A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/
The goal of this study is to run a series of performance tests to check the responsiveness of the Annotation Lab web application when handling big workloads, both in terms of a high number of concurrent users and in terms of database scalability. The most problematic pages are the projects and tasks pages as they handle a big amount of data. This performance study focuses on loading those 2 pages.
The tests have been conducted using the automatic load testing tool jmeter. The database was populated with synthetic user profiles and projects as described below.
Annotation Lab Configuration
• 20 NER projects with taxonomies of 30 labels each were created, and 125 tasks were imported to each project.
• Each task consisted of 600,000 characters of texts (around 200 pages).
• 7000 labeled tokens/chunks were added to each task.
• 20 users were assigned to all 20 projects as Annotator and Manager
• 6 floating licenses were imported.
• 5 preannotation servers, as well as 1 training server, were run in parallel. The deployed model was: ner_jsl with embeddings_clinical.
Cluster Specification:
• No of EKS nodes: 3
• Instance Type: t2.2xlarge
• Specification: 8 Core CPU with 32 GB RAM
The preannotation/train servers were used to predict/train at the beginning (where replica:1, web server worker:1 and thread:1) to check how many preannotation servers and training servers could be deployed at the same time in EKS system with 3 nodes. Gradually as the number of replicas and web server threads and workers increased, the performance of the system when handling more users increased at the cost of resources.
Hence for allowing teams of over 100 users to work in parallel on the same Annotation Lab instance deployed on the above cluster specification the system should not be preannotating or training at the same time.
Performance/Stress Test:
The jmeter script was used to call the API for retrieving the project list page and the tasks list of a project per thread. In jmeter the thread number refers to the number of users using the web application concurrently.
In the extreme scenario when the base Annotation Lab configuration (1 webserver worker, 8 users, and one replica) is used, it can support 10 users concurrently.
For improving the performance of Annotation Lab to better handle a higher number of users, several parameters were tweaked.
First, the number of web server threads was increased from 8 to 10 and then finally to 12. This change improved the performance of the application to support about 30 users. In the table below you can see that the performance boost from 8 to 10 threads was much higher than the performance boost from 10 to 12 web server threads.
Second, the number of workers increased from 1 to 2 and then to 3. The application was able to support 50 threads. You can notice that the performance boast was better from 2 workers to 3 workers than from 1 worker to 2 workers.
Finally, the number of replicas was increased from 1 to 2 and then to 3. The application was able to support 110 threads. Again, we can see that the performance boast was better from 1 replica to 2 replicas than from 2 replicas to 3 replicas.
SN | Replicas | Webserver Worker | Webserver Thread | No of User (Threads) | Result |
1 | 1 | 1 | 8 | 10 | pass |
2 | 1 | 1 | 8 | 12 | % |
3 | 1 | 1 | 10 | 12 | pass |
4 | 1 | 1 | 10 | 15 | pass |
5 | 1 | 1 | 10 | 20 | pass |
6 | 1 | 1 | 10 | 25 | slight error % |
7 | 1 | 1 | 12 | 25 | pass |
8 | 1 | 1 | 12 | 30 | error % |
9 | 1 | 2 | 12 | 30 | pass |
10 | 1 | 2 | 12 | 35 | pass |
11 | 1 | 2 | 12 | 40 | error % |
12 | 1 | 3 | 12 | 45 | pass |
13 | 1 | 3 | 12 | 50 | error % |
14 | 2 | 1 | 8 | 22 | pass |
15 | 2 | 1 | 8 | 24 | slight error % |
16 | 2 | 1 | 10 | 35 | pass |
17 | 2 | 1 | 10 | 40 | error % |
18 | 2 | 1 | 12 | 43 | pass |
19 | 2 | 1 | 12 | 45 | slight error % |
20 | 2 | 2 | 12 | 50 | pass |
21 | 2 | 2 | 12 | 55 | slight error % |
22 | 2 | 3 | 12 | 80 | pass |
23 | 2 | 3 | 12 | 90 | slight error % |
24 | 3 | 3 | 12 | 100 | pass |
25 | 3 | 3 | 12 | 110 | error % |
Conclusion:
The results of this performance test show that the EKS Annotation Lab configuration with 2 replicas, 3 webserver workers, and 12 web server threads is optimal in terms of resource cost to performance ratio.
Get & install it HERE.
Full feature set HERE.
Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation
See in action