Parallel Training and Preannotation with support for Floating Licenses in the Annotation Lab

07.04.2022

Nabin Khadka

Data Scientist at John Snow Labs

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

We are very excited to release Annotation Lab 3 with support for Floating Licenses and for parallel training and preannotation jobs, created on demand by Project Owners and Managers across various projects. Below are more details about the release.

Support Floating Licenses

Annotation Lab now supports floating licenses with different scopes (ocr: training, ocr: inference, healthcare: inference, healthcare: training). Depending on the scope of the available license, users can perform model training and/or deploy preannotation servers. Licenses are a must only for training Spark NLP for Healthcare models and for deploying Spark NLP for Healthcare models as preannotation servers.
One floating license can be only bound with one server (preannotation server, OCR server, training job) at a time. To run multiple model training jobs and/or preannotations servers, users must provide multiple floating licenses.
Annotation Lab supports either floating licenses or air-gapped licenses. Mixing floating and air-gapped licenses on the same Annotation Lab instance is not allowed.

Parallel Trainings and Preannotations

When using Annotation Lab as part of an enterprise-level deployment, the annotator’s team can be distributed across different locations or countries or logically divided based on the sub-domain or type of documents they cover. In this context, confusion can appear while using shared resources (e.g. one single instance of the preannotation server and/or one OCR server) for running preannotation on a large number of documents across multiple projects. Things can go even worse while running multiple training jobs because each job can take hours to complete and waiting for another training/active learning process to end is not a good experience. Annotation Lab now offers a way of running model training and document preannotation across multiple projects and/or teams in parallel. If the infrastructure dedicated to the Annotation Lab includes sufficient resources, each team/project can run smoothly without being blocked.

On demand deployment of preannotation servers and training jobs

Even though Annotation Lab does not impose any restriction on the number of parallel training and preannotation processes, it is good practice to bind the setup to an upper limit. This way, you make sure the number of parallel resources created on the fly does not exceed the given threshold. When the maximum number of deployed servers is reached, a Project Owner/Manager has the option to remove the ones which are no longer needed and free resources that can be used for new deployments. This can be done from the Settings page – Server tab. The DevOps engineer installing the Annotation Lab must specify this value during the installation process. (use the model_server.count parameter from the installation/update script). Note: This count can be changed anytime by updating the value of the model_server.count parameter in the update script and running it again.

Deploy a new training job

With this release, users can perform multiple training jobs at the same time, depending on the available resources/license(s). Users can opt to create new training jobs independently from already running training/preannotation/OCR jobs. If resources/licenses are available when pressing the Train Now button a new training server is launched.

Deploy a new preannotation server

Previously, only one project could run preannotation at a time. When preannotation was running on one of the projects, it was blocked for all other projects. This release is all about pushing the limits of concurrency. Thus, multiple preannotation servers can now be deployed simultaneously across multiple projects. So, one or more projects can spawn their own custom preannotation servers and perform inference.

Concurrency is not only supported between preannotation servers but between training and preannotation too. Users can have training running on one project and preannotation running on another project at the same time.

OCR and Visual NER servers

Just like preannotation servers, Annotation Lab 3.0.0 also supports the deployment of multiple OCR servers. If a user has uploaded a Spark OCR license, be it airgap or floating, OCR inference is enabled. To create a Visual NER project, users have to deploy at least one OCR server. Any OCR server can perform preannotation. To select the OCR server, users have to go to the Import page, toggle the OCR option and from the popup, choose one of the available OCR servers. If no OCR server is present, by clicking on the deploy button, users can deploy a new OCR server instantly.

Usage of Spark NLP for Healthcare licenses

The number of available floating licenses can also influence the creation of multiple training and preannotation servers. For example, to deploy 5 preannotation servers using Spark NLP for Healthcare models or embeddings, across 5 different projects, you will need 5 floating licenses. Since one floating license can only be used for one server, it is not possible to deploy a preannotation server and then trigger training from the same project. In this case, the preannotation server has to be deleted first and then the training can be started.

Those restrictions do not apply when using airgap licenses or when using Spark NLP models and embeddings.

Management of Preannotation and Training Servers

This release of Annotation Lab gives users the ability to view the list of all active servers. Any user can access the Server List page by navigating to the Settings page > Server tab. This page gives the following details:

A summary of the status/limitations of the current infrastructure to run Spark NLP for Healthcare training jobs and/or preannotation servers.
Ability to delete a server and free up resources when required, so that another training job and/or preannotation server can be started.
Shows details of the server
- Server Name: Gives the name of the server that can help identify it while running preannotation or importing files.
- License Details: The license that is being used in the server and its scope.
- Usage: Let the user know the usage of the server. A server can be used for preannotation, training or OCR.
- Deployed by: The user who deployed the server. This information might be useful for contacting the user who deployed a server before deleting it.
- Deployed at: The deployed time of the server.

New options available when running preannotation

Since now multiple preannotation servers can be available to preannotate the tasks from a project, the dialog box that opens when clicking the Preannotate button on the Tasks page has been extended with new options. Namely, Project Owners or Managers can now select the server to use. At that point information about the configuration deployed on the selected server will be shown on the popup so users can make an informed decision on which server to use.

In case the target preannotation server does not exist yet, the dialog box also offers the option to deploy a new server with the current project’s configuration. If this option is selected, and if enough resources are available (infrastructure capacity and a free license if required) the server is deployed and preannotation can be started. If there are no free resources, users can delete one or several existing servers from Settings page – Server tab.

Reuse existing or create new OCR servers for image & PDF processing

On the tasks import page, toggling the OCR option activates a new popup where users can select an existing OCR server to use for the import of images and pdf files. In no suitable OCR server is available, one can be created by choosing the “Create New” option.

License Page

Users can see an info message regarding the number of servers that can be spawned based on available licenses.

Get & Install It HERE.

Full Feature Set HERE.

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Nabin Khadka

Data Scientist at John Snow Labs

Our additional expert:

Nabin Khada leads the team building the Annotation Lab at John Snow Labs. He has 7 years of experience as a software engineer, covering a broad range of technologies from web & mobile apps to distributed systems and large-scale machine learning.