Inter-Annotator Agreement Charts, Transfer Learning and Open Source Model Training in the Annotation Lab

02.09.2021

Nabin Khadka

Data Scientist at John Snow Labs

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

We’re very excited to announce the new release of Annotation Lab 2.0, which adds numerous Inter-Annotation Agreement Charts, Transfer Learning support for Training Models, Training Community Models without the need of License, custom training script, and many more cool features to enhance the usability. With this version, Annotation Lab moves in the direction of a single page application which means the UI components and navigations are getting faster with this and subsequent versions.

Here are details of features and fixes included in this release.

Inter-Annotator Agreement Charts (IAA)

To get a measure of how well multiple annotators can make the same annotation decision for a certain category, we are shipping seven different charts. To see these charts users can click on the third tab “Inter-Annotator Agreement” of the Analytics Dashboard of NER projects. There are dropdown boxes to change annotators for comparison purposes. It is also possible to download the data of some charts in CSV format by clicking the download button present at the bottom right corner of each of them.

Note: Only the Submitted and Starred (Ground Truth) completions are used to render these charts.

Shows agreement percentage across all labels between annotators John and Jane

Shows agreement with Label wise breakdown

Shows whether two Annotators agree on every annotated Chunks

Shows agreement between one Annotator and Preannotation result on every annotated Chunks

Shows Labels of each Chunk by one Annotator and context in the tasks

Shows the frequency of Chunk-Label by one Annotator

Shows the frequency of Chunk-Annotatory by one Label

Change in CONLL export

There are significant changes in the way Annotation Lab does the CONLL export. In previous versions, numerous files were created based on Tasks and Completions. There were issues in the Header and no sentences were detected. Also, some punctuations were not correctly exported or were missing.

The new CONLL export implementation results in a single file and fixes all the above issues. As in previous versions, if only Starred completions are needed in the exported file, users can select the “Only ground truth” checkbox.

Search label:token

Now, it is possible to list the tasks based on some annotation criteria:
Searching options:

label: ABC -> returns all tasks that have completion with label ABC
label: ABC=DEF -> returns all tasks that have completion with label ABC that is tagged with text DEF
choice: Mychoice -> returns all tasks that have a completion which has choice Mychoice
(Search is case insensitive :- label: ABC=DEF or label: Abc=Def or label: abc=def are considered equivalent)

Example:

Consider a system with 3 tasks which are annotated as below:

Search-query “label:person” will list task 0 and task 1

Search-query “label:location” will list task 2

Search-query “label:person=the water” will list task 1

Model Validation

Validation of labels and models is done beforehand. The Error message is shown if the label is incompatible with models.

Transfer Learning

Now its is possible to continue model training from an already available model. If a Medical NER model is present in the system, the project owner or manager can go to Advanced Options settings of the Training section in the Setup Page and choose it to Fine Tune the model. When Fine Tuning is enabled, the embeddings that were used to train the model need to be present in the system. If present, it will be automatically selected, otherwise users need to go to the Models Hub page and download or upload it.

This version uses 3.1.3 version of the Spark NLP and Spark NLP for Healthcare version. Spark OCR is also upgraded to version 3.5.0.

Training without License

In previous versions, Annotation Lab didn’t allow training without the presence of Spark NLP for Healthcare license. But now the training with community embeddings is allowed even without the presence of Valid license.

Custom Training Script

If users want to change the default Training script present within the Annotation Lab, they can upload their own training pipeline. In the Training section of the Project Setup Page, only admin users can upload the training scripts. At the moment we are supporting the NER custom training script only.

UI Improvements

We are planning to transform the existing Annotation Lab UI completely to make it a single-page application. Users are expected to see the differences right from this version on multiple pages. For many actions and navigations, the entire page does not need to reload which means the application will be much smoother than before. The following pages are already converted and others will follow in subsequent releases.

Project List Page
Create Project Page
About Us Page
Settings Page
Profile Page
Users module Page
License Page

Note: As we are in the transition phase, there may be UI glitches or broken links within the application. We will keep fixing them.

Also with some UI improvements mentioned below, uploading custom models or embeddings has now become easier:

Name of the model is prefilled based on the uploaded file
Adding Description is made optional
Adding a large number of labels is now easier as we can add multiple comma-separated labels simultaneously
Models or embeddings uploading can be canceled, resumed as well as removed
When models or embeddings upload is in progress upload modal cannot be closed

Miscellaneous

Users can see a proper message on the Modelshub page when annotationlab is not connected to the internet (AWS S3 to be more precise). This happens in air-gapped environments or some issues in the enterprise network.

Users now have the option to download the trained models from the Models Hub page. The download option is available under the overflow menu of each Model on the “Available Models” tab.

The last Annotation Lab version introduced showing Live Training Logs but there were too many details. Now the logs are made much more pretty and readable.
Not all Embeddings present in the Models Hub are supported by NER and Assertion Status Training. These are now properly validated from the UI.

Sometimes there can be a conflict between two users in a way that one user may use embeddings for training, but for some reason, another user tries to delete the same either to free up some disk or by accident. In that case, the training will fail with some unpleasant exceptions/logs. The existence of the embeddings in training as well as in deployment is ensured and a readable message is shown to users.

Now it is possible to provide a custom CA certificate chain to include in the Annotation Lab deployment. Follow the instruction.md file present in the installation artifact.

More Improvements

When multiple paged OCR file was imported using Spark OCR, the task created did not have pagination.
Sometimes deleted project was not cleared off the system completely which led to different validation errors on the Labeling Config of a new project.

Also, due to a bug in the Assertion Status script, the training was not working at all. Now all these identified bugs are fixed.

Any AdminUser could delete the main “admin” user as well as itself. We have added proper validation to avoid such situations.

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

Nabin Khadka

Data Scientist at John Snow Labs

Our additional expert:

Nabin Khada leads the team building the Annotation Lab at John Snow Labs. He has 7 years of experience as a software engineer, covering a broad range of technologies from web & mobile apps to distributed systems and large-scale machine learning.

Active Learning for Document Classification, Multilingual Embeddings, and Live Training Logs in the Annotation Lab

Nabin Khadka

A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/ We are very excited...

Inter-Annotator Agreement Charts, Transfer Learning and Open Source Model Training in the Annotation Lab

Inter-Annotator Agreement Charts (IAA)

Change in CONLL export

Search label:token

Example:

Model Validation

Transfer Learning

Training without License

Custom Training Script

UI Improvements

Miscellaneous

More Improvements

Active Learning for Document Classification, Multilingual Embeddings, and Live Training Logs in the Annotation Lab

Recommended For You