A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/
We’re very excited to announce the new release of Annotation Lab 2.0, which adds numerous Inter-Annotation Agreement Charts, Transfer Learning support for Training Models, Training Community Models without the need of License, custom training script, and many more cool features to enhance the usability. With this version, Annotation Lab moves in the direction of a single page application which means the UI components and navigations are getting faster with this and subsequent versions.
Here are details of features and fixes included in this release.
Inter-Annotator Agreement Charts (IAA)
To get a measure of how well multiple annotators can make the same annotation decision for a certain category, we are shipping seven different charts. To see these charts users can click on the third tab “Inter-Annotator Agreement” of the Analytics Dashboard of NER projects. There are dropdown boxes to change annotators for comparison purposes. It is also possible to download the data of some charts in CSV format by clicking the download button present at the bottom right corner of each of them.
Note: Only the Submitted and Starred (Ground Truth) completions are used to render these charts.
- Shows agreement percentage across all labels between annotators John and Jane
- Shows agreement with Label wise breakdown
- Shows whether two Annotators agree on every annotated Chunks
- Shows agreement between one Annotator and Preannotation result on every annotated Chunks
- Shows Labels of each Chunk by one Annotator and context in the tasks
- Shows the frequency of Chunk-Label by one Annotator
- Shows the frequency of Chunk-Annotatory by one Label
Change in CONLL export
There are significant changes in the way Annotation Lab does the CONLL export. In previous versions, numerous files were created based on Tasks and Completions. There were issues in the Header and no sentences were detected. Also, some punctuations were not correctly exported or were missing.
The new CONLL export implementation results in a single file and fixes all the above issues. As in previous versions, if only Starred completions are needed in the exported file, users can select the “Only ground truth” checkbox.
Search label:token
Now, it is possible to list the tasks based on some annotation criteria:
Searching options:
label: ABC
-> returns all tasks that have completion with label ABClabel: ABC=DEF
-> returns all tasks that have completion with label ABC that is tagged with text DEFchoice: Mychoice
-> returns all tasks that have a completion which has choice Mychoice
(Search is case insensitive :-label: ABC=DEF
orlabel: Abc=Def
orlabel: abc=def
are considered equivalent)
Example:
Consider a system with 3 tasks which are annotated as below:
- Search-query “label:person” will list task 0 and task 1
- Search-query “label:location” will list task 2
- Search-query “label:person=the water” will list task 1
Model Validation
Validation of labels and models is done beforehand. The Error message is shown if the label is incompatible with models.
Transfer Learning
Now its is possible to continue model training from an already available model. If a Medical NER model is present in the system, the project owner or manager can go to Advanced Options settings of the Training section in the Setup Page and choose it to Fine Tune the model. When Fine Tuning is enabled, the embeddings that were used to train the model need to be present in the system. If present, it will be automatically selected, otherwise users need to go to the Models Hub page and download or upload it.
This version uses 3.1.3 version of the Spark NLP and Spark NLP for Healthcare version. Spark OCR is also upgraded to version 3.5.0.
Training without License
In previous versions, Annotation Lab didn’t allow training without the presence of Spark NLP for Healthcare license. But now the training with community embeddings is allowed even without the presence of Valid license.
Custom Training Script
If users want to change the default Training script present within the Annotation Lab, they can upload their own training pipeline. In the Training section of the Project Setup Page, only admin users can upload the training scripts. At the moment we are supporting the NER custom training script only.
UI Improvements
We are planning to transform the existing Annotation Lab UI completely to make it a single-page application. Users are expected to see the differences right from this version on multiple pages. For many actions and navigations, the entire page does not need to reload which means the application will be much smoother than before. The following pages are already converted and others will follow in subsequent releases.
- Project List Page
- Create Project Page
- About Us Page
- Settings Page
- Profile Page
- Users module Page
- License Page
Note: As we are in the transition phase, there may be UI glitches or broken links within the application. We will keep fixing them.
Also with some UI improvements mentioned below, uploading custom models or embeddings has now become easier:
- Name of the model is prefilled based on the uploaded file
- Adding Description is made optional
- Adding a large number of labels is now easier as we can add multiple comma-separated labels simultaneously
- Models or embeddings uploading can be canceled, resumed as well as removed
- When models or embeddings upload is in progress upload modal cannot be closed
Miscellaneous
Users can see a proper message on the Modelshub page when annotationlab is not connected to the internet (AWS S3 to be more precise). This happens in air-gapped environments or some issues in the enterprise network.
Users now have the option to download the trained models from the Models Hub page. The download option is available under the overflow menu of each Model on the “Available Models” tab.
The last Annotation Lab version introduced showing Live Training Logs but there were too many details. Now the logs are made much more pretty and readable.
Not all Embeddings present in the Models Hub are supported by NER and Assertion Status Training. These are now properly validated from the UI.
Sometimes there can be a conflict between two users in a way that one user may use embeddings for training, but for some reason, another user tries to delete the same either to free up some disk or by accident. In that case, the training will fail with some unpleasant exceptions/logs. The existence of the embeddings in training as well as in deployment is ensured and a readable message is shown to users.
Now it is possible to provide a custom CA certificate chain to include in the Annotation Lab deployment. Follow the instruction.md
file present in the installation artifact.
More Improvements
When multiple paged OCR file was imported using Spark OCR, the task created did not have pagination.
Sometimes deleted project was not cleared off the system completely which led to different validation errors on the Labeling Config of a new project.
Also, due to a bug in the Assertion Status script, the training was not working at all. Now all these identified bugs are fixed.
Any AdminUser could delete the main “admin” user as well as itself. We have added proper validation to avoid such situations.
Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation
See in action