was successfully added to your cart.

    Inter-Annotator Agreement Charts, Transfer Learning and Open Source Model Training in the Annotation Lab

    Avatar photo
    Data Scientist at John Snow Labs

    A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/

    We’re very excited to announce the new release of Annotation Lab 2.0, which adds numerous Inter-Annotation Agreement Charts, Transfer Learning support for Training Models, Training Community Models without the need of License, custom training script, and many more cool features to enhance the usability. With this version, Annotation Lab moves in the direction of a single page application which means the UI components and navigations are getting faster with this and subsequent versions.

    Here are details of features and fixes included in this release.

    Inter-Annotator Agreement Charts (IAA)

    To get a measure of how well multiple annotators can make the same annotation decision for a certain category, we are shipping seven different charts. To see these charts users can click on the third tab “Inter-Annotator Agreement” of the Analytics Dashboard of NER projects. There are dropdown boxes to change annotators for comparison purposes. It is also possible to download the data of some charts in CSV format by clicking the download button present at the bottom right corner of each of them.

    Note: Only the Submitted and Starred (Ground Truth) completions are used to render these charts.

    • Shows agreement percentage across all labels between annotators John and Jane
    Model Training in the Annotation Lab
    • Shows agreement with Label wise breakdown
    Model Training in the Annotation Lab
    • Shows whether two Annotators agree on every annotated Chunks
    Model Training in the Annotation Lab
    • Shows agreement between one Annotator and Preannotation result on every annotated Chunks
    Model Training in the Annotation Lab
    • Shows Labels of each Chunk by one Annotator and context in the tasks
    Model Training in the Annotation Lab
    • Shows the frequency of Chunk-Label by one Annotator
    Model Training in the Annotation Lab
    • Shows the frequency of Chunk-Annotatory by one Label
    Model Training in the Annotation Lab

    Change in CONLL export

    There are significant changes in the way Annotation Lab does the CONLL export. In previous versions, numerous files were created based on Tasks and Completions. There were issues in the Header and no sentences were detected. Also, some punctuations were not correctly exported or were missing.

    The new CONLL export implementation results in a single file and fixes all the above issues. As in previous versions, if only Starred completions are needed in the exported file, users can select the “Only ground truth” checkbox.

    Search label:token

    Now, it is possible to list the tasks based on some annotation criteria:
    Searching options:

    • label: ABC -> returns all tasks that have completion with label ABC
    • label: ABC=DEF -> returns all tasks that have completion with label ABC that is tagged with text DEF
    • choice: Mychoice -> returns all tasks that have a completion which has choice Mychoice
      (Search is case insensitive :- label: ABC=DEF or label: Abc=Def or label: abc=def are considered equivalent)

    Example:

    Consider a system with 3 tasks which are annotated as below:

    Model Training in the Annotation Lab
    Model Training in the Annotation Lab
    Model Training in the Annotation Lab
    • Search-query “label:person” will list task 0 and task 1
    Model Training in the Annotation Lab
    • Search-query “label:location” will list task 2
    Model Training in the Annotation Lab
    • Search-query “label:person=the water” will list task 1
    Model Training in the Annotation Lab

    Model Validation

    Validation of labels and models is done beforehand. The Error message is shown if the label is incompatible with models.

    ezgif com-gif-maker (1)

    Transfer Learning

    Now its is possible to continue model training from an already available model. If a Medical NER model is present in the system, the project owner or manager can go to Advanced Options settings of the Training section in the Setup Page and choose it to Fine Tune the model. When Fine Tuning is enabled, the embeddings that were used to train the model need to be present in the system. If present, it will be automatically selected, otherwise users need to go to the Models Hub page and download or upload it.

    ezgif com-gif-maker

    This version uses 3.1.3 version of the Spark NLP and Spark NLP for Healthcare version. Spark OCR is also upgraded to version 3.5.0.

    Training without License

    In previous versions, Annotation Lab didn’t allow training without the presence of Spark NLP for Healthcare license. But now the training with community embeddings is allowed even without the presence of Valid license.

    Custom Training Script

    If users want to change the default Training script present within the Annotation Lab, they can upload their own training pipeline. In the Training section of the Project Setup Page, only admin users can upload the training scripts. At the moment we are supporting the NER custom training script only.

    Model Training in the Annotation Lab

    UI Improvements

    We are planning to transform the existing Annotation Lab UI completely to make it a single-page application. Users are expected to see the differences right from this version on multiple pages. For many actions and navigations, the entire page does not need to reload which means the application will be much smoother than before. The following pages are already converted and others will follow in subsequent releases.

    • Project List Page
    • Create Project Page
    • About Us Page
    • Settings Page
    • Profile Page
    • Users module Page
    • License Page

    Note: As we are in the transition phase, there may be UI glitches or broken links within the application. We will keep fixing them.

    Also with some UI improvements mentioned below, uploading custom models or embeddings has now become easier:

    • Name of the model is prefilled based on the uploaded file
    • Adding Description is made optional
    • Adding a large number of labels is now easier as we can add multiple comma-separated labels simultaneously
    • Models or embeddings uploading can be canceled, resumed as well as removed
    • When models or embeddings upload is in progress upload modal cannot be closed
    ezgif com-gif-maker (1)

    Miscellaneous

    Users can see a proper message on the Modelshub page when annotationlab is not connected to the internet (AWS S3 to be more precise). This happens in air-gapped environments or some issues in the enterprise network.

    tmp_1629739448619

    Users now have the option to download the trained models from the Models Hub page. The download option is available under the overflow menu of each Model on the “Available Models” tab.

    Model Training in the Annotation Lab

    The last Annotation Lab version introduced showing Live Training Logs but there were too many details. Now the logs are made much more pretty and readable.
    Not all Embeddings present in the Models Hub are supported by NER and Assertion Status Training. These are now properly validated from the UI.

    Sometimes there can be a conflict between two users in a way that one user may use embeddings for training, but for some reason, another user tries to delete the same either to free up some disk or by accident. In that case, the training will fail with some unpleasant exceptions/logs. The existence of the embeddings in training as well as in deployment is ensured and a readable message is shown to users.

    Now it is possible to provide a custom CA certificate chain to include in the Annotation Lab deployment. Follow the instruction.md file present in the installation artifact.

    More Improvements

    When multiple paged OCR file was imported using Spark OCR, the task created did not have pagination.
    Sometimes deleted project was not cleared off the system completely which led to different validation errors on the Labeling Config of a new project.

    Also, due to a bug in the Assertion Status script, the training was not working at all. Now all these identified bugs are fixed.

    Any AdminUser could delete the main “admin” user as well as itself. We have added proper validation to avoid such situations.

    How useful was this post?

    Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

    See in action
    Avatar photo
    Data Scientist at John Snow Labs
    Our additional expert:
    Nabin Khada leads the team building the Annotation Lab at John Snow Labs. He has 7 years of experience as a software engineer, covering a broad range of technologies from web & mobile apps to distributed systems and large-scale machine learning.

    Active Learning for Document Classification, Multilingual Embeddings, and Live Training Logs in the Annotation Lab

    A new generation of the NLP Lab is now available: the Generative AI Lab. Check details here https://www.johnsnowlabs.com/nlp-lab/ We are very excited...