was successfully added to your cart.

    Spark NLP For Healthcare 2.7.3 with Biobert Extraction Models, Higher Accuracy, De-Identification, New Radiology NER Model & More

    Spark NLP for Healthcare 2.7.3 is available now! If you’re an existing customer, expect an email with upgrade instructions.

    In this release we introduce the following new models, features, and improvements:

    • Massive improvements & feature enhancements in De-Identification module
    • A brand-new RelationExtractionDL Annotator
    • Drug Normalizer
    • Confidence Scores in assertion output
    • Cosine similarity metrics in entity resolvers
    • AuxLabel in the metadata of entity resolvers
    • New Relation Extraction models
    • New Entity Resolver models
    • New Clinical Pretrained NER model
    • Bug fixes & general improvements
    • Matching the version with Spark NLP open-source v2.7.3

     

    Improvements in De-Identification Module

     

    Faker Library

    This release introduces the ‘faker’ library in De-Identification Module to automatically generate random data like names, dates, addresses etc. so users do not have to specify dummy data (custom obfuscation files can still be used). It also improves the obfuscation results due to a bigger pool of random values.

     

    How to use

     

    Structured De-Identification Module

    This release comes with a new annotator to handle the de-identification of structured data. It allows users to define a mapping of columns and their obfuscation policy. Users can also provide dummy data and map them to columns they want to replace values in.

     

    How to use

     

    Input:

    Deidentified:

     

    Introducing SOTA Relation Extraction model using BioBert

    We released a brand-new end-to-end trained BERT model, resulting in massive improvements. Another new annotator (ReChunksFilter) is also developed for this new model to allow syntactic features to work well with BioBert to extract relations.

     

    How to use

     

    Drug Normalizer

    This release comes with Drug Normalizer to standardize units of drugs and handle abbreviations in raw text or drug chunks identified by any NER model. This normalization significantly improves the performance of entity resolvers.

     

    How to use

     

    Assertion models to support confidence in the output

    Just like NER output, now assertion models also provide confidence scores for each prediction.

     

    New Relation Extraction Models

    This release introduces 3 new relation extraction models between body parts and clinical entities and 1 new relation extraction model between date and clinical entities. These models are trained using a binary relation extraction approach for better accuracy.

    • re_bodypart_direction : Relation Extraction between Body Part and Direction entities.

    Example:

    Text: “MRI demonstrated infarction in the upper brain stem, left cerebellum and right basil ganglia”

    • re_bodypart_problem : Relation Extraction between Body Part and Problem entities.

    Example:

    Text: “No neurologic deficits other than some numbness in his left hand.”

    • re_bodypart_proceduretest : Relation Extraction between Body Part and Procedure, Test entities.

    Example:

    Text: “TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound.”

    • re_date_clinical : Relation Extraction between Date and different clinical entities.

    Example:

    Text: “This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.”

     

    How to use

     

    New Resolver Models using JSL SBERT

    • sbiobertresolve_icd10cm_augmented
    • sbiobertresolve_cpt_augmented
    • sbiobertresolve_cpt_procedures_augmented
    • sbiobertresolve_icd10cm_augmented_billable_hcc
    • sbiobertresolve_hcc_augmented

    Returning auxiliary columns mapped to resolutions: Chunk entity resolver and sentence entity resolver now returns auxiliary data that is mapped the resolutions during training. This will allow users to get multiple resolutions with a single model without using any other annotator in the pipeline (In order to get billable codes otherwise there need to be other modules in the same pipeline)

    We also added a cosine similarity metric in metadata to resolve entities and find the closest matches, resulting in better, more semantically correct results.

    • sbiobertresolve_icd10cm_augmented_billable_hcc

    Example:

    Input Text: “bladder cancer”

    • sbiobertresolve_cpt_augmented

    Example:

    Input Text: “ct abdomen without contrast”

     

    New Pretrained Clinical NER Model

    • NER Radiology

    Input Text: “Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder. This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow. This may represent benign fibrous tissue or a lipoma.”

    How useful was this post?

    Our additional expert:
    Veysel is a Head of Data Science at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. Holding a PhD degree in ML, Dr. Kocaman has authored more than 25 papers in peer reviewed journals and conferences in the last few years, focusing on solving real world problems in healthcare with NLP. He is a seasoned data scientist with a strong background in every aspect of data science including machine learning, artificial intelligence, and big data with over ten years of experience. Veysel has broad consulting experience in Statistics, Data Science, Software Architecture, DevOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.

    Automated Drug Adverse Event Detection from Unstructured Text

    Adverse Drug Events (ADEs) are potentially very dangerous to patients and are amongst the top causes of morbidity and mortality. Monitoring &...
    preloader