was successfully added to your cart.

    Comparing De-Identification Performance: Healthcare NLP, Azure Health Data Services, And Amazon Medical Comprehend

    Avatar photo
    Master’s Degree St. in Artificial Intelligence

    In an era of rapidly advancing healthcare technology, the protection of patient privacy is more critical than ever. Medical records, rich with sensitive information, are invaluable for research and innovation but must be carefully managed to ensure compliance with regulations like HIPAA and GDPR. De-identification, the process of removing or obscuring personally identifiable information (PII) from medical data, lies at the heart of this effort.

    De-Identification process identifies potential pieces of content with personal information about patients and removes them by replacing them with semantic tags or fake entities.

    Leveraging LLMs for de-identifying sensitive data (PHI) might be considered excessive and potentially unreliable, depending on the use case and the required level of customization. While LLMs like ChatGPT are highly capable of generating high-quality text, they are not specifically designed or optimized for the precise task of PHI de-identification.

    This blog explores the performance and comparison of de-identification services provided by Healthcare NLP, Amazon, and Azure, focusing on their accuracy when applied to a dataset annotated by healthcare experts.

    This comparison is valuable because it provides critical insights into the strengths and limitations of each de-identification solution, enabling researchers, developers, and organizations to make informed decisions when choosing a tool. For researchers, it helps identify the most accurate and reliable service for sensitive data processing. For developers, it highlights the ease of integration and API flexibility, which are crucial for building scalable solutions. For organizations, particularly in healthcare and finance, it offers a clear perspective on compliance capabilities, and performance, ensuring the selected tool meets regulatory requirements while optimizing operational efficiency.

    Dataset

    For this benchmark, we utilized 48 open-source documents annotated by domain experts from John Snow Labs. The annotations focused on the extraction of IDNUMLOCATIONDATEAGENAME, and CONTACT entities because these labels represent critical personal information that is commonly targeted for de-identification in various healthcare. These labels are typically associated with sensitive aspects of a person’s identity and are often required to be removed or anonymized in compliance with regulations like HIPAA or GDPR. By centering the benchmark around these entities, the task ensures that the performance evaluation is directly relevant to the challenges of real-world de-identification, which often involves identifying and obscuring such critical personal data.

    Ground Truth Dataset

    Tools Compared

    Healthcare NLP & LLM

    The Healthcare Library is a powerful component of John Snow Labs’ Spark NLP platform, designed to facilitate NLP tasks within the healthcare domain. This library provides over 2,500 pre-trained models and pipelines tailored for medical data, enabling accurate information extraction, NER for clinical and medical concepts, and text analysis capabilities. Regularly updated and built with cutting-edge algorithms, the Healthcare library aims to streamline information processing and empower healthcare professionals with deeper insights from unstructured medical data sources, such as electronic health records, clinical notes, and biomedical literature.

    John Snow Labs has created custom large language models (LLMs) tailored for diverse healthcare use cases. These models come in different sizes and quantization levels, designed to handle tasks such as summarizing medical notes, answering questions, performing retrieval-augmented generation (RAG), named entity recognition and facilitating healthcare-related chats.

    John Snow Labs’ Healthcare NLP & LLM library offers a powerful solution to streamline the de-identification of medical records. By leveraging advanced Named Entity Recognition (NER) models, the library can automatically identify and deidentify Protected Health Information (PHI) from unstructured clinical notes. This capability ensures compliance with privacy regulations while maintaining the utility of the data for research and analysis. With this technology, healthcare organizations can efficiently anonymize sensitive information, enabling the safe sharing of clinical data for studies, improving patient privacy, and fostering innovation in medical research.

    With the Healthcare NLP library, you can either build a custom de-identification pipeline to target specific labels or use pretrained pipelines with just two lines of code to deidentify a wide range of entities, including AGE, CONTACT, DATE, ID, LOCATION, NAME, PROFESSION, CITY, COUNTRY, DOCTOR, HOSPITAL, IDNUM, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, STREET, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, IPADDR, and EMAIL.

    For this benchmark, we specifically used the clinical_de-identification_docwise_benchmark pretrained pipeline, which is designed to extract and de-identify NAME, IDNUM, CONTACT, LOCATION, AGE, DATE entities. It’s important to note that this pipeline does not rely on any LLM components.

    Here is a sample code for using the Healthcare NLP pipeline:

    from sparknlp.pretrained import PretrainedPipeline
    
    deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
    sample_text = "Patient John Doe was admitted to Boston General Hospital on 01/12/2023."
    result = deid_pipeline.fullAnnotate(sample_text)

    Azure Health Data Services

    Azure Health Data Services’ de-identification service is designed to protect sensitive health information while preserving data utility. This API leverages natural language processing techniques to identify, label, redact, or surrogate Protected Health Information (PHI) in unstructured medical texts. Launched in 2024, the service offers three key operations: Tag, Redact, and Surrogate, enabling healthcare organizations to process diverse types of clinical documents securely and efficiently.

    The de-identification service stands out for its ability to balance privacy concerns with data usability. By employing machine learning algorithms, it can detect HIPAA’s 18 identifiers and other PHI entities, ensuring compliance with various regional privacy regulations such as GDPR and CCPA. This innovative tool empowers healthcare professionals, researchers, and organizations to unlock the potential of their clinical data for machine learning, analytics, and collaborative research while maintaining the highest standards of patient privacy and data protection. You can get more details from here: https://learn.microsoft.com/en-us/azure/healthcare-apis/deidentification/

    Here is a sample code for using Azure Health Data Services:

    import os
    
    CREDENTIALS_IN_JSON_FORMAT = {
    "AZURE_CLIENT_ID":"", # update here with yours
    "AZURE_TENANT_ID":"", # update here with yours
    "AZURE_CLIENT_SECRET":"" # update here with yours
    }
    for k, v in CREDENTIALS_IN_JSON_FORMAT.items():
     os.environ[k] = v
    
    
    # This is constant for one app setup
    os.environ['AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT'] = '' # update here
    
    from azure.health.deidentification import DeidentificationClient
    from azure.identity import DefaultAzureCredential
    from azure.health.deidentification.models import *
    
    endpoint = os.environ["AZURE_HEALTH_DEIDENTIFICATION_ENDPOINT"]
    endpoint = endpoint.replace("https://", "")
    
    credential = DefaultAzureCredential(exclude_interactive_browser_credential=False)
    
    client = DeidentificationClient(endpoint, credential)
    sample_text = "Patient John Doe was admitted to Boston General Hospital on 01/12/2023."
    body = DeidentificationContent(input_text=sample_text, operation="tag")
    result: DeidentificationResult = client.deidentify(body)

    Amazon Medical Comprehend

    Amazon Comprehend Medical is a HIPAA-eligible natural language processing (NLP) service that leverages machine learning to extract valuable health data from unstructured medical text. This powerful tool can quickly and accurately identify medical entities such as conditions, medications, dosages, tests, treatments, and protected health information (PHI) from various clinical documents including physician’s notes, discharge summaries, and test results. With its ability to understand context and relationships between extracted information, Amazon Comprehend Medical offers a robust solution for healthcare professionals and researchers looking to automate data extraction, improve patient care, and streamline clinical workflows. You can get more details from here: https://docs.aws.amazon.com/comprehend-medical/latest/dev/textanalysis-phi.html
    Here is a sample code for using Amazon Medical Comprehend:

    import boto3
    
    # Extract validated credentials for role assumption
    validated_access_key_id = MFA_validated_token['Credentials']['AccessKeyId']
    validated_secret_access_key = MFA_validated_token['Credentials']['SecretAccessKey']
    validated_session_token = MFA_validated_token['Credentials']['SessionToken']
    
    temp_sts_client = boto3.client(
        'sts',
        aws_access_key_id=validated_access_key_id,
        aws_secret_access_key=validated_secret_access_key,
        aws_session_token=validated_session_token
    )
    
    # choose your role
    target_role_arn = "" # update here
    
    # Assume the desired role
    response = temp_sts_client.assume_role(
        RoleArn=target_role_arn,
        RoleSessionName='MedComp'
    )
    
    # temporary creds
    tmp_access_key_id = response['Credentials']['AccessKeyId']
    tmp_secret_access_key = response['Credentials']['SecretAccessKey']
    tmp_session_token = response['Credentials']['SessionToken']
    
    client = boto3.client(service_name='comprehendmedical',
                          region_name='', # update here
                          aws_access_key_id = tmp_access_key_id,
                          aws_secret_access_key = tmp_secret_access_key,
                          aws_session_token = tmp_session_token
                         )
    
    sample_text = "Patient John Doe was admitted to Boston General Hospital on 01/12/2023."
    result = client.detect_phi(Text= sample_text)

    Comparison of the Tools

    The most significant difference between these tools lies in their adaptability. Azure Health Data Services and Amazon Medical Comprehend are API-based, black-box cloud solutions, making modifying or adapting results to specific needs impossible. On the other hand, the Healthcare NLP library’s de-identification pipeline can be loaded and utilized with only two lines of code. You can also customize the outputs by adjusting the pipeline’s stages to meet your specific needs. Furthermore, it can be used locally.

    Comparison Table Of The Tools

    You can find the code for using all these tools in the De-identification Performance Comparison Of Healthcare NLP VS Cloud Solutions Notebook.

    Evaluation Criteria

    In this benchmark study, we employed two distinct approaches to compare accuracy:

    A. Entity-Level Evaluation: Since de-identifying PHI data is a critical task, we evaluated how well de-identification tools detected entities present in the annotated dataset, regardless of their specific labels in the ground truth. The detection outcomes were categorized as:

    • full_match: The entire entity was correctly detected.
    • partial_match: Only a portion of the entity was detected.
    • not_matched: The entity was not detected at all.

    For example:

    Text: “Patient John Doe was admitted to Boston General Hospital on 01/12/2023.”
    Ground Truth Entity: John Doe (NAME)
    Predicted Entity: John Doe (NAME) ==> full_match
    Predicted Entity: John==> partial_match

    If predictions don’t have any match with “John Doe”, the result is “not_matched”.

    B. Token-Level Accuracy: The text in the annotated dataset was tokenized, and the ground truth labels assigned to each token were compared with predictions made by the Healthcare NLP library, Amazon Medical Comprehend, and Azure Health Data Services. Classification reports were generated for each tool, comparing their precisionrecall, and F1 scores.

    This dual approach comprehensively evaluated each tool’s performance in de-identifying PHI data.

    De-identification Results of the Tools on a Sample Text

    Methodology

    In this study, some differences were observed between the predictions of the de-identification services and the ground truth labels. In the ground truth dataset, entities were annotated with more generic labels. For example, all names were annotated as NAME instead of using labels like PATIENT_NAME for patient names and DOCTOR_NAME for doctor names. To ensure consistency, the labels from the de-identification tools’ predictions were mapped to the corresponding ground truth labels.

    Additionally, for a fair comparison, entities that could not be mapped to the ground truth labels (e.g., PROFESSIONORGANIZATION, etc.) were excluded from the predictions before comparing the results.

    Label Mapping Table

    After obtaining the predictions and completing the preprocessing, the entity count table is as follows:

    Entity Counts Table

    Results and Findings

    Entity-Level Evaluation

    The results obtained from matching the predictions made by Healthcare NLP, Amazon Medical Comprehend, and Azure Health Data Services with the ground truth entities are as follows:

    Matching Status Dataframe

    Based on these results, the percentages representing the match rates for each de-identification tool are plotted as shown below:

    Entity Match Results In Percentage

    Token-Level Evaluation

    The resulting data frame, obtained by tokenizing the text annotated as ground truth and adding the corresponding ground truth, Healthcare NLP, Amazon Medical Comprehend, and Azure Health Data Services prediction labels for each token, is as follows:

    Token Level Results Dataframe

    Based on the token-level results, the classification reports showing the comparison between each de-identification tool’s predictions and the ground truth labels are as follows:

    Healthcare NLP:

    Classification Report of The Healthcare NLP Library Predictions

    Azure Health Data Services:

    Classification Report of The Azure Health Data Services Predictions

    Amazon Medical Comprehend:

    Classification Report of The Amazon Medical Comprehend Predictions

    Let’s visualize the F1-Scores for each label on a single plot:

    F1-Score Table for Each Label

    PHI Entity Prediction Comparison

    The de-identification task is crucial in ensuring the privacy of sensitive data. In this context, the primary focus is not on the specific labels of PHI entities but on whether they are successfully detected. When evaluating the results based on the classification of entities as PHI or non-PHI, the outcomes for PHI entity detection are as follows:

     

    PHI Entity Detection Scores

    Price Analysis of the Tools

    When handling large volumes of clinical notes, cost becomes a key consideration. Since such small datasets are rare in real-world applications, we estimated the expenses for these tools based on processing 1 million unstructured clinical notes, with each document averaging around 5,250 characters.

    Amazon Comprehend Medical Pricing: According to the price calculator, obtaining PHI predictions for 1M documents, with an average of 5,250 characters per document, costs $14,525.

    Azure Health Data Services Pricing: Based on the pricing page, generating PHI predictions for 1M documents, each averaging 5,250 characters, costs $13,125.

    Healthcare NLP Pricing: When using John Snow Labs-Healthcare NLP Prepaid product on an EC2–32 CPU (c6a.8xlarge at $1,2 per hour) machine, obtaining the de-identification codes for PHI entities from approximately 48 documents takes around 39.4 seconds. Based on this, processing 1M documents and de-identifying the PHI entities would take about 13680 minutes (228 hours (approximately 9.5 days), but with proper scaling, it can be completed in just one day)costing $273 for infrastructure and $2145 for the license (considering a 1-month license price of $7,000). Thus, the total cost for Healthcare NLP is approximately $2418.

    Cost Comparison of Tools for Processing 1 Million Clinical Notes

    According to our estimates for processing 1 million clinical notes, Azure and Amazon incur substantially higher costs, while Healthcare NLP allows organizations to perform de-identification at a fraction of the price. This makes it an ideal choice for institutions handling large volumes of data, providing both affordability and customization without compromising performance.

    Conclusion

    In this study, we compared the performance of Healthcare NLP, Amazon Medical Comprehend, and Azure Health Data Services tools on a ground truth dataset annotated by medical experts. The results were evaluated from two perspectives: chunk-level and token-level.

    • The chunk-level evaluation showed that Healthcare NLP outperformed the others in terms of accurately capturing entities and minimizing missed entities. Following Healthcare NLP, Azure Health Data Services and Amazon Medical Comprehend ranked second and third, respectively.
    • The token-level evaluation revealed that Healthcare NLP achieved the best results in terms of precision, recall, and F1 score, with Azure Health Data Services and Amazon Medical Comprehend trailing behind in that order.
    • Adaptability of Tools: One of the key differences among these tools is their level of adaptability. While Azure Health Data Services and Amazon Medical Comprehend are API-based, black-box cloud solutions that do not allow modifications or customization of results, the Healthcare NLP library offers a highly flexible approach. Its de-identification pipeline can be implemented with just two lines of code, and users can easily customize outputs by modifying the pipeline’s stages to suit specific requirements.
    • Cost-Effectiveness: Healthcare NLP emerges as the most cost-effective solution when comparing the cost of de-identification tools for processing large-scale clinical notes. Unlike API-based services such as Azure and Amazon, which charge per request and can become expensive at scale, Healthcare NLP offers a flexible, local deployment option that significantly reduces costs. Additionally, even if you process 1 billion clinical notesthe cost remains fixed for the same time period. Unlike API-based solutions, which charge per request and scale in cost as volume increases, the Healthcare NLP library offers consistent pricing.

    In summary, the Healthcare NLP library consistently outperformed Azure Health Data Services and Amazon Medical Comprehend in both chunk-level and token-level evaluations, achieving the highest precision, recall, and F1 scores while minimizing missed entities. Beyond accuracy, its adaptability sets it apart, unlike Azure and Amazon, which are black-box API solutions with no customization options, Healthcare NLP allows users to modify its de-identification pipeline to meet specific needs. Additionally, it proves to be the most cost-effective solution, offering substantial savings compared to cloud-based alternatives that charge per request.

    You can find the whole code in this notebook: De-identification Performance Comparison Of Healthcare NLP VS Cloud Solutions Notebook

    Beyond its standalone capabilities, the Healthcare NLP library is also available on AWSSnowflake, and Databricks Marketplaces, making it easier for organizations to integrate de-identification solutions into their existing workflows.

    • Seamless Integration: Deploy directly within your preferred cloud ecosystem without a complex setup.
    • Scalability: Process large-scale clinical data efficiently.
    • Compliance & Security: Designed for healthcare-grade data protection, meeting HIPAA and GDPR standards.

    Healthcare NLP models are licensed, so if you want to use these models, you can watch “Get a Free License For John Snow Labs NLP Libraries” video and request one license here.

    Try Healthcare LLMs button

    How useful was this post?

    Try Healthcare NLP

    See in action
    Avatar photo
    Master’s Degree St. in Artificial Intelligence
    Our additional expert:
    Muhammet Santas has a Master’s Degree St. in Artificial Intelligence and works as a Data Scientist at John Snow Labs as part of the Healthcare NLP Team.

    Reliable and verified information compiled by our editorial and professional team. John Snow Labs' Editorial Policy.

    John Snow Labs New Suite of Medical Language Models Advance Industry Benchmarks

    Introduction John Snow Labs’ latest 2025 release of its Medical Large Language Models advance Healthcare AI by setting new state-of-the-art accuracy on...
    preloader