- Sensitive information is often “burned” into the image, which requires computer vision or OCR to identify
- Sensitive information is also stored in metadata fields, some of which include unstructured text
- The DICOM standard is decades old, hence there are thousands of variants of file formats and metadata fields
- Each DICOM file can contain thousands of images (slices), in different resolutions
- Different image modalities (MRI vs. US vs. CT scans) have their own nuances
This session presents a scalable, enterprise-grade solution that provides high accuracy across supporting multiple image formats and clinical modalities. Join to see live demos & code that tackles these challenges with the help of John Snow Labs’ Visual NLP. We’ll will explore DICOM processing capabilities, from computing basic metrics on a potentially large dataset to de-identifying images and metadata. We will also discuss infrastructure and how to scale pipelines to handle heavy workloads.
Alberto Andreotti is a data scientist at John Snow Labs, specializing in Machine Learning, Natural Language Processing, and Distributed Computing. With a background in Computer Engineering, he has expertise in developing software for both Embedded Systems and Distributed Applications. Alberto is skilled in Java and C++ programming, particularly for mobile platforms. His focus includes Machine Learning, High-Performance Computing (HPC), and Distributed Systems, making him a pivotal member of the John Snow Labs team.