Accelerating progress in personalized healthcare requires learning the causal relationships between diseases, genes, treatments, medications, labs, and other clinical information – at scale over a large population and time range. More than half of the clinically relevant data in oncology is only found in free-text pathology reports, radiology reports, sequencing reports, and progress notes.
Extracting and normalizing these facts from these clinical documents requires training oncology-specific models that can accurately extract these specific facts from a variety of documents. This talk describes results and lessons learned, from a real-world project doing this at scale.