Applying State-Of-The-Art NLP to Healthcare

23.03.2021

Stepheni Hass

There’s no doubt that natural language processing in healthcare and biomedicine can help turn data into great value. Recent advances in NLP research are leading to new promising applications, from finding symptom attributes, to extracting information from electronic health records and summarizing medical text.

The Healthcare NLP Summit from April 6-9 features 30+ talks and trainings from leading NLP data scientists who will give insights into the latest NLP best practices and research in the healthcare industry as well as the open-source libraries, models & transformers you can use today.

We talked to some of the speakers of the conference and asked them about their favorite NLP techniques, notable breakthroughs, and trends as well as the NLP projects they are currently working on.

What’s your favorite NLP technique or tool for healthcare?

Data extraction from electronic health records

“Information extraction from electronic health records (EHRs) is my favorite technique of NLP healthcare. The huge volume of unstructured patient data that is put into EHRs provides a great challenge for any single physician to analyze and get a comprehensive view of the patient history. By using information extraction techniques in NLP we can provide a comprehensive, easily accessible, and accurate history of the patient to the physician.” —Mukesh Mithrakumar, Sr. Machine Learning Engineer at IQVIA

TextRay and UMLS

“Personally, I’m a big fan of “whatever works”. I won’t be running for the most recent and complicated neural network for a problem that can be solved using a simple rule-based or bag-of-words system. When I joined Zebra Medical my favorite model was TextRay, a method invented by Jonathan Laserson to quickly and accurately label multiple X-ray reports with almost zero ML training.

I also believe in exploiting available information and databases, which makes possible uses of UMLS and other medical ontologies in NLP one of my favorite subjects of research.” —Rachel Wities, NLP Data Scientist at Zebra Medical Vision

Self-supervised learning (SSL)

“I’m excited about self-supervised learning (SSL) that helps overcome the annotation bottleneck, including task-agnostic SSL such as neural language model pretraining and task-specific SSL such as distant supervision and deep probabilistic logic.” —Hoifung Poon, Senior Director of Biomedical NLP at Microsoft.

Whatever tool serves the purpose

“I’m technology/tool agnostic. In a current pharmacoepidemiology project, I’m implementing and benchmarking performance across multiple different approaches from a home-grown linguistics-focused system MTERMS to n-grams to word embeddings to transformer models to topic modeling.

For some of these models, I’m utilizing open source packages like SparkNLP, NLU, GenSim, NLTK, etc. It is worth considering different approaches (e.g., linguistics-focused and/or probability-based) for a particular task, as there may be task-dependent cases where one approach is feasible to apply whereas the other yields suboptimal performance.” —Joseph Plasek, Postdoctoral Research Fellow at Mass General Brigham

What NLP research breakthrough do you find particularly valuable for the healthcare industry?

Pre-trained language models

“Pre-trained language models are undoubtedly the most valuable breakthrough in our field. Transformer models pre-trained on clinical data not only reach state-of-the-art performance on many clinical NLP tasks, but they also help to overcome one of the most painful problems of Healthcare NLP – the shortage in labeled data.

However, this method requires a lot of time and resources for pre-training and suffers from all the problems of a black-box algorithm. How much medical knowledge such models have and how well they perform transfer-learning between different medical subdomains are still topics for further investigation.” —Rachel Wities, NLP Data Scientist at Zebra Medical Vision

Explainable NLP

“Explainable/interpretable NLP, such as the SHAP package that Microsoft Research has been developing may be particularly useful in healthcare as it provides insights for transformer models that clinicians can use to help understand why NLP made the choice that they did, which will hopefully increase trust in the system.” —Joseph Plasek, Postdoctoral Research Fellow at Mass General Brigham

What technical problem are you currently solving in the project you’re working on?

Applying model analysis tools to large language models

“My team and I are working on how best to adapt our model analysis tools for use on large language models. These models can have hundreds of billions of parameters, making it non-trivial to serve them live for interactive analysis. As larger models find more usage, it’s important that people can probe them in a low-latency way, in order to enable interactive analysis and debugging.” —James Wexler, Staff Software Engineer at Google.

Finding symptom attributes

“Building generalized models to find symptom attributes (for example sharp headache), and relate the attribute to the entity. The model has to be generic enough to find new symptom attributes (a symptom our NLP deep model didn’t see before in the training phase), for example, lower abdominal pain. The main challenge here is to deal with new unseen symptoms and get evaluation results as same as the seen symptoms in the training phase.” —Moran Beladev, Senior ML Researcher at Diagnostic Robotics

What is a trend you see for NLP in healthcare in 2021?

Smarter pre-trained language models and multi-modal algorithms

“After firmly adopting pre-trained language-models such as ELMo, BERT, and GPT in previous years, in 2020 the general NLP community explored what these models actually “know,” and what their gaps and limitations are. This exploration is now reaching the Healthcare NLP domain, and I expect that in 2021 we’ll see more efforts to make medical pre-trained language models smarter. In my lecture titled Sending BERT To Med School at the Healthcare NLP Summit I will elaborate about one way of doing it, using external medical knowledge bases.

Another rising trend is the use of multi-modal algorithms, processing both text and image. This is a very promising lead, as the use of two modalities compensate for the lack of labeled data, and give a fuller picture of the patient’s situation.” —Rachel Wities, NLP Data Scientist at Zebra Medical Vision

More accurate outputs through creative applications

“Today, incorporation of NLP data into AI models is still in early stages, where we are focused on making use of entities extracted from notes and make use of temporal or causal relations. With the progress fueled by BERT and its clinical variants, these outputs are significantly more accurate than what was available even a couple of years ago. I expect a lot of activity developing creative applications of these “low hanging fruits”. —Sutanay Choudhury, Chief Scientist at PNNL

Increasing adoption of NLP across healthcare

“While there is slow adoption of NLP across healthcare and life science organizations, there are continuous refinements and improvements taking place in algorithms and techniques of this technology. Researchers are leveraging concepts of AI like neural nets, genetic algorithms, active learning, etc. to improve the accuracy and relevancy of insights for driving value. Although NLP and text analytics is still in the evolution phase in the healthcare and life sciences industries, it is expected to be ready for production use anytime soon. Regulatory authorities are proactively evaluating this technology to determine how it can be adopted and deployed by healthcare and life sciences organizations to improve life expectancy and quality of life.” —Prathamesh Karmalkar, Principal Data Scientist at Merck

Healthcare NLP Summit 2021

Join us at the Healthcare NLP Summit on April 6-9 to hear more about the latest NLP research and use cases related to the healthcare industry. You can learn from leading data scientists from companies such as Roche, Curai, Microsoft, Alpha Health, Cigna, Amazon Health AI, Merck, and many more about best practices and challenges of applying NLP, deep learning & transfer learning in practice.

Stepheni Hass

Our additional expert:

Marketing and brand evangelist who is equally analytical and creative, with a strong focus on collaboration. Highly experienced with proven results in marketing, events planning and execution, project management, business development, team building, operations, and much more. I'm a driven, high-performing business unicorn with a passion for connecting people and building communities!