White Swan’s Impressions of John Snow Labs’ Clinical Named Entity Recognition

11.03.2021

Beth Fordham

Operations Director at Black Swan Data

At an industry conference in 2018, our volunteer Rob Lovelock was introduced to a company called John Snow Labs, which specialized in developing healthcare AI and LLM models. At the time there was no immediate project suitable for White Swan and John Snow Labs to work on together, but Rob was impressed with their work and was optimistic that an opportunity would arise.

Fast forward a couple of years to White Swan beginning to build the Million Minds app, and Rob advised me that John Snow Labs may be able to assist with one of the key parts of the project.

I set up a call with the CTO of John Snow Labs, David Talby, who recognized the potential synergy between the companies and kindly set up a complimentary trial of their Named Entity Recognition (NER) models to see whether they could be of help. An innovative AI Healthcare company, John Snow Labs work with major healthcare companies who use their NER models to pull keywords from the free-form text, something which could really speed up the development of the Million Minds product.

One of the key elements of the Million Minds app is the ability to draw out pertinent terms from a patient’s free-form text description of their symptoms. The inclusion of free-form text in this area is important. For a patient, it enables them to list every single symptom, no matter how small, and for the app, as it enables us to build a broader view of symptoms that may have not previously been shared with a doctor. Many symptom tracking apps use dropdowns and pre-determined lists to ask patients about their symptoms, but we feel initially offering patients the opportunity to describe how they feel in their own words may give us clues to a diagnosis that has been missed before. We’ll then use a chatbot to intelligently draw out additional information which will assist the AI in narrowing down to a list of possible diagnoses.

As an example – when Julie King was listing her symptoms to Steve, she told him that one of her toes had been curling. Steve was making note of absolutely everything Julie was saying whether she thought it was important or not – and the curling toe was mentioned again and again. This particular symptom would possibly not have been recognized if using a drop-down symptom list. Ultimately it was discovered that a curling toe was one of the differentiating factors between her specific type of Parkinson’s and other variants, which led to a diagnosis for Julie.

This crucial feature of the app meant we needed to find a way to identify the important terms within the free-form text. The John Snow Labs models will do part of this work for us as we can run our free-form descriptions through their model, and it will highlight some of the key terms. They also have the ability to link up these terms with other medical classifications, for example, ICD (a World Health Organisation recognized library of symptoms and conditions), which helps us to expand the list of relevant terms.

Use Case

White Swan’s Patient Description:

“I’ve been getting a high temperature and feeling hot and shivery. I also get headaches and muscle and joint pain and feel tired and exhausted with a complete loss of energy. I’ve had some standard tests done by the doctor but nothing major has come back. I sometimes also get heart palpitations”

Spark NLP for Healthcare entity recognition picked out the following terms from this (including the misspelling of palpitations):

[“hot”, “shivery”, “high”, “feeling”, “headaches”, “muscle”, “tired”, “joint”, “palpitations”, “complete”, “pain”, “a”, “heart”, “loss”, “energy”, “temperature”]

The model then categorizes these terms into ‘problems’, ‘treatments’, and ‘tests’.

It also outputs the following ICD10 classification codes:

We found the setup and usage of the models quite straightforward and the documentation and training for this process were comprehensive and thorough. The models perform well with clinical terms and quite well with more colloquial terminology, but we believe we’ll need to either train the model, (or supplement it with our own colloquial taxonomy) to get full coverage and pick up all the relevant terms. What we love about this tool is that it is possible to train the ‘out of the box model from John Snow Labs, giving us the flexibility to fast-track certain elements (by using their standard model) and then customize others to fit our use case.

We are so grateful for the opportunity to trial these models to help us build our proof of concept, and without them would have spent a lot more volunteer time on developing our own entity recognition. Instead, we’ve been able to put this volunteer time to use elsewhere on other areas of the initial build which aren’t available from third parties. In short, this collaboration has simplified the process we would have spent several months building ourselves!

The trial models have performed well in testing and we believe they are reasonably priced, so we anticipate using these in the full version of the application once we have completed a successful pilot – stay tuned for a further Blog post updating you on this.