Randomized Controlled Trials (RCT) classification using Spark NLP

08.07.2022

Sai Shailesh

In this article, I give a brief introduction to Randomized Controlled Trials (RCT). Also, an overview of the classification models and pretrained pipelines available in Spark NLP for the classification of RCT.

What are Randomized controlled trials (RCT)?

An RCT is a type of study in which participants are randomly assigned to one of two or more clinical interventions or treatments. The RCT is the most scientifically rigorous method of hypothesis testing available, and is regarded as the gold standard trial for evaluating the effectiveness of interventions.

In clinical research, randomized controlled trials (RCT) are the best way to study the safety and efficacy of new treatments. RCT are used to answer patient-related questions and are required by governmental regulatory bodies as the basis for approval decisions.

Why are RCT important?

Because the techniques utilised during the execution of an RCT minimise the potential of confounding factors influencing the outcomes, it is thought to provide the most credible data on the effectiveness of treatments. As a result, the results of RCT are more likely to be closer to the genuine effect than the results of other study methodologies.

Why is it difficult to identify the RCT articles?

The term “RCT” is frequently absent or applied inconsistently in controlled vocabularies of major research databases. As a result, people conducting systematic reviews must often manually read (“screen”) thousands of irrelevant records in order to locate a small number of RCT that are relevant. This is one of the reasons why doing systematic reviews is time-consuming and costly, an issue made worse by the rapid development of the published evidence base.

What have we achieved?

We give SOTA models that can determine whether or not a scientific publication is an RCT. Furthermore, we have pre-trained pipelines that are already fitted with certain annotators and transformers based on the use case and can be launched with only one line of code.

What type of dataset have we used?

We utilised a classification dataset to see if the model could identify whether or not a scientific paper was an RCT (True or False).

Try Spark NLP

See in action

Sai Shailesh

Our additional expert:

👋 I am a Computer Science Senior who juggles his studies and life as a Data Scientist. ⚓ I am Currently working at John Snow Labs as a Data Scientist and maintaining/contributing to the SparkNLP library.

John Snow Labs Releases Spark NLP 4.0, Delivering 8x Speedups, Native M1 Support, and 1,000+ New Models to the Most Used NLP Library in the Enterprise

Maziyar Panahi

Modern Extractive Question Answering Annotators, Notable Performance Improvements, and State-of-the-Art Models Define Spark NLP 4.0 John Snow Labs, the Healthcare AI and...

Randomized Controlled Trials (RCT) classification using Spark NLP

What are Randomized controlled trials (RCT)?

Why are RCT important?

Why is it difficult to identify the RCT articles?

What have we achieved?

What type of dataset have we used?

1. Classification using UniversalSentenceEncoder

Let Us Break It Down:

2. Classification using BertSentenceEmbeddings

3. Classification using MedicalBertForSequenceClassification

Conclusion

SparkNLP Resources

John Snow Labs Releases Spark NLP 4.0, Delivering 8x Speedups, Native M1 Support, and 1,000+ New Models to the Most Used NLP Library in the Enterprise

Join the Global Healthcare AI Community

Randomized Controlled Trials (RCT) classification using Spark NLP

What are Randomized controlled trials (RCT)?

Why are RCT important?

Why is it difficult to identify the RCT articles?

What have we achieved?

What type of dataset have we used?

1. Classification using UniversalSentenceEncoder

Let Us Break It Down:

2. Classification using BertSentenceEmbeddings

3. Classification using MedicalBertForSequenceClassification

Conclusion

SparkNLP Resources

John Snow Labs Releases Spark NLP 4.0, Delivering 8x Speedups, Native M1 Support, and 1,000+ New Models to the Most Used NLP Library in the Enterprise

Recommended For You