Legal NLP releases Subpoenas section classification model and more LLM examples and use cases

13.06.2023

David Cecchini

Data Scientist at John Snow Labs

The latest version of Legal NLP, 1.15 introduces numerous additional features to the existing collection of 926+ models and 125+ Language Models from previous releases of the library. Let’s examine each of these new capabilities in detail.

New Subpoenas section classifier

This model can identify important sections of subpoena documents, such as `INSTRUCTION`, `ARGUMENT`, `DEFINITION`, `NOTIFICATION`, `DOCUMENT_REQUEST`, `STATEMENT_OF_FACTS`, `CONCLUSION`, among others.

With the sections classified, we could run other models that are specialized in finding insights on each specific section.

Updated LLM examples

With the increase in the capabilities of the library, we added new examples to help users understand how to perform certain specific tasks:

Text summarization

The updated notebook now shows an example of how to perform summarization on long documents. This is one approach to the challenging problem of how to process long documents with the limitations of the current models in terms of number of tokens they can process on the input texts.
By splitting the document into chunks and taking into consideration the number of tokens that can be processed by the model at each run, the approach we used was able to summarize a long document by split-and-merge strategy.

Text Generation

In this notebook, we show how to use the Flan-T5-based model to continue generating texts in the Legal domain (text generation), finetuned on in-house data.

Normalizing date mentions in text

This notebook shows how to use Legal Natural Language Processing to standardize date mentions in the texts to a unique format. When working with data coming from various sources, we may incur the problem of some of the sources using the format mm/dd/yyyy, while other sources use dd/mm/yyyy, and any other format. By standardizing the date mentions, we can easily apply other analytics on the texts to obtain insights from the data.

The legal.DateNormalizer annotator is even capable of standardizing relative date (current day is customizable).

Extracting important key phrases from text

With the legal.ChunkKeyPhraseExtraction annotator, it is possible to extract the most relevant phrases given candidates coming from either N-Grams or NER entities.

Drawing boxes around entities in PDF files with Visual NLP and Legal NLP

This example notebook shows how to combine the power of Visual NLP and Finance NLP to identify entities coming from PDF/Image files by first extracting the text from the file and using one of the Legal NLP pretrained NER models. Finally, mapping the found entities back to the file and marking them visually.

Fancy trying?

We’ve got 30-days free licenses for you with technical support from our legal team of technical and SME. This trial includes complete access to more than 926 models, including Classification, NER, Relation Extraction, Similarity Search, Summarization, Sentiment Analysis, Question Answering, etc. and 120+ legal language models.

Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!

Don’t forget to check our notebooks and demos.

How to run

Legal NLP is extremely easy to run on both clusters and driver-only environments using johnsnowlabs library:

!pip install johnsnowlabs

from johnsnowlabs import nlp

nlp.install(force_browser=True)

# Start Spark Session
spark = nlp.start()

David Cecchini

Data Scientist at John Snow Labs

Our additional expert:

Ph.D. at Tsinghua-Berkeley Shenzhen Institute | Data Scientist

Legal NLP releases new LLM demos, LLM-based Question Answering, Longformer and Camembert models , Greek Regulation Classification and notebooks and demos

Juan Martinez

Legal NLP 1.14 comes with a lot of new capabilities added to the 926+ models and 125+ Language Models already available in...