Legal NLP Releases E5 and BGE Sentence Embedding models and two subpoena demo apps

14.11.2023

David Cecchini

Data Scientist at John Snow Labs

Version 1.20.0 of the library comes with optimized sentence embedding models for RAG applications in the Legal domain and new demo apps for Subpoenas.

Sentence Embedding Models

The new sentence embedding model expands the capabilities of the library for Retrieval Augmented Generation (RAG) applications, and the capability to train text classification models. E5 and BGE models obtain top performances on standard NLP benchmarks, and we provide a Legal-specific fine-tuned version.

To use the model on a Spark NLP pipeline, simply download the pretrained model:

document_assembler = (
    nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)

E5_embedding = (
    nlp.E5Embeddings.pretrained(
        "legembedding_e5_base", "en", "legal/models"
    )
    .setInputCols(["document"])
    .setOutputCol("E5")
)

pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])

Then, we can transform texts into vector representations with:

data = spark.createDataFrame(
    [["Receiving Party shall not use any Confidential Information for any purpose other than the purposes stated in Agreement."]]
).toDF("text")

result = pipeline.fit(data).transform(data)

We can then obtain the vector arrays with:

result.select("E5.embeddings").show(truncate=100)

Obtaining:

+----------------------------------------------------------------------------------------------------+
|                                                                                          embeddings|
+----------------------------------------------------------------------------------------------------+
|[0.023808457, 0.38471168, 0.43703818, 0.6033733, -0.40925637, -0.78775537, -0.329775, -0.028943084, -0.08633131, ...|
+----------------------------------------------------------------------------------------------------+

The vector representation of the texts can easily be used on VectorDBs to implement RAG systems on domain-specific applications. Reach out to us if you want to use the capabilities of Spark and Legal NLP to enhance your RAG application on legal documents.

New demo apps for Subpoena analysis

A subpoena is a formal document issued by a court, grand jury, legislative body or committee, or authorized administrative agency. It commands an individual to appear at a specific time and provide testimony, either orally or in writing, regarding the matter specified in the document.

We released two new demo apps to identify subpoena entities and section classification, showing examples using the latest models of the library.

Identifying relevant Entities in Subpoena

Link to the demo app: here.

The demo illustrates how to use pretrained models to identify relevant information on subpoena texts. The existing entities are:

ADDRESS
MATTER_VS
APPOINTMENT_HOUR
DOCUMENT_TOPIC
DOCUMENT_PERSON
COURT_ADDRESS
APPOINTMENT_DATE
COUNTY
CASE
SIGNER
COURT
DOCUMENT_DATE_TO
DOCUMENT_TYPE
STATE
DOCUMENT_DATE_FROM
RECEIVER
MATTER
SUBPOENA_DATE
DOCUMENT_DATE_YEAR

Classifying Subpoena sections

Link to the demo app: here.

This demo shows the capabilities of our pretrained model to classify sections of the subpoena into a specific category. Available categories are:

ATTACHMENT
INSTRUCTION
ARGUMENT
DEFINITION
REQUEST_FOR_ISSUANCE_OF_SUBPOENA
NOTIFICATION
DOCUMENT_REQUEST
PRELIMINARY_STATEMENT
CERTIFICATE_OF_SERVICE
STATEMENT_OF_FACTS
CONCLUSION
BACKGROUND
CERTIFICATE_OF_FILING
INTRODUCTION
DECLARATION

Our library can expand and customize NER and Classification models for specific use cases. If you have specific needs, contact us!

Fancy trying?

We’ve got 30-day free licenses for you with technical support from our legal team of technical and SMEs. This trial includes complete access to more than 926 models, including Classification, NER, Relation Extraction, Similarity Search, Summarization, Sentiment Analysis, Question Answering, etc. and 120+ legal language models.

Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!

Don’t forget to check our notebooks and demos.

How to run

Legal NLP is extremely easy to run on both clusters and driver-only environments using johnsnowlabs library:

Install the johnsnowlabs library:

pip install johnsnowlabs

Then, in Python, install Legal NLP with:

from johnsnowlabs import nlp

nlp.install(force_browser=True)

You are ready to use all the capabilities of the library. You can start a spar session and start analyzing legal texts.

# Start Spark Session
spark = nlp.start()

For alternative installation methods of how to install in specific environments, please check the docs.

Try Legal NLP

See in action

David Cecchini

Data Scientist at John Snow Labs

Our additional expert:

Ph.D. at Tsinghua-Berkeley Shenzhen Institute | Data Scientist

Finance NLP Releases Large E5 Sentence Embedding model and medium Aspect-based Sentiment Analysis models

David Cecchini

Version 1.20.0 of the library has an optimized sentence embedding model for RAG application in the Finance domain and aspect-based sentiment analysis...

Legal NLP Releases E5 and BGE Sentence Embedding models and two subpoena demo apps

Sentence Embedding Models

New demo apps for Subpoena analysis

Fancy trying?

How to run

Finance NLP Releases Large E5 Sentence Embedding model and medium Aspect-based Sentiment Analysis models

Recommended For You