Legal NLP Releases new Multi-label model on stack exchange topics

13.10.2023

David Cecchini

Data Scientist at John Snow Labs

Version 1.19.0 of the library comes with a topic classification multi-label model using the INSTRUCTOR embedding.

New Multilabel Model on Legal Stack Exchange topics with INSTRUCTOR

This model extends a previously released model to multilabel, trained on a more recent dataset for Legal stack exchange questions topics using the recently released INSTRUCTOR embedding model.

The model can identify more than thirty topics, listed below:

`business`, `california`, `canada`, `civil-law`, `constitutional-law`, `consumer-protection`, `contract-law`, `copyright`, `corporate-law`, `criminal-law`, `employment`, `england-and-wales`, `european-union`, `fraud`, `gdpr`, `germany`, `intellectual-property`, `international`, `internet`, `landlord`, `legal-terms`, `liability`, `licensing`, `police`, `privacy`, `property`, `real-estate`, `rental-property`, `software`, `tax-law`, `terms-of-service`, `trademark`, `traffic`, `united-kingdom`, `united-states`, `us-constitution`

Please note that some categories from the original data were removed due to limited examples. The model depends on the INSTRUCTOR embedding, which can be used in a Spark NLP pipeline as follows:

embeddings = (
    nlp.InstructorEmbeddings.pretrained("instructor_large", "en")
    .setInstruction("Represent for multilabel classification:")
    .setInputCols(["document"])
    .setOutputCol("sentence_embeddings")
)

classifierdl = (
    nlp.MultiClassifierDLModel.pretrained(
        "legmulticlf_law_stack_exchange", "en", "legal/models"
    )
    .setInputCols(["sentence_embeddings"])
    .setOutputCol("class")
)

clf_pipeline = nlp.Pipeline(
    stages=[document_assembler, embeddings, classifierdl]
)

For the given example:

At one point, I saw this label on Coke cans and wondered: is this legally enforceable? If it’s not, is it possible for a retailer in any way to disallow the resale of an item purchased? Something like, I don’t know, maybe a license you must agree to to purchase said item? This comes in the larger context of these new tech releases (GPUs, consoles) and how the producers/retailers could legally prevent scalpers.

We can extract the topics with the pipeline (text formatted for visualization purposes):

    df = spark.createDataFrame(
    [
        [
            """I've seen this label on Coke cans at one point, and I was 
            wondering: is this legally enforceable? If it's not, is it possible
            for a retailer in any way to disallow the resale of an item
            purchased? Something like, I don't know, maybe a license you have
            to agree to in order to be allowed to purchase said item?
            This comes in the larger context of these new tech releases
            (GPUs, consoles) and how the producers/retailers could legally
            prevent scalpers."""
        ]
    ]
).toDF("text")

model = clf_pipeline.fit(df)
result = model.transform(df)

result.select("class.result").show(truncate=False)

Obtaining the following results:

+------------------------+
result                  |
+------------------------+
|[contract-law, business]|
+------------------------+

The topics obtained were contract-law and business.

Fancy trying?

We have 30-day free licenses for you with technical support from our legal team of experts and SME. This trial includes access to more than 926 models, including Classification, NER, Relation Extraction, Similarity Search, Summarization, Sentiment Analysis, Question Answering, etc., and 120+ legal language models.

Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!

Don’t forget to check our notebooks and demos.

How to run

Legal NLP (NLP for contracts) is extremely easy to run on both clusters and driver-only environments using johnsnowlabs library:

pip install johnsnowlabs

from johnsnowlabs import nlp

nlp.install(force_browser=True)

# Start Spark Session
spark = nlp.start()

# Import the Legal NLP module
from johnsnowlabs import legal

For alternative installation methods of how to install in specific environments, please check the docs.

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action

David Cecchini

Data Scientist at John Snow Labs

Our additional expert:

Ph.D. at Tsinghua-Berkeley Shenzhen Institute | Data Scientist

NLP Summit Insights: How LLMs Are Shaping the Future of Modern Business

Ida Lucente

NLP Summit Experts Share Perspectives on How Advanced NLP Technologies for Healthcare, Finance, Legal Will Shape Their Industries and Unleash Better &...