We are happy to announce the Legal NLP 1.6 is out.
Legal NLP is a John Snow Lab’s product, launched 2022 to provide state-of-the-art, autoscalable, domain-specific NLP on top of Spark.
With more than 600 models, featuring Deep Learning and Transformer-based architectures, NLP for legal includes:
- Annotators to carry out Name Entity Recognition, Relation Extraction, Assertion Status / Understanding Entities in Context, Data Mapping to external sources, Deidentification, Question Answering, Table Question Answering, Sentiment Analysis, Summarization and much more, both training and inference!
- Zero-shot Name Entity Recognition and Relation extraction;
- 600+ pretrained Deep Learning / Transformer-based models;
- Fully integration with Databricks, AWS or Azure;
- 33+ notebooks and 25+ demos ready to showcase its features.
- Full integration with NLP Lab (former Annotation Lab) for managing your annotation projects and train your legal NLP model in a zero-code fashion.
- Compatiblity with Visual NLP, to combine OCR/Visual capabilities, as Signature Extraction, Form Recognition or Table detection, to Legal NLP.
Portuguese, Italian, French, English Legal Language Models
This is a list of new LM you can use to calculate the embeddings of your texts and train Legal nlp models on the top of them (including English, Portuguese, Italian and French):
bert_embeddings_Legal_BERTimbau_base_pt bert_embeddings_legal_bert_base_cased_ptbr_pt bert_embeddings_bert_large_portuguese_cased_legal_mlm_pt
bert_embeddings_Legal_BERTimbau_large_pt
bert_embeddings_Italian_Legal_BERT_it
bert_embeddings_legal_bert_small_uncased_en
bert_embeddings_custom_legalbert_en
bert_embeddings_legalbert_en
bert_embeddings_legalbert_large_1.7M_2_en
bert_embeddings_Legal_heBERT_he
bert_embeddings_legalbert_large_1.7M_1_en
bert_embeddings_legal_bert_base_uncased_finetuned_ledgarscotus7_en
bert_embeddings_Legal_heBERT_ft_he
bert_embeddings_bert_small_finetuned_legal_contracts_larger4010_en
bert_embeddings_bert_tiny_finetuned_legal_definitions_en
bert_embeddings_bert_small_finetuned_legal_definitions_en
bert_embeddings_legal_bert_base_uncased_finetuned_RRamicus_en
bert_embeddings_bert_small_finetuned_legal_definitions_longer_en
bert_embeddings_bert_small_finetuned_legal_contracts10train10val_en
bert_embeddings_bert_small_finetuned_legal_contracts_larger20_5_1_en
camembert_embeddings_Italian_Legal_BERT_SC_it
camembert_embeddings_legal_camembert_fr
camembert_embeddings_legal_distilcamembert_fr
camembert_embeddings_lsg16k_Italian_Legal_BERT_SC_it
Applicable Law NER and Classification
Classify paragraphs talking about Applicable Law and extract the laws from them using legclf_applicable_law_cuad_en and legner_applicable_law_clause.
Dispute Clauses Law NER and Classification
Classify paragraphs talking about Dispute Clauses using legclf_dispute_clauses_cuad and extract entities from them using legner_dispute_clauses.
11 New Clause Classifiers
Identify clauses in your legal agreements with this new clause binary classifiers. They can retrieve for you the following categories given an input paragraph:
– legclf_tax_matters_md: [tax_matter, other]
– legclf_choice_of_law_md:[choice_of_law, other]
– legclf_term_of_agreement_md:[term_of_agreement, other]
– legclf_attorney_fees_md:[attorney_fees, other]
– legclf_effect_of_termination_md:[effect_of_termination, other]
– legclf_affirmative_covenants_md:[affirmative_covenants, other]
– legclf_amendments_and_waivers_md:[amendments_and_waivers, other]
– legclf_miscellaneous_provisions_md: [miscellaneous_provisions, other]
– legclf_conditions_precedent_md: [conditions_precedent, other]
– legclf_fees_and_expenses_md:[fees_and_expenses, other]
– legclf_termination_for_cause_md:[termination_for_cause, other]
Contextual Parser (scalable rule-base NER)
Detect HEADERs and SUBHEADERs in legal agreements, for splitting purposes. Included in legpipe_header_subheader
all along with other DL-based models for a hybrid extraction approach.
Split legal agreements into sections using pretrained pipelines
Combine NER, Contextual Parser and ChunkSplitting models to split legal agreements into different sections automatically, using legpipe_header_subheader
Augmented legner_contract_doc_parties
We have improved our Legal NER model to detect Document Types, Parties, Aliases, Former Names, Organizations and Effective Dates in legal agreements. The new model is lg(large)
and can be found by the name of legner_contract_doc_parties_lg
New training notebooks and certification training
We have continued creating notebooks showcasing Legal NLP functionalities, with more than 33 notebooks available. Some of the new ones include:
- Legal BertForTokenClassification training and inference, using any available Legal Bert model available in Hugging Face.
- ContextualParser for carrying out rule-based NER at scale, both training and inference.
- Binary, Multiclass and Multilabel classification of legal texts, training and inference.
- RelationExtraction based on spanBert training.
- EntityResolution (normalization) and Chunk Mapping (data augmentation training notebooks.
- Using NER to split a text with ChunkSentenceSplitting.
- Legal Use Case notebook: Creating graphs from Credit Agreements.
How to run
Legal NLP is very easy to run on both clusters and driver-only environments using johnsnowlabs
library:
!pip install johnsnowlabs
nlp.install(force_browser=True) nlp.start()
Fancy trying?
We’ve got 30-days free licenses for you with technical support from our legal team of technical and SME. Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!
Try Legal NLP
See in action