was successfully added to your cart.

Finance NLP releases new ABSA demo app and BGE embeddings model

Avatar photo
Data Scientist at John Snow Labs

The latest version of the library comes with a better embedding model and a new demo app for Aspect-Based Sentiment Analysis

BGE Sentence Embedding Model

The new model adds to the library’s capabilities to create vector representations of financial texts aimed at performing Retrieval Augmented Generation (RAG) applications. BAAI General Embeddings (BGE) is one of the top open-source models for retrieval applications, according to MTEB.

To use the model in Legal NLP, add the BertEmbeddings annotator with the legembeddings_bge_base pretrained model. This annotator requires document and token columns:

documentAssembler = (
    nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)

tokenizer = nlp.Tokenizer().setInputCols("document").setOutputCol("token")

bge = (
    nlp.BertEmbeddings.pretrained(
        "finembeddings_bge_base", "en", "finance/models"
    )
    .setInputCols(["document", "token"])
    .setOutputCol("bge")
)

pipeline = nlp.Pipeline(stages=[documentAssembler, tokenizer, bge])

Then, we can create an example sentence to transform using the pipeline:

example = spark.createDataFrame(
    [["""What is the best way to invest in the stock market?"""]]
).toDF("text")

result = (
    pipeline.fit(example)
    .transform(example)
    .selectExpr("explode(bge.embeddings) as bge_embeddings")

The obtained result contains word embeddings with the BGE vectors. If we want to transform it into a sentence embedding, we can use the SentenceEmbeddings annotator to apply a pooling strategy (currently AVERAGE and SUM are available.

sent_embedding = (
    nlp.SentenceEmbedings()
    .setInputCols(["document", "bge"])
    .setOutputCol("sentence_bge")
    .setPoolingStrategy("AVERAGE")
)

New ABSA demo app

The new demo app shows the capabilities of the previously released ABSA model, which can identify financial entities in the text and classify them on financial sentiments. The demo is available at this link.

For example, the sentence:

Very few regions were immune from the harsh winter, but our segregation focused on those stores where we believe the adverse impact on net sales was significant

The model can identify net sales as a financial entity (REVENUE), with a NEGATIVE sentiment associated with it.

Fancy trying?

We’ve got 30-day free licenses for you with technical support from our financial team of technical and SMEs. This trial includes complete access to more than 150 models, including Classification, NER, Relation Extraction, Similarity Search, Summarization, Sentiment Analysis, Question Answering, etc., and 50+ financial language models.

Just go to https://www.johnsnowlabs.com/install/ and follow the instructions!

Don’t forget to check our notebooks and demos.

How to run

Finance NLP is relatively easy to run on both clusters and driver-only environments using johnsnowlabs library:

Install the johnsnowlabs library:

pip install johnsnowlabs

Then, on Python, install NLP for finance with

from johnsnowlabs import nlp

nlp.install(force_browser=True)

Then, we can import the Finance NLP module and start working with Spark.

from johnsnowlabs import nlp, finance

# Start Spark Session
spark = nlp.start()

methods of how to install in specific environments, please check the docs.

How useful was this post?

Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

See in action
Avatar photo
Data Scientist at John Snow Labs
Our additional expert:
Ph.D. at Tsinghua-Berkeley Shenzhen Institute | Data Scientist

Finance NLP Releases Large E5 Sentence Embedding model and medium Aspect-based Sentiment Analysis models

Version 1.20.0 of the library has an optimized sentence embedding model for RAG application in the Finance domain and aspect-based sentiment analysis...
preloader