was successfully added to your cart.

1 Line of Code to Train A Multilingual Text Classifier for 100+ Languages with NLU 1.1.4

Avatar photo
Senior Data Scientist at John Snow Labs

We are very excited to announce NLU 1.1.4 has been released and comes with a lot of tutorials showcasing how you can train a multilingual text classifier on just one starting language which then will be able to classify labels correct for text in over 100+ languages.


This is possible by leveraging the language-agnostic BERT Sentence Embeddings(LABSE). In addition to that tutorials for English pure classifiers for stock market sentiment, sarcasm and negations have been added.

You can train a classifier with default USE embeddings in just 1 line

You can use any other embedding by specifying it before the classifier reference

If you train with LABSE your model will understand 100+ languages, even if you train only in one language!

Finally, this release makes working in Spark environments easier, by providing a way to get a Spark DF, regardless of your input data.

New NLU Multi-Lingual training tutorials

These notebooks showcase how to leverage the powerful language-agnostic BERT Sentence Embeddings(LABSE) to train a language-agnostic classifier.
You can train on one start language(i.e. English dataset) and your model will be able to correctly predict the labels in every one of the 100+ languages of the LABSE embeddings.

New NLU training tutorials (English)

These are simple training notebooks for binary classification for English

How useful was this post?

Avatar photo
Senior Data Scientist at John Snow Labs
Our additional expert:
Christian Kasim Loan is a computer scientist with over 10 years of coding experience who works for John Snow Labs as a Senior Data Scientist where he helps porting the latest and greatest Machine Learning Models to Spark and created the NLU library.
preloader