Intent and Action Classification, analyze Chinese News and Crypto market, 200+ languages & answer questions with NLU 1.1.3

02.03.2021

Christian Kasim Loan

Senior Data Scientist at John Snow Labs

We are very excited to announce that the latest NLU release comes with a new pretrained Intent Classifier and NER Action Extractor for text related to music, restaurants, and movies trained on the SNIPS dataset. Make sure to check out the models’ hub and the easy 1-liners for more info!

In addition to that, new NER and Embedding models for Bengali are now available.

Finally, there is a new NLU Webinar with 9 accompanying tutorial notebooks which teach you a lot of things and is segmented into the following parts:

Part1: Easy 1 Liners
- Spell checking/Sentiment/POS/NER/ BERTtology embeddings
Part2: Data analysis and NLP tasks on Crypto News Headline dataset
- Preprocessing and extracting Keywords, Emotions, Named Entities and visualize them
Part3: NLU Multi-Lingual 1 Liners with Microsoft’s Marian Models
- Translate between 200+ languages (and classify lang afterward)
Part 4: Data analysis and NLP tasks on Chinese News Article Dataset
- Word Segmentation, Lemmatization, Keywords Extraction, Extract Named Entities and translate to English
Part 5: Train a sentiment Classifier that understands 100+ Languages
- Train on a french sentiment dataset and predict the sentiment of 100+ languages with language-agnostic BERT Sentence Embedding
Part 6: Question answering, Summarization, Squad, and more with Google’s T5
- T5 Question answering and 18 + other NLP tasks (SQUAD / GLUE / SUPER GLUE)

New Models

NLU 1.1.3 New Non-English Models

Language	nlu.load() reference	Spark NLP Model reference	Type
Bengali	bn.ner.cc_300d	bengaliner_cc_300d	NerDLModel
Bengali	bn.embed	bengali_cc_300d	NerDLModel
Bengali	bn.embed.cc_300d	bengali_cc_300d	Word Embeddings Model (Alias)
Bengali	bn.embed.glove	bengali_cc_300d	Word Embeddings Model (Alias)

NLU 1.1.3 New English Models

Language	nlu.load() reference	Spark NLP Model reference	Type
English	en.classify.snips	nerdl_snips_100d	NerDLModel
English	en.ner.snips	classifierdl_use_snips	ClassifierDLModel

New NLU Webinar

State-of-the-art Natural Language Processing for 200+ Languages with 1 Line of code

Talk Abstract

Learn to harness the power of 1,000+ production-grade & scalable NLP models for 200+ languages – all available with just 1 line of Python code by leveraging the open-source NLU library, which is powered by the widely popular Spark NLP.

John Snow Labs has delivered over 80 releases of Spark NLP to date, making it the most widely used NLP library in the enterprise and providing the AI community with state-of-the-art accuracy and scale for a variety of common NLP tasks. The most recent releases include pre-trained models for over 200 languages – including languages that do not use spaces for word segmentation algorithms like Chinese, Japanese, and Korean, and languages written from right to left like Arabic, Farsi, Urdu, and Hebrew. All software and models are free and open source under an Apache 2.0 license.

This webinar will show you how to leverage the multi-lingual capabilities of Spark NLP & NLU – including automated language detection for up to 375 languages, and the ability to perform translation, named entity recognition, stopword removal, lemmatization, and more in a variety of language families. We will create Python code in real-time and solve these problems in just 30 minutes. The notebooks will then be made freely available online.

You can watch the video here,

NLU 1.1.3 New Notebooks and tutorials

New Webinar Notebooks

New easy NLU 1-liners in NLU 1.1.3

Detect actions in general commands related to music, restaurant, movies.

nlu.load("en.classify.snips").predict("book a spot for nona gray  myrtle and alison at a top-rated brasserie that is distant from wilson av on nov  the 4th  2030 that serves ouzeri",output_level = "document")

outputs :

ner_confidence	entities	document	Entities_Classes
[1.0, 1.0, 0.9997000098228455, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9990000128746033, 1.0, 1.0, 1.0, 0.9965000152587891, 0.9998999834060669, 0.9567000269889832, 1.0, 1.0, 1.0, 0.9980000257492065, 0.9991999864578247, 0.9988999962806702, 1.0, 1.0, 0.9998999834060669]	[‘nona gray myrtle and alison’, ‘top-rated’, ‘brasserie’, ‘distant’, ‘wilson av’, ‘nov the 4th 2030’, ‘ouzeri’]	book a spot for nona gray myrtle and alison at a top-rated brasserie that is distant from wilson av on nov the 4th 2030 that serves ouzeri	[‘party_size_description’, ‘sort’, ‘restaurant_type’, ‘spatial_relation’, ‘poi’, ‘timeRange’, ‘cuisine’]

Named Entity Recognition (NER) Model in Bengali (bengaliner_cc_300d)

# Bengali for: 'Iajuddin Ahmed passed Matriculation from Munshiganj High School in 1947 and Intermediate from Munshiganj Horganga College in 1950.'
    nlu.load("bn.ner.cc_300d").predict("১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন",output_level = "document")

outputs :

ner_confidence	entities	Entities_Classes	document
[0.9987999796867371, 0.9854000210762024, 0.8604000210762024, 0.6686999797821045, 0.5289999842643738, 0.7009999752044678, 0.7684999704360962, 0.9979000091552734, 0.9976000189781189, 0.9930999875068665, 0.9994000196456909, 0.9879000186920166, 0.7407000064849854, 0.9215999841690063, 0.7657999992370605, 0.39419999718666077, 0.9124000072479248, 0.9932000041007996, 0.9919999837875366, 0.995199978351593, 0.9991999864578247]	[‘সালে’, ‘ইয়াজউদ্দিন আহম্মেদ’, ‘মুন্সিগঞ্জ উচ্চ বিদ্যালয়’, ‘সালে’, ‘মুন্সিগঞ্জ হরগঙ্গা কলেজ’]	[‘TIME’, ‘PER’, ‘ORG’, ‘TIME’, ‘ORG’]	১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন

Identify intent in general text – SNIPS dataset

nlu.load("en.ner.snips").predict("I want to bring six of us to a bistro in town that serves hot chicken sandwich that is within the same area",output_level = "document")

outputs :

document	snips	snips_confidence
I want to bring six of us to a bistro in town that serves hot chicken sandwich that is within the same area	BookRestaurant	1

Word Embeddings for Bengali (bengali_cc_300d)

# Bengali for : 'Iajuddin Ahmed passed Matriculation from Munshiganj High School in 1947 and Intermediate from Munshiganj Horganga College in 1950.'
    nlu.load("bn.embed").predict("১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন",output_level = "document")

outputs :

document	bn_embed_embeddings
১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন	[-0.0828 0.0683 0.0215 … 0.0679 -0.0484…]

NLU 1.1.3 Enhancements

Added automatic conversion to Sentence Embeddings of Word Embeddings when there is no Sentence Embedding Avaiable and a model needs the converted version to run.

NLU 1.1.3 Bug Fixes

Fixed a bug that caused ur.sentiment NLU pipeline to build incorrectly
Fixed a bug that caused sentiment.imdb.glove NLU pipeline to build incorrectly
Fixed a bug that caused en.sentiment.glove.imdb NLU pipeline to build incorrectly
Fixed a bug that caused Spark 2.3.X environments to crash.

NLU Installation

# PyPi
    !pip install nlu pyspark==2.4.7
    #Conda
    # Install NLU from Anaconda/Conda
    conda install -c johnsnowlabs nlu

Additional NLU resources

Christian Kasim Loan

Senior Data Scientist at John Snow Labs

Our additional expert:

Christian Kasim Loan is a computer scientist with over 10 years of coding experience who works for John Snow Labs as a Senior Data Scientist where he helps porting the latest and greatest Machine Learning Models to Spark and created the NLU library.

Hindi WordEmbeddings, Bengali Named Entity Recognition, 30+ new models, analyze Crypto news with NLU 1.1.2

Christian Kasim Loan

We are very happy to announce NLU 1.1.2 has been released with the integration of 30+ models and pipelines Bengali Named Entity...

Intent and Action Classification, analyze Chinese News and Crypto market, 200+ languages & answer questions with NLU 1.1.3

New Models

NLU 1.1.3 New Non-English Models

NLU 1.1.3 New English Models

New NLU Webinar

State-of-the-art Natural Language Processing for 200+ Languages with 1 Line of code

Talk Abstract

NLU 1.1.3 New Notebooks and tutorials

New Webinar Notebooks

New easy NLU 1-liners in NLU 1.1.3

Detect actions in general commands related to music, restaurant, movies.

Named Entity Recognition (NER) Model in Bengali (bengaliner_cc_300d)

outputs :

Identify intent in general text – SNIPS dataset

outputs :

Word Embeddings for Bengali (bengali_cc_300d)

outputs :

NLU 1.1.3 Enhancements

NLU 1.1.3 Bug Fixes

NLU Installation

Additional NLU resources

Hindi WordEmbeddings, Bengali Named Entity Recognition, 30+ new models, analyze Crypto news with NLU 1.1.2

Recommended For You