We are very excited to announce that the latest NLU release comes with a new pretrained Intent Classifier and NER Action Extractor for text related to music, restaurants, and movies trained on the SNIPS dataset. Make sure to check out the models’ hub and the easy 1-liners for more info!
In addition to that, new NER and Embedding models for Bengali are now available.
Finally, there is a new NLU Webinar with 9 accompanying tutorial notebooks which teach you a lot of things and is segmented into the following parts:
- Part1: Easy 1 Liners
- Spell checking/Sentiment/POS/NER/ BERTtology embeddings
- Part2: Data analysis and NLP tasks on Crypto News Headline dataset
- Preprocessing and extracting Keywords, Emotions, Named Entities and visualize them
- Part3: NLU Multi-Lingual 1 Liners with Microsoft’s Marian Models
- Translate between 200+ languages (and classify lang afterward)
- Part 4: Data analysis and NLP tasks on Chinese News Article Dataset
- Word Segmentation, Lemmatization, Keywords Extraction, Extract Named Entities and translate to English
- Part 5: Train a sentiment Classifier that understands 100+ Languages
- Train on a french sentiment dataset and predict the sentiment of 100+ languages with language-agnostic BERT Sentence Embedding
- Part 6: Question answering, Summarization, Squad, and more with Google’s T5
- T5 Question answering and 18 + other NLP tasks (SQUAD / GLUE / SUPER GLUE)
New Models
NLU 1.1.3 New Non-English Models
Language | nlu.load() reference | Spark NLP Model reference | Type |
---|---|---|---|
Bengali | bn.ner.cc_300d | bengaliner_cc_300d | NerDLModel |
Bengali | bn.embed | bengali_cc_300d | NerDLModel |
Bengali | bn.embed.cc_300d | bengali_cc_300d | Word Embeddings Model (Alias) |
Bengali | bn.embed.glove | bengali_cc_300d | Word Embeddings Model (Alias) |
NLU 1.1.3 New English Models
Language | nlu.load() reference | Spark NLP Model reference | Type |
---|---|---|---|
English | en.classify.snips | nerdl_snips_100d | NerDLModel |
English | en.ner.snips | classifierdl_use_snips | ClassifierDLModel |
New NLU Webinar
State-of-the-art Natural Language Processing for 200+ Languages with 1 Line of code
Talk Abstract
Learn to harness the power of 1,000+ production-grade & scalable NLP models for 200+ languages – all available with just 1 line of Python code by leveraging the open-source NLU library, which is powered by the widely popular Spark NLP.
John Snow Labs has delivered over 80 releases of Spark NLP to date, making it the most widely used NLP library in the enterprise and providing the AI community with state-of-the-art accuracy and scale for a variety of common NLP tasks. The most recent releases include pre-trained models for over 200 languages – including languages that do not use spaces for word segmentation algorithms like Chinese, Japanese, and Korean, and languages written from right to left like Arabic, Farsi, Urdu, and Hebrew. All software and models are free and open source under an Apache 2.0 license.
This webinar will show you how to leverage the multi-lingual capabilities of Spark NLP & NLU – including automated language detection for up to 375 languages, and the ability to perform translation, named entity recognition, stopword removal, lemmatization, and more in a variety of language families. We will create Python code in real-time and solve these problems in just 30 minutes. The notebooks will then be made freely available online.
You can watch the video here,
NLU 1.1.3 New Notebooks and tutorials
New Webinar Notebooks
- NLU basics, easy 1-liners (Spellchecking, sentiment, NER, POS, BERT
- Analyze Crypto News dataset with Keyword extraction, NER, Emotional distribution, and stemming
- Translate Crypto News dataset between 300 Languages with the Marian Model (German, French, Hebrew examples)
- Translate Crypto News dataset between 300 Languages with the Marian Model (Hindi, Russian, Chinese examples)
- Analyze Chinese News Headlines with Chinese Word Segmentation, Lemmatization, NER, and Keyword extraction
- Train a Sentiment Classifier that will understand 100+ languages on just a French Dataset with the powerful Language Agnostic Bert Embeddings
- Summarize text and Answer Questions with T5
- Solve any task in 1 line from SQUAD, GLUE and SUPER GLUE with T5
- Overview of models for various languages
New easy NLU 1-liners in NLU 1.1.3
Detect actions in general commands related to music, restaurant, movies.
nlu.load("en.classify.snips").predict("book a spot for nona gray myrtle and alison at a top-rated brasserie that is distant from wilson av on nov the 4th 2030 that serves ouzeri",output_level = "document")
outputs :
ner_confidence | entities | document | Entities_Classes |
---|---|---|---|
[1.0, 1.0, 0.9997000098228455, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.9990000128746033, 1.0, 1.0, 1.0, 0.9965000152587891, 0.9998999834060669, 0.9567000269889832, 1.0, 1.0, 1.0, 0.9980000257492065, 0.9991999864578247, 0.9988999962806702, 1.0, 1.0, 0.9998999834060669] | [‘nona gray myrtle and alison’, ‘top-rated’, ‘brasserie’, ‘distant’, ‘wilson av’, ‘nov the 4th 2030’, ‘ouzeri’] | book a spot for nona gray myrtle and alison at a top-rated brasserie that is distant from wilson av on nov the 4th 2030 that serves ouzeri | [‘party_size_description’, ‘sort’, ‘restaurant_type’, ‘spatial_relation’, ‘poi’, ‘timeRange’, ‘cuisine’] |
Named Entity Recognition (NER) Model in Bengali (bengaliner_cc_300d)
# Bengali for: 'Iajuddin Ahmed passed Matriculation from Munshiganj High School in 1947 and Intermediate from Munshiganj Horganga College in 1950.' nlu.load("bn.ner.cc_300d").predict("১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন",output_level = "document")
outputs :
ner_confidence | entities | Entities_Classes | document |
---|---|---|---|
[0.9987999796867371, 0.9854000210762024, 0.8604000210762024, 0.6686999797821045, 0.5289999842643738, 0.7009999752044678, 0.7684999704360962, 0.9979000091552734, 0.9976000189781189, 0.9930999875068665, 0.9994000196456909, 0.9879000186920166, 0.7407000064849854, 0.9215999841690063, 0.7657999992370605, 0.39419999718666077, 0.9124000072479248, 0.9932000041007996, 0.9919999837875366, 0.995199978351593, 0.9991999864578247] | [‘সালে’, ‘ইয়াজউদ্দিন আহম্মেদ’, ‘মুন্সিগঞ্জ উচ্চ বিদ্যালয়’, ‘সালে’, ‘মুন্সিগঞ্জ হরগঙ্গা কলেজ’] | [‘TIME’, ‘PER’, ‘ORG’, ‘TIME’, ‘ORG’] | ১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন |
Identify intent in general text – SNIPS dataset
nlu.load("en.ner.snips").predict("I want to bring six of us to a bistro in town that serves hot chicken sandwich that is within the same area",output_level = "document")
outputs :
document | snips | snips_confidence |
---|---|---|
I want to bring six of us to a bistro in town that serves hot chicken sandwich that is within the same area | BookRestaurant | 1 |
Word Embeddings for Bengali (bengali_cc_300d)
# Bengali for : 'Iajuddin Ahmed passed Matriculation from Munshiganj High School in 1947 and Intermediate from Munshiganj Horganga College in 1950.' nlu.load("bn.embed").predict("১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন",output_level = "document")
outputs :
document | bn_embed_embeddings |
---|---|
১৯৪৮ সালে ইয়াজউদ্দিন আহম্মেদ মুন্সিগঞ্জ উচ্চ বিদ্যালয় থেকে মেট্রিক পাশ করেন এবং ১৯৫০ সালে মুন্সিগঞ্জ হরগঙ্গা কলেজ থেকে ইন্টারমেডিয়েট পাশ করেন | [-0.0828 0.0683 0.0215 … 0.0679 -0.0484…] |
NLU 1.1.3 Enhancements
- Added automatic conversion to Sentence Embeddings of Word Embeddings when there is no Sentence Embedding Avaiable and a model needs the converted version to run.
NLU 1.1.3 Bug Fixes
- Fixed a bug that caused
ur.sentiment
NLU pipeline to build incorrectly - Fixed a bug that caused
sentiment.imdb.glove
NLU pipeline to build incorrectly - Fixed a bug that caused
en.sentiment.glove.imdb
NLU pipeline to build incorrectly - Fixed a bug that caused Spark 2.3.X environments to crash.
NLU Installation
# PyPi !pip install nlu pyspark==2.4.7 #Conda # Install NLU from Anaconda/Conda conda install -c johnsnowlabs nlu