This release features 3 new tutorial notebooks for Open/Closed book question answering with Google’s T5, Intent classification, and Aspect Based NER.
In Addition, NLU 1.1.0 comes with 25+ pre-trained models and pipelines in Amharic, Bengali, Bhojpuri, Japanese, and Korean languages from the amazing Spark2.7.2 release. Finally, NLU now supports running on Spark 2.3 clusters.
NLU 1.1.1 New English Models and Pipelines
New Easy NLU 1-liner Examples:
Extract aspects and entities from airline questions (ATIS dataset)
nlu.load("en.ner.atis").predict("i want to fly from baltimore to dallas round trip")
output: ["baltimore"," dallas", "round trip"]
Intent Classification for Airline Traffic Information System queries (ATIS dataset)
nlu.load("en.classify.questions.atis").predict("what is the price of flight from newyork to washington")
output: "atis_airfare"
Recognize Entities OntoNotes – ELECTRA Large
nlu.load("en.ner.onto.large").predict("Johnson first entered politics when elected in 2001 as a member of Parliament. He then served eight years as the mayor of London.")
output: ["Johnson", "first", "2001", "eight years", "London"]
Question classification of open-domain and fact-based questions Pipeline – TREC50
nlu.load("en.classify.trec50.pipe").predict("When did the construction of stone circles begin in the UK? ")
output: LOC_other
Traditional Chinese Word Segmentation
# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.segment_words.gsd").predict("然而,這樣的處理也衍生了一些問題。")
output: ["然而",",","這樣","的","處理","也","衍生","了","一些","問題","。"]
Part of Speech for Traditional Chinese
# 'However, this treatment also creates some problems' in Chinese
nlu.load("zh.pos.ud_gsd_trad").predict("然而,這樣的處理也衍生了一些問題。")
Output:
Token |
POS |
然而 |
ADV |
, |
PUNCT |
這樣 |
PRON |
的 |
PART |
處理 |
NOUN |
也 |
ADV |
衍生 |
VERB |
了 |
PART |
一些 |
ADJ |
問題 |
NOUN |
。 |
PUNCT |
Thai Word Segment Recognition
# 'Mona Lisa is a 16th-century oil painting created by Leonardo held at the Louvre in Paris' in Thai
nlu.loadnlu.load("th.segment_words").predict("Mona Lisa เป็นภาพวาดสีน้ำมันในศตวรรษที่ 16 ที่สร้างโดย Leonardo จัดขึ้นที่พิพิธภัณฑ์ลูฟร์ในปารีส")
Output:
token |
M |
o |
n |
a |
Lisa |
เป็น |
ภาพ |
ว |
า |
ด |
สีน้ำ |
มัน |
ใน |
ศตวรรษ |
ที่ |
16 |
ที่ |
สร้าง |
โ |
ด |
ย |
L |
e |
o |
n |
a |
r |
d |
o |
จัด |
ขึ้น |
ที่ |
พิพิธภัณฑ์ |
ลูฟร์ |
ใน |
ปารีส |
Part of Speech for Bengali (POS)
# 'The village is also called 'Mod' in Tora language' in Bengali
nlu.load("bn.pos").predict("বাসস্থান-ঘরগৃহস্থালি তোড়া ভাষায় গ্রামকেও বলে ` মোদ ' ৷")
Output:
token |
pos |
বাসস্থান-ঘরগৃহস্থালি |
NN |
তোড়া |
NNP |
ভাষায় |
NN |
গ্রামকেও |
NN |
বলে |
VM |
` |
SYM |
মোদ |
NN |
‘ |
SYM |
৷ |
SYM |
Stop Words Cleaner for Bengali
# 'This language is not enough' in Bengali
df = nlu.load("bn.stopwords").predict("এই ভাষা যথেষ্ট নয়")
Output:
cleanTokens |
token |
ভাষা |
এই |
যথেষ্ট |
ভাষা |
নয় |
যথেষ্ট |
None |
নয় |
Part of Speech for Bengali
# 'The people of Ohu know that the foundation of Bhojpuri was shaken' in Bengali
nlu.load('bh.pos').predict("ओहु लोग के मालूम बा कि श्लील होखते भोजपुरी के नींव हिल जाई")
Output:
pos |
token |
DET |
ओहु |
NOUN |
लोग |
ADP |
के |
NOUN |
मालूम |
VERB |
बा |
SCONJ |
कि |
ADJ |
श्लील |
VERB |
होखते |
PROPN |
भोजपुरी |
ADP |
के |
NOUN |
नींव |
VERB |
हिल |
AUX |
जाई |
Amharic Part of Speech (POS)
# ' "Son, finish the job," he said.' in Amharic
nlu.load('am.pos').predict('ልጅ ኡ ን ሥራ ው ን አስጨርስ ኧው ኣል ኧሁ"')
Output:
pos |
token |
NOUN |
ልጅ |
DET |
ኡ |
PART |
ን |
NOUN |
ሥራ |
DET |
ው |
PART |
ን |
VERB |
አስጨርስ |
PRON |
ኧው |
AUX |
ኣል |
PRON |
ኧሁ |
PUNCT |
። |
NOUN |
“ |
Thai Sentiment Classification
# 'I love peanut butter and jelly!' in thai
nlu.load('th.classify.sentiment').predict('ฉันชอบเนยถั่วและเยลลี่!')[['sentiment','sentiment_confidence']]
Output:
sentiment |
sentiment_confidence |
positive |
0.999998 |
Arabic Named Entity Recognition (NER)
# 'In 1918, the forces of the Arab Revolt liberated Damascus with the help of the British' in Arabic
nlu.load('ar.ner').predict('في عام 1918 حررت قوات الثورة العربية دمشق بمساعدة من الإنكليز',output_level='chunk')[['entities_confidence','ner_confidence','entities']]
Output:
entity_class |
ner_confidence |
entities |
ORG |
[1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] |
قوات الثورة العربية |
LOC |
[1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] |
دمشق |
PER |
[1.0, 1.0, 1.0, 0.9997000098228455, 0.9840999841690063, 0.9987999796867371, 0.9990000128746033, 0.9998999834060669, 0.9998999834060669, 0.9993000030517578, 0.9998999834060669] |
الإنكليز |