35 | Natural Language Understanding |
We saw earlier how to use ctrl+= to enter natural language input. Now we’re going to talk about how to set up functions that understand natural language.
Interpreter is the key to much of this. You tell Interpreter what type of thing you want to get, and it will take any string you provide, and try to interpret it that way.
Interpret the string "nyc" as a city:
In[1]:=
Out[1]=
“The big apple” is a nickname for New York City:
In[2]:=
Out[2]=
Interpret the string "hot pink" as a color:
In[3]:=
Out[3]=
Interpreter converts natural language to Wolfram Language expressions that you can compute with. Here’s an example involving currency amounts.
Interpret various currency amounts:
In[4]:=
Out[4]=
Compute the total, doing conversions at current exchange rates:
In[5]:=
Out[5]=
Here’s another example, involving locations.
Interpreter gives the geo location of the White House:
In[6]:=
Out[6]=
In[7]:=
Out[7]=
Interpreter handles many hundreds of different types of objects.
Interpret names of universities (which “U of I” is picked depends on geo location):
In[8]:=
Out[8]=
Interpret names of chemicals:
In[9]:=
Out[9]=
Interpret names of animals, then get images of them:
In[10]:=
In[10]:=
Interpreter interprets whole strings. TextCases, on the other hand, tries to pick out instances of what you request from a string.
Pick out the nouns in a piece of text:
In[11]:=
Out[11]=
Pick out currency amounts:
In[12]:=
Out[12]=
You can use TextCases to pick out particular kinds of things from a piece of text. Here we pick out instances of country names in a Wikipedia article.
In[55]:=
Out[55]=
TextStructure shows you the whole structure of a piece of text.
Find how a sentence of English can be parsed into grammatical units:
In[14]:=
Out[14]=
An alternative representation, as a graph:
In[15]:=
Out[15]=
WordList[ ] gives a list of common words. WordList["Noun"], etc. give lists of words that can be used as particular parts of speech.
Give the first 20 in a list of common verbs in English:
In[16]:=
Out[16]=
It’s easy to study properties of words. Here are histograms comparing the length distributions of nouns, verbs and adjectives in the list of common words.
Make histograms of the lengths of common nouns, verbs and adjectives:
In[17]:=
Out[17]=
So far we’ve only talked about English. But the Wolfram Language also knows about other languages. For example, WordTranslation gives translations of words.
Translate “hello” into French:
In[18]:=
Out[18]=
Translate into Korean:
In[19]:=
Out[19]=
Translate into Korean, then transliterate to the English alphabet:
In[20]:=
Out[20]=
If you want to compare lots of different languages, give All as the language for WordTranslation. The result is an association which gives translations for different languages, with the languages listed roughly in order of decreasing worldwide usage.
In[21]:=
Out[21]=
Let’s take the top 100 languages, and look at the first character in the first translation for “hello” that appears. Here’s a word cloud that shows that among these languages, “h” is the most common letter to start the word for “hello”.
For the top 100 languages, make a word cloud of the first characters in the word for “hello”:
In[22]:=
Out[22]=
Interpreter["type"] | specify a function to interpret natural language | |
TextCases["text","type"] | find cases of a given type of object in text | |
TextStructure["text"] | find the grammatical structure of text | |
WordTranslation["word","language"] | translate a word into another language |
35.5Find universities that can be referred to as “U of X”, where X is any letter of the alphabet. »
35.6Find which US state capital names can be interpreted as movie titles (use CommonName to get the string versions of entity names). »
35.7Find cities that can be referred to by permutations of the letters a, i, l and m. »
35.8Make a word cloud of country names in the Wikipedia article on “gunpowder”. »
35.9Find all nouns in “She sells seashells by the sea shore.” »
35.10Use TextCases to find the number of nouns, verbs and adjectives in the first 1000 characters of the Wikipedia article on computers. »
35.11Find the grammatical structure of the first sentence of the Wikipedia article about computers. »
35.13Make a community graph plot of the graph representation of the text structure of the first sentence of the Wikipedia article about language. »
35.15Generate a list of the translations of numbers 2 through 10 into French. »
What possible types of interpreters are there?
It’s a long list. Check out the documentation, or evaluate $InterpreterTypes to see the list.
Does Interpreter need a network connection?
In simple cases, such as dates or basic currency, no. But for full natural language input, yes.
When I say “4 dollars”, how does it know if I want US dollars or something else?
It uses what it knows of your geo location to tell what kind of dollars you’re likely to mean.
Can Interpreter deal with arbitrary natural language?
If something can be expressed in the Wolfram Language, then Interpreter should be able to interpret it. Interpreter["SemanticExpression"] takes any input, and tries to understand its meaning so as to get a Wolfram Language expression that captures it. What it’s doing is essentially the first stage of what Wolfram|Alpha does.
Can I add my own interpreters?
Yes. GrammarRules lets you build up your own grammar, making use of whatever existing interpreters you want.
Can I find the meaning of a word?
WordDefinition gives dictionary definitions.
Can I find what part of speech a word is?
PartOfSpeech tells you all the parts of speech a word can correspond to. So for “fish” it gives noun and verb. Which of these is correct in a given case depends on how the word is used in a sentence—and that’s what TextStructure figures out.
Can I translate whole sentences as well as words?
TextTranslation does this for some languages, usually by calling an external service.
What languages does WordTranslation handle?
It can translate lots of words for the few hundred most common languages. It can translate at least a few words for well over a thousand languages. LanguageData gives information on over 10,000 languages.
- TextStructure requires complete grammatical text, but Interpreter uses many different techniques to also work with fragments of text.
- When you use ctrl+= you can resolve ambiguous input interactively. With Interpreter you have to do it programmatically, using the option AmbiguityFunction.