35 | Natural Language Understanding |
We saw
earlier how to use to enter natural language input. Now we
’re going to talk about how to set up functions that understand natural language.
Interpreter is the key to much of this. You tell
Interpreter what type of thing you want to get, and it will take any string you provide, and try to interpret it that way.
Interpret the string
"nyc" as a city:
“The big apple” is a nickname for New York City:
Interpret the string
"hot pink" as a color:
Interpreter converts natural language to Wolfram Language expressions that you can compute with. Here
’s an example involving currency amounts.
Interpret various currency amounts:
Compute the total, doing conversions at current exchange rates:
Here
’s another example, involving locations.
It can also work from a street address:
Interpreter handles many hundreds of different types of objects.
Interpret names of universities (which
“U of I
” is picked depends on geo location):
Interpret names of chemicals:
Interpret names of animals, then get images of them:
Interpreter interprets whole strings.
TextCases, on the other hand, tries to pick out instances of what you request from a string.
Pick out the nouns in a piece of text:
Pick out currency amounts:
You can use
TextCases to pick out particular kinds of things from a piece of text. Here we pick out instances of country names in a Wikipedia article.
Generate a word cloud of country names from the Wikipedia article on the EU:
TextStructure shows you the whole structure of a piece of text.
Find how a sentence of English can be parsed into grammatical units:
An alternative representation, as a graph:
WordList[ ] gives a lists of common words.
WordList["Noun"], etc. gives lists of words that can be used as particular parts of speech.
Give the first 20 in a list of common verbs in English:
It’s easy to study properties of words. Here are histograms comparing the length distributions of nouns, verbs and adjectives in the list of common words.
Make histograms of the lengths of common nouns, verbs and adjectives:
So far we
’ve only talked about English. But the Wolfram Language also knows about other languages. For example,
WordTranslation gives translations of words.
Translate “hello” into French:
Translate into Korean, then transliterate to the English alphabet:
If you want to compare lots of different languages, give
All as the language for
WordTranslation. The result is an association which gives translations for different languages, with the languages listed roughly in order of decreasing worldwide usage.
Give translations of “hello” into the 5 most common languages in the world:
Let’s take the top 100 languages, and look at the first character in the first translation for “hello” that appears. Here’s a word cloud that shows that among these languages, “h” is the most common letter to start the word for “hello”.
For the top 100 languages, make a word cloud of the first characters in the word for “hello”:
Interpreter["type"] | | specify a function to interpret natural language |
TextCases["text","type"] | | find cases of a given type of object in text |
TextStructure["text"] | | find the grammatical structure of text |
WordTranslation["word","language"] | | translate a word into another language |
35.1Use
Interpreter to find the location of the Eiffel Tower.
»
35.2Use
Interpreter to find a university referred to as
“U of T
”.
»
35.3Use
Interpreter to find the chemicals referred to as C2H4, C2H6 and C3H8.
»
35.5Find universities that can be referred to as
“U of X
”, where x is any letter of the alphabet.
»
35.6Find which US state capital names can be interpreted as movie titles (use
CommonName to get the string versions of entity names).
»
35.7Find cities that can be referred to by permutations of the letters a, i, l and m.
»
35.8Make a word cloud of country names in the Wikipedia article on
“gunpowder
”.
»
35.9Find all nouns in
“She sells seashells by the sea shore.
” »
35.10Use
TextCases to find the number of nouns, verbs and adjectives in the first 1000 characters of the Wikipedia article on computers.
»
35.11Find the grammatical structure of the first sentence of the Wikipedia article about computers.
»
35.12Find the 10 most common nouns in
ExampleData[{"Text", "AliceInWonderland"}].
»
35.13Make a community graph plot of the graph representation of the text structure of the first sentence of the Wikipedia article about language.
»
35.14Make a list of numbers of nouns, verbs, adjectives and adverbs found by
WordList in English.
»
35.15Generate a list of the translations of numbers 2 through 10 into French.
»
What possible types of interpreters are there?
In simple cases, such as dates or basic currency, no. But for full natural language input, yes.
When I say “4 dollars”, how does it know if I want US dollars or something else?
It uses what it knows of your geo location to tell what kind of dollars you
’re likely to mean.
If something can be expressed in the Wolfram Language, then
Interpreter should be able to interpret it.
Interpreter["SemanticExpression"] takes any input, and tries to understand its meaning so as to get a Wolfram Language expression that captures it. What it
’s doing is essentially the first stage of what Wolfram|Alpha does.
Can I add my own interpreters?
Yes.
GrammarRules lets you build up your own grammar, making use of whatever existing interpreters you want.
Can I find the meaning of a word?
Can I find what part of speech a word is?
PartOfSpeech tells you
all the parts of speech a word can correspond to. So for
“fish
” it gives noun and verb. Which of these is correct in a given case depends on how the word is used in a sentence
—and that
’s what
TextStructure figures out.
Can I translate whole sentences as well as words?
TextTranslation does this for some languages, usually by calling an external service.
It can translate lots of words for the few hundred most common languages. It can translate at least a few words for well over a thousand languages.
LanguageData gives information on over 10,000 languages.
- TextStructure requires complete grammatical text, but Interpreter uses many different techniques to also work with fragments of text.
- When you use you can resolve ambiguous input interactively. With Interpreter you have to do it programmatically, using the option AmbiguityFunction.