11 | Strings and Text |
Another thing the Wolfram Language lets you compute with is text. You enter text as a string, indicated by quotes (").
Enter a string:
In[1]:= |
Out[1]= |
Just like when you enter a number, a string on its own comes back unchanged—except that the quotes aren’t visible when the string is displayed. There are many functions that work on strings. Like StringLength, which gives the length of a string.
StringLength counts the number of characters in a string:
In[2]:= |
Out[2]= |
StringReverse reverses the characters in a string:
In[3]:= |
Out[3]= |
ToUpperCase makes all the characters in a string uppercase (capital letters):
In[4]:= |
Out[4]= |
StringTake takes a certain number of characters from the beginning of a string:
In[5]:= |
Out[5]= |
If you take 10 characters, you get a string of length 10:
In[6]:= |
Out[6]= |
In[7]:= |
Out[7]= |
You can make lists of strings, then apply functions to them.
A list of strings:
In[8]:= |
Out[8]= |
In[9]:= |
Out[9]= |
StringJoin joins the strings in a list:
In[10]:= |
Out[10]= |
Sometimes it’s useful to turn strings into lists of their constituent characters. Each character is actually a string itself, of length 1.
Characters breaks a string into a list of its characters:
In[11]:= |
Out[11]= |
Once you’ve broken a string into a list of characters, you can use all the usual list functions on it.
Sort the characters in a string:
In[12]:= |
Out[12]= |
The invisible elements at the beginning of the list are space characters. If you want to see strings in the form you’d input them, complete with "...", use InputForm.
InputForm shows strings as you would input them, including quotes:
In[13]:= |
Out[13]= |
Functions like StringJoin and Characters work on strings of any kind; it doesn’t matter if they’re meaningful text or not. There are other functions, like TextWords, that specifically work on meaningful text, written, say, in English.
TextWords gives a list of the words in a string of text:
In[14]:= |
Out[14]= |
This gives the length of each word:
In[15]:= |
Out[15]= |
In[16]:= |
Out[16]= |
There are lots of ways to get text into the Wolfram Language. One example is the WikipediaData function, which gets the current text of Wikipedia articles.
Get the first 100 characters of the Wikipedia article about “computers”:
In[17]:= |
Out[17]= |
A convenient way to get a sense of what’s in a piece of text is to create a word cloud. The function WordCloud does this.
Create a word cloud for the Wikipedia article on “computers”:
In[18]:= |
Out[18]= |
The Wolfram Language has lots of built-in knowledge about words that appear in English and other languages. WordList gives lists of words.
Get the first 20 words from a list of common English words:
In[19]:= |
Out[19]= |
In[20]:= |
Out[20]= |
Strings don’t have to contain text. In a juxtaposition of ancient and modern, we can for example generate Roman numerals as strings.
Generate the Roman numeral string for 1988:
In[21]:= |
Out[21]= |
Make a table of the Roman numerals for numbers up to 20:
In[22]:= |
Out[22]= |
As with everything, we can do computations on these strings. For example, we can plot the lengths of successive Roman numerals.
In[23]:= |
Out[23]= |
IntegerName gives the English name of an integer.
Generate a string giving the name of the integer 56:
In[24]:= |
Out[24]= |
In[25]:= |
Out[25]= |
Alphabet gives the alphabet:
In[26]:= |
Out[26]= |
LetterNumber tells you where in the alphabet a letter appears:
In[27]:= |
Out[27]= |
FromLetterNumber does the opposite:
In[28]:= |
Out[28]= |
Alphabet knows about non-English alphabets too:
In[29]:= |
Out[29]= |
Transliterate converts to (approximately) equivalent English letters:
In[30]:= |
Out[30]= |
This transliterates the word “wolfram” into the Russian alphabet:
In[31]:= |
Out[31]= |
If you want to, you can also turn text into images, which you can then manipulate using image processing. The function Rasterize makes a raster, or bitmap, of something.
In[32]:= |
Out[32]= |
In[33]:= |
Out[33]= |
"string" | a string | |
StringLength["string"] | length of a string | |
StringReverse["string"] | reverse a string | |
StringTake["string",4] | take characters at the beginning of a string | |
StringJoin["string","string"] | join strings together | |
StringJoin[{"string","string"}] | join a list of strings | |
ToUpperCase["string"] | convert characters to uppercase | |
Characters["string"] | convert a string to a list of characters | |
TextWords["string"] | list of words from a string | |
TextSentences["string"] | list of sentences | |
WikipediaData["topic"] | Wikipedia article about a topic | |
WordCloud["text"] | word cloud based on word frequencies | |
WordList[ ] | list of common words in English | |
Alphabet[] | list of letters of the alphabet | |
LetterNumber["c"] | where a letter appears in the alphabet | |
FromLetterNumber[n] | letter appearing at a position in the alphabet | |
Transliterate["text"] | transliterate text in any language into English | |
Transliterate["text","alphabet"] | transliterate text into other alphabets | |
RomanNumeral[n] | convert a number to its Roman numeral string | |
IntegerName[n] | convert a number to its English name string | |
InputForm["string"] | show a string with quotes | |
Rasterize["string"] | make a bitmap image |
11.2Make a single string of the whole alphabet, in uppercase. »
11.3Generate a string of the alphabet in reverse order. »
11.11Make a string from the first letters of all sentences in the Wikipedia article about computers. »
11.15Use StringJoin and Characters to make a word cloud of all letters in the words from WordList[]. »
11.17Find the Roman numerals for the year 1959. »
11.18Find the maximum string length of any Roman-numeral year from 1 to 2020. »
11.19Make a word cloud from the first characters of the Roman numerals up to 100. »
11.21Generate the uppercase Greek alphabet. »
11.24Make a list of 100 random 5-letter strings. »
11.26Get the Arabic alphabet and transliterate it into English. »
11.28Use Manipulate to make an interactive selector of size-100 characters from the alphabet, controlled by a slider. »
11.29Use Manipulate to make an interactive selector of black-on-white outlines of rasterized size-100 characters from the alphabet, controlled by a menu. »
11.30Use Manipulate to create a “vision simulator” that blurs a size-200 letter “A” by an amount from 0 to 50. »
+11.1Generate a string of the alphabet followed by the alphabet written in reverse. »
+11.2Make a column of a string of the alphabet and its reverse. »
+11.4Join together without spaces, etc. the words in the first sentence in the Wikipedia article for “strings”. »
+11.5Find the length of the longest word in the Wikipedia article about computers. »
+11.6Plot the lengths of Roman numerals for numbers up to 2000. »
+11.7Generate a string by joining the Roman numerals up to 100. »
+11.8Make a line plot of the successive letter numbers for the concatenation of all Roman numerals up to 30. »
+11.9Find the maximum string length of the name of any integer up to 1000. »
+11.10Make a list of uppercase size-20 letters of the alphabet in random colors. »
+11.11Make a list of 100 random 5-letter strings with the Russian alphabet. »
+11.13Add together white-on-black size-200 letters A and B. »
What is the difference between "x" and x?
"x" is a string; x is a Wolfram Language symbol, just like Plus or Max, that can be defined to actually do computations. We’ll talk much more about symbols later.
How do I enter characters that aren’t on my keyboard?
You can use whatever methods your computer provides, or you can do it directly with the Wolfram Language using constructs such as \[Alpha].
Use \" (and if you want to put \" literally in the string, use \\ \"). (You’ll use a lot of backslashes if you want to put \\ \" in: \\ \\ \\ \".)
How are the colors of elements in word clouds determined?
By default it’s random within a certain color palette. You can specify it if you want to.
How come the word cloud shows “s” as the most common letter?
Because it is the most common first letter for common words in English. If you look at all letters, the most common is “e”.
LetterNumber["α", "Greek"] gives numbering in the Greek alphabet. All characters are assigned a character code. You can find it using ToCharacterCode.
Basically all the ones that are used today. Try “Greek” or “Arabic”, or the name of a language. Note that when a language uses accented characters, it’s sometimes tricky to decide what’s “in” the alphabet, and what’s just derived from it.
Can I translate words instead of just transliterating their letters?
Yes. Use WordTranslation. See Section 35.
- RandomWord[10] gives 10 random words. How many of them do you know?
- StringTake["string", -2] takes 2 characters from the end of the string.
- Every character, whether “a”, “α” or “狼” is represented by a Unicode character code, found with ToCharacterCode. You can explore “Unicode space” with FromCharacterCode.
- If you get a different result from WikipediaData, that’s because Wikipedia has been changed.
- WordCloud automatically removes “uninteresting” words in text, like “the”, “and”, etc.
- If you can’t figure out the name of an alphabet or language, use ctrl+= (as described in Section 16) to give it in natural language form.