11 | Strings and Text |
Another thing the Wolfram Language lets you compute with is text. You enter text as a string, indicated by quotes (").
Enter a string:
In[1]:=
Out[1]=
Just like when you enter a number, a string on its own comes back unchanged—except that the quotes aren’t visible when the string is displayed. There are many functions that work on strings. Like StringLength, which gives the length of a string.
StringLength counts the number of characters in a string:
In[2]:=
Out[2]=
StringReverse reverses the characters in a string:
In[3]:=
Out[3]=
ToUpperCase makes all the characters in a string uppercase (capital letters):
In[4]:=
Out[4]=
StringTake takes a certain number of characters from the beginning of a string:
In[5]:=
Out[5]=
If you take 10 characters, you get a string of length 10:
In[6]:=
Out[6]=
StringJoin joins strings (don’t forget spaces if you want to separate words):
In[7]:=
Out[7]=
A list of strings:
In[8]:=
Out[8]=
Get the first two characters from each string:
In[9]:=
Out[9]=
StringJoin joins the strings in a list:
In[10]:=
Out[10]=
Sometimes it’s useful to turn strings into lists of their constituent characters. Each character is actually a string itself, of length 1.
Characters breaks a string into a list of its characters:
In[11]:=
Out[11]=
Once you’ve broken a string into a list of characters, you can use all the usual list functions on it.
Sort the characters in a string:
In[12]:=
Out[12]=
The invisible elements at the beginning of the list are space characters. If you want to see strings in the form you’d input them, complete with "...", use InputForm.
InputForm shows strings as you would input them, including quotes:
In[13]:=
Out[13]=
Functions like StringJoin and Characters work on strings of any kind; it doesn’t matter if they’re meaningful text or not. There are other functions, like TextWords, that specifically work on meaningful text, written, say, in English.
TextWords gives a list of the words in a string of text:
In[14]:=
Out[14]=
This gives the length of each word:
In[15]:=
Out[15]=
TextSentences breaks a text string into a list of sentences:
In[16]:=
Out[16]=
There are lots of ways to get text into the Wolfram Language. One example is the WikipediaData function, which gets the current text of Wikipedia articles.
Get the first 100 characters of the Wikipedia article about “computers”:
In[17]:=
Out[17]=
A convenient way to get a sense of what’s in a piece of text is to create a word cloud. The function WordCloud does this.
Create a word cloud for the Wikipedia article on “computers”:
In[18]:=
Out[18]=
The Wolfram Language has lots of built-in knowledge about words that appear in English and other languages. WordList gives lists of words.
Get the first 20 words from a list of common English words:
In[19]:=
Out[19]=
Make a word cloud from the first letters of all the words:
In[20]:=
Out[20]=
Strings don’t have to contain text. In a juxtaposition of ancient and modern, we can for example generate Roman numerals as strings.
Generate the Roman numeral string for 1988:
In[21]:=
Out[21]=
Make a table of the Roman numerals for numbers up to 20:
In[22]:=
Out[22]=
As with everything, we can do computations on these strings. For example, we can plot the lengths of successive Roman numerals.
Plot the lengths of the Roman numerals for numbers up to 100:
In[23]:=
Out[23]=
IntegerName gives the English name of an integer.
Generate a string giving the name of the integer 56:
In[24]:=
Out[24]=
Here’s a plot of the lengths of integer names in English:
In[25]:=
Out[25]=
There are various ways to turn letters into numbers (and vice versa).
Alphabet gives the alphabet:
In[26]:=
Out[26]=
LetterNumber tells you where in the alphabet a letter appears:
In[27]:=
Out[27]=
FromLetterNumber does the opposite:
In[28]:=
Out[28]=
Alphabet knows about non-English alphabets too:
In[29]:=
Out[29]=
In[30]:=
Out[30]=
This transliterates the word “wolfram” into the Russian alphabet:
In[31]:=
Out[31]=
The characters in strings can be anything you can type on your computer, including for example emoji.
Reverse the characters in a string made of emoji:
In[32]:=
Out[32]=
If you want to, you can also turn text into images, which you can then manipulate using image processing. The function Rasterize makes a raster, or bitmap, of something.
Generate an image of a piece of text:
In[33]:=
Out[33]=
Do image processing on it:
In[34]:=
Out[34]=
"string" | a string | |
StringLength["string"] | length of a string | |
StringReverse["string"] | reverse a string | |
StringTake["string",4] | take characters at the beginning of a string | |
StringJoin["string","string"] | join strings together | |
StringJoin[{"string","string"}] | join a list of strings | |
ToUpperCase["string"] | convert characters to uppercase | |
Characters["string"] | convert a string to a list of characters | |
TextWords["string"] | list of words from a string | |
TextSentences["string"] | list of sentences | |
WikipediaData["topic"] | Wikipedia article about a topic | |
WordCloud["text"] | word cloud based on word frequencies | |
WordList[ ] | list of common words in English | |
Alphabet[] | list of letters of the alphabet | |
LetterNumber["c"] | where a letter appears in the alphabet | |
FromLetterNumber[n] | letter appearing at a position in the alphabet | |
Transliterate["text"] | transliterate text in any language into English | |
Transliterate["text","alphabet"] | transliterate text into other alphabets | |
RomanNumeral[n] | convert a number to its Roman numeral string | |
IntegerName[n] | convert a number to its English name string | |
InputForm["string"] | show a string with quotes | |
Rasterize["string"] | make a bitmap image |
11.2Make a single string of the whole alphabet, in uppercase. »
11.3Generate a string of the alphabet in reverse order. »
11.7Make a bar chart of the lengths of the words in “A long time ago, in a galaxy far, far away”. »
11.8Find the string length of the Wikipedia article for “computer”. »
11.9Find how many words are in the Wikipedia article for “computer”. »
11.10Find the first sentence in the Wikipedia article about “strings”. »
11.11Make a string from the first letters of all sentences in the Wikipedia article about computers. »
11.15Use StringJoin and Characters to make a word cloud of all letters in the words from WordList[]. »
11.17Find the Roman numerals for the year 1959. »
11.18Find the maximum string length of any Roman-numeral year from 1 to 2020. »
11.19Make a word cloud from the first characters of the Roman numerals up to 100. »
11.21Generate the uppercase Greek alphabet. »
11.22Make a bar chart of the letter numbers in “wolfram”. »
11.24Make a list of 100 random 5-letter strings. »
11.25Transliterate “wolfram” into Greek. »
11.26Make a string of 10 wolf, ram emoji 🐺🐏🐺🐏.... »
11.27Get the Arabic alphabet and transliterate it into English. »
11.28Make a white-on-black size-200 letter “A”. »
11.29Use Manipulate to make an interactive selector of size-100 characters from the alphabet, controlled by a slider. »
11.30Use Manipulate to make an interactive selector of black-on-white outlines of rasterized size-100 characters from the alphabet, controlled by a menu. »
11.31Use Manipulate to create a “vision simulator” that blurs a size-200 letter “A” by an amount from 0 to 50. »
+11.1Generate a string of the alphabet followed by the alphabet written in reverse. »
+11.2Make a column of a string of the alphabet and its reverse. »
+11.3Find how many sentences are in the Wikipedia article for “computer”. »
+11.4Join together without spaces, etc. the words in the first sentence in the Wikipedia article for “strings”. »
+11.5Find the length of the longest word in the Wikipedia article about computers. »
+11.6Plot the lengths of Roman numerals for numbers up to 2000. »
+11.7Generate a string by joining the Roman numerals up to 100. »
+11.8Make a line plot of the successive letter numbers for the concatenation of all Roman numerals up to 30. »
+11.9Find the maximum string length of the name of any integer up to 1000. »
+11.10Make a list of uppercase size-20 letters of the alphabet in random colors. »
+11.11Make a list of 100 random 5-letter strings with the Russian alphabet. »
+11.13Add together white-on-black size-200 letters A and B. »
What is the difference between "x" and x?
"x" is a string; x is a Wolfram Language symbol, just like Plus or Max, that can be defined to actually do computations. We’ll talk much more about symbols later.
How do I enter characters that aren’t on my keyboard?
You can use whatever methods your computer provides, or you can do it directly with the Wolfram Language using constructs such as \[Alpha].
How do I put quotes (") inside a string?
Use \" (and if you want to put \" literally in the string, use \\ \"). (You’ll use a lot of backslashes if you want to put \\ \" in: \\ \\ \\ \".)
How are the colors of elements in word clouds determined?
By default it’s random within a certain color palette. You can specify it if you want to.
How come the word cloud shows “s” as the most common letter?
Because it is the most common first letter for common words in English. If you look at all letters, the most common is “e”.
What about letters that aren’t English? How are they numbered?
LetterNumber["α", "Greek"] gives numbering in the Greek alphabet. All characters are assigned a character code. You can find it using ToCharacterCode.
Basically all the ones that are used today. Try “Greek” or “Arabic”, or the name of a language. Note that when a language uses accented characters, it’s sometimes tricky to decide what’s “in” the alphabet, and what’s just derived from it.
Can I translate words instead of just transliterating their letters?
Yes. Use WordTranslation. See Section 35.
Can I get lists of common words in languages other than English?
- RandomWord[10] gives 10 random words. How many of them do you know?
- StringTake["string", -2] takes 2 characters from the end of the string.
- Every character, whether “a”, “α”, “狼” or “🐼”, is represented by a Unicode character code, found with ToCharacterCode. You can explore “Unicode space” with FromCharacterCode.
- Characters can look very different when they’re shown in different fonts; emoji can look very different between different operating systems.
- If you get a different result from WikipediaData, that’s because Wikipedia has been changed.
- WordCloud automatically removes “uninteresting” words in text, like “the”, “and”, etc.
- If you can’t figure out the name of an alphabet or language, use ctrl+= (as described in Section 16) to give it in natural language form.