WOLFRAM

11 Strings and Text

11Strings and Text
Another thing the Wolfram Language lets you compute with is text. You enter text as a string, indicated by quotes (").
Enter a string:
In[1]:=
Out[1]=
Just like when you enter a number, a string on its own comes back unchangedexcept that the quotes aren’t visible when the string is displayed. There are many functions that work on strings. Like StringLength, which gives the length of a string.
StringLength counts the number of characters in a string:
In[2]:=
Out[2]=
StringReverse reverses the characters in a string:
In[3]:=
Out[3]=
ToUpperCase makes all the characters in a string uppercase (capital letters):
In[4]:=
Out[4]=
StringTake takes a certain number of characters from the beginning of a string:
In[5]:=
Out[5]=
If you take 10 characters, you get a string of length 10:
In[6]:=
Out[6]=
StringJoin joins strings (don’t forget spaces if you want to separate words):
In[7]:=
Out[7]=
You can make lists of strings, then apply functions to them.
A list of strings:
In[8]:=
Out[8]=
Get the first two characters from each string:
In[9]:=
Out[9]=
StringJoin joins the strings in a list:
In[10]:=
Out[10]=
Sometimes it’s useful to turn strings into lists of their constituent characters. Each character is actually a string itself, of length 1.
Characters breaks a string into a list of its characters:
In[11]:=
Out[11]=
Once you’ve broken a string into a list of characters, you can use all the usual list functions on it.
Sort the characters in a string:
In[12]:=
Out[12]=
The invisible elements at the beginning of the list are space characters. If you want to see strings in the form you’d input them, complete with "...", use InputForm.
InputForm shows strings as you would input them, including quotes:
In[13]:=
Out[13]=
Functions like StringJoin and Characters work on strings of any kind; it doesn’t matter if they’re meaningful text or not. There are other functions, like TextWords, that specifically work on meaningful text, written, say, in English.
TextWords gives a list of the words in a string of text:
In[14]:=
Out[14]=
This gives the length of each word:
In[15]:=
Out[15]=
TextSentences breaks a text string into a list of sentences:
In[16]:=
Out[16]=
There are lots of ways to get text into the Wolfram Language. One example is the WikipediaData function, which gets the current text of Wikipedia articles.
Get the first 100 characters of the Wikipedia article about “computers”:
In[17]:=
Out[17]=
A convenient way to get a sense of what’s in a piece of text is to create a word cloud. The function WordCloud does this.
Create a word cloud for the Wikipedia article on “computers”:
In[18]:=
Out[18]=
Not surprisingly, “computer” and “computers” are the most common words in the article.
The Wolfram Language has lots of built-in knowledge about words that appear in English and other languages. WordList gives lists of words.
Get the first 20 words from a list of common English words:
In[19]:=
Out[19]=
Make a word cloud from the first letters of all the words:
In[20]:=
Out[20]=
Strings don’t have to contain text. In a juxtaposition of ancient and modern, we can for example generate Roman numerals as strings.
Generate the Roman numeral string for 1988:
In[21]:=
Out[21]=
Make a table of the Roman numerals for numbers up to 20:
In[22]:=
Out[22]=
As with everything, we can do computations on these strings. For example, we can plot the lengths of successive Roman numerals.
Plot the lengths of the Roman numerals for numbers up to 100:
In[23]:=
Out[23]=
IntegerName gives the English name of an integer.
Generate a string giving the name of the integer 56:
In[24]:=
Out[24]=
Here’s a plot of the lengths of integer names in English:
In[25]:=
Out[25]=
There are various ways to turn letters into numbers (and vice versa).
Alphabet gives the alphabet:
In[26]:=
Out[26]=
LetterNumber tells you where in the alphabet a letter appears:
In[27]:=
Out[27]=
FromLetterNumber does the opposite:
In[28]:=
Out[28]=
Alphabet knows about non-English alphabets too:
In[29]:=
Out[29]=
In[30]:=
Out[30]=
This transliterates the word “wolfram” into the Russian alphabet:
In[31]:=
Out[31]=
The characters in strings can be anything you can type on your computer, including for example emoji.
Reverse the characters in a string made of emoji:
In[32]:=
Out[32]=
If you want to, you can also turn text into images, which you can then manipulate using image processing. The function Rasterize makes a raster, or bitmap, of something.
Generate an image of a piece of text:
In[33]:=
Out[33]=
Do image processing on it:
In[34]:=
Out[34]=
Vocabulary
"string" a string
StringLength["string"] length of a string
StringReverse["string"] reverse a string
StringTake["string",4] take characters at the beginning of a string
StringJoin["string","string"] join strings together
StringJoin[{"string","string"}] join a list of strings
ToUpperCase["string"] convert characters to uppercase
Characters["string"] convert a string to a list of characters
TextWords["string"] list of words from a string
TextSentences["string"] list of sentences
WikipediaData["topic"] Wikipedia article about a topic
WordCloud["text"] word cloud based on word frequencies
WordList[ ] list of common words in English
Alphabet[] list of letters of the alphabet
LetterNumber["c"] where a letter appears in the alphabet
FromLetterNumber[n] letter appearing at a position in the alphabet
Transliterate["text"] transliterate text in any language into English
Transliterate["text","alphabet"] transliterate text into other alphabets
RomanNumeral[n] convert a number to its Roman numeral string
IntegerName[n] convert a number to its English name string
InputForm["string"] show a string with quotes
Rasterize["string"] make a bitmap image
11.1Join two copies of the string "Hello"»
Expected output:
Out[]=
11.2Make a single string of the whole alphabet, in uppercase. »
Expected output:
Out[]=
11.3Generate a string of the alphabet in reverse order. »
Expected output:
Out[]=
11.4Join 100 copies of the string "AGCT"»
Expected output:
Out[]=
11.5Use StringTake, StringJoin and Alphabet to get "abcdef"»
Expected output:
Out[]=
11.6Create a column with increasing numbers of letters from the string "this is about strings"»
Expected output:
Out[]=
11.7Make a bar chart of the lengths of the words in “A long time ago, in a galaxy far, far away”. »
Expected output:
Out[]=
11.8Find the string length of the Wikipedia article for “computer”. »
Sample expected output:
Out[]=
11.9Find how many words are in the Wikipedia article for “computer”. »
Sample expected output:
Out[]=
11.10Find the first sentence in the Wikipedia article about “strings”. »
Sample expected output:
Out[]=
11.11Make a string from the first letters of all sentences in the Wikipedia article about computers. »
Sample expected output:
Out[]=
11.12Find the maximum word length among English words from WordList[]»
Sample expected output:
Out[]=
11.13Count the number of words in WordList[ ] that start with “q”. »
Sample expected output:
Out[]=
11.14Make a line plot of the lengths of the first 1000 words from WordList[]»
Sample expected output:
Out[]=
11.15Use StringJoin and Characters to make a word cloud of all letters in the words from WordList[]»
Sample expected output:
Out[]=
11.16Use StringReverse to make a word cloud of the last letters in the words from WordList[]»
Sample expected output:
Out[]=
11.17Find the Roman numerals for the year 1959. »
Expected output:
Out[]=
11.18Find the maximum string length of any Roman-numeral year from 1 to 2020. »
Expected output:
Out[]=
11.19Make a word cloud from the first characters of the Roman numerals up to 100. »
Expected output:
Out[]=
11.20Use Length to find the length of the Russian alphabet. »
Expected output:
Out[]=
11.21Generate the uppercase Greek alphabet. »
Expected output:
Out[]=
11.22Make a bar chart of the letter numbers in “wolfram”. »
Expected output:
Out[]=
11.23Use FromLetterNumber to make a string of 1000 random letters. »
Sample expected output:
Out[]=
11.24Make a list of 100 random 5-letter strings. »
Sample expected output:
Out[]=
11.25Transliterate “wolfram” into Greek. »
Expected output:
Out[]=
11.26Make a string of 10 wolf, ram emoji 🐺🐏🐺🐏.... »
Expected output:
Out[]=
11.27Get the Arabic alphabet and transliterate it into English. »
Expected output:
Out[]=
11.28Make a white-on-black size-200 letter “A”. »
Expected output:
Out[]=
11.29Use Manipulate to make an interactive selector of size-100 characters from the alphabet, controlled by a slider. »
Expected output:
Out[]=
11.30Use Manipulate to make an interactive selector of black-on-white outlines of rasterized size-100 characters from the alphabet, controlled by a menu. »
Expected output:
Out[]=
11.31Use Manipulate to create a “vision simulator” that blurs a size-200 letter “A” by an amount from 0 to 50. »
Expected output:
Out[]=
+11.1Generate a string of the alphabet followed by the alphabet written in reverse. »
Expected output:
Out[]=
+11.2Make a column of a string of the alphabet and its reverse. »
Expected output:
Out[]=
+11.3Find how many sentences are in the Wikipedia article for “computer”. »
Sample expected output:
Out[]=
+11.4Join together without spaces, etc. the words in the first sentence in the Wikipedia article for “strings”. »
Sample expected output:
Out[]=
+11.5Find the length of the longest word in the Wikipedia article about computers. »
Sample expected output:
Out[]=
+11.6Plot the lengths of Roman numerals for numbers up to 2000. »
Sample expected output:
Out[]=
+11.7Generate a string by joining the Roman numerals up to 100. »
Expected output:
Out[]=
+11.8Make a line plot of the successive letter numbers for the concatenation of all Roman numerals up to 30. »
Expected output:
Out[]=
+11.9Find the maximum string length of the name of any integer up to 1000. »
Expected output:
Out[]=
+11.10Make a list of uppercase size-20 letters of the alphabet in random colors. »
Sample expected output:
Out[]=
+11.11Make a list of 100 random 5-letter strings with the Russian alphabet. »
Sample expected output:
Out[]=
+11.12Create a Manipulate to display edges in a size-200 letter “A”, blurred from 0 to 50. »
Expected output:
Out[]=
+11.13Add together white-on-black size-200 letters A and B. »
Expected output:
Out[]=
Q&A
What is the difference between "x" and x?
"x" is a string; x is a Wolfram Language symbol, just like Plus or Max, that can be defined to actually do computations. We’ll talk much more about symbols later.
How do I enter characters that aren’t on my keyboard?
You can use whatever methods your computer provides, or you can do it directly with the Wolfram Language using constructs such as \[Alpha].
How do I put quotes (") inside a string?
Use \" (and if you want to put \" literally in the string, use \\ \"). (You’ll use a lot of backslashes if you want to put \\ \" in: \\ \\ \\ \".)
How are the colors of elements in word clouds determined?
By default it’s random within a certain color palette. You can specify it if you want to.
How come the word cloud shows “s” as the most common letter?
Because it is the most common first letter for common words in English. If you look at all letters, the most common is “e”.
What about letters that aren’t English? How are they numbered?
LetterNumber["α", "Greek"] gives numbering in the Greek alphabet. All characters are assigned a character code. You can find it using ToCharacterCode.
What alphabets does the Wolfram Language know about?
Basically all the ones that are used today. Try “Greek” or “Arabic”, or the name of a language. Note that when a language uses accented characters, it’s sometimes tricky to decide what’s “in” the alphabet, and what’s just derived from it.
Can I translate words instead of just transliterating their letters?
Can I get lists of common words in languages other than English?
Yes. Use WordList[Language"Spanish"], etc.
Tech Notes
  • RandomWord[10] gives 10 random words. How many of them do you know?
  • StringTake["string", -2] takes 2 characters from the end of the string.
  • Every character, whether “a”, “α”, “狼” or “🐼”, is represented by a Unicode character code, found with ToCharacterCode. You can explore “Unicode space” with FromCharacterCode.
  • Characters can look very different when they’re shown in different fonts; emoji can look very different between different operating systems.
  • If you get a different result from WikipediaData, that’s because Wikipedia has been changed.
  • WordCloud automatically removes “uninteresting” words in text, like “the”, “and”, etc.
  • If you can’t figure out the name of an alphabet or language, use (as described in Section 16) to give it in natural language form.
Next Section