WOLFRAM

11 Strings and Text

11Strings and Text
Another thing the Wolfram Language lets you compute with is text. You enter text as a string, indicated by quotes (").
Enter a string:
In[1]:=
Out[1]=
StringLength counts the number of characters in a string:
In[2]:=
Out[2]=
StringReverse reverses the characters in a string:
In[3]:=
Out[3]=
ToUpperCase makes all the characters in a string uppercase (capital letters):
In[4]:=
Out[4]=
StringTake takes a certain number of characters from the beginning of a string:
In[5]:=
Out[5]=
If you take 10 characters, you get a string of length 10:
In[6]:=
Out[6]=
StringJoin joins strings (don’t forget spaces if you want to separate words):
In[7]:=
Out[7]=
A list of strings:
In[8]:=
Out[8]=
Get the first two characters from each string:
In[9]:=
Out[9]=
StringJoin joins the strings in a list:
In[10]:=
Out[10]=
Sometimes it’s useful to turn strings into lists of their constituent characters. Each character is actually a string itself, of length 1.
Characters breaks a string into a list of its characters:
In[11]:=
Out[11]=
Once you’ve broken a string into a list of characters, you can use all the usual list functions on it.
Sort the characters in a string:
In[12]:=
Out[12]=
The invisible elements at the beginning of the list are space characters. If you want to see strings in the form you’d input them, complete with "...", use InputForm.
InputForm shows strings as you would input them, including quotes:
In[13]:=
Out[13]=
Functions like StringJoin and Characters work on strings of any kind; it doesn’t matter if they’re meaningful text or not. There are other functions, like TextWords, that specifically work on meaningful text, written, say, in English.
TextWords gives a list of the words in a string of text:
In[14]:=
Out[14]=
This gives the length of each word:
In[15]:=
Out[15]=
TextSentences breaks a text string into a list of sentences:
In[16]:=
Out[16]=
There are lots of ways to get text into the Wolfram Language. One example is the WikipediaData function, which gets the current text of Wikipedia articles.
Get the first 100 characters of the Wikipedia article about “computers”:
In[17]:=
Out[17]=
A convenient way to get a sense of what’s in a piece of text is to create a word cloud. The function WordCloud does this.
Create a word cloud for the Wikipedia article on “computers”:
In[18]:=
Out[18]=
Get the first 20 words from a list of common English words:
In[19]:=
Out[19]=
Make a word cloud from the first letters of all the words:
In[20]:=
Out[20]=
Strings don’t have to contain text. In a juxtaposition of ancient and modern, we can for example generate Roman numerals as strings.
Generate the Roman numeral string for 1988:
In[21]:=
Out[21]=
Make a table of the Roman numerals for numbers up to 20:
In[22]:=
Out[22]=
As with everything, we can do computations on these strings. For example, we can plot the lengths of successive Roman numerals.
Plot the lengths of the Roman numerals for numbers up to 100:
In[23]:=
Out[23]=
IntegerName gives the English name of an integer.
Generate a string giving the name of the integer 56:
In[24]:=
Out[24]=
Here’s a plot of the lengths of integer names in English:
In[25]:=
Out[25]=
There are various ways to turn letters into numbers (and vice versa).
Alphabet gives the alphabet:
In[26]:=
Out[26]=
LetterNumber tells you where in the alphabet a letter appears:
In[27]:=
Out[27]=
FromLetterNumber does the opposite:
In[28]:=
Out[28]=
Alphabet knows about non-English alphabets too:
In[29]:=
Out[29]=
In[30]:=
Out[30]=
This transliterates the word “wolfram” into the Russian alphabet:
In[31]:=
Out[31]=
Reverse the characters in a string made of emoji:
In[32]:=
Out[32]=
Generate an image of a piece of text:
In[33]:=
Out[33]=
Do image processing on it:
In[34]:=
Out[34]=
11.1Join two copies of the string "Hello"»
Expected output:
Out[]=
11.2Make a single string of the whole alphabet, in uppercase. »
Expected output:
Out[]=
11.3Generate a string of the alphabet in reverse order. »
Expected output:
Out[]=
11.4Join 100 copies of the string "AGCT"»
Expected output:
Out[]=
11.5Use StringTake, StringJoin and Alphabet to get "abcdef"»
Expected output:
Out[]=
11.6Create a column with increasing numbers of letters from the string "this is about strings"»
Expected output:
Out[]=
11.7Make a bar chart of the lengths of the words in “A long time ago, in a galaxy far, far away”. »
Expected output:
Out[]=
11.8Find the string length of the Wikipedia article for “computer”. »
Sample expected output:
Out[]=
11.9Find how many words are in the Wikipedia article for “computer”. »
Sample expected output:
Out[]=
11.10Find the first sentence in the Wikipedia article about “strings”. »
Sample expected output:
Out[]=
11.11Make a string from the first letters of all sentences in the Wikipedia article about computers. »
Sample expected output:
Out[]=
11.12Find the maximum word length among English words from WordList[]»
Sample expected output:
Out[]=
11.13Count the number of words in WordList[ ] that start with “q”. »
Sample expected output:
Out[]=
11.14Make a line plot of the lengths of the first 1000 words from WordList[]»
Sample expected output:
Out[]=
11.15Use StringJoin and Characters to make a word cloud of all letters in the words from WordList[]»
Sample expected output:
Out[]=
11.16Use StringReverse to make a word cloud of the last letters in the words from WordList[]»
Sample expected output:
Out[]=
11.17Find the Roman numerals for the year 1959. »
Expected output:
Out[]=
11.18Find the maximum string length of any Roman-numeral year from 1 to 2020. »
Expected output:
Out[]=
11.19Make a word cloud from the first characters of the Roman numerals up to 100. »
Expected output:
Out[]=
11.20Use Length to find the length of the Russian alphabet. »
Expected output:
Out[]=
11.21Generate the uppercase Greek alphabet. »
Expected output:
Out[]=
11.22Make a bar chart of the letter numbers in “wolfram”. »
Expected output:
Out[]=
11.23Use FromLetterNumber to make a string of 1000 random letters. »
Sample expected output:
Out[]=
11.24Make a list of 100 random 5-letter strings. »
Sample expected output:
Out[]=
11.25Transliterate “wolfram” into Greek. »
Expected output:
Out[]=
11.26Make a string of 10 wolf, ram emoji 🐺🐏🐺🐏.... »
Expected output:
Out[]=
11.27Get the Arabic alphabet and transliterate it into English. »
Expected output:
Out[]=
11.28Make a white-on-black size-200 letter “A”. »
Expected output:
Out[]=
11.29Use Manipulate to make an interactive selector of size-100 characters from the alphabet, controlled by a slider. »
Expected output:
Out[]=
11.30Use Manipulate to make an interactive selector of black-on-white outlines of rasterized size-100 characters from the alphabet, controlled by a menu. »
Expected output:
Out[]=
11.31Use Manipulate to create a “vision simulator” that blurs a size-200 letter “A” by an amount from 0 to 50. »
Expected output:
Out[]=
+11.1Generate a string of the alphabet followed by the alphabet written in reverse. »
Expected output:
Out[]=
+11.2Make a column of a string of the alphabet and its reverse. »
Expected output:
Out[]=
+11.3Find how many sentences are in the Wikipedia article for “computer”. »
Sample expected output:
Out[]=
+11.4Join together without spaces, etc. the words in the first sentence in the Wikipedia article for “strings”. »
Sample expected output:
Out[]=
+11.5Find the length of the longest word in the Wikipedia article about computers. »
Sample expected output:
Out[]=
+11.6Plot the lengths of Roman numerals for numbers up to 2000. »
Sample expected output:
Out[]=
+11.7Generate a string by joining the Roman numerals up to 100. »
Expected output:
Out[]=
+11.8Make a line plot of the successive letter numbers for the concatenation of all Roman numerals up to 30. »
Expected output:
Out[]=
+11.9Find the maximum string length of the name of any integer up to 1000. »
Expected output:
Out[]=
+11.10Make a list of uppercase size-20 letters of the alphabet in random colors. »
Sample expected output:
Out[]=
+11.11Make a list of 100 random 5-letter strings with the Russian alphabet. »
Sample expected output:
Out[]=
+11.12Create a Manipulate to display edges in a size-200 letter “A”, blurred from 0 to 50. »
Expected output:
Out[]=
+11.13Add together white-on-black size-200 letters A and B. »
Expected output:
Out[]=
What is the difference between "x" and x?
"x" is a string; x is a Wolfram Language symbol, just like Plus or Max, that can be defined to actually do computations. We’ll talk much more about symbols later.
How do I enter characters that aren’t on my keyboard?
You can use whatever methods your computer provides, or you can do it directly with the Wolfram Language using constructs such as \[Alpha].
How do I put quotes (") inside a string?
Use \" (and if you want to put \" literally in the string, use \\ \"). (You’ll use a lot of backslashes if you want to put \\ \" in: \\ \\ \\ \".)
How are the colors of elements in word clouds determined?
By default it’s random within a certain color palette. You can specify it if you want to.
How come the word cloud shows “s” as the most common letter?
Because it is the most common first letter for common words in English. If you look at all letters, the most common is “e”.
What about letters that aren’t English? How are they numbered?
LetterNumber["α", "Greek"] gives numbering in the Greek alphabet. All characters are assigned a character code. You can find it using ToCharacterCode.
Basically all the ones that are used today. Try “Greek” or “Arabic”, or the name of a language. Note that when a language uses accented characters, it’s sometimes tricky to decide what’s “in” the alphabet, and what’s just derived from it.
Can I translate words instead of just transliterating their letters?
Can I get lists of common words in languages other than English?
Yes. Use WordList[Language"Spanish"], etc.
  • RandomWord[10] gives 10 random words. How many of them do you know?
  • StringTake["string", -2] takes 2 characters from the end of the string.
  • Every character, whether “a”, “α”, “狼” or “🐼”, is represented by a Unicode character code, found with ToCharacterCode. You can explore “Unicode space” with FromCharacterCode.
  • Characters can look very different when they’re shown in different fonts; emoji can look very different between different operating systems.
  • If you get a different result from WikipediaData, that’s because Wikipedia has been changed.
  • WordCloud automatically removes “uninteresting” words in text, like “the”, “and”, etc.
  • If you can’t figure out the name of an alphabet or language, use ctrl+= (as described in Section 16) to give it in natural language form.
Next Section