Wolfram Language

Text & Language Processing

Frequencies of Letters vs. First Letters

Show that in an English dictionary the most frequent characters at the beginning of words do not coincide with the most frequent characters globally.

Get a list of common English words from WordList.

In[1]:=
Click for copyable input
Length[words = WordList[]]
Out[1]=

Take the first letter of each word.

In[2]:=
Click for copyable input
firstchars = StringTake[words, 1];

Count the number of words starting with each of these letters.

In[3]:=
Click for copyable input
Counts[firstchars]
Out[3]=

Visualize their relative predominance by generating a WordCloud. The most frequent first letters are the consonants s, c, p, d.

In[4]:=
Click for copyable input
WordCloud[firstchars]
Out[4]=

Compute the relative predominance of all the letters in all words by using LetterCounts.

In[5]:=
Click for copyable input
allchars = LetterCounts[StringJoin[words], IgnoreCase -> True]
Out[5]=

Now the most frequent letters are the vowels e, i, a.

In[6]:=
Click for copyable input
WordCloud[allchars]
Out[6]=

Related Examples

de es fr ja ko pt-br ru zh