Examine Characteristics of Languages, Alphabets, and Scripts
Version 11 provides access to extensive built-in knowledge about languages, writing scripts, and alphabets.
Different languages may share the same writing script (or writing system) but still use different alphabets of characters. This example explores the large variability in numbers of characters in the languages using the Latin writing script.
Take the list of alphabets that use the Latin writing script.
alphabets =
EntityList[
EntityClass["Alphabet",
"WritingScripts" -> Entity["WritingScript", "Latin::6tr5q"]]];
Length[alphabets]
There are 131 such alphabets. Show a small sample of them.
RandomSample[alphabets, 15]
Construct an association storing the list of characters of each alphabet.
letters =
EntityValue[alphabets, "CommonAlphabet", "EntityAssociation"];
The shortest alphabet is Mohawk, with 12 letters.
letters[Entity["Alphabet", "Mohawk::p8wq4"]]
The longest alphabet is Slovak, with 46 characters.
letters[Entity["Alphabet", "Slovak::kj62d"]]
This histogram shows that the most common length is 26 letters, like English, though not all 26-letter alphabets contain the same letters.
Histogram[Length /@ letters, 30]
Now count the number of alphabets in which a given letter is present. Only three letters are present in all 131 Latin alphabets, namely a, i, n.
TakeLargest[Counts[Flatten[Values[letters]]], 10]
Mohawk does not contain the letter m, and the Hawaiian alphabet is the only one not containing t.
letters[Entity["Alphabet", "Hawaiian::p38r5"]]