New in Wolfram
Mathematica
8: Parametric Probability Distributions
◄
previous
|
next
►
Core Algorithms
English Is Not a Random Set of Words
The length of words in English follows a
BinomialDistribution
—the fit is shown on the first picture. The collection of words with random length can be modeled well by a
WaringYuleDistribution
, as the second graphic shows.
In[1]:=
X
worddata = StringLength /@ DictionaryLookup[{"English", All}];
In[2]:=
X
binom = EstimatedDistribution[worddata, BinomialDistribution[n, p], ParameterEstimator -> {"MaximumLikelihood", Method -> {"FindRoot", MaxIterations -> 1000}}];
In[3]:=
X
randomWordLength = StringLength /@ StringSplit[ StringJoin[ RandomChoice[CharacterRange["a", "z"]~Join~{" "}, 10^6]]];
In[4]:=
X
wyd = EstimatedDistribution[randomWordLength - 1, WaringYuleDistribution[a, b]];
In[5]:=
X
Show[Histogram[worddata, {Range[25] - 1/2}, "ProbabilityDensity", ChartStyle -> "SolarColors", BaseStyle -> {FontFamily -> "Verdana"}, PlotLabel -> "Length of Words in English"], DiscretePlot[PDF[binom, x], {x, 0, 25}, PlotRange -> All, PlotStyle -> Black, PlotMarkers -> {Automatic, Medium}], Epilog -> Inset[Framed[ Style[Grid[{{"Estimated Distribution:"}, {binom}}], 12], Background -> LightBlue, RoundingRadius -> 3], {Right, 0.15}, {Right, Top}], ImageSize -> 400] Show[Histogram[randomWordLength - 1, {0, 120, 10}, "PDF", ChartStyle -> "SolarColors", BaseStyle -> {FontFamily -> "Verdana"}, PlotLabel -> "Length of Words at Random"], DiscretePlot[PDF[wyd, k], {k, 0, 120}, PlotStyle -> Black, PlotMarkers -> {Automatic, Tiny}], Epilog -> Inset[Framed[Style[Grid[{{"Estimated Distribution:"}, {wyd}}], 12], Background -> LightBlue, RoundingRadius -> 3], {Right, 0.03}, {Right, Top}], ImageSize -> 400]
Out[5]=
Out[5]=