New in Wolfram
Mathematica
8: Parametric Probability Distributions
◄
previous

next
►
Core Algorithms
English Is Not a Random Set of Words
The length of words in English follows a
BinomialDistribution
—the fit is shown on the first picture. The collection of words with random length can be modeled well by a
WaringYuleDistribution
, as the second graphic shows.
In[1]:=
X
worddata = StringLength /@ DictionaryLookup[{"English", All}];
In[2]:=
X
binom = EstimatedDistribution[worddata, BinomialDistribution[n, p], ParameterEstimator > {"MaximumLikelihood", Method > {"FindRoot", MaxIterations > 1000}}];
In[3]:=
X
randomWordLength = StringLength /@ StringSplit[ StringJoin[ RandomChoice[CharacterRange["a", "z"]~Join~{" "}, 10^6]]];
In[4]:=
X
wyd = EstimatedDistribution[randomWordLength  1, WaringYuleDistribution[a, b]];
In[5]:=
X
Show[Histogram[worddata, {Range[25]  1/2}, "ProbabilityDensity", ChartStyle > "SolarColors", BaseStyle > {FontFamily > "Verdana"}, PlotLabel > "Length of Words in English"], DiscretePlot[PDF[binom, x], {x, 0, 25}, PlotRange > All, PlotStyle > Black, PlotMarkers > {Automatic, Medium}], Epilog > Inset[Framed[ Style[Grid[{{"Estimated Distribution:"}, {binom}}], 12], Background > LightBlue, RoundingRadius > 3], {Right, 0.15}, {Right, Top}], ImageSize > 400] Show[Histogram[randomWordLength  1, {0, 120, 10}, "PDF", ChartStyle > "SolarColors", BaseStyle > {FontFamily > "Verdana"}, PlotLabel > "Length of Words at Random"], DiscretePlot[PDF[wyd, k], {k, 0, 120}, PlotStyle > Black, PlotMarkers > {Automatic, Tiny}], Epilog > Inset[Framed[Style[Grid[{{"Estimated Distribution:"}, {wyd}}], 12], Background > LightBlue, RoundingRadius > 3], {Right, 0.03}, {Right, Top}], ImageSize > 400]
Out[5]=
Out[5]=