利用内置数据改善精选数据集
Wolfram 知识库中丰富的内置社会经济学数据集合可以用于改善外部数据源.
从美国住房和城市发展部(HUD)导入 Head Start 机构的地址数据集,作为开始.
In[1]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_81.png)
hs = ResourceData["Head Start Locations"][
All, {"CenterName", "CenterAddress", "CenterStateEntity",
"CenterCityEntity", "CenterZipCodeEntity", "Coordinates"}];
In[2]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_82.png)
Take[hs, 5]
Out[2]=
![](assets.zh/enhance-curated-datasets-with-built-in-data/O_57.png)
使用选择算符提取位置的一个子集.
In[3]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_83.png)
hsChicago =
hs[Select[#CenterCityEntity ===
Entity["City", {"Chicago", "Illinois", "UnitedStates"}] &]];
将它们表示在芝加哥地图上.
In[4]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_84.png)
GeoListPlot[hsChicago[All, #Coordinates &]]
Out[4]=
![](assets.zh/enhance-curated-datasets-with-built-in-data/O_58.png)
将 Head Start 地址分组,并根据邮政编码绘制它们的分布图.
In[5]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_85.png)
chicagoCounts = Length /@ GroupBy[hsChicago, #CenterZipCodeEntity &];
In[6]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_86.png)
GeoRegionValuePlot[chicagoCounts]
Out[6]=
![](assets.zh/enhance-curated-datasets-with-built-in-data/O_59.png)
检索在芝加哥邮编代码内有关学龄人口的内置数据,并将这些值与从 HUD 外部数据推导的 Head Start 地址计数绘图.
In[7]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_87.png)
zipcodes = Normal@Keys[chicagoCounts]
Out[7]=
![](assets.zh/enhance-curated-datasets-with-built-in-data/O_60.png)
In[8]:=
![Click for copyable input](assets.zh/enhance-curated-datasets-with-built-in-data/In_88.png)
chicagoMedian =
EntityValue[zipcodes,
EntityProperty["ZIPCode", "Population", {"Age" -> "SchoolAge"}],
"EntityAssociation"];
显示完整的 Wolfram 语言输入
Out[9]=
![](assets.zh/enhance-curated-datasets-with-built-in-data/O_61.png)