Create a simple dataset that can be viewed as having 2 rows and 3 columns:
You can first extract the whole “b row”, then get the “z” element of the result:
You can also just get the whole “b row” of the dataset. The result is a new dataset, which for ease of reading happens to be displayed in this case as a column.
Generate a new dataset from the “b row” of the original dataset:
Here is the dataset that corresponds to the “z column” for all “rows”.
Generate a dataset consisting of the “z column” for all rows:
If we use
f instead of
Total, we can see what
’s going on: the function is being applied to each of the
“row
” associations.
Apply the function f to each row:
Apply a function that adds the x and z elements of each association:
You can give a function to apply to all rows too.
This extracts the value of each “z column”, then applies f to the association of results:
Apply f to the totals of all columns:
Find totals for all rows, then pick out the total for the “b row”:
It’s equivalent to this:
The operator form of
Select is a function which can be applied to actually perform the
Select operation.
Make a dataset by selecting only rows whose “z column” is greater than 5:
For each row, select columns whose values are greater than 5, leaving a ragged structure:
Normal turns the dataset into an ordinary association of associations:
Many Wolfram Language functions have operator forms.
Sort rows according to the value of the difference of the x and y columns:
Sort the rows, and find the total of all columns:
Sometimes you want to apply a function to each element in the dataset.
Apply f to each element in the dataset:
Sort the rows before totaling the squares of their elements:
A dataset formed from a list of associations:
If we ask about the moons of Mars, we get a dataset, which we can then query further.
Get a dataset about the moons of Mars:
“Drill down” to make a table of radii of all the moons of Mars:
Make a dataset of the number of moons listed for each planet:
Find the total mass of all moons for each planet:
Get the same result, but only for planets with more than 10 moons:
Get a dataset with moons that are more than 1% of the mass of the Earth.
For all moons, select ones whose mass is greater than 0.01 times the mass of the Earth:
Get the list of keys (i.e. moon names) in the resulting association for each planet:
Here’s the whole computation in one line:
Make number line plots of the logarithms of masses for moons of each planet:
Here’s how to make a word cloud of names of moons, sized according to the masses of the moons. To do this, we need a single association that associates the name of each moon with its mass.
We’ve seen before that we can write something like f[g[x]] as f@g
@x or
x//g//f. We can also write it
f[g[#]]&[x]. But what about
f[g[#]]&? Is there a short way to write this? The answer is that there is, in terms of the
function composition operators @* and
/*.
f@*g
@*h represents a composition of functions to be applied right-to-left:
h/*g
/*f represents a composition of functions to be applied left-to-right:
Here
’s the previous code rewritten using composition
@*:
As a final example, let
’s look at another dataset
—this time coming straight from the
Wolfram Data Repository. Here
’s a webpage (about big meteors) from the repository:
To get the main dataset that
’s mentioned here, just use
ResourceData.
Extract the coordinates entry from each row, and plot the results:
Make a histogram of the altitudes:
Dataset[data] | | a dataset |
Normal[dataset] | | convert a dataset to normal lists and associations |
Catenate[{assoc1, ...}] | | catenate associations, combining their elements |
f@*g | | composition of functions (f[g[x]] when applied to x) |
f/*g | | right composition (g[f[x]] when applied to x) |
Note: These exercises use the dataset
planets=CloudGet["http://wolfr.am/7FxLgPm5"].
45.1Make a word cloud of the planets, with weights determined by their number of moons.
»
45.2Make a bar chart of the number of moons for each planet.
»
45.3Make a dataset of the masses of the planets, sorted by their number of moons.
»
45.4Make a dataset of planets and the mass of each one
’s most massive moon.
»
45.5Make a dataset of masses of planets, where the planets are sorted by the largest mass of their moons.
»
45.6Make a dataset of the median mass of all moons for each planet.
»
45.8Make a word cloud of countries in Central America, with the names of countries proportional to the lengths of the Wikipedia article about them.
»
45.9Find the maximum observed altitude in the Fireballs & Bolides dataset.
»
45.10Find a dataset of the 5 largest observed altitudes in the Fireballs & Bolides dataset.
»
45.11Make a histogram of the differences in successive peak brightness times in the Fireballs & Bolides dataset.
»
45.12Plot the nearest cities for the first 10 entries in the Fireballs & Bolides dataset, labeling each city.
»
45.13Plot the nearest cities for the 10 entries with largest altitudes in the Fireballs & Bolides dataset, labeling each city.
»
What kinds of data can datasets contain?
Any kinds. Not just numbers and text but also images, graphs and lots more. There’s no need for all elements of a particular row or column to be the same type.
What are databases and how do they relate to
Dataset?
SQL databases are strictly based on tables of data arranged in rows and columns of particular types, with additional data linked in through
“foreign keys
”.
Dataset can have any mixture of types of data, with any number of levels of nesting, and any hierarchical structure, somewhat more analogous to a NoSQL database, but with additional operations made possible by the symbolic nature of the language.