Work with a Large Database
This example looks at a terabyte-scale database and performs some basic queries that would be impossible to perform in-memory.
Open Street Map is a collaborative effort to generate a free map of the world. The project was created in 2004 and its over two million users have generated over a terabyte of data. As such, it is a great example database for showcasing out-of-core data science. Instructions on how to get the data and set up a database server can be found here.
Register the database for usage with entities.
This is a very large database; its largest table "planet_osm_nodes" takes up almost 200 GB on disk. Here is how many rows it contains.
Suppose you wanted to find all the streets that contain "Wolf".
Unfortunately these contain quite a few duplicates, but you can check for the number of distinct names.
Another interesting thing to look at is the "planet_osm_table", which contains lots of metadata about various objects. For example, you can check how many trees were mapped.
Or what the most common sport structures are.
Visualize the result.