Assignment 6#
Due: Wednesday Nov 24th at 11:59 pm ET
Goal#
Analyze large vector data using Dask-GeoPandas and cloud-native data on Source Cooperative.
Instructions#
You should submit this assignment to your existing geog213-assignments or geog313-assignments GitHub repository under a new directory named assignment-6/.
Include:
Scripts that contain the functions that you develop in this assignment.
An executed Jupyter notebook (
.ipynb) with all the outputs.A
Dockerfilethat builds a reproducible environment.A
README.mdwith instructions to reproduce the results of your work.
Data#
For this assignment you will be working with the Google-Microsoft Open Buildings Dataset - Combined by VIDA which is available on Source Cooperative. Check out the Read Me of the dataset to understand the dataset and familiarize yourself with its metadata.
Tasks#
1) Download Data (15 pts)#
Write a function that receives the ISO code for a country and downloads the corresponding geoparquet file or files for that country.
Use this function to download the data for Haiti.
2) Load Geoparquet Data (15 pts)#
Write a function that loads the building footprints for the country of interest from the downloaded geoparquet file(s). The geoparquet file(s) in this dataset might be large depending on the country you are working with. So, you should implement lazy data loading using
dask_geopandasfunctionality.Use this function to load building footprints for Haiti.
3) Analyze the Data (70 pts)#
In this section, you will analyze the data using the functionality provided by Dask:
Plot the histogram of the area of all buildings provided by Microsoft as the source.
Note: this might have a very skewed distribution. Try passing arguments to your histogram function to create a more even histogram plot, and explain your approach.
Count the number of building footprints that
intersectwith each other across the country.From the intersecting building footprints, calculate how many:
Google building footprints intersect another Google building footprint
Microsoft building footprints intersect another Microsoft building footprint
Google building footprints intersect a Microsoft building footprint
4) Spatial Join (This task is only for students registered at the 200-level) (50 pts)#
Perform a spatial join between the building footprints dataset and the administrative boundaries at level 2 dataset for Haiti. For each administrative unit, compute:
The total number of buildings.
The percentage of the area of the administrative unit covered by buildings.
The mean and median of building areas.
The proportion of buildings from Google vs. Microsoft within each unit.
Plot a map of the country with all the administrative boundaries at level 2 which shows the number of buildings in each administrative unit.