{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Dask-GeoPandas\n", "\n", "**Attribution**: *This notebook is a revised version of the [Basic Introduction](https://dask-geopandas.readthedocs.io/en/stable/guide/basic-intro.html) notebook from Dask-GeoPandas documentation.* \n", "\n", "This notebook illustrates the basic API of Dask-GeoPandas and provides a basic timing comparison between operations on `geopandas.GeoDataFrame` and parallel `dask_geopandas.GeoDataFrame`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can access this notebook (in a Docker image) on this [GitHub repo](https://github.com/HamedAlemo/vector-data-tutorial)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "import requests\n", "import numpy as np\n", "import geopandas as gpd\n", "import dask_geopandas as dg\n", "import cartopy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download a Sample Dataset using Cartopy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to use the [Natural Earth dataset](https://www.naturalearthdata.com/). This dataset has several vector files for different physical and cultural boundaries at different spatial scales. We will use the [110m Admin 0 Countries dataset](https://www.naturalearthdata.com/downloads/110m-cultural-vectors/110m-admin-0-countries/). \n", "\n", "You can use `cartopy` to download this dataset locally. Note that the file will be downloaded to a local cache folder. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "admin_shp = cartopy.io.shapereader.natural_earth(\n", " resolution='110m',\n", " category='cultural',\n", " name='admin_0_countries'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a Dask-GeoPandas `GeoDataFrame`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many ways to create a `dask_geopandas.GeoDataFrame`. If your initial data fits in memory, you can create it from a `geopandas.GeoDataFrame` using the `from_geopandas` function:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "ERROR 1: PROJ: proj_create_from_database: Open of /opt/conda/envs/vector_tutorial/share/proj failed\n" ] } ], "source": [ "gdf = gpd.read_file(admin_shp)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
featureclascalerankLABELRANKSOVEREIGNTSOV_A3ADM0_DIFLEVELTYPETLCADMIN...FCLASS_TRFCLASS_IDFCLASS_PLFCLASS_GRFCLASS_ITFCLASS_NLFCLASS_SEFCLASS_BDFCLASS_UAgeometry
0Admin-0 country16FijiFJI02Sovereign country1Fiji...NoneNoneNoneNoneNoneNoneNoneNoneNoneMULTIPOLYGON (((180 -16.06713, 180 -16.55522, ...
1Admin-0 country13United Republic of TanzaniaTZA02Sovereign country1United Republic of Tanzania...NoneNoneNoneNoneNoneNoneNoneNoneNonePOLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3...
2Admin-0 country17Western SaharaSAH02Indeterminate1Western Sahara...UnrecognizedUnrecognizedUnrecognizedNoneNoneUnrecognizedNoneNoneNonePOLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3Admin-0 country12CanadaCAN02Sovereign country1Canada...NoneNoneNoneNoneNoneNoneNoneNoneNoneMULTIPOLYGON (((-122.84 49, -122.97421 49.0025...
4Admin-0 country12United States of AmericaUS112Country1United States of America...NoneNoneNoneNoneNoneNoneNoneNoneNoneMULTIPOLYGON (((-122.84 49, -120 49, -117.0312...
\n", "

5 rows × 169 columns

\n", "
" ], "text/plain": [ " featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 \\\n", "0 Admin-0 country 1 6 Fiji FJI \n", "1 Admin-0 country 1 3 United Republic of Tanzania TZA \n", "2 Admin-0 country 1 7 Western Sahara SAH \n", "3 Admin-0 country 1 2 Canada CAN \n", "4 Admin-0 country 1 2 United States of America US1 \n", "\n", " ADM0_DIF LEVEL TYPE TLC ADMIN ... \\\n", "0 0 2 Sovereign country 1 Fiji ... \n", "1 0 2 Sovereign country 1 United Republic of Tanzania ... \n", "2 0 2 Indeterminate 1 Western Sahara ... \n", "3 0 2 Sovereign country 1 Canada ... \n", "4 1 2 Country 1 United States of America ... \n", "\n", " FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT \\\n", "0 None None None None None \n", "1 None None None None None \n", "2 Unrecognized Unrecognized Unrecognized None None \n", "3 None None None None None \n", "4 None None None None None \n", "\n", " FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA \\\n", "0 None None None None \n", "1 None None None None \n", "2 Unrecognized None None None \n", "3 None None None None \n", "4 None None None None \n", "\n", " geometry \n", "0 MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ... \n", "1 POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3... \n", "2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... \n", "3 MULTIPOLYGON (((-122.84 49, -122.97421 49.0025... \n", "4 MULTIPOLYGON (((-122.84 49, -120 49, -117.0312... \n", "\n", "[5 rows x 169 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When creating `dask_geopandas.GeoDataFrame` we have to specify how to partittion using `npartitons` or `chunksize`. Here we use `npartitions` to split it into N equal chunks." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "ddf = dg.from_geopandas(gdf, npartitions=4)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
Dask-GeoPandas GeoDataFrame Structure:
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
featureclascalerankLABELRANKSOVEREIGNTSOV_A3ADM0_DIFLEVELTYPETLCADMINADM0_A3GEOU_DIFGEOUNITGU_A3SU_DIFSUBUNITSU_A3BRK_DIFFNAMENAME_LONGBRK_A3BRK_NAMEBRK_GROUPABBREVPOSTALFORMAL_ENFORMAL_FRNAME_CIAWFNOTE_ADM0NOTE_BRKNAME_SORTNAME_ALTMAPCOLOR7MAPCOLOR8MAPCOLOR9MAPCOLOR13POP_ESTPOP_RANKPOP_YEARGDP_MDGDP_YEARECONOMYINCOME_GRPFIPS_10ISO_A2ISO_A2_EHISO_A3ISO_A3_EHISO_N3ISO_N3_EHUN_A3WB_A2WB_A3WOE_IDWOE_ID_EHWOE_NOTEADM0_ISOADM0_DIFFADM0_TLCADM0_A3_USADM0_A3_FRADM0_A3_RUADM0_A3_ESADM0_A3_CNADM0_A3_TWADM0_A3_INADM0_A3_NPADM0_A3_PKADM0_A3_DEADM0_A3_GBADM0_A3_BRADM0_A3_ILADM0_A3_PSADM0_A3_SAADM0_A3_EGADM0_A3_MAADM0_A3_PTADM0_A3_ARADM0_A3_JPADM0_A3_KOADM0_A3_VNADM0_A3_TRADM0_A3_IDADM0_A3_PLADM0_A3_GRADM0_A3_ITADM0_A3_NLADM0_A3_SEADM0_A3_BDADM0_A3_UAADM0_A3_UNADM0_A3_WBCONTINENTREGION_UNSUBREGIONREGION_WBNAME_LENLONG_LENABBREV_LENTINYHOMEPARTMIN_ZOOMMIN_LABELMAX_LABELLABEL_XLABEL_YNE_IDWIKIDATAIDNAME_ARNAME_BNNAME_DENAME_ENNAME_ESNAME_FANAME_FRNAME_ELNAME_HENAME_HINAME_HUNAME_IDNAME_ITNAME_JANAME_KONAME_NLNAME_PLNAME_PTNAME_RUNAME_SVNAME_TRNAME_UKNAME_URNAME_VINAME_ZHNAME_ZHTFCLASS_ISOTLC_DIFFFCLASS_TLCFCLASS_USFCLASS_FRFCLASS_RUFCLASS_ESFCLASS_CNFCLASS_TWFCLASS_INFCLASS_NPFCLASS_PKFCLASS_DEFCLASS_GBFCLASS_BRFCLASS_ILFCLASS_PSFCLASS_SAFCLASS_EGFCLASS_MAFCLASS_PTFCLASS_ARFCLASS_JPFCLASS_KOFCLASS_VNFCLASS_TRFCLASS_IDFCLASS_PLFCLASS_GRFCLASS_ITFCLASS_NLFCLASS_SEFCLASS_BDFCLASS_UAgeometry
npartitions=4
0stringint32int32stringstringint32int32stringstringstringstringint32stringstringint32stringstringint32stringstringstringstringstringstringstringstringstringstringstringstringstringstringint32int32int32int32float64int32int32int32int32stringstringstringstringstringstringstringstringstringstringstringstringint32int32stringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringint32int32stringstringstringstringint32int32int32int32int32float64float64float64float64float64int64stringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringstringgeometry
45...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
89...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
133...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
176...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
\n", "
Dask Name: frompandas, 1 expression
" ], "text/plain": [ "Dask GeoDataFrame Structure:\n", " featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ADM0_A3 GEOU_DIF GEOUNIT GU_A3 SU_DIF SUBUNIT SU_A3 BRK_DIFF NAME NAME_LONG BRK_A3 BRK_NAME BRK_GROUP ABBREV POSTAL FORMAL_EN FORMAL_FR NAME_CIAWF NOTE_ADM0 NOTE_BRK NAME_SORT NAME_ALT MAPCOLOR7 MAPCOLOR8 MAPCOLOR9 MAPCOLOR13 POP_EST POP_RANK POP_YEAR GDP_MD GDP_YEAR ECONOMY INCOME_GRP FIPS_10 ISO_A2 ISO_A2_EH ISO_A3 ISO_A3_EH ISO_N3 ISO_N3_EH UN_A3 WB_A2 WB_A3 WOE_ID WOE_ID_EH WOE_NOTE ADM0_ISO ADM0_DIFF ADM0_TLC ADM0_A3_US ADM0_A3_FR ADM0_A3_RU ADM0_A3_ES ADM0_A3_CN ADM0_A3_TW ADM0_A3_IN ADM0_A3_NP ADM0_A3_PK ADM0_A3_DE ADM0_A3_GB ADM0_A3_BR ADM0_A3_IL ADM0_A3_PS ADM0_A3_SA ADM0_A3_EG ADM0_A3_MA ADM0_A3_PT ADM0_A3_AR ADM0_A3_JP ADM0_A3_KO ADM0_A3_VN ADM0_A3_TR ADM0_A3_ID ADM0_A3_PL ADM0_A3_GR ADM0_A3_IT ADM0_A3_NL ADM0_A3_SE ADM0_A3_BD ADM0_A3_UA ADM0_A3_UN ADM0_A3_WB CONTINENT REGION_UN SUBREGION REGION_WB NAME_LEN LONG_LEN ABBREV_LEN TINY HOMEPART MIN_ZOOM MIN_LABEL MAX_LABEL LABEL_X LABEL_Y NE_ID WIKIDATAID NAME_AR NAME_BN NAME_DE NAME_EN NAME_ES NAME_FA NAME_FR NAME_EL NAME_HE NAME_HI NAME_HU NAME_ID NAME_IT NAME_JA NAME_KO NAME_NL NAME_PL NAME_PT NAME_RU NAME_SV NAME_TR NAME_UK NAME_UR NAME_VI NAME_ZH NAME_ZHT FCLASS_ISO TLC_DIFF FCLASS_TLC FCLASS_US FCLASS_FR FCLASS_RU FCLASS_ES FCLASS_CN FCLASS_TW FCLASS_IN FCLASS_NP FCLASS_PK FCLASS_DE FCLASS_GB FCLASS_BR FCLASS_IL FCLASS_PS FCLASS_SA FCLASS_EG FCLASS_MA FCLASS_PT FCLASS_AR FCLASS_JP FCLASS_KO FCLASS_VN FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA geometry\n", "npartitions=4 \n", "0 string int32 int32 string string int32 int32 string string string string int32 string string int32 string string int32 string string string string string string string string string string string string string string int32 int32 int32 int32 float64 int32 int32 int32 int32 string string string string string string string string string string string string int32 int32 string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string int32 int32 string string string string int32 int32 int32 int32 int32 float64 float64 float64 float64 float64 int64 string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string string geometry\n", "45 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "89 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "133 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "176 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...\n", "Dask Name: frompandas, 1 expression\n", "Expr=df" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try computation on a non-geometry column:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "CONTINENT\n", "Africa 51\n", "Asia 47\n", "Seven seas (open ocean) 1\n", "South America 13\n", "Oceania 7\n", "Antarctica 1\n", "Europe 39\n", "North America 18\n", "Name: count, dtype: int64[pyarrow]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/envs/vector_tutorial/lib/python3.12/site-packages/dask/dataframe/core.py:7175: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n", "\n", " df = func(*args, **kwargs)\n" ] } ], "source": [ "ddf[\"CONTINENT\"].value_counts().compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And calling one of the `geopandas`-specific methods or attributes:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/envs/vector_tutorial/lib/python3.12/site-packages/dask_geopandas/expr.py:185: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n", "\n", " meta = getattr(self._meta, attr)\n" ] }, { "data": { "text/plain": [ "Dask Series Structure:\n", "npartitions=4\n", "0 float64\n", "45 ...\n", "89 ...\n", "133 ...\n", "176 ...\n", "Dask Name: area, 3 expressions\n", "Expr=MapPartitions(getattr)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddf.geometry.area" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, without calling `compute()`, the resulting Series does not yet contain any values. Also note the warning about area calculation. Since the crs of our dataset is `EPSG:4326` the area calculation will result in area in degrees of latitude and longitude. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 1.639511\n", "1 76.301964\n", "2 8.603984\n", "3 1712.995228\n", "4 1122.281921\n", " ... \n", "172 8.604719\n", "173 1.479321\n", "174 1.231641\n", "175 0.639000\n", "176 51.196106\n", "Length: 177, dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ddf.geometry.area.compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Timing comparison: Point-in-polygon with 10 million points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `GeoDataFrame` used above is a bit small to see any benefit from parallelization using dask (as the overhead of the task scheduler is larger than the actual operation on such a small dataframe), so let's create a bigger point `GeoSeries`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [], "source": [ "N = 10_000_000" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [], "source": [ "points_df = gpd.GeoDataFrame(geometry=gpd.points_from_xy(np.random.randn(N),np.random.randn(N)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And creating the `dask-geopandas` version of this series:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [] }, "outputs": [], "source": [ "points_ddf = dg.from_geopandas(points_df, npartitions=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a polygon and check if the points are located within this polygon:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [] }, "outputs": [], "source": [ "import shapely.geometry\n", "box = shapely.geometry.box(0, 0, 1, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `within` operation will result in a boolean Series:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "Dask Series Structure:\n", "npartitions=16\n", "0 bool\n", "625000 ...\n", " ... \n", "9375000 ...\n", "9999999 ...\n", "Dask Name: within, 2 expressions\n", "Expr=UFunc(within)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points_ddf.within(box)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The relative number of the points within the polygon:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0.1164894" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(points_ddf.within(box).sum() / len(points_ddf)).compute()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's compare the time it takes to compute this:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "235 ms ± 2.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%timeit points_df.within(box)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "46.7 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "%timeit points_ddf.within(box).compute()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }