Creating a Choropleth Map of the World in Python using GeoPandas
There are different ways of creating choropleth maps in Python. In a previous notebook, I showed how you can use the Basemap library to accomplish this. More than 2 years have passed since publication and the available tools have evolved a lot. In this notebook I use the GeoPandas library to create a choropleth map. As you'll see the code is more concise and easier to follow along.
To allow for better comparison of the 2 approaches I'll again use a World Bank dataset, but to make it a bit more interesting the map will show the number of Individuals using the Internet (% of population) based on the International Telecommunication Union, World Telecommunication/ICT Development Report and database.
Setup
Load the necessary modules and specify the files for input and output, set the number of colors to use, the size of the figure in inches (width, height) and meta information about what is displayed.
%load_ext signature %matplotlib inline import os import matplotlib.pyplot as plt import pandas as pd import geopandas as gpd from helpers import slug datafile = os.path.expanduser('~/data/worldbank/API_IT.NET.USER.ZS_DS2_en_csv_v2.csv') shapefile = os.path.expanduser('~/data/geo/naturalearthdata.com/ne_10m_admin_0_countries_lakes/ne_10m_admin_0_countries_lakes.shp') colors = 9 cmap = 'Blues' figsize = (16, 10) year = '2016' cols = ['Country Name', 'Country Code', year] title = 'Individuals using the Internet (% of population) in {}'.format(year) imgfile = 'img/{}.png'.format(slug(title)) description = ''' Individuals who have used the Internet from any location in the last 3 months via any device based on the International Telecommunication Union, World Telecommunication/ICT Development Report and database. Data: World Bank - worldbank.org • Author: Ramiro Gómez - ramiro.org'''.strip()
Create a GeoDataFrame from the Admin 0 - Countries shapefile available from Natural Earth Data and show a sample of 5 records. We only read the ADM0_A3
and geometry
columns, which contain the 3-letter country codes defined in ISO 3166-1 alpha-3 and the country shapes as polygons respectively.
gdf = gpd.read_file(shapefile)[['ADM0_A3', 'geometry']].to_crs('+proj=robin') gdf.sample(5)
ADM0_A3 | geometry | |
---|---|---|
52 | CPV | (POLYGON ((-2309170.347710123 1587430.58249633... |
100 | HTI | (POLYGON ((-6860253.524149654 1936562.75358100... |
253 | ZMB | POLYGON ((2934183.839812623 -918610.8758938238... |
241 | VAT | POLYGON ((1073492.780621549 4478198.522616305,... |
219 | SYR | POLYGON ((3738892.582840674 3968612.091458879,... |
Next read the datafile downloaded from the World Bank Open Data site and create a pandas DataFrame that contains values for Country Code
, Country Name
and the percentages of Internet users in the year 2016
.
df = pd.read_csv(datafile, skiprows=4, usecols=cols) df.sample(5)
Country Name | Country Code | 2016 | |
---|---|---|---|
144 | Macao SAR, China | MAC | 81.642985 |
157 | Malta | MLT | 77.289395 |
204 | Sudan | SDN | 28.000000 |
165 | Mauritius | MUS | 53.226178 |
11 | Australia | AUS | 88.238658 |
Next we merge the data frames on the columns containing the 3-letter country codes and show summary statistics as returned from the describe method.
merged = gdf.merge(df, left_on='ADM0_A3', right_on='Country Code') merged.describe()
2016 | |
---|---|
count | 201.000000 |
mean | 51.417892 |
std | 28.569949 |
min | 1.177119 |
25% | 25.366301 |
50% | 54.000000 |
75% | 76.409085 |
max | 98.240016 |
Mapping the data
The merge operation above returned a GeoDataFrame. From this data structure it is very easy to create a choropleth map by invoking the plot
method. We need to specify the column
to plot and since we don't want a continuous color scale we set scheme
to equal_interval
and the number of classes k
to 9
. We also set the size of the figure and show a legend in the plot.
ax = merged.dropna().plot(column=year, cmap=cmap, figsize=figsize, scheme='equal_interval', k=colors, legend=True)
This is pretty nice already, but before publishing this map, there remains some work to be done. As is often the case, some data is missing. You may or may not have noticed it, but the corresponding countries are not shown at all, look for North Korea. The call to dropna()
right before the plot()
call removed these records from the plotted GeoDataFrame.
We could just leave it like that, because we simply don't know the values, but I'm sure that would put off some people. So let's draw these countries and fill them with a light gray and a striped pattern as in this D3.js based map.
Moreover, the image taken by itself provides no clue about what is shown, so we'll add a title and an annotation. Also we to turn off the axes, cut off some space in the far west and east, and move the legend to the lower left of the figure, because there is more empty space.
merged[merged.isna().any(axis=1)].plot(ax=ax, color='#fafafa', hatch='///') ax.set_title(title, fontdict={'fontsize': 20}, loc='left') ax.annotate(description, xy=(0.1, 0.1), size=12, xycoords='figure fraction') ax.set_axis_off() ax.set_xlim([-1.5e7, 1.7e7]) ax.get_legend().set_bbox_to_anchor((.12, .4)) ax.get_figure()
I think this map is fine for publication and the code is pretty easy to follow, but there is some room for improvement as far as I'm concerned. For example reducing the left and right margins, turning off axis display and specifying a color and/or pattern for missing data via parameters to the plot
method would be nice to have. In any case, I think the GeoPandas project is headed in a good direction and hope it will continue to evolve as a library for analyzing and mapping geographic data in Python.
%signature
Author: Ramiro Gómez • Last edited: March 23, 2018Linux 4.13.0-37-generic - CPython 3.6.4 - IPython 6.2.1 - matplotlib 2.2.2 - numpy 1.14.2 - pandas 0.22.0
Shirts for Python Programmers
Published:March 20, 2018 by Ramiro Gómez. If you want to be notified about new content, click here to subscribe to the newsletter or RSS feed.
Disclosure: External links on this page may contain affiliate IDs, which means that I earn a commission if you make a purchase via such a link. This allows me to operate this site and offer hopefully valuable content that is freely accessible. More information about affiliate programs.
© Ramiro Gómez. Berlin, Germany.
Be informed about new content
Share this page
Hit ESC or click X on top right to close this dialog.