The goal is to be able to show something similar to Our World In Data's interactive map. This work was inspired by Shivangi Patel's guide
We will work with the same dataset: Prevalence of obesity (BMI ≥ 30) among adults, estimated by country, standardised by age
Data was obtained from the Global Health Observatory data repository (World Health Organization): under Download complete data set as
click on more..., then under CSV download list containing text, codes and values
, or click here
First, let's cleanup and organise the data. We will use the 'pandas' library for this.
import pandas as pd
# read file
data = pd.read_csv('data-verbose.csv')
data.columns
# discard male only and female only data
data = data.loc[data["SEX (DISPLAY)"] == 'Both sexes']
# only keep columns of interest
data = data[['YEAR (CODE)','COUNTRY (CODE)','COUNTRY (DISPLAY)','Numeric']]
data.reset_index(inplace=True, drop=True)
data.rename(columns={
'YEAR (CODE)': 'Year',
'COUNTRY (CODE)': 'Code',
'COUNTRY (DISPLAY)': 'Country',
'Numeric': 'Prevalence'
}, inplace=True)
data.head()
Since we'll be coloring each country according to the corresponding obesity prevalence, we need access to the shape of each country. This is done using the 'geopandas' package and data from natural-earth-vector. Download all the files named "ne_110m_admin_0_countries.*"
import geopandas as gpd
# read shapes
geo = gpd.read_file("ne_110m_admin_0_countries.shp")[['ADMIN', 'ADM0_A3', 'geometry']]
geo.columns = ['Country', 'Code', 'geometry']
geo.head()
If we display the map now, we will see that Antarctica takes a lot of space. Since we don't have data on it, let's drop it.
geo = geo.loc[~(geo['Country'] == 'Antarctica')]
If we look closely at the data, we are missing information on some countries.
data[data["Prevalence"].isna()]["Country"].unique()
In the case of Sudan, it's more of a labelling problem because Sudan was split in 2 separate countries in 2011.
data[data["Country"].str.contains("Sudan")]["Country"].unique()
geo[geo["Country"].str.contains("Sudan")]["Country"].unique()
In the current version of the dataset, only "Sudan (former)" contains data, but on our version of the map we only have the 2 independant states, not the former. We will simply copy the data from "Sudan (former)" in both new countries and drop the former.
for year in data["Year"].unique():
data.loc[(data["Country"].isin(["Sudan","South Sudan"])) & (data["Year"] == year),
"Prevalence"] = data[(data["Country"] == "Sudan (former)") & (data["Year"] == year)]["Prevalence"].values[0]
data = data.loc[~(data['Country'] == 'Sudan (former)')]
data[(data["Country"].str.contains("Sudan")) & (data["Year"] == 2016)]
# Also, the 3-letter code for "South Sudan" is "SSD" and not "SDS" in the geographic data
geo.loc[geo["Code"]=="SDS", "Code"] = "SSD"
geo[geo["Code"]=="SSD"]
Now let's create the interactive plot. We will use the 'bokeh' and 'matplotlib' libraries for this.
from bokeh.io import save, show, output_file, output_notebook, reset_output, export_png
from bokeh.plotting import figure
from bokeh.models import (
GeoJSONDataSource, ColumnDataSource, ColorBar, Slider, Spacer,
HoverTool, TapTool, Panel, Tabs, Legend, Toggle, LegendItem,
)
from bokeh.palettes import brewer
from bokeh.models.callbacks import CustomJS
from bokeh.models.widgets import Div
from bokeh.layouts import widgetbox, row, column
from matplotlib import pyplot as plt
from matplotlib.colors import rgb2hex
The first thing we need to do is to group our data in predefined bins. We will assign each bin to a color.
# Create bins to color each country
bins = [0,2,5,10,15,20,25,30,100]
# create stylish labels
bin_labels = [f'≤{bins[1]}%'] + [f'{bins[i]}-{bins[i+1]}%' for i in range(1,len(bins)-2)] + [f'>{bins[-2]}%']
# assign each row to a bin
data['bin'] = pd.cut(
data['Prevalence'], bins=bins, right=True, include_lowest=True, precision=0, labels=bin_labels,
).astype(str)
# Merge the geographic data with obesity data
df = geo.merge(data, on='Code', how='left')
df = df.drop(columns="Country_y").rename(columns={"Country_x":"Country"})
df[df["Prevalence"].isna()]["Country"].unique()
# Add a 'No data' bin for countries without data on their obesity
df.loc[df['Prevalence'].isna(), 'bin'] = 'No data'
df.fillna('No data', inplace = True)
# Define a yellow to red color palette
palette = brewer['YlOrRd'][len(bins)-1]
# Reverse color order so that dark red corresponds to highest obesity
palette = palette[::-1]
# Assign obesity prevalence to a color
def val_to_color(value, nan_color='#d9d9d9'):
if isinstance(value, str): return nan_color
for i in range(1,len(bins)):
if value <= bins[i]:
return palette[i-1]
df['color'] = df['Prevalence'].apply(val_to_color)
Since Bokeh doesn't have an interactive colorbar, we will create one by plotting rectangles on a figure. This is a bit cumbersome because we need to define x coordinates and a width for each bin in our data, but I find the interactive colorbar to be very useful.
# assign x coordinates
def bin_to_cbar_x(value):
if value == 'No data': return -2
for i,b in enumerate(bin_labels):
if value == b:
return 5*(i+1)
df['cbar_x'] = df['bin'].apply(bin_to_cbar_x)
# assign width
df['cbar_w'] = df['Prevalence'].apply(lambda x: 5 if x == 'No data' else 4.7)
We will also add a second figure which displays the evolution of each country's obesity rate. We need to define another colorpalette for this.
# create color palette for the graph
countries = sorted(df[df["bin"] != "No data"]["Country"].unique())
n_country = len(countries)
print("%d countries to plot" % n_country)
cmap = plt.get_cmap('gist_ncar', n_country)
country_palette = [rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]
Now all that is left to do is to create the different objects that bokeh will display. Let's start with the datasources. We will define which year to display on the map first, as well as which country.
# define the output file
reset_output()
output_file("obesity-trends.html", title="Obesity trends", mode="inline")
# Input sources
df.sort_values(by=["Country","Year"], inplace=True)
# source that will contain all necessary data for the map
geosource = GeoJSONDataSource(geojson=df.to_json())
# source that contains the data that is actually shown on the map (for a given year)
displayed_src = GeoJSONDataSource(geojson=df[df['Year'].isin(['No data', 1975])].to_json())
# source that will be used for the graph (we don't need the countries shapes for this)
country_source = ColumnDataSource(df[df['Country'] == "France"].drop(columns=["geometry"]))
The tools displayed with our map and graph.
# Tools
# slider to select the year
slider = Slider(title='Year',start=1975, end=2016, step=1, value=1975)
# hover tool for the map
map_hover = HoverTool(tooltips=[
('Country','@Country (@Code)'),
('Obesity rate (%)', '@Prevalence')
])
# hover tool for the graph
graph_hover = HoverTool(tooltips=[
('Country','@Country (@Code)'),
('Obesity rate (%)', '@Prevalence'),
('Year', '@Year')
])
# button for the animation
anim_button = Toggle(label="▶ Play", button_type="success", width=50, active=False)
Now let's create the plot !
# create map figure
p = figure(
title = 'Share of adults who are obese in 1975',
plot_height=550 , plot_width=1100,
toolbar_location="right", tools="tap,pan,wheel_zoom,box_zoom,save,reset", toolbar_sticky=False,
active_scroll="wheel_zoom",
)
p.title.text_font_size = '16pt'
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False
# Add hover tool
p.add_tools(map_hover)
# Add patches (countries) to the figure
patches = p.patches(
'xs','ys', source=displayed_src,
fill_color='color',
line_color='black', line_width=0.25, fill_alpha=1,
hover_fill_color='color',
)
# outline when we hover over a country
patches.hover_glyph.line_color = '#3bdd9d'
patches.hover_glyph.line_width = 3
patches.nonselection_glyph = None
# create the interactive colorbar
p_bar = figure(
title=None, plot_height=80 , plot_width=600,
tools="tap", toolbar_location=None
)
p_bar.xgrid.grid_line_color = None
p_bar.ygrid.grid_line_color = None
p_bar.outline_line_color = None
p_bar.yaxis.visible = False
# set the title and ticks of the colorbar
p_bar.xaxis.axis_label = "% Obesity (BMI ≥ 30)"
p_bar.xaxis.ticker = sorted(df['cbar_x'].unique())
p_bar.xaxis.major_label_overrides = dict([(i[0],i[1]) for i in df.groupby(['cbar_x','bin']).describe().index])
p_bar.xaxis.axis_label_text_font_size = "12pt"
p_bar.xaxis.major_label_text_font_size = "10pt"
# activate the hover but hide tooltips
hover_bar = HoverTool(tooltips=None)
p_bar.add_tools(hover_bar)
# plot the rectangles for the colorbar
cbar = p_bar.rect(x='cbar_x', y=0, width='cbar_w', height=1,
color='color', source=displayed_src,
hover_line_color='#3bdd9d', hover_fill_color='color')
# outline when we hover over the colorbar legend
cbar.hover_glyph.line_width = 4
cbar.nonselection_glyph = None
# create the graph figure
p_country = figure(
title="Evolution of obesity", plot_height=700 , plot_width=1100,
tools="pan,wheel_zoom,save", active_scroll="wheel_zoom", toolbar_location="right",
)
p_country.title.text_font_size = '14pt'
p_country.xaxis.axis_label = "Year"
p_country.yaxis.axis_label = "Obesity rate (%)"
p_country.axis.major_label_text_font_size = "12pt"
p_country.axis.axis_label_text_font_size = "14pt"
# plot data on the figure
line_plots = {}
legend_items = {}
for i, country in enumerate(countries):
# get subset of data corresponding to a country
country_source = ColumnDataSource(df[df['Country'] == country].drop(columns=["geometry"]))
# plot
line = p_country.line("Year", "Prevalence", legend=False, source=country_source,
color=country_palette[i], line_width=2)
circle = p_country.circle("Year", "Prevalence", legend=False, source=country_source,
line_color="darkgrey", fill_color=country_palette[i], size=8)
# used later in the interactive callbacks
line_plots[country] = [line, circle]
legend_items[country] = LegendItem(label=country, renderers=[line, circle])
# only display France at first
if country != "France":
line.visible = False
circle.visible = False
default_legend = [
("France", line_plots["France"]),
]
legend = Legend(items=default_legend, location="top_center")
legend.click_policy = "hide"
p_country.add_layout(legend, 'right')
# Add hover tool
p_country.add_tools(graph_hover)
The interactivity will be done with JavaScript callbacks since they give much more liberty and we won't need to run a Bokeh server to display the map.
# JS callbacks
# Update the map on slider change
slider_callback = CustomJS(args=dict(slider=slider, source=geosource, displayed_src=displayed_src), code="""
var year = slider.value;
var show = [year, 'No data'];
var data = {};
columns = Object.keys(source.data);
columns.forEach(function(key) {
data[key] = [];
});
for (var i = 0; i < source.get_length(); i++){
if (show.includes(source.data['Year'][i])){
columns.forEach(function(key) {
data[key].push(source.data[key][i])
});
}
}
displayed_src.data = data;
displayed_src.change.emit();
""")
slider.js_on_change('value', slider_callback)
# Update figure title from slider change
callback_title = CustomJS(args=dict(slider=slider, figure=p), code="""
var year = slider.value;
figure.title.text = 'Share of adults who are obese in ' + year;
""")
slider.js_on_change('value', callback_title)
# Add callback on country click
plot_callback = CustomJS(args=dict(
csource=country_source, source=geosource, displayed_src=displayed_src, line_plots=line_plots, legend=legend, legend_items=legend_items), code="""
// only continue if a country was selected
var ixs = displayed_src.selected.indices;
if (ixs.length == 0) { return; }
// init
var data = {};
var items = [];
countries = [];
columns = Object.keys(source.data);
columns.forEach(function(key) {
data[key] = [];
});
// hide all plots
for (var country in line_plots) {
var line = line_plots[country][0];
var circle = line_plots[country][1];
line.visible = false;
circle.visible = false;
}
// loop over the selected countries
ixs.forEach(function(ix) {
// identify corresponding country
country = displayed_src.data["Country"][ix];
countries.push(country);
});
// sort them in order
countries.sort()
// display the corresponding glyphs and legend
countries.forEach(function(country) {
line = line_plots[country][0];
circle = line_plots[country][1];
line.visible = true;
circle.visible = true;
items.push(legend_items[country]);
for (var i = 0; i < source.get_length(); i++){
if (source.data['Country'][i] == country) {
columns.forEach(function(key) {
data[key].push(source.data[key][i])
});
}
}
});
legend.items = items;
csource.data = data;
csource.change.emit();
""")
displayed_src.selected.js_on_change('indices', plot_callback)
# add animation
update_interval = 500 # in ms
anim_callback = CustomJS(args=dict(slider=slider, update_interval=update_interval), code="""
var button = cb_obj;
if (button.active == true){
button.label = "◼ Stop";
button.button_type = "danger";
mytimer = setInterval(update_year, update_interval);
} else {
button.label = "▶ Play";
button.button_type = "success";
clearInterval(mytimer);
}
function update_year() {
year = slider.value;
if (year < 2016) {
slider.value += 1;
} else {
slider.value = 1975;
}
}
""")
anim_button.callback = anim_callback
Finally, we define the layout for all these elements. We will have 2 tabs, one for the map, and one for the chart.
# arrange display with tabs
tab_map = Panel(title="Map",
child=column(
p, # map
p_bar, # colorbar
row(widgetbox(anim_button), Spacer(width=10), widgetbox(slider)) # animation button and slider
))
tab_chart = Panel(title="Chart", child=column(p_country))
tabs = Tabs(tabs=[ tab_map, tab_chart ])
# save the document and display it !
footer = Div(text="""
Data: World Health Organization - Global Health Observatory</br >
Author: <a href="https://cbouy.github.io">Cédric Bouysset</a>
""")
layout = column(tabs, footer)
show(layout)