The goal is to be able to show something similar to Our World In Data's interactive map. This work was inspired by Shivangi Patel's guide

We will work with the same dataset: Prevalence of obesity (BMI ≥ 30) among adults, estimated by country, standardised by age

Data was obtained from the Global Health Observatory data repository (World Health Organization): under Download complete data set as click on more..., then under CSV download list containing text, codes and values, or click here

Data¶

First, let's cleanup and organise the data. We will use the 'pandas' library for this.

import pandas as pd

# read file
data = pd.read_csv('data-verbose.csv')
data.columns

Index(['GHO (CODE)', 'GHO (DISPLAY)', 'GHO (URL)', 'PUBLISHSTATE (CODE)',
       'PUBLISHSTATE (DISPLAY)', 'PUBLISHSTATE (URL)', 'YEAR (CODE)',
       'YEAR (DISPLAY)', 'YEAR (URL)', 'REGION (CODE)', 'REGION (DISPLAY)',
       'REGION (URL)', 'COUNTRY (CODE)', 'COUNTRY (DISPLAY)', 'COUNTRY (URL)',
       'AGEGROUP (CODE)', 'AGEGROUP (DISPLAY)', 'AGEGROUP (URL)', 'SEX (CODE)',
       'SEX (DISPLAY)', 'SEX (URL)', 'Display Value', 'Numeric', 'Low', 'High',
       'StdErr', 'StdDev', 'Comments'],
      dtype='object')

# discard male only and female only data
data = data.loc[data["SEX (DISPLAY)"] == 'Both sexes']
# only keep columns of interest
data = data[['YEAR (CODE)','COUNTRY (CODE)','COUNTRY (DISPLAY)','Numeric']]
data.reset_index(inplace=True, drop=True)
data.rename(columns={
    'YEAR (CODE)': 'Year',
    'COUNTRY (CODE)': 'Code',
    'COUNTRY (DISPLAY)': 'Country',
    'Numeric': 'Prevalence'
}, inplace=True)
data.head()

Since we'll be coloring each country according to the corresponding obesity prevalence, we need access to the shape of each country. This is done using the 'geopandas' package and data from natural-earth-vector. Download all the files named "ne_110m_admin_0_countries.*"

import geopandas as gpd

# read shapes
geo = gpd.read_file("ne_110m_admin_0_countries.shp")[['ADMIN', 'ADM0_A3', 'geometry']]
geo.columns = ['Country', 'Code', 'geometry']
geo.head()

If we display the map now, we will see that Antarctica takes a lot of space. Since we don't have data on it, let's drop it.

geo = geo.loc[~(geo['Country'] == 'Antarctica')]

If we look closely at the data, we are missing information on some countries.

data[data["Prevalence"].isna()]["Country"].unique()

array(['San Marino', 'Sudan', 'Monaco', 'South Sudan'], dtype=object)

In the case of Sudan, it's more of a labelling problem because Sudan was split in 2 separate countries in 2011.

data[data["Country"].str.contains("Sudan")]["Country"].unique()

array(['Sudan (former)', 'Sudan', 'South Sudan'], dtype=object)

geo[geo["Country"].str.contains("Sudan")]["Country"].unique()

array(['Sudan', 'South Sudan'], dtype=object)

In the current version of the dataset, only "Sudan (former)" contains data, but on our version of the map we only have the 2 independant states, not the former. We will simply copy the data from "Sudan (former)" in both new countries and drop the former.

for year in data["Year"].unique():
    data.loc[(data["Country"].isin(["Sudan","South Sudan"])) & (data["Year"] == year),
         "Prevalence"] = data[(data["Country"] == "Sudan (former)") & (data["Year"] == year)]["Prevalence"].values[0]
data = data.loc[~(data['Country'] == 'Sudan (former)')]
data[(data["Country"].str.contains("Sudan")) & (data["Year"] == 2016)]

# Also, the 3-letter code for "South Sudan" is "SSD" and not "SDS" in the geographic data
geo.loc[geo["Code"]=="SDS", "Code"] = "SSD"
geo[geo["Code"]=="SSD"]

Preparing the plot¶

Now let's create the interactive plot. We will use the 'bokeh' and 'matplotlib' libraries for this.

from bokeh.io import save, show, output_file, output_notebook, reset_output, export_png
from bokeh.plotting import figure
from bokeh.models import (
    GeoJSONDataSource, ColumnDataSource, ColorBar, Slider, Spacer,
    HoverTool, TapTool, Panel, Tabs, Legend, Toggle, LegendItem,
)
from bokeh.palettes import brewer
from bokeh.models.callbacks import CustomJS
from bokeh.models.widgets import Div
from bokeh.layouts import widgetbox, row, column
from matplotlib import pyplot as plt
from matplotlib.colors import rgb2hex

The first thing we need to do is to group our data in predefined bins. We will assign each bin to a color.

# Create bins to color each country
bins = [0,2,5,10,15,20,25,30,100]
# create stylish labels
bin_labels = [f'≤{bins[1]}%'] + [f'{bins[i]}-{bins[i+1]}%' for i in range(1,len(bins)-2)] + [f'>{bins[-2]}%']
# assign each row to a bin
data['bin'] = pd.cut(
    data['Prevalence'], bins=bins, right=True, include_lowest=True, precision=0, labels=bin_labels,
).astype(str)

# Merge the geographic data with obesity data
df = geo.merge(data, on='Code', how='left')
df = df.drop(columns="Country_y").rename(columns={"Country_x":"Country"})
df[df["Prevalence"].isna()]["Country"].unique()

array(['Western Sahara', 'Falkland Islands', 'Greenland',
       'French Southern and Antarctic Lands', 'Puerto Rico', 'Palestine',
       'New Caledonia', 'Taiwan', 'Northern Cyprus', 'Somaliland',
       'Kosovo'], dtype=object)

# Add a 'No data' bin for countries without data on their obesity
df.loc[df['Prevalence'].isna(), 'bin'] = 'No data'
df.fillna('No data', inplace = True)

# Define a yellow to red color palette
palette = brewer['YlOrRd'][len(bins)-1]
# Reverse color order so that dark red corresponds to highest obesity
palette = palette[::-1]

# Assign obesity prevalence to a color
def val_to_color(value, nan_color='#d9d9d9'):
    if isinstance(value, str): return nan_color
    for i in range(1,len(bins)):
        if value <= bins[i]:
            return palette[i-1]
df['color'] = df['Prevalence'].apply(val_to_color)

Since Bokeh doesn't have an interactive colorbar, we will create one by plotting rectangles on a figure. This is a bit cumbersome because we need to define x coordinates and a width for each bin in our data, but I find the interactive colorbar to be very useful.

# assign x coordinates
def bin_to_cbar_x(value):
    if value == 'No data': return -2
    for i,b in enumerate(bin_labels):
        if value == b:
            return 5*(i+1)
df['cbar_x'] = df['bin'].apply(bin_to_cbar_x)
# assign width
df['cbar_w'] = df['Prevalence'].apply(lambda x: 5 if x == 'No data' else 4.7)

We will also add a second figure which displays the evolution of each country's obesity rate. We need to define another colorpalette for this.

# create color palette for the graph
countries = sorted(df[df["bin"] != "No data"]["Country"].unique())
n_country = len(countries)
print("%d countries to plot" % n_country)
cmap = plt.get_cmap('gist_ncar', n_country)
country_palette = [rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]

165 countries to plot

Plotting¶

Now all that is left to do is to create the different objects that bokeh will display. Let's start with the datasources. We will define which year to display on the map first, as well as which country.

# define the output file
reset_output()
output_file("obesity-trends.html", title="Obesity trends", mode="inline")

# Input sources
df.sort_values(by=["Country","Year"], inplace=True)
# source that will contain all necessary data for the map
geosource = GeoJSONDataSource(geojson=df.to_json())
# source that contains the data that is actually shown on the map (for a given year)
displayed_src = GeoJSONDataSource(geojson=df[df['Year'].isin(['No data', 1975])].to_json())
# source that will be used for the graph (we don't need the countries shapes for this)
country_source = ColumnDataSource(df[df['Country'] == "France"].drop(columns=["geometry"]))

The tools displayed with our map and graph.

# Tools

# slider to select the year
slider = Slider(title='Year',start=1975, end=2016, step=1, value=1975)

# hover tool for the map
map_hover = HoverTool(tooltips=[
    ('Country','@Country (@Code)'),
    ('Obesity rate (%)', '@Prevalence')
])

# hover tool for the graph
graph_hover = HoverTool(tooltips=[
    ('Country','@Country (@Code)'),
    ('Obesity rate (%)', '@Prevalence'),
    ('Year', '@Year')
])

# button for the animation
anim_button = Toggle(label="▶ Play", button_type="success", width=50, active=False)

Now let's create the plot !

# create map figure
p = figure(
    title = 'Share of adults who are obese in 1975',
    plot_height=550 , plot_width=1100,
    toolbar_location="right", tools="tap,pan,wheel_zoom,box_zoom,save,reset", toolbar_sticky=False,
    active_scroll="wheel_zoom",
)
p.title.text_font_size = '16pt'
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False

# Add hover tool
p.add_tools(map_hover)

# Add patches (countries) to the figure
patches = p.patches(
    'xs','ys', source=displayed_src,
    fill_color='color',
    line_color='black', line_width=0.25, fill_alpha=1,
    hover_fill_color='color',
)
# outline when we hover over a country
patches.hover_glyph.line_color = '#3bdd9d'
patches.hover_glyph.line_width = 3
patches.nonselection_glyph = None

# create the interactive colorbar
p_bar = figure(
    title=None, plot_height=80 , plot_width=600,
    tools="tap", toolbar_location=None
)
p_bar.xgrid.grid_line_color = None
p_bar.ygrid.grid_line_color = None
p_bar.outline_line_color = None
p_bar.yaxis.visible = False

# set the title and ticks of the colorbar
p_bar.xaxis.axis_label = "% Obesity (BMI ≥ 30)"
p_bar.xaxis.ticker = sorted(df['cbar_x'].unique())
p_bar.xaxis.major_label_overrides = dict([(i[0],i[1]) for i in df.groupby(['cbar_x','bin']).describe().index])
p_bar.xaxis.axis_label_text_font_size = "12pt"
p_bar.xaxis.major_label_text_font_size = "10pt"

# activate the hover but hide tooltips
hover_bar = HoverTool(tooltips=None)
p_bar.add_tools(hover_bar)

# plot the rectangles for the colorbar
cbar = p_bar.rect(x='cbar_x', y=0, width='cbar_w', height=1,
    color='color', source=displayed_src,
    hover_line_color='#3bdd9d', hover_fill_color='color')

# outline when we hover over the colorbar legend
cbar.hover_glyph.line_width = 4
cbar.nonselection_glyph = None

# create the graph figure
p_country = figure(
    title="Evolution of obesity", plot_height=700 , plot_width=1100,
    tools="pan,wheel_zoom,save", active_scroll="wheel_zoom", toolbar_location="right",
)
p_country.title.text_font_size = '14pt'
p_country.xaxis.axis_label = "Year"
p_country.yaxis.axis_label = "Obesity rate (%)"
p_country.axis.major_label_text_font_size = "12pt"
p_country.axis.axis_label_text_font_size = "14pt"

# plot data on the figure
line_plots = {}
legend_items = {}
for i, country in enumerate(countries):
    # get subset of data corresponding to a country
    country_source = ColumnDataSource(df[df['Country'] == country].drop(columns=["geometry"]))
    # plot
    line = p_country.line("Year", "Prevalence", legend=False, source=country_source,
                      color=country_palette[i], line_width=2)
    circle = p_country.circle("Year", "Prevalence", legend=False, source=country_source,
                          line_color="darkgrey", fill_color=country_palette[i], size=8)
    # used later in the interactive callbacks
    line_plots[country] = [line, circle]
    legend_items[country] = LegendItem(label=country, renderers=[line, circle])
    # only display France at first
    if country != "France":
        line.visible = False
        circle.visible = False

default_legend = [
    ("France", line_plots["France"]),
]
legend = Legend(items=default_legend, location="top_center")
legend.click_policy = "hide"
p_country.add_layout(legend, 'right')

# Add hover tool
p_country.add_tools(graph_hover)

The interactivity will be done with JavaScript callbacks since they give much more liberty and we won't need to run a Bokeh server to display the map.

# JS callbacks

# Update the map on slider change
slider_callback = CustomJS(args=dict(slider=slider, source=geosource, displayed_src=displayed_src), code="""
    var year = slider.value;
    var show = [year, 'No data'];
    var data = {};
    columns = Object.keys(source.data);
    columns.forEach(function(key) {
        data[key] = [];
    });
    for (var i = 0; i < source.get_length(); i++){
        if (show.includes(source.data['Year'][i])){
            columns.forEach(function(key) {
                data[key].push(source.data[key][i])
            });
        }
    }
    displayed_src.data = data;
    displayed_src.change.emit();
""")
slider.js_on_change('value', slider_callback)

# Update figure title from slider change
callback_title = CustomJS(args=dict(slider=slider, figure=p), code="""
    var year = slider.value;
    figure.title.text = 'Share of adults who are obese in ' + year;
""")
slider.js_on_change('value', callback_title)


# Add callback on country click
plot_callback = CustomJS(args=dict(
    csource=country_source, source=geosource, displayed_src=displayed_src, line_plots=line_plots, legend=legend, legend_items=legend_items), code="""
    // only continue if a country was selected
    var ixs = displayed_src.selected.indices;
    if (ixs.length == 0) { return; }
    
    // init
    var data = {};
    var items = [];
    countries = [];
    columns = Object.keys(source.data);
    columns.forEach(function(key) {
        data[key] = [];
    });
    
    // hide all plots
    for (var country in line_plots) {
        var line = line_plots[country][0];
        var circle = line_plots[country][1];
        line.visible = false;
        circle.visible = false;
    }
    
    // loop over the selected countries
    ixs.forEach(function(ix) {
        // identify corresponding country
        country = displayed_src.data["Country"][ix];
        countries.push(country);
    });
    // sort them in order
    countries.sort()
    // display the corresponding glyphs and legend
    countries.forEach(function(country) {
        line = line_plots[country][0];
        circle = line_plots[country][1];
        line.visible = true;
        circle.visible = true;
        items.push(legend_items[country]);
        
        for (var i = 0; i < source.get_length(); i++){
            if (source.data['Country'][i] == country) {
                columns.forEach(function(key) {
                    data[key].push(source.data[key][i])
                });
            }
        }
    });
    legend.items = items;
    csource.data = data;
    csource.change.emit();
""")
displayed_src.selected.js_on_change('indices', plot_callback)

# add animation
update_interval = 500 # in ms
anim_callback = CustomJS(args=dict(slider=slider, update_interval=update_interval), code="""
    var button = cb_obj;
    if (button.active == true){
        button.label = "◼ Stop";
        button.button_type = "danger";
        mytimer = setInterval(update_year, update_interval);           
    } else {
        button.label = "▶ Play";
        button.button_type = "success";
        clearInterval(mytimer);
    }

    function update_year() {
        year = slider.value;
        if (year < 2016) {
            slider.value += 1;
        } else {
            slider.value = 1975;
        }
    }
""")
anim_button.callback = anim_callback

Finally, we define the layout for all these elements. We will have 2 tabs, one for the map, and one for the chart.

# arrange display with tabs
tab_map = Panel(title="Map",
    child=column(
        p, # map
        p_bar, # colorbar
        row(widgetbox(anim_button), Spacer(width=10), widgetbox(slider)) # animation button and slider
    ))
tab_chart = Panel(title="Chart", child=column(p_country))
tabs = Tabs(tabs=[ tab_map, tab_chart ])

# save the document and display it !
footer = Div(text="""
Data: World Health Organization - Global Health Observatory</br >
Author: <a href="https://cbouy.github.io">Cédric Bouysset</a>
""")
layout = column(tabs, footer)
show(layout)

	Year	Code	Country	Prevalence
0	1978	UZB	Uzbekistan	5.0
1	2003	BDI	Burundi	2.8
2	1999	CHN	China	2.2
3	1996	GHA	Ghana	4.4
4	1992	HND	Honduras	9.3

	Country	Code	geometry
0	Fiji	FJI	(POLYGON ((180 -16.06713266364245, 180 -16.555...
1	United Republic of Tanzania	TZA	POLYGON ((33.90371119710453 -0.950000000000000...
2	Western Sahara	SAH	POLYGON ((-8.665589565454809 27.65642588959236...
3	Canada	CAN	(POLYGON ((-122.84 49.00000000000011, -122.974...
4	United States of America	USA	(POLYGON ((-122.84 49.00000000000011, -120 49....

	Year	Code	Country	Prevalence
1632	2016	SSD	South Sudan	8.6
4848	2016	SDN	Sudan	8.6