The goal is to be able to show something similar to Our World In Data's interactive map. This work was inspired by Shivangi Patel's guide

We will work with the same dataset: Prevalence of obesity (BMI ≥ 30) among adults, estimated by country, standardised by age

Data was obtained from the Global Health Observatory data repository (World Health Organization): under Download complete data set as click on more..., then under CSV download list containing text, codes and values, or click here

Data

First, let's cleanup and organise the data. We will use the 'pandas' library for this.

In [1]:
import pandas as pd
In [2]:
# read file
data = pd.read_csv('data-verbose.csv')
data.columns
Out[2]:
Index(['GHO (CODE)', 'GHO (DISPLAY)', 'GHO (URL)', 'PUBLISHSTATE (CODE)',
       'PUBLISHSTATE (DISPLAY)', 'PUBLISHSTATE (URL)', 'YEAR (CODE)',
       'YEAR (DISPLAY)', 'YEAR (URL)', 'REGION (CODE)', 'REGION (DISPLAY)',
       'REGION (URL)', 'COUNTRY (CODE)', 'COUNTRY (DISPLAY)', 'COUNTRY (URL)',
       'AGEGROUP (CODE)', 'AGEGROUP (DISPLAY)', 'AGEGROUP (URL)', 'SEX (CODE)',
       'SEX (DISPLAY)', 'SEX (URL)', 'Display Value', 'Numeric', 'Low', 'High',
       'StdErr', 'StdDev', 'Comments'],
      dtype='object')
In [3]:
# discard male only and female only data
data = data.loc[data["SEX (DISPLAY)"] == 'Both sexes']
# only keep columns of interest
data = data[['YEAR (CODE)','COUNTRY (CODE)','COUNTRY (DISPLAY)','Numeric']]
data.reset_index(inplace=True, drop=True)
data.rename(columns={
    'YEAR (CODE)': 'Year',
    'COUNTRY (CODE)': 'Code',
    'COUNTRY (DISPLAY)': 'Country',
    'Numeric': 'Prevalence'
}, inplace=True)
data.head()
Out[3]:
Year Code Country Prevalence
0 1978 UZB Uzbekistan 5.0
1 2003 BDI Burundi 2.8
2 1999 CHN China 2.2
3 1996 GHA Ghana 4.4
4 1992 HND Honduras 9.3

Since we'll be coloring each country according to the corresponding obesity prevalence, we need access to the shape of each country. This is done using the 'geopandas' package and data from natural-earth-vector. Download all the files named "ne_110m_admin_0_countries.*"

In [4]:
import geopandas as gpd
In [5]:
# read shapes
geo = gpd.read_file("ne_110m_admin_0_countries.shp")[['ADMIN', 'ADM0_A3', 'geometry']]
geo.columns = ['Country', 'Code', 'geometry']
geo.head()
Out[5]:
Country Code geometry
0 Fiji FJI (POLYGON ((180 -16.06713266364245, 180 -16.555...
1 United Republic of Tanzania TZA POLYGON ((33.90371119710453 -0.950000000000000...
2 Western Sahara SAH POLYGON ((-8.665589565454809 27.65642588959236...
3 Canada CAN (POLYGON ((-122.84 49.00000000000011, -122.974...
4 United States of America USA (POLYGON ((-122.84 49.00000000000011, -120 49....

If we display the map now, we will see that Antarctica takes a lot of space. Since we don't have data on it, let's drop it.

In [6]:
geo = geo.loc[~(geo['Country'] == 'Antarctica')]

If we look closely at the data, we are missing information on some countries.

In [7]:
data[data["Prevalence"].isna()]["Country"].unique()
Out[7]:
array(['San Marino', 'Sudan', 'Monaco', 'South Sudan'], dtype=object)

In the case of Sudan, it's more of a labelling problem because Sudan was split in 2 separate countries in 2011.

In [8]:
data[data["Country"].str.contains("Sudan")]["Country"].unique()
Out[8]:
array(['Sudan (former)', 'Sudan', 'South Sudan'], dtype=object)
In [9]:
geo[geo["Country"].str.contains("Sudan")]["Country"].unique()
Out[9]:
array(['Sudan', 'South Sudan'], dtype=object)

In the current version of the dataset, only "Sudan (former)" contains data, but on our version of the map we only have the 2 independant states, not the former. We will simply copy the data from "Sudan (former)" in both new countries and drop the former.

In [10]:
for year in data["Year"].unique():
    data.loc[(data["Country"].isin(["Sudan","South Sudan"])) & (data["Year"] == year),
         "Prevalence"] = data[(data["Country"] == "Sudan (former)") & (data["Year"] == year)]["Prevalence"].values[0]
data = data.loc[~(data['Country'] == 'Sudan (former)')]
data[(data["Country"].str.contains("Sudan")) & (data["Year"] == 2016)]
Out[10]:
Year Code Country Prevalence
1632 2016 SSD South Sudan 8.6
4848 2016 SDN Sudan 8.6
In [11]:
# Also, the 3-letter code for "South Sudan" is "SSD" and not "SDS" in the geographic data
geo.loc[geo["Code"]=="SDS", "Code"] = "SSD"
geo[geo["Code"]=="SSD"]
Out[11]:
Country Code geometry
176 South Sudan SSD POLYGON ((30.83385242171543 3.509171604222463,...

Preparing the plot

Now let's create the interactive plot. We will use the 'bokeh' and 'matplotlib' libraries for this.

In [12]:
from bokeh.io import save, show, output_file, output_notebook, reset_output, export_png
from bokeh.plotting import figure
from bokeh.models import (
    GeoJSONDataSource, ColumnDataSource, ColorBar, Slider, Spacer,
    HoverTool, TapTool, Panel, Tabs, Legend, Toggle, LegendItem,
)
from bokeh.palettes import brewer
from bokeh.models.callbacks import CustomJS
from bokeh.models.widgets import Div
from bokeh.layouts import widgetbox, row, column
from matplotlib import pyplot as plt
from matplotlib.colors import rgb2hex

The first thing we need to do is to group our data in predefined bins. We will assign each bin to a color.

In [13]:
# Create bins to color each country
bins = [0,2,5,10,15,20,25,30,100]
# create stylish labels
bin_labels = [f'≤{bins[1]}%'] + [f'{bins[i]}-{bins[i+1]}%' for i in range(1,len(bins)-2)] + [f'>{bins[-2]}%']
# assign each row to a bin
data['bin'] = pd.cut(
    data['Prevalence'], bins=bins, right=True, include_lowest=True, precision=0, labels=bin_labels,
).astype(str)
In [14]:
# Merge the geographic data with obesity data
df = geo.merge(data, on='Code', how='left')
df = df.drop(columns="Country_y").rename(columns={"Country_x":"Country"})
df[df["Prevalence"].isna()]["Country"].unique()
Out[14]:
array(['Western Sahara', 'Falkland Islands', 'Greenland',
       'French Southern and Antarctic Lands', 'Puerto Rico', 'Palestine',
       'New Caledonia', 'Taiwan', 'Northern Cyprus', 'Somaliland',
       'Kosovo'], dtype=object)
In [15]:
# Add a 'No data' bin for countries without data on their obesity
df.loc[df['Prevalence'].isna(), 'bin'] = 'No data'
df.fillna('No data', inplace = True)
In [16]:
# Define a yellow to red color palette
palette = brewer['YlOrRd'][len(bins)-1]
# Reverse color order so that dark red corresponds to highest obesity
palette = palette[::-1]

# Assign obesity prevalence to a color
def val_to_color(value, nan_color='#d9d9d9'):
    if isinstance(value, str): return nan_color
    for i in range(1,len(bins)):
        if value <= bins[i]:
            return palette[i-1]
df['color'] = df['Prevalence'].apply(val_to_color)

Since Bokeh doesn't have an interactive colorbar, we will create one by plotting rectangles on a figure. This is a bit cumbersome because we need to define x coordinates and a width for each bin in our data, but I find the interactive colorbar to be very useful.

In [17]:
# assign x coordinates
def bin_to_cbar_x(value):
    if value == 'No data': return -2
    for i,b in enumerate(bin_labels):
        if value == b:
            return 5*(i+1)
df['cbar_x'] = df['bin'].apply(bin_to_cbar_x)
# assign width
df['cbar_w'] = df['Prevalence'].apply(lambda x: 5 if x == 'No data' else 4.7)

We will also add a second figure which displays the evolution of each country's obesity rate. We need to define another colorpalette for this.

In [18]:
# create color palette for the graph
countries = sorted(df[df["bin"] != "No data"]["Country"].unique())
n_country = len(countries)
print("%d countries to plot" % n_country)
cmap = plt.get_cmap('gist_ncar', n_country)
country_palette = [rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]
165 countries to plot

Plotting

Now all that is left to do is to create the different objects that bokeh will display. Let's start with the datasources. We will define which year to display on the map first, as well as which country.

In [19]:
# define the output file
reset_output()
output_file("obesity-trends.html", title="Obesity trends", mode="inline")
In [20]:
# Input sources
df.sort_values(by=["Country","Year"], inplace=True)
# source that will contain all necessary data for the map
geosource = GeoJSONDataSource(geojson=df.to_json())
# source that contains the data that is actually shown on the map (for a given year)
displayed_src = GeoJSONDataSource(geojson=df[df['Year'].isin(['No data', 1975])].to_json())
# source that will be used for the graph (we don't need the countries shapes for this)
country_source = ColumnDataSource(df[df['Country'] == "France"].drop(columns=["geometry"]))

The tools displayed with our map and graph.

In [21]:
# Tools

# slider to select the year
slider = Slider(title='Year',start=1975, end=2016, step=1, value=1975)

# hover tool for the map
map_hover = HoverTool(tooltips=[
    ('Country','@Country (@Code)'),
    ('Obesity rate (%)', '@Prevalence')
])

# hover tool for the graph
graph_hover = HoverTool(tooltips=[
    ('Country','@Country (@Code)'),
    ('Obesity rate (%)', '@Prevalence'),
    ('Year', '@Year')
])

# button for the animation
anim_button = Toggle(label="▶ Play", button_type="success", width=50, active=False)

Now let's create the plot !

In [22]:
# create map figure
p = figure(
    title = 'Share of adults who are obese in 1975',
    plot_height=550 , plot_width=1100,
    toolbar_location="right", tools="tap,pan,wheel_zoom,box_zoom,save,reset", toolbar_sticky=False,
    active_scroll="wheel_zoom",
)
p.title.text_font_size = '16pt'
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.axis.visible = False

# Add hover tool
p.add_tools(map_hover)

# Add patches (countries) to the figure
patches = p.patches(
    'xs','ys', source=displayed_src,
    fill_color='color',
    line_color='black', line_width=0.25, fill_alpha=1,
    hover_fill_color='color',
)
# outline when we hover over a country
patches.hover_glyph.line_color = '#3bdd9d'
patches.hover_glyph.line_width = 3
patches.nonselection_glyph = None
In [23]:
# create the interactive colorbar
p_bar = figure(
    title=None, plot_height=80 , plot_width=600,
    tools="tap", toolbar_location=None
)
p_bar.xgrid.grid_line_color = None
p_bar.ygrid.grid_line_color = None
p_bar.outline_line_color = None
p_bar.yaxis.visible = False

# set the title and ticks of the colorbar
p_bar.xaxis.axis_label = "% Obesity (BMI ≥ 30)"
p_bar.xaxis.ticker = sorted(df['cbar_x'].unique())
p_bar.xaxis.major_label_overrides = dict([(i[0],i[1]) for i in df.groupby(['cbar_x','bin']).describe().index])
p_bar.xaxis.axis_label_text_font_size = "12pt"
p_bar.xaxis.major_label_text_font_size = "10pt"

# activate the hover but hide tooltips
hover_bar = HoverTool(tooltips=None)
p_bar.add_tools(hover_bar)

# plot the rectangles for the colorbar
cbar = p_bar.rect(x='cbar_x', y=0, width='cbar_w', height=1,
    color='color', source=displayed_src,
    hover_line_color='#3bdd9d', hover_fill_color='color')

# outline when we hover over the colorbar legend
cbar.hover_glyph.line_width = 4
cbar.nonselection_glyph = None
In [24]:
# create the graph figure
p_country = figure(
    title="Evolution of obesity", plot_height=700 , plot_width=1100,
    tools="pan,wheel_zoom,save", active_scroll="wheel_zoom", toolbar_location="right",
)
p_country.title.text_font_size = '14pt'
p_country.xaxis.axis_label = "Year"
p_country.yaxis.axis_label = "Obesity rate (%)"
p_country.axis.major_label_text_font_size = "12pt"
p_country.axis.axis_label_text_font_size = "14pt"

# plot data on the figure
line_plots = {}
legend_items = {}
for i, country in enumerate(countries):
    # get subset of data corresponding to a country
    country_source = ColumnDataSource(df[df['Country'] == country].drop(columns=["geometry"]))
    # plot
    line = p_country.line("Year", "Prevalence", legend=False, source=country_source,
                      color=country_palette[i], line_width=2)
    circle = p_country.circle("Year", "Prevalence", legend=False, source=country_source,
                          line_color="darkgrey", fill_color=country_palette[i], size=8)
    # used later in the interactive callbacks
    line_plots[country] = [line, circle]
    legend_items[country] = LegendItem(label=country, renderers=[line, circle])
    # only display France at first
    if country != "France":
        line.visible = False
        circle.visible = False

default_legend = [
    ("France", line_plots["France"]),
]
legend = Legend(items=default_legend, location="top_center")
legend.click_policy = "hide"
p_country.add_layout(legend, 'right')

# Add hover tool
p_country.add_tools(graph_hover)

The interactivity will be done with JavaScript callbacks since they give much more liberty and we won't need to run a Bokeh server to display the map.

In [25]:
# JS callbacks

# Update the map on slider change
slider_callback = CustomJS(args=dict(slider=slider, source=geosource, displayed_src=displayed_src), code="""
    var year = slider.value;
    var show = [year, 'No data'];
    var data = {};
    columns = Object.keys(source.data);
    columns.forEach(function(key) {
        data[key] = [];
    });
    for (var i = 0; i < source.get_length(); i++){
        if (show.includes(source.data['Year'][i])){
            columns.forEach(function(key) {
                data[key].push(source.data[key][i])
            });
        }
    }
    displayed_src.data = data;
    displayed_src.change.emit();
""")
slider.js_on_change('value', slider_callback)

# Update figure title from slider change
callback_title = CustomJS(args=dict(slider=slider, figure=p), code="""
    var year = slider.value;
    figure.title.text = 'Share of adults who are obese in ' + year;
""")
slider.js_on_change('value', callback_title)


# Add callback on country click
plot_callback = CustomJS(args=dict(
    csource=country_source, source=geosource, displayed_src=displayed_src, line_plots=line_plots, legend=legend, legend_items=legend_items), code="""
    // only continue if a country was selected
    var ixs = displayed_src.selected.indices;
    if (ixs.length == 0) { return; }
    
    // init
    var data = {};
    var items = [];
    countries = [];
    columns = Object.keys(source.data);
    columns.forEach(function(key) {
        data[key] = [];
    });
    
    // hide all plots
    for (var country in line_plots) {
        var line = line_plots[country][0];
        var circle = line_plots[country][1];
        line.visible = false;
        circle.visible = false;
    }
    
    // loop over the selected countries
    ixs.forEach(function(ix) {
        // identify corresponding country
        country = displayed_src.data["Country"][ix];
        countries.push(country);
    });
    // sort them in order
    countries.sort()
    // display the corresponding glyphs and legend
    countries.forEach(function(country) {
        line = line_plots[country][0];
        circle = line_plots[country][1];
        line.visible = true;
        circle.visible = true;
        items.push(legend_items[country]);
        
        for (var i = 0; i < source.get_length(); i++){
            if (source.data['Country'][i] == country) {
                columns.forEach(function(key) {
                    data[key].push(source.data[key][i])
                });
            }
        }
    });
    legend.items = items;
    csource.data = data;
    csource.change.emit();
""")
displayed_src.selected.js_on_change('indices', plot_callback)

# add animation
update_interval = 500 # in ms
anim_callback = CustomJS(args=dict(slider=slider, update_interval=update_interval), code="""
    var button = cb_obj;
    if (button.active == true){
        button.label = "◼ Stop";
        button.button_type = "danger";
        mytimer = setInterval(update_year, update_interval);           
    } else {
        button.label = "▶ Play";
        button.button_type = "success";
        clearInterval(mytimer);
    }

    function update_year() {
        year = slider.value;
        if (year < 2016) {
            slider.value += 1;
        } else {
            slider.value = 1975;
        }
    }
""")
anim_button.callback = anim_callback

Finally, we define the layout for all these elements. We will have 2 tabs, one for the map, and one for the chart.

In [26]:
# arrange display with tabs
tab_map = Panel(title="Map",
    child=column(
        p, # map
        p_bar, # colorbar
        row(widgetbox(anim_button), Spacer(width=10), widgetbox(slider)) # animation button and slider
    ))
tab_chart = Panel(title="Chart", child=column(p_country))
tabs = Tabs(tabs=[ tab_map, tab_chart ])
In [27]:
# save the document and display it !
footer = Div(text="""
Data: World Health Organization - Global Health Observatory</br >
Author: <a href="https://cbouy.github.io">Cédric Bouysset</a>
""")
layout = column(tabs, footer)
show(layout)