{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal is to be able to show something similar to [Our World In Data](https://ourworldindata.org/grapher/share-of-adults-defined-as-obese?time=1975..2016&country=SAU)'s interactive map. This work was inspired by [Shivangi Patel's guide](https://towardsdatascience.com/a-complete-guide-to-an-interactive-geographical-map-using-python-f4c5197e23e0)\n",
    "\n",
    "We will work with the same dataset: Prevalence of obesity (BMI ≥ 30) among adults, estimated by country, standardised by age\n",
    "\n",
    "Data was obtained from the [Global Health Observatory data repository](http://apps.who.int/gho/data/node.main.A900A?lang=en) (World Health Organization): under `Download complete data set as` click on **more...**, then under **CSV** download `list containing text, codes and values`, or [click here](https://apps.who.int/gho/athena/data/data-verbose.csv?target=GHO/NCD_BMI_30A&profile=verbose&filter=AGEGROUP:*;COUNTRY:*;SEX:*)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's cleanup and organise the data. We will use the 'pandas' library for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:17.710454Z",
     "start_time": "2019-06-09T16:40:17.247155Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:17.887712Z",
     "start_time": "2019-06-09T16:40:17.712411Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['GHO (CODE)', 'GHO (DISPLAY)', 'GHO (URL)', 'PUBLISHSTATE (CODE)',\n",
       "       'PUBLISHSTATE (DISPLAY)', 'PUBLISHSTATE (URL)', 'YEAR (CODE)',\n",
       "       'YEAR (DISPLAY)', 'YEAR (URL)', 'REGION (CODE)', 'REGION (DISPLAY)',\n",
       "       'REGION (URL)', 'COUNTRY (CODE)', 'COUNTRY (DISPLAY)', 'COUNTRY (URL)',\n",
       "       'AGEGROUP (CODE)', 'AGEGROUP (DISPLAY)', 'AGEGROUP (URL)', 'SEX (CODE)',\n",
       "       'SEX (DISPLAY)', 'SEX (URL)', 'Display Value', 'Numeric', 'Low', 'High',\n",
       "       'StdErr', 'StdDev', 'Comments'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read file\n",
    "data = pd.read_csv('data-verbose.csv')\n",
    "data.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:17.946393Z",
     "start_time": "2019-06-09T16:40:17.890088Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Year</th>\n",
       "      <th>Code</th>\n",
       "      <th>Country</th>\n",
       "      <th>Prevalence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1978</td>\n",
       "      <td>UZB</td>\n",
       "      <td>Uzbekistan</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2003</td>\n",
       "      <td>BDI</td>\n",
       "      <td>Burundi</td>\n",
       "      <td>2.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1999</td>\n",
       "      <td>CHN</td>\n",
       "      <td>China</td>\n",
       "      <td>2.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1996</td>\n",
       "      <td>GHA</td>\n",
       "      <td>Ghana</td>\n",
       "      <td>4.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1992</td>\n",
       "      <td>HND</td>\n",
       "      <td>Honduras</td>\n",
       "      <td>9.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Year Code     Country  Prevalence\n",
       "0  1978  UZB  Uzbekistan         5.0\n",
       "1  2003  BDI     Burundi         2.8\n",
       "2  1999  CHN       China         2.2\n",
       "3  1996  GHA       Ghana         4.4\n",
       "4  1992  HND    Honduras         9.3"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# discard male only and female only data\n",
    "data = data.loc[data[\"SEX (DISPLAY)\"] == 'Both sexes']\n",
    "# only keep columns of interest\n",
    "data = data[['YEAR (CODE)','COUNTRY (CODE)','COUNTRY (DISPLAY)','Numeric']]\n",
    "data.reset_index(inplace=True, drop=True)\n",
    "data.rename(columns={\n",
    "    'YEAR (CODE)': 'Year',\n",
    "    'COUNTRY (CODE)': 'Code',\n",
    "    'COUNTRY (DISPLAY)': 'Country',\n",
    "    'Numeric': 'Prevalence'\n",
    "}, inplace=True)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since we'll be coloring each country according to the corresponding obesity prevalence, we need access to the shape of each country. This is done using the 'geopandas' package and data from [natural-earth-vector](https://github.com/nvkelso/natural-earth-vector/tree/master/110m_cultural). Download all the files named \"ne_110m_admin_0_countries.*\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.320879Z",
     "start_time": "2019-06-09T16:40:17.948984Z"
    }
   },
   "outputs": [],
   "source": [
    "import geopandas as gpd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.447389Z",
     "start_time": "2019-06-09T16:40:18.327076Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Code</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Fiji</td>\n",
       "      <td>FJI</td>\n",
       "      <td>(POLYGON ((180 -16.06713266364245, 180 -16.555...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>United Republic of Tanzania</td>\n",
       "      <td>TZA</td>\n",
       "      <td>POLYGON ((33.90371119710453 -0.950000000000000...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Western Sahara</td>\n",
       "      <td>SAH</td>\n",
       "      <td>POLYGON ((-8.665589565454809 27.65642588959236...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Canada</td>\n",
       "      <td>CAN</td>\n",
       "      <td>(POLYGON ((-122.84 49.00000000000011, -122.974...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>United States of America</td>\n",
       "      <td>USA</td>\n",
       "      <td>(POLYGON ((-122.84 49.00000000000011, -120 49....</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       Country Code  \\\n",
       "0                         Fiji  FJI   \n",
       "1  United Republic of Tanzania  TZA   \n",
       "2               Western Sahara  SAH   \n",
       "3                       Canada  CAN   \n",
       "4     United States of America  USA   \n",
       "\n",
       "                                            geometry  \n",
       "0  (POLYGON ((180 -16.06713266364245, 180 -16.555...  \n",
       "1  POLYGON ((33.90371119710453 -0.950000000000000...  \n",
       "2  POLYGON ((-8.665589565454809 27.65642588959236...  \n",
       "3  (POLYGON ((-122.84 49.00000000000011, -122.974...  \n",
       "4  (POLYGON ((-122.84 49.00000000000011, -120 49....  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# read shapes\n",
    "geo = gpd.read_file(\"ne_110m_admin_0_countries.shp\")[['ADMIN', 'ADM0_A3', 'geometry']]\n",
    "geo.columns = ['Country', 'Code', 'geometry']\n",
    "geo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we display the map now, we will see that Antarctica takes a lot of space. Since we don't have data on it, let's drop it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.453802Z",
     "start_time": "2019-06-09T16:40:18.449110Z"
    }
   },
   "outputs": [],
   "source": [
    "geo = geo.loc[~(geo['Country'] == 'Antarctica')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we look closely at the data, we are missing information on some countries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.465472Z",
     "start_time": "2019-06-09T16:40:18.455600Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['San Marino', 'Sudan', 'Monaco', 'South Sudan'], dtype=object)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data[\"Prevalence\"].isna()][\"Country\"].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the case of Sudan, it's more of a labelling problem because Sudan was split in 2 separate countries in 2011."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.481996Z",
     "start_time": "2019-06-09T16:40:18.467460Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Sudan (former)', 'Sudan', 'South Sudan'], dtype=object)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[data[\"Country\"].str.contains(\"Sudan\")][\"Country\"].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.494472Z",
     "start_time": "2019-06-09T16:40:18.483931Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Sudan', 'South Sudan'], dtype=object)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "geo[geo[\"Country\"].str.contains(\"Sudan\")][\"Country\"].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the current version of the dataset, only \"Sudan (former)\" contains data, but on our version of the map we only have the 2 independant states, not the former. We will simply copy the data from \"Sudan (former)\" in both new countries and drop the former."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.723627Z",
     "start_time": "2019-06-09T16:40:18.496054Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Year</th>\n",
       "      <th>Code</th>\n",
       "      <th>Country</th>\n",
       "      <th>Prevalence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1632</th>\n",
       "      <td>2016</td>\n",
       "      <td>SSD</td>\n",
       "      <td>South Sudan</td>\n",
       "      <td>8.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4848</th>\n",
       "      <td>2016</td>\n",
       "      <td>SDN</td>\n",
       "      <td>Sudan</td>\n",
       "      <td>8.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Year Code      Country  Prevalence\n",
       "1632  2016  SSD  South Sudan         8.6\n",
       "4848  2016  SDN        Sudan         8.6"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "for year in data[\"Year\"].unique():\n",
    "    data.loc[(data[\"Country\"].isin([\"Sudan\",\"South Sudan\"])) & (data[\"Year\"] == year), \n",
    "         \"Prevalence\"] = data[(data[\"Country\"] == \"Sudan (former)\") & (data[\"Year\"] == year)][\"Prevalence\"].values[0]\n",
    "data = data.loc[~(data['Country'] == 'Sudan (former)')]\n",
    "data[(data[\"Country\"].str.contains(\"Sudan\")) & (data[\"Year\"] == 2016)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:18.768492Z",
     "start_time": "2019-06-09T16:40:18.725228Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Code</th>\n",
       "      <th>geometry</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>176</th>\n",
       "      <td>South Sudan</td>\n",
       "      <td>SSD</td>\n",
       "      <td>POLYGON ((30.83385242171543 3.509171604222463,...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Country Code                                           geometry\n",
       "176  South Sudan  SSD  POLYGON ((30.83385242171543 3.509171604222463,..."
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Also, the 3-letter code for \"South Sudan\" is \"SSD\" and not \"SDS\" in the geographic data\n",
    "geo.loc[geo[\"Code\"]==\"SDS\", \"Code\"] = \"SSD\"\n",
    "geo[geo[\"Code\"]==\"SSD\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preparing the plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's create the interactive plot. We will use the 'bokeh' and 'matplotlib' libraries for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.243145Z",
     "start_time": "2019-06-09T16:40:18.770241Z"
    }
   },
   "outputs": [],
   "source": [
    "from bokeh.io import save, show, output_file, output_notebook, reset_output, export_png\n",
    "from bokeh.plotting import figure\n",
    "from bokeh.models import (\n",
    "    GeoJSONDataSource, ColumnDataSource, ColorBar, Slider, Spacer,\n",
    "    HoverTool, TapTool, Panel, Tabs, Legend, Toggle, LegendItem,\n",
    ")\n",
    "from bokeh.palettes import brewer\n",
    "from bokeh.models.callbacks import CustomJS\n",
    "from bokeh.models.widgets import Div\n",
    "from bokeh.layouts import widgetbox, row, column\n",
    "from matplotlib import pyplot as plt\n",
    "from matplotlib.colors import rgb2hex"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first thing we need to do is to group our data in predefined bins. We will assign each bin to a color."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.295580Z",
     "start_time": "2019-06-09T16:40:19.244622Z"
    },
    "code_folding": [],
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Create bins to color each country\n",
    "bins = [0,2,5,10,15,20,25,30,100]\n",
    "# create stylish labels\n",
    "bin_labels = [f'≤{bins[1]}%'] + [f'{bins[i]}-{bins[i+1]}%' for i in range(1,len(bins)-2)] + [f'>{bins[-2]}%']\n",
    "# assign each row to a bin\n",
    "data['bin'] = pd.cut(\n",
    "    data['Prevalence'], bins=bins, right=True, include_lowest=True, precision=0, labels=bin_labels,\n",
    ").astype(str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.319190Z",
     "start_time": "2019-06-09T16:40:19.297462Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Western Sahara', 'Falkland Islands', 'Greenland',\n",
       "       'French Southern and Antarctic Lands', 'Puerto Rico', 'Palestine',\n",
       "       'New Caledonia', 'Taiwan', 'Northern Cyprus', 'Somaliland',\n",
       "       'Kosovo'], dtype=object)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Merge the geographic data with obesity data\n",
    "df = geo.merge(data, on='Code', how='left')\n",
    "df = df.drop(columns=\"Country_y\").rename(columns={\"Country_x\":\"Country\"})\n",
    "df[df[\"Prevalence\"].isna()][\"Country\"].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.335127Z",
     "start_time": "2019-06-09T16:40:19.321603Z"
    }
   },
   "outputs": [],
   "source": [
    "# Add a 'No data' bin for countries without data on their obesity\n",
    "df.loc[df['Prevalence'].isna(), 'bin'] = 'No data'\n",
    "df.fillna('No data', inplace = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.353401Z",
     "start_time": "2019-06-09T16:40:19.337840Z"
    }
   },
   "outputs": [],
   "source": [
    "# Define a yellow to red color palette\n",
    "palette = brewer['YlOrRd'][len(bins)-1]\n",
    "# Reverse color order so that dark red corresponds to highest obesity\n",
    "palette = palette[::-1]\n",
    "\n",
    "# Assign obesity prevalence to a color\n",
    "def val_to_color(value, nan_color='#d9d9d9'):\n",
    "    if isinstance(value, str): return nan_color\n",
    "    for i in range(1,len(bins)):\n",
    "        if value <= bins[i]:\n",
    "            return palette[i-1]\n",
    "df['color'] = df['Prevalence'].apply(val_to_color)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since Bokeh doesn't have an interactive colorbar, we will create one by plotting rectangles on a figure. This is a bit cumbersome because we need to define x coordinates and a width for each bin in our data, but I find the interactive colorbar to be very useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.374823Z",
     "start_time": "2019-06-09T16:40:19.355901Z"
    }
   },
   "outputs": [],
   "source": [
    "# assign x coordinates\n",
    "def bin_to_cbar_x(value):\n",
    "    if value == 'No data': return -2\n",
    "    for i,b in enumerate(bin_labels):\n",
    "        if value == b:\n",
    "            return 5*(i+1)\n",
    "df['cbar_x'] = df['bin'].apply(bin_to_cbar_x)\n",
    "# assign width\n",
    "df['cbar_w'] = df['Prevalence'].apply(lambda x: 5 if x == 'No data' else 4.7)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will also add a second figure which displays the evolution of each country's obesity rate. We need to define another colorpalette for this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.395974Z",
     "start_time": "2019-06-09T16:40:19.376973Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "165 countries to plot\n"
     ]
    }
   ],
   "source": [
    "# create color palette for the graph\n",
    "countries = sorted(df[df[\"bin\"] != \"No data\"][\"Country\"].unique())\n",
    "n_country = len(countries)\n",
    "print(\"%d countries to plot\" % n_country)\n",
    "cmap = plt.get_cmap('gist_ncar', n_country)\n",
    "country_palette = [rgb2hex(cmap(i)[:3]) for i in range(cmap.N)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Plotting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now all that is left to do is to create the different objects that bokeh will display. Let's start with the datasources. We will define which year to display on the map first, as well as which country. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:19.399879Z",
     "start_time": "2019-06-09T16:40:19.397319Z"
    }
   },
   "outputs": [],
   "source": [
    "# define the output file\n",
    "reset_output()\n",
    "output_file(\"obesity-trends.html\", title=\"Obesity trends\", mode=\"inline\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:23.477821Z",
     "start_time": "2019-06-09T16:40:19.401138Z"
    }
   },
   "outputs": [],
   "source": [
    "# Input sources\n",
    "df.sort_values(by=[\"Country\",\"Year\"], inplace=True)\n",
    "# source that will contain all necessary data for the map\n",
    "geosource = GeoJSONDataSource(geojson=df.to_json())\n",
    "# source that contains the data that is actually shown on the map (for a given year)\n",
    "displayed_src = GeoJSONDataSource(geojson=df[df['Year'].isin(['No data', 1975])].to_json())\n",
    "# source that will be used for the graph (we don't need the countries shapes for this)\n",
    "country_source = ColumnDataSource(df[df['Country'] == \"France\"].drop(columns=[\"geometry\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The tools displayed with our map and graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:23.483041Z",
     "start_time": "2019-06-09T16:40:23.478975Z"
    }
   },
   "outputs": [],
   "source": [
    "# Tools\n",
    "\n",
    "# slider to select the year\n",
    "slider = Slider(title='Year',start=1975, end=2016, step=1, value=1975)\n",
    "\n",
    "# hover tool for the map\n",
    "map_hover = HoverTool(tooltips=[ \n",
    "    ('Country','@Country (@Code)'),\n",
    "    ('Obesity rate (%)', '@Prevalence')\n",
    "])\n",
    "\n",
    "# hover tool for the graph\n",
    "graph_hover = HoverTool(tooltips=[ \n",
    "    ('Country','@Country (@Code)'),\n",
    "    ('Obesity rate (%)', '@Prevalence'),\n",
    "    ('Year', '@Year')\n",
    "])\n",
    "\n",
    "# button for the animation\n",
    "anim_button = Toggle(label=\"▶ Play\", button_type=\"success\", width=50, active=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's create the plot !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:23.512692Z",
     "start_time": "2019-06-09T16:40:23.484294Z"
    }
   },
   "outputs": [],
   "source": [
    "# create map figure\n",
    "p = figure(\n",
    "    title = 'Share of adults who are obese in 1975', \n",
    "    plot_height=550 , plot_width=1100, \n",
    "    toolbar_location=\"right\", tools=\"tap,pan,wheel_zoom,box_zoom,save,reset\", toolbar_sticky=False,\n",
    "    active_scroll=\"wheel_zoom\",\n",
    ")\n",
    "p.title.text_font_size = '16pt'\n",
    "p.xgrid.grid_line_color = None\n",
    "p.ygrid.grid_line_color = None\n",
    "p.axis.visible = False\n",
    "\n",
    "# Add hover tool\n",
    "p.add_tools(map_hover)\n",
    "\n",
    "# Add patches (countries) to the figure\n",
    "patches = p.patches(\n",
    "    'xs','ys', source=displayed_src, \n",
    "    fill_color='color',\n",
    "    line_color='black', line_width=0.25, fill_alpha=1, \n",
    "    hover_fill_color='color',\n",
    ")\n",
    "# outline when we hover over a country\n",
    "patches.hover_glyph.line_color = '#3bdd9d'\n",
    "patches.hover_glyph.line_width = 3\n",
    "patches.nonselection_glyph = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:23.604836Z",
     "start_time": "2019-06-09T16:40:23.515127Z"
    }
   },
   "outputs": [],
   "source": [
    "# create the interactive colorbar\n",
    "p_bar = figure(\n",
    "    title=None, plot_height=80 , plot_width=600, \n",
    "    tools=\"tap\", toolbar_location=None\n",
    ")\n",
    "p_bar.xgrid.grid_line_color = None\n",
    "p_bar.ygrid.grid_line_color = None\n",
    "p_bar.outline_line_color = None\n",
    "p_bar.yaxis.visible = False\n",
    "\n",
    "# set the title and ticks of the colorbar\n",
    "p_bar.xaxis.axis_label = \"% Obesity (BMI ≥ 30)\"\n",
    "p_bar.xaxis.ticker = sorted(df['cbar_x'].unique())\n",
    "p_bar.xaxis.major_label_overrides = dict([(i[0],i[1]) for i in df.groupby(['cbar_x','bin']).describe().index])\n",
    "p_bar.xaxis.axis_label_text_font_size = \"12pt\"\n",
    "p_bar.xaxis.major_label_text_font_size = \"10pt\"\n",
    "\n",
    "# activate the hover but hide tooltips\n",
    "hover_bar = HoverTool(tooltips=None)\n",
    "p_bar.add_tools(hover_bar)\n",
    "\n",
    "# plot the rectangles for the colorbar\n",
    "cbar = p_bar.rect(x='cbar_x', y=0, width='cbar_w', height=1, \n",
    "    color='color', source=displayed_src,\n",
    "    hover_line_color='#3bdd9d', hover_fill_color='color')\n",
    "\n",
    "# outline when we hover over the colorbar legend\n",
    "cbar.hover_glyph.line_width = 4\n",
    "cbar.nonselection_glyph = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:24.447675Z",
     "start_time": "2019-06-09T16:40:23.606643Z"
    }
   },
   "outputs": [],
   "source": [
    "# create the graph figure\n",
    "p_country = figure(\n",
    "    title=\"Evolution of obesity\", plot_height=700 , plot_width=1100, \n",
    "    tools=\"pan,wheel_zoom,save\", active_scroll=\"wheel_zoom\", toolbar_location=\"right\",\n",
    ")\n",
    "p_country.title.text_font_size = '14pt'\n",
    "p_country.xaxis.axis_label = \"Year\"\n",
    "p_country.yaxis.axis_label = \"Obesity rate (%)\"\n",
    "p_country.axis.major_label_text_font_size = \"12pt\"\n",
    "p_country.axis.axis_label_text_font_size = \"14pt\"\n",
    "\n",
    "# plot data on the figure\n",
    "line_plots = {}\n",
    "legend_items = {}\n",
    "for i, country in enumerate(countries):\n",
    "    # get subset of data corresponding to a country\n",
    "    country_source = ColumnDataSource(df[df['Country'] == country].drop(columns=[\"geometry\"]))\n",
    "    # plot\n",
    "    line = p_country.line(\"Year\", \"Prevalence\", legend=False, source=country_source, \n",
    "                      color=country_palette[i], line_width=2)\n",
    "    circle = p_country.circle(\"Year\", \"Prevalence\", legend=False, source=country_source, \n",
    "                          line_color=\"darkgrey\", fill_color=country_palette[i], size=8)\n",
    "    # used later in the interactive callbacks\n",
    "    line_plots[country] = [line, circle]\n",
    "    legend_items[country] = LegendItem(label=country, renderers=[line, circle])\n",
    "    # only display France at first\n",
    "    if country != \"France\":\n",
    "        line.visible = False\n",
    "        circle.visible = False\n",
    "\n",
    "default_legend = [\n",
    "    (\"France\", line_plots[\"France\"]),\n",
    "]\n",
    "legend = Legend(items=default_legend, location=\"top_center\")\n",
    "legend.click_policy = \"hide\"\n",
    "p_country.add_layout(legend, 'right')\n",
    "\n",
    "# Add hover tool\n",
    "p_country.add_tools(graph_hover)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The interactivity will be done with JavaScript callbacks since they give much more liberty and we won't need to run a Bokeh server to display the map."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:24.456683Z",
     "start_time": "2019-06-09T16:40:24.448931Z"
    },
    "code_folding": [],
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# JS callbacks\n",
    "\n",
    "# Update the map on slider change\n",
    "slider_callback = CustomJS(args=dict(slider=slider, source=geosource, displayed_src=displayed_src), code=\"\"\"\n",
    "    var year = slider.value;\n",
    "    var show = [year, 'No data'];\n",
    "    var data = {};\n",
    "    columns = Object.keys(source.data);\n",
    "    columns.forEach(function(key) {\n",
    "        data[key] = [];\n",
    "    });\n",
    "    for (var i = 0; i < source.get_length(); i++){\n",
    "        if (show.includes(source.data['Year'][i])){\n",
    "            columns.forEach(function(key) {\n",
    "                data[key].push(source.data[key][i])\n",
    "            });\n",
    "        }\n",
    "    }\n",
    "    displayed_src.data = data;\n",
    "    displayed_src.change.emit();\n",
    "\"\"\")\n",
    "slider.js_on_change('value', slider_callback)\n",
    "\n",
    "# Update figure title from slider change\n",
    "callback_title = CustomJS(args=dict(slider=slider, figure=p), code=\"\"\"\n",
    "    var year = slider.value;\n",
    "    figure.title.text = 'Share of adults who are obese in ' + year;\n",
    "\"\"\")\n",
    "slider.js_on_change('value', callback_title)\n",
    "\n",
    "\n",
    "# Add callback on country click\n",
    "plot_callback = CustomJS(args=dict(\n",
    "    csource=country_source, source=geosource, displayed_src=displayed_src, line_plots=line_plots, legend=legend, legend_items=legend_items), code=\"\"\"\n",
    "    // only continue if a country was selected\n",
    "    var ixs = displayed_src.selected.indices;\n",
    "    if (ixs.length == 0) { return; }\n",
    "    \n",
    "    // init\n",
    "    var data = {};\n",
    "    var items = [];\n",
    "    countries = [];\n",
    "    columns = Object.keys(source.data);\n",
    "    columns.forEach(function(key) {\n",
    "        data[key] = [];\n",
    "    });\n",
    "    \n",
    "    // hide all plots\n",
    "    for (var country in line_plots) {\n",
    "        var line = line_plots[country][0];\n",
    "        var circle = line_plots[country][1];\n",
    "        line.visible = false;\n",
    "        circle.visible = false;\n",
    "    }\n",
    "    \n",
    "    // loop over the selected countries\n",
    "    ixs.forEach(function(ix) {\n",
    "        // identify corresponding country\n",
    "        country = displayed_src.data[\"Country\"][ix];\n",
    "        countries.push(country);\n",
    "    });\n",
    "    // sort them in order\n",
    "    countries.sort()\n",
    "    // display the corresponding glyphs and legend\n",
    "    countries.forEach(function(country) {\n",
    "        line = line_plots[country][0];\n",
    "        circle = line_plots[country][1];\n",
    "        line.visible = true;\n",
    "        circle.visible = true;\n",
    "        items.push(legend_items[country]);\n",
    "        \n",
    "        for (var i = 0; i < source.get_length(); i++){\n",
    "            if (source.data['Country'][i] == country) {\n",
    "                columns.forEach(function(key) {\n",
    "                    data[key].push(source.data[key][i])\n",
    "                });\n",
    "            }\n",
    "        }\n",
    "    });\n",
    "    legend.items = items;\n",
    "    csource.data = data;\n",
    "    csource.change.emit();\n",
    "\"\"\")\n",
    "displayed_src.selected.js_on_change('indices', plot_callback)\n",
    "\n",
    "# add animation\n",
    "update_interval = 500 # in ms\n",
    "anim_callback = CustomJS(args=dict(slider=slider, update_interval=update_interval), code=\"\"\"\n",
    "    var button = cb_obj;\n",
    "    if (button.active == true){\n",
    "        button.label = \"◼ Stop\";\n",
    "        button.button_type = \"danger\";\n",
    "        mytimer = setInterval(update_year, update_interval);           \n",
    "    } else {\n",
    "        button.label = \"▶ Play\";\n",
    "        button.button_type = \"success\";\n",
    "        clearInterval(mytimer);\n",
    "    }\n",
    "\n",
    "    function update_year() {\n",
    "        year = slider.value;\n",
    "        if (year < 2016) {\n",
    "            slider.value += 1;\n",
    "        } else {\n",
    "            slider.value = 1975;\n",
    "        }\n",
    "    }\n",
    "\"\"\")\n",
    "anim_button.callback = anim_callback"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we define the layout for all these elements. We will have 2 tabs, one for the map, and one for the chart."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:24.475205Z",
     "start_time": "2019-06-09T16:40:24.458064Z"
    }
   },
   "outputs": [],
   "source": [
    "# arrange display with tabs\n",
    "tab_map = Panel(title=\"Map\",\n",
    "    child=column(\n",
    "        p, # map\n",
    "        p_bar, # colorbar\n",
    "        row(widgetbox(anim_button), Spacer(width=10), widgetbox(slider)) # animation button and slider\n",
    "    ))\n",
    "tab_chart = Panel(title=\"Chart\", child=column(p_country))\n",
    "tabs = Tabs(tabs=[ tab_map, tab_chart ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-06-09T16:40:26.784784Z",
     "start_time": "2019-06-09T16:40:24.476885Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# save the document and display it !\n",
    "footer = Div(text=\"\"\"\n",
    "Data: World Health Organization - Global Health Observatory</br >\n",
    "Author: <a href=\"https://cbouy.github.io\">Cédric Bouysset</a>\n",
    "\"\"\")\n",
    "layout = column(tabs, footer)\n",
    "show(layout)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "259.261px"
   },
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}