There are humans behind those stats

Last night, I presented some visualizations to a local user group at Wake Forest University. A few of them were modern remakes of classic work from William Playfair (below, original and my remake), John Snow (cholera map) and Charles Joseph Minard (march on Moscow), to name a few.

William Playfair (exports and imports)
Remake of previous chart by Francois Dion

Another one was a remake of a much more recent visualization, one that was highlighted in a 2006 TEDx talk by Hans Rosling, about a software called Gapminder. However, it was the data and the storytelling that was much more memorable than the software. So much so, that this talk impacted the field of visualization for years and also drove home for many practitioners how statistics can be effectively communicated.

In my own presentation last night, the main concern was to show how to easily recreate the interactive visualization using modern dashboarding and plotting libraries.

But that was not the end of it. I wanted to add a video to the code repository, so people who didn't attend the talk could have an idea of what the visualization and the interaction was about. I sat in my office and got to work, completely focused on the screen. About 20 seconds into the recording, something really unusual happened. Watch the 30 second clip (you might want to watch it fullscreen):

Did you notice what happened in 1994? Why was the Y axis (life expectancy) suddenly showing its origin (0 years)? Surely that couldn't be right, or could it? Right away, one of Tukey's saying came to mind:

" The greatest value of a picture is when it forces us to notice what we never expected to see." - John W. Tukey

I had been over the "gapminder" data many times, no doubt biased by Rosling's presentation. Had it not been for my decision to make the graph interactive instead of having autoplay, I might not have noticed this outlier.

In a different talk ("Seeking Exotics", notebooks are on github), based on fertility rate, and using machine learning based anomaly detection, Isolation Forest had shown the country of Timor-Leste to have an unusual population pattern, caused by war. But this was on a totally different scale...

Rwanda, 1994

It was not a natural disaster that impacted this particular country. The moment I hovered over this dot and saw Rwanda, I knew what I was looking at. The effect of the civil war, of the genocide that happened that year. No other catastrophic event, man made or not, has had such a traumatic impact on a population's life expectancy, between 1964 and 2003.

And, the moment I hovered over this dot and saw Rwanda, I saw the human beings behind those stats.

As a parting thought, in the wake of so many recent hurricanes and earthquakes, it is good to keep in mind that there are humans behind those statistics too.

Learn more

The dataset, jupyter notebooks, various web apps, the gapminder remake and many links from the talk (including a link to Hans Rosling's original TEDx talk) are all available on the github repository here:

Francois Dion

Chief Data Scientist