Some data science principles from Gelman, Rosling and me

March 6, 2015

(This article was originally published at Big Data, Plainly Spoken (aka Numbers Rule Your World), and syndicated at StatsBlogs.)

I discovered Hans Rosling's Gapminder work when I first started Junk Charts almost ten year ago, with this series of posts. So I was very excited to meet Hans yesterday at the Data, Children and Post-2015 Agenda Event hosted by the UNICEF Data and Analytics Section. And he gave a marvellous talk. I came away touched in equal parts by his humanity, the animated passion for his subject, and the insatiable desire to communicate.

Before getting to Hans, the event's host also made an impression. The UN has put a lot of effort into the Open Data movement. They revamped the website that hosts data from MICS (Multiple Indicator Cluster Surveys), which can be a good source of data for classes and projects. An older resource called DevInfo also appears to be very useful for data about the plight of children (link). The home page for UNICEF data is here.

Hans is a straight talker. And here are a few zingers from his talk.

It's a personality disorder for someone to be interested only in the data. Data is not enough.

He came back to this point at the end of the talk, pointing out that great work comes from people who understand the statistical reasoning and how the data is collected.

We don't need Big Data. We need Basic Data.

Here is an example of Basic Data, presented in the simplest possible way:


You have all the granular data and yet the majority of people continue to harbor myths about world social statistics. The above chart, for example, makes the point that in the last thirty years, Asia (which holds more than half of the world's population) has dramatically reduced fertility rates to reach the same level of the Americas. And yet, when Rosling quizzes his audience about world population growth, 80 to 90 percent still hold the impression that global population will continue to grow at a either linear or sub-linear rate.

Throughout the presentation, I noticed a further cleansing of his visual palette. This leads to another provocation:

The passion of the people plus Excel were all you need. You don't need fancy software.

He was talking about the Ebola crisis in Liberia, where he worked with locals to help measure and staunch the emerging epidemic. Many Western news outlets did not do enough homework and reported vastly inflated numbers during the course of the epidemic. As of yesterday, there are no known cases in Liberia. Hurray!

Saving the best for last. My favorite quote of the evening:

Big Data is a big bag of numerators without denominators.

This gets at the heart of the first C in OCCAM datasets: we are in desperate need of controls.


Meanwhile, Gelman found an elegant way to describe the mentality of statisticians:



I talk about Big Data, statistics education, and business analytics as the second part of the interview with KDnuggets came on line. See here.

I argue that introductory statistics should be taught as a liberal arts course. Reflecting on Rosling's disappointment that the majority of highly-educated people are so ignorant of basic world facts, I also wonder whether the education sector will find a way to teach students these facts. Thinking back to the my own college days, the introductory  courses in statistics, economics, psychology, etc., were great at training me how to think theoretically but none bothered to connect the theories with any real-world statistics! Here is a past post about the dearth of a Census 101 class.

One of the questions I pose most frequently to my team members is: Do you think or do you know? In the spirit of stacking this post with quotes, I offer:

    Thinking comes before knowing but knowing doesn't come from thinking.

The first part of the KDnuggets interview is here.





Please comment on the article here: Big Data, Plainly Spoken (aka Numbers Rule Your World)

Tags: , , , , , , , , , ,