Taking Data Journalism Seriously

May 16, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

This is a bit of a followup to our recent review of “Everybody Lies.”

While writing the review I searched the blog for mentions of Seth Stephens-Davidowitz, and I came across this post from last year, concerning a claim made by author J. D. Vance that “the middle part of America is more religious than the South.” This was a claim that stunned me, given that I’d seen some of the statistics on the topic, and it turned out that Vance had been mistaken, that he’d used some unadjusted numbers which were not directly comparable when looking at different regions of the country. It was an interesting statistical example, also interesting in that claims made in data journalism, just like claims made in academic research, can get all sorts of uncritical publicity. People just trust the numbers, which makes sense in that takes some combination of effort, subject-matter knowledge, and technical expertise to dig deeper and figure out what’s really going on.

How should we think about data journalism, an endeavor which might be characterized as “informal social science”?

Data journalism is a thing, it’s out there, and maybe it needs to be evaluated by the same standards as we evaluate published scholarly research. For example, this exercise in noise mining—a study on college basketball that appeared in the New York Times—is as as bad as this Psychological Science paper on sports team performance. And then there’s data journalism done by academic researchers on holiday, as it were; wacky things like this. When I do data journalism I think it’s of the same high quality as my published work (except that it’s more likely to have some mistakes because it gets posted right away and hasn’t had the benefit of reviews), but I get the impression that other academics have different standards for newspaper articles and blog posts than for scholarly articles. One thing I like about Stephens-Davidowitz’s book is that it mixes results from different sources without privileging PPNAS or whatever.

Anyway, I don’t currently have any big picture regarding data journalism. I just think it’s important; it’s different from the sorts of social science research done in academia, business, and government; and we should be taking it seriously.

P.S. According to Wikipedia, J. D. Vance (author of the mistaken quote above about religiosity) is an “author and venture capitalist,” which connects us to another theme, that of silly statistics from clueless rich guys, of which my favorite remains this credulity-straining graph of “percentage of slaves or serfs in the world” from rich person Peter Diamandis. Wealthy people have no monopoly on foolishness, of course. But when a rich guy does believe passionately in some error, he might well have the connections to promulgate it widely. Henry Ford and Ron Unz come to mind.

The post Taking Data Journalism Seriously appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: ,