Do we trust these data on political news consumption?

September 11, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Mark Palko writes:

The Monkey Cage just retweeted this but some of the numbers look funny.

“This” is a paper, “The Myth of Partisan Selective Exposure: A Portrait of the Online Political News Audience,” by Jacob Nelson and James Webster, which who write:

We explore observed online audience behavior data to present a portrait of the actual online political news audience. We find that this audience frequently navigates to news sites from Facebook, and that it congregates among a few popular, well-known political news sites. We also find that political news sites comprise ideologically diverse audiences, and that they share audiences with nearly all smaller, more ideologically extreme outlets. Our results call into question the strength of the so-called red/blue divide in actual web use.

But Palko is skeptical, in particular pointing to the graph below:

Palko writes:

Yes, the claims are somewhat counterintuitive, but that wouldn’t bother me so much if the rest of the data didn’t look so screwy. In particular, the average total minutes per visitor per month graph strongly suggests some kind of math or data collection error. It’s not just the size of the spikes at Drudge and Google News search (though those are troubling); it’s their placement.

If we were seeing huge spikes for sites with relatively small but loyal readerships (particularly with longform content), that would be understandable, but this graph shows pretty much the opposite. Large, heavy traffic sites inevitably have a significant portion of casual users and one-time drop-ins. These ought to drag down the average. This makes me suspicious of the Drudge Report numbers (Drudge has huge traffic) and even more so of Google News (surely there are a lot of people who very occasionally spend 30 seconds to a minute looking for stories on some topic).

Drilling down a bit further, the fifth highest average time goes to Bloomberg, another high traffic site that must have a significant portion of casual readers. Compare that to Breitbart, a site known for fanatical followers. I would expect the average Breitbart reader to have a much higher monthly total than the average Bloomberg reader. Likewise the Hill, Mother Jones, Vox and the New Yorker.

I exchanged some tweets with one of the authors. He conceded that the Drudge numbers look strange, but said they had great confidence in their data, and offered to discuss the paper further. He hasn’t replied to any of my tweets since then.

I feel like I’m missing the obvious here, but these numbers seem to run completely counter to what I’d expect.

I think the snappy nature of twitter makes it difficult to have a good discussion about something such as data quality so I think it’s better to have the discussion here on the blog.

In discussing this example, let me emphasize that I’ve not looked at these data at all, and Palko also wanted to say that he does not have expertise here, he’s just a casual observer doing a quick plausibility check of the data.

So I’m interested to hear the story, and I hope that a clarification of the quality of these data can be helpful in moving forward our understanding of the important topic of the consumption of political news.

The post Do we trust these data on political news consumption? appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science