(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
Mark Palko writes:
I can understand the appeal of the cutting edge. The new stuff is sexier. It gets people’s attention. The trouble is, those cutting edge studies often collapse under scrutiny. Some can’t be replicated. Others prove to be not that important.
Confirmation, on the other hand, is not sexy. It doesn’t drive traffic. It’s harder to fit into a paragraph. In a way, though, it’s more interesting because it has a high likelihood of being true and fills in the gaps in big, important questions. The interaction between the ideas is usually the interesting part.
In this particular example, Palko is telling the story of a journalist who reports a finding as new when it is essentially a replication of decades-old work. Palko’s point is not that there’s anything wrong with replication but rather that the journalist seems to feel that it is necessary to report the idea as new and cutting-edge, even if it falls within a long tradition. (Also, Palko is not claiming that this newly published work is not original, merely that it is more valuable in context.)
Palko’s observations fit into a topic that’s been coming up a lot in this blog (as well as in statistical discussions more generally) in recent years:
- Lots of iffy studies are published every year in psychology, medicine, biology etc. For reasons explained by Uri Simohnson and others, it’s possible to get tons of publishable (i.e., “statistically significant”) results out of noise. Even some well-respected work turns out to me quite possibly wrong.
- It would be great if studies were routinely replicated. In medicine there are ethical concerns, but in biolab or psychology experiments, why not? What if first- and second-year grad students in these fields were routinely required to conduct replications of well-known findings? There are lots of grad students out there, and we’d soon get a big N on all these questionable claims—at least those that can be evaluated by collecting new data in the lab.
- As many have noted, it’s hard to publish a replication. But now we can have online journals. Why not a Journal of Replication? (We also need to ditch the system where people are expected to anonymously review papers for free, but that’s not such a big deal. We could pay reviewers (using the money that otherwise would go to the executives at Springer etc) and also move to an open post-publication review system (that is, the journal looks something like a blog, in that there’s a space to comment on any article). Paying reviewers might sound expensive, but peer review is part of the scientific process. It’s worth paying for.
- Also the Bayesian point that surprising claims are likely to be wrong. Journalists like to report “man bites dog” not “dog bites man,” but when you look into some of those “man bites dog” stories they’re not actually true. I don’t see a resolution for this one.
- Everybody’s talking about the problem of false claims in the scientific literature. People are talking about it much more now than they were, twenty years ago. Back then, we thought of publication bias as a minor nuisance, but now we see it as part of a big picture of the breakdown of the culture of science.
- I think statistics needs to move beyond the paradigm of analyzing studies or datasets one at a time. I doubt this even made sense in the days of R. A. Fisher, I’m guessing that back in the day at Rothamsted Experimental Station or the experimental farm in Ames, Iowa, that each experiment was part of a long thread of trial and error (perhaps Steve Stigler can supply more details on this). But somewhere along the way came the idea that each little experiment was supposed to come in a box, nicely tied up and statistically significant.
P.S. The picture above is the first item that came up in a google image search on *confirmation is not sexy*.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science