Failure of failure to replicate

April 11, 2018

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Dan Kahan tells this story:

Too much here to digest probably, but the common theme is—what if people start saying their work “replicates” or “fails to replicate” when the studies in question are massively underpowered &/or have significantly different design (& sample) from target study?

1. Kahan after discovering that authors claim my study “failed to replicate”:
On Thu, Aug 10, 2017 at 6:37 PM, Dan Kahan <> wrote:
Hi, Steve & Cristina.
So predictably, people are picking up on your line that “[you failed to replicate Kahan et al.’s “motivated numeracy effect”.
As we have discussed, your study differed from ours in various critical respects, including N & variance of sample in numeracy & ideology. I think it is misleading to say one found no “replication” when study designs differ.  All the guidelines on replication make this point.

2. Them–acknowledging this point

co-author 2

Hi Dan,

If we didn’t, we should have said “conceptual replication.” I certainly agree we didn’t fail to replicate any specific study of yours. And we could have had a bigger N and more conservatives. That’s why we haven’t tried to publish the work in a journal, just a conference proceedings. But, as appealing as the hypothesis is, Cristina’s work does leave me with less faith in the general rule that more numerate people engage in more motivated reasoning using the contingency table task.

best, s

lead author:
Hi Dan,
I agree– we should have used a phrase other than “replication” in describing those parts of the results. 
To add, I tried to make it clear in our poster presentation, as well as in our paper, that the effect of reasons we found was not predicated on the existence of the motivated numeracy effect. And I explicitly noted that this null result was likely attributable to the differences between the two studies– in fact, many people I talked to pointed out the difference in N and the differences in variance on their own.

3. Kahan—trying to figure out how they can acknowledge somewhere that their studies aren’t commensurable w/ ours & it was mistake to assert otherwise

Hi, Steve & Cristina.

Thanks for reflecting on my email & responding so quickly.
I am left, however, with the feeling that your willngness to acknowledge my points in private correspondence doesn’t address my objection to the fairness of what you have done in an open scholarly forum.
You have “published” your paper in the proceedings collection.  The abstract of your paper states expressly “we failed to replicate Kahan et al.’s ‘motivated numeracy effect.’ ” In the text you  state that  “you attempted to replicate” our study and “failed to find a significant effect of motivated numeracy.”
The perfectly forseeable result is that readers  are now treating your study as a “failed replication” attempt, notwithstanding your acknowledgement to me that such a conclusion “clearly,” “definitely” isn’t warranted. Expecting them to “figure this out” for themselves isn’t realistic given the limited time & attention span of casual readers, and the lure of the abstract.
I think the fair thing to do would be to remove the references to “failed replication” and to acknowledge in the paper that your design — because of the N and because of the lack of variance in ideology & numeracy in the study subjects — was not suited for testing the replicability of ours.
Anytning short of this puts me in the position of bearing the burden of explaining away your expressly stated conclusion that our study results “didn’t replicate.”  Because my advocacy would be discounted as self-serving, I would suffer an obvious rhetorical handicap.  What’s more, I’d be forced to spend considerable time on this at the expense of other projects I am working on.
Avoiding such unfairness informs the protocols for replication that various scholars and groups of scholars have compiled and that guided the Science article.  I’m sure you agree that this is good practice & hope will accommodate me & my co-authors on this.
4. Co-author tells me I should feel “honored” that they examined by work & otherwise shut up; also, “replication” carries no special meaning that justifies my focus on it…

Dear Dan,
I will speak for myself, not Cristina.
You seem to have misunderstood my email. I am not taking back our claim that we failed to replicate. What I said is that I admitted that we could have characterized it as a failure of a “conceptual replication.” This is still a type of replication. We were testing an hypothesis we derived from your paper, we used a similar experimental procedure though a wildly smaller N, which we tried to counterbalance by giving each subject more tasks to do. So we had more data than you per subject. We also only tested half your hypothesis in the sense that we didn’t have many conservatives. Nevertheless, we fully expected to see the same pattern of results that you found. But we didn’t; we found the opposite. We were surprised and disappointed but nevertheless decided to report the data in a public forum. I stand by our report even if you don’t like one of our verbs.
Even if we wanted to, we couldn’t deliver on your request. The proceedings have been published. It’s too late to change them. But the fact is that I wouldn’t want to change them anyway. Yes, we could have added the word “conceptual” in a couple of places. But that wouldn’t change the gist of the story. There are failures to replicate all the time. Ours is a minor study, reported in a minor venue. If people challenge you because of it, I’m sure you’re smart enough and have enough data to meet the challenge. I think you should consider it an honor that we took the time and made the effort to look at one boundary of your effect. If you feel strongly about it, then feel free to go out and explain why our data look the way they do. Simply saying our N was too small and our population too narrow explains nothing. We found some very systematic effects, not just random noise.
all the best, steve

Just to interrupt here, I agree with Dan that this seems wrong.  Earlier, Steve had written, “I certainly agree we didn’t fail to replicate any specific study of yours,” and Cristina had written, “we should have used a phrase other than ‘replication’ in describing those parts of the results.”  But now Steve is saying:

I stand by our report even if you don’t like one of our verbs.

I guess the verb here is “replicate”—but at the very least it’s not just Dan who doesn’t think that word is appropriate.  It’s also Cristina, the first author of the paper!

The point here is not to do some sort of gotcha or to penalized Cristina in any way for being open in an email. Rather, it’s the opposite:  the point is that Kahan is offering to help Steve and Cristina out by giving them a chance to fix a mistake they’d made—just as, earlier, Steve and Cristina were helping Dan out by doing conceptual replications of his work.  It seems that those conceptual replications may have been too noisy to tell us much—but that’s fine too, we have to start somewhere.

OK, back to Dan:

5.  So Dan writes attached paper w/ co-author:
Futile gesture, no doubt.
Kahan concludes:
This is a case study in how replication can easily go off the rails. The same types of errors people make in non-replicated papers will now be used in replications.

The post Failure of failure to replicate appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science