Just forget the Type 1 error thing.

John Christie writes:

I was reading this paper by Habibnezhad, Lawrence, & Klein (2018) and came across the following footnote:

In a research program seeking to apply null-hypothesis testing to achieve one-off decisions with regard to the presence/absence of an effect, a flexible stopping-rule would induce inflation of the Type I error rate. Although our decision to double the N from 20 to 40 to reduce the 95% CI is not such a flexible stopping rule, it might increase the Type I error rate. That noted, we are not proposing any such one-off decisions, but instead seek to contribute to the cumulative evidence of the scientific process. Those seeking such decisions may consider the current report exploratory rather than confirmatory. (fn 2)

Given the recent strong recommendations by many against adding participants after looking at the result I wonder if you feel the footnote is sufficient or if you wanted to comment on it on your blog.

My quick reply is that I hate this type 1 error thing.

Let me explain in the context of a simple example. Consider two classical designs:

1. N=20 experiment

2. N=2,000,000 experiment.

Both these have “type 1 error rates” of 0.05, but experiment #2 will be much more likely to give statistical significance. Who cares about the type 1 error rate? I don’t. The null hypothesis of zero effect and zero systematic error is always false.

To put it another way: it’s completely fine to add participants after looking at the result. The goal should not be to get “statistical significance” or to get 95% intervals that exclude zero or whatever. Once you forget that, you can move forward.

But now let’s step back and consider the motivation for type 1 error control in the first place. The concern is that if you don’t control type 1 error, you’ll routinely jump to conclusions. I’d prefer to frame this in terms of type M (magnitude) and type S (sign) errors. I think the way to avoid jumping to unwarranted conclusions is by making each statement stand up on its own. To put it another way, I have no problem presenting a thousand 95% intervals, under the expectation that 50 will not contain the true value.