Valentin Amrhein, Sander Greenland, and Blake McShane write:

We have a forthcoming comment in Nature arguing that it is time to abandon statistical significance. The comment serves to introduce a new special issue of The American Statistician on “Statistical inference in the 21st century: A world beyond P < 0.05”. It is titled "Retire Statistical Significance"---a theme of many of the papers in the special issue including the editorial introduction---and it focuses on the absurdities generated by so-called “proofs of the null”. Nature has asked us to recruit "co-signatories” for the comment (for an example, see here) and we think readers of your blog would be interested. If so, we would be delighted to send a draft to interested parties for signature . Please request a copy at retire.significance2019@gmail.com and we will send it (Nature has a very strict embargo policy so please explicitly indicate you will keep it to yourself) or, if you already agree with the message, please just sign here. The timeline is tight so we need endorsements by Mar 8 but the comment is short at ~1500 words.

I signed the form myself! I like their paper and agree with all of it, with just a few minor issues:

– They write, “For example, the difference between getting P = 0.03 versus P = 0.06 is the same as the difference between getting heads versus tails on a single fair coin toss.” I’d remove this sentence, first because the connection to the coin toss does not seem clear—it’s a cute mathematical argument but I think just confusing in this context—second because I feel that the whole p=0.03 vs. p=0.06 thing (or, worse, p=0.49 vs. p=0.51) is misleading. The fundamental problem with “statistical significance” is not the arbitrariness of the bright-line rule, but rather the fact that even apparently large differences in p-values (for example, p=0.01 and p=0.30 mentioned later in that paragraph) can be easily explained by noise.

– Also in that paragraph they refer to two studies with 80% power. This too is a bit misleading, I think: People always think they have 80% power when they don’t (see here and here).

– I like that they say we must learn to embrace uncertainty!

– I’m somewhat bothered about this recommendation from their paper: “We recommend that authors describe the practical implications of all values inside the interval, especially the observed effect or point estimate (that is, the value most compatible with the data) and the limits. All the values between the interval’s limits are reasonably compatible with the data.” My problem is that in many cases of forking paths and selection, we have no good reason to think of *any* of the values within the confidence interval as reasonable. For example that study of beauty and sex ratio which purportedly found an 8 percentage point difference with a 95% confidence interval of something like [2%, 14%]. Even 2%–even 1%–would be highly implausible here. In this example, I don’t think it’s accurate in that case to even say that values the range [2%, 14%] are “reasonably compatible with the data.”

I understand the point they’re trying to make, and I like the term “compatability intervals,” but I think you have to be careful not to put too much of a burden on these intervals. There are lots of people out there who say, Let’s dump p-values and instead use confidence intervals. But confidence intervals have these selection problems too. I agree with the things they say in the paragraphs following the above quote.

– They write that in the future, “P-values will be reported precisely (e.g., P = 0.021 or P = 0.13) rather than as binary inequalities.” I don’t like this! I mean, sure, binary is terrible. But “P = 0.021” is, to my mind, ridiculous over-precision. I’d rather see the estimate and the standard error.

Anyway, I think their article is great; the above comments are minor.

Key point from Amrhein, Greenland, and McShane:

We don’t mean to drop P-values, but rather to stop using them dichotomously to decide whether a result refutes or supports a hypothesis.

Also this:

The objection we hear most against retiring statistical significance is that it is needed to make yes-or-no decisions. But for the choices often required in regulatory, policy, and business environments, decisions based on the costs, benefits, and likelihoods of all potential consequences always beat those made based solely on statistical significance. Moreover, for decisions about whether to further pursue a research idea, there is no simple connection between a P-value and the probable results of subsequent studies.

Yes yes yes yes yes. See this other paper of ours for further elaboration of these points.

**P.S.** As noted above, I signed the petition and I recommend you, the readers, consider doing so as well. That said, I fully respect people who don’t like to sign petitions. Feel free to use the comment thread both to discuss the general idea of retiring statistical significance, as well as questions of whether petitions are a good idea . . .