The American Statistical Association’s (ASA) recent effort to advise the statistical and scientific communities on how they should think about statistics in research is ambitious in scope. It is concerned with an initial attempt to depict what empirical research might look like in “a world beyond p<0.05” (The American Statistician, 2019, 73, S1,1-401). Quite surprisingly, the main recommendation of the lead editorial article in the Special Issue of The American Statistician devoted to this topic (Wasserstein, Schirm, & Lazar, 2019; hereafter, ASA II) is that “it is time to stop using the term ‘statistically significant’ entirely”. (p.2) ASA II acknowledges the controversial nature of this directive and anticipates that it will be subject to critical examination. Indeed, in a recent post, Deborah Mayo began her evaluation of ASA II by making constructive amendments to three recommendations that appear early in the document (‘Error Statistics Philosophy’, June 17, 2019). These amendments have received numerous endorsements, and I record mine here. In this short commentary, I briefly state a number of general reservations that I have about ASA II.
1. The proposal that we should stop using the expression “statistical significance” is given a weak justification
ASA II proposes a superficial linguistic reform that is unlikely to overcome the widespread misconceptions and misuse of the concept of significance testing. A more reasonable, and common-sense, strategy would be to diagnose the reasons for the misconceptions and misuse and take ameliorative action through the provision of better statistics education, much as ASA I did with p values. Interestingly, ASA II references Mayo’s recent book, Statistical Inference as Severe Testing (2018), when mentioning the “statistics wars”. However, it refrains from considering the fact that her error-statistical perspective provides an informed justification for continuing to use tests of significance, along with the expression, “statistically significant”. Further, ASA II reports cases where some of the Special Issue authors thought that use of a p-value threshold might be acceptable. However, it makes no effort to consider how these cases might challenge their main recommendation.
2. The claimed benefits of abandoning talk of statistical significance are hopeful conjectures.
ASA II makes a number of claims about the benefits that it thinks will follow from abandoning talk of statistical significance. It says,“researchers will see their results more easily replicated – and, even when not, they will better understand why”. “[We] will begin to see fewer false alarms [and] fewer overlooked discoveries …”. And, “As ‘statistical significance’ is used less, statistical thinking will be used more.” (p.1) I do not believe that any of these claims are likely to follow from retirement of the expression, “statistical significance”. Unfortunately, no justification is provided for the plausibility of any of the alleged benefits. To take two of these claims: First, removal of the common expression, “significance testing” will make little difference to the success rate of replications. It is well known that successful replications depend on a number of important factors, including research design, data quality, effect size, and study power, along with the multiple criteria often invoked in ascertaining replication success. Second, it is just implausible to suggest that refraining from talk about statistical significance will appreciably help overcome mechanical decision-making in statistical practice, and lead to a greater engagement with statistical thinking. Such an outcome will require, among other things, the implementation of science education reforms that centre on the conceptual foundations of statistical inference.
3. ASA II’s main recommendation is not a majority view.
ASA II bases its main recommendation to stop using the language of “statistical significance” in good part on its review of the articles in the Special Issue. However, an inspection of the Special Issue reveals that this recommendation is at variance with the views of many of the 40-odd articles it contains. Those articles range widely over topics covered, and attitudes to, the usefulness of tests of significance. By my reckoning, only two of the articles advocate banning talk of significance testing. To be fair, ASA II acknowledges the diversity of views held about the nature of tests of significance. However, I think that this diversity should have prompted it to take proper account of the fact that its recommendation is only one of a number of alternative views about significance testing. At the very least, ASA II should have tempered its strong recommendation not to speak of statistical significance any more.
4.The claim for continuity between ASA I and ASA II is misleading. There is no evidence in ASA I (Wasserstein & Lazar, 2016) for the assertion made in ASA II that the earlier document stopped just short of recommending that claims of “statistical significance” should be eliminated. In fact, ASA II marks a clear departure from ASA I, which was essentially concerned with how to better understand and use p-values. There is nothing in the earlier document to suggest that abandoning talk of statistical significance might be the next appropriate step forward in the ASA’s efforts to guide our statistical thinking.
5. Nothing is said about scientific method, and little is said about science.
The announcement of the ASA’s 2017 Symposium on Statistical Inference stated that the Symposium would “focus on specific approaches for advancing scientific methods in the 21stcentury”. However, the Symposium, and the resulting Special Issue of The American Statistician, showed little interest in matters to do with scientific method. This is regrettable because the myriad insights about scientific inquiry contained in contemporary scientific methodology have the potential to greatly enrich statistical science. The post-p< 0.05 world depicted by ASA II is not an informed scientific world. It is an important truism that statistical inference plays a major role in scientific reasoning. However, for this role to be properly conveyed, ASA II would have to employ an informative conception of the nature of scientific inquiry.
6. Scientists who speak of statistical significance do embrace uncertainty. I think that it is uncharitable, indeed incorrect, of ASA II to depict many researchers who use the language of significance testing as being engaged in a quest for certainty. John Dewey, Charles Peirce, and Karl Popper taught us quite some time ago that we are fallible, error-prone creatures, and that we must embrace uncertainty. Further, despite their limitations, our science education efforts frequently instruct learners to think of uncertainty as an appropriate epistemic attitude to hold in science. This fact, combined with the oft-made claim that statistics employs ideas about probability in order to quantify uncertainty, requires from ASA II a factually-based justification for its claim that many scientists who employ tests of significance do so in a quest for certainty.
Under the heading, “Has the American Statistical Association Gone Post-Modern?”, the legal scholar, Nathan Schachtman, recently stated:
The ASA may claim to be agnostic in the face of the contradictory recommendations, but there is one thing we know for sure: over-reaching litigants and their expert witnesses will exploit the real or apparent chaos in the ASA’s approach. The lack of coherent, consistent guidance will launch a thousand litigation ships, with no epistemic compass.(‘Schachtman’s Law’, March 24, 2019)
I suggest that, with appropriate adjustment, the same can fairly be said about researchers and statisticians, who might look to ASA II as an informative guide to a better understanding of tests of significance, and the many misconceptions about them that need to be corrected.
Mayo, D. G. (2019). The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean” (Some Recommendations)(ii),blog post on Error Statistics Philosophy Blog, June 17, 2019.
Mayo, D. G. (2018). Statistical inference as severe testing: How to get beyond the statistics wars. New York, NY: Cambridge University Press.
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70, 129-133.
Wasserstein, R. L., Schirm. A. L., & Lazar, N. A. (2019). Editorial: Moving to a world beyond “p<0.05”. The American Statistician, 73, S1, 1-19.