The above phrase just came up, and I think it’s important enough to deserve its own post.
Well-meaning researchers do statistical-significance filtering all the time—it’s what they’re trained to do, it’s what they see in published papers in top journals, it’s what reviewers for journals want them to do—so I can understand why they do it. But it’s a mistake, it’s a noise amplifier.
To put it another way: Statistical significance filtering has two major problems:
– Noise amplifier. That’s what I’m focusing on here. P-values are super noisy. You’re trained to think that p-values are not noisy—you’re given the (false) impression that if the true effect is zero, there’s only a 5% chance that you’ll get statistical significance, and you’re also given the (false) impression that if the effect is real, there’s an 80% chance you will get statistical significance. In fact, whether the underlying effect is real or not, your p-value is noisy noisy noisy (see here), and selecting what to report, or deciding how to report, based on statistical significance, is little better than putting all your findings on a sheet of paper, folding it up, cutting it a few times with scissors, and picking out a few random shards to publish. See section 2.2 here for an example (and not a particularly bad example, more like standard practice).
So, again, statistical-significance filtering is a noise amplifier. We should avoid filtering our results by statistical significance, not just because we’re worried about our “alpha level” or “p-hacking” or because it’s a “questionable research practice” or because of “multiple testing” or whatever, but because it adds noise to our already noisy data. And that’s just irresponsible. It’s a bad idea, even if it’s done in complete innocence.