Is Significance Significant?

February 5, 2013

(This article was originally published at Carlisle Rainey » Methods/Statistics, and syndicated at StatsBlogs.)

Justin Esarey has a nice post up on his blog. It is so interesting that I had to stop and think about it for a while.

My reaction best belongs in the comment section of his blog, but I want to use some equations and I know that I can make them work here. [In hindsight, this was a bad idea. I wrote this post on the day Justin published his, but just got LaTeX back working on my blog, through a redesign of the whole site.] So first, hop over and read his nuanced discussion and then come back and read my coarse reaction and questions.

The claim I find most interesting is his conclusion from the simulation study.

So, what can we conclude? First, a small magnitude but statistically significant result contains virtually no important information. I think lots of political scientists sort-of intuitively recognize this fact, but seeing it in black and white really underscores that these sorts of results aren’t (by themselves) all that scientifically meaningful. Second, even a large magnitude, statistically significant result is not especially convincing on its own. To be blunt, even though such a result moves our posterior probabilities a lot, if we’re starting from a basis of skepticism no single result is going to be adequate to convince us otherwise.

The key question is how to update our posterior belief about the effect being zero or non-zero in light of statistical significance. This is an interesting idea to me, because I recently defended statistical significance as a useful way to quickly summarize empirical results (compared to a confidence interval).

Justin's Idea

Here's Justin's idea. What is the posterior probability that the effect is zero, given that it is statistically significant? Of course this assumes that we put some prior mass on exactly no effect. Justin suggests that a skeptical social scientist might believe that the null hypothesis (of exactly no effect) is true with probability 0.9. (I don't care for mass priors at zero and I don't think Justin does either, it just happens to be convenient here to make a point.) Then the posterior probability that the null is true is given by the equation below.

\( Pr(\beta = 0 | Sig.) = \dfrac{Pr(Sig. | \beta = 0)Pr(\beta = 0)}{Pr(Sig. | \beta = 0)Pr(\beta = 0) + Pr(Sig. | \beta \neq 0)Pr(\beta \neq 0)}\)

As I noted above, Justin views 0.9 as a useful prior probability for the null. For simplicity, we can consider the limiting case--a sample so large (or an effect so big) that we always find significance when the effect is not zero. In this limiting case, we know that the probability of significance given the null is false is one. We also know, by construction, that the probability of significance, given the null is true, is 0.05. We can just plug this information into the equation above.

\(Pr(\beta = 0 | Sig.) = \dfrac{0.05 \times 0.9}{0.05 \times 0.9 + 1 \times 0.1} \approx 0.31\)

That is, even with huge sample size and statistical significance, we only think there is a 31% chance that the null is false. That is a counter-intuitive result.

But why is this? When I first saw Justin's simulations, I was a little puzzled and had to run the code myself and work through the analytic stuff to believe the result.

Like other tricky Bayes' rule problems (e.g. Monty Hall problem, boy/girl problem), you have to be careful about the information contained in the likelihood. Here, the likelihood doesn't contain any information about the relative likelihood of the observed data under the null except that it is relatively unlikely. This means that extremely large estimates are treated the same as large estimates. To see this, note that once we're rejecting almost always, it doesn't matter how large the effect is.

Okay, I understand why the posterior probability doesn't go to zero. But why 31%? Shouldn't it settle a little closer to zero? It settles well away from zero because the amount of information in the likelihood is capped. The best thing we can observe is statistical significance. Because this happens fairly often when the null is true (5% of the time), and we strongly believe the null is true (90% of the time), statistical significance doesn't help us much.

Now that I've worked through that bit in my own mind, I see that Justin is making a powerful point--statistical significance doesn't contain that much information.

Remaining Questions

The main question I still have is about the prior. I don't believe that the probability that the null is true is 0.9. I think it is zero. So then why is this prior useful?

Instead of placing a mass at zero, my skeptical prior places a strongly informative prior about zero, say a normal distribution with mean zero and standard deviation 0.1. Then I would hypothesize about the sign of the coefficient. How would this change the argument?

I think this would be more inline with Justin's simulation when the prior probability of the null is 0.5. The math I think is quite similar for a continuous prior centered at zero and the discrete prior that placed 0.5 at zero. Working through the math would require some complicated integration or simulation. Nonetheless, I'll fearlessly intuit the answer. First, when one uses a continuous prior centered at zero, the prior used matters much less than when a discrete prior is used. Second, the resulting posterior allows us to be much more skeptical about the null. In short, I don't think Justin's results would hold for my skeptical, but continuous prior centered about zero.

P.S. While I was fixing up my website to display the equations, Justin posted this, which gets at my questions, but doesn't leave me completely satisfied. I'll think more about it and discuss it in a future post.

Please comment on the article here: Carlisle Rainey » Methods/Statistics