(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

This one surprised me. I included the following question in an exam:

In causal inference, it is often important to study varying treatment effects: for example, a treatment could be more effective for men than for women, or for healthy than for unhealthy patients. Suppose a study is designed to have 80% power to detect a main effect at a 95% confidence level. Further suppose that interactions of interest are half the size of main effects. What is its power for detecting an interaction, comparing men to women (say) in a study that is half men and half women? Suppose 1000 studies of this size are performed. How many of the studies would you expect to report a statistically significant interaction? Of these, what is the expectation of the ratio of estimated effect size to actual effect size?

None of the students got any part of this question correct.

In retrospect, the question was too difficult; it had too many parts given that it was an in-class exam, and I can see how it would be tough to figure out all these numbers. But the students even didn’t get close: they had no idea how to start. They had no sense that you can work backward from power to effect size and go from there.

And these were statistics Ph.D. students. OK, they’re still students and they have time to learn. But this experience reminds me, once again, that classical hypothesis testing is really really hard. All these null hypotheses and type 1 and type 2 errors are distractions, and it’s hard to keep your eye on the ball.

I like the above exam question. I’ll put it in our new book, but I’ll need to break it up into many pieces to make it more doable.

**P.S.** See here for an awesome joke-but-not-really-a-joke solution from an anonymous commenter.

**P.P.S.** Solution is here.

The post Classical hypothesis testing is really really hard appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**