Hans van Maanen writes:

Mag ik je weer een statistische vraag voorleggen?

If I ask my frequentist statistician for a 95%-confidence interval, I can be 95% sure that the true value will be in the interval she just gave me. My visualisation is that she filled a bowl with 100 intervals, 95 of which do contain the true value and 5 do not, and she picked one at random.

Now, if she gives me two independent 95%-CI’s (e.g., two primary endpoints in a clinical trial), I can only be 90% sure (0.95^2 = 0,9025) that they both contain the true value. If I have a table with four measurements and 95%-CI’s, there’s only a 81% chance they all contain the true value.Also, if we have two results and we want to be 95% sure both intervals contain the true values, we should construct two 97.5%-CI’s (0.95^(1/2) = 0.9747), and if we want to have 95% confidence in four results, we need 0,99%-CI’s.

I’ve read quite a few texts trying to get my head around confidence intervals, but I don’t remember seeing this discussed anywhere. So am I completely off, is this a well-known issue, or have I just invented the Van Maanen Correction for Multiple Confidence Intervals? ;-))

Ik hoop dat je tijd hebt voor een antwoord. It puzzles me!

My reply:

Ja hoor kan ik je hulpen, maar en engels:

1. “If I ask my frequentist statistician for a 95%-confidence interval, I can be 95% sure that the true value will be in the interval she just gave me.” Not quite true. Yes, true on average, but not necessarily true in any individual case. Some intervals are clearly wrong. Here’s the point: even if you picked an interval at random from the bowl, once you see the interval you have additional information. Sometimes the entire interval is implausible, suggesting that it’s likely that you happened to have picked one of the bad intervals in the bowl. Other times, the interval contains the entire range of plausible values, suggesting that you’re almost completely sure that you have picked one of the good intervals in the bowl. This can especially happen if your study is noisy and the sample size is small. For example, suppose you’re trying to estimate the difference in proportion of girl births, comparing two different groups of parents (for example, beautiful parents and ugly parents). You decide to conduct a study of N=400 births, with 200 in each group. Your estimate will be p2 – p1, with standard error sqrt(0.5^2/200 + 0.5^2/200) = 0.05, so your 95% conf interval will be p2 – p1 +/- 0.10. We happen to be pretty sure that any true population difference will be less than 0.01 (see here, hence if p2 – p1 is between -0.09 and +0.09, we can be pretty sure that our 95% interval *does* contain the true value. Conversely, if p2 – p1 is less than -0.11 or more than +0.11, then we can be pretty sure that our interval *does not* contain the true value. Thus, once we *see the interval*, it’s no longer generally a correct statement to say that you can be 95% sure the interval contains the true value.

2. Regarding your question: I don’t really think it makes sense to want 95% confidence in four results. It makes more sense to accept that our inferences are uncertain, we should not demand or act as if that they all be correct.