Using multilevel modeling to improve analysis of multiple comparisons

Justin Chumbley writes:

I have mused on drafting a simple paper inspired by your paper “Why we (usually) don’t have to worry about multiple comparisons”.

The initial idea is simply to revisit frequentist “weak FWER” or “omnibus tests” (which assume the null everywhere), connecting it to a Bayesian perspective. To do this, I focus on the distribution of the posterior maximum or extrema (not the maximum a posteriori point estimate) of the joint posterior, given a data-set simulated under the omnibus null hypothesis. This joint posterior may be, for example, defined on a set of a priori exchangeable random coefficients in a multilevel model: it’s maxima just encodes my posterior belief in the magnitude of the largest of those coefficients (which “should” be zero for this data) and can be estimated for example by MCMC. The idea is that hierarchical Bayesian extreme values helpfully contract to zero with the number of coefficients in this setting, while non-hierarchical frequentist extreme values increase. The latter being more typically quantified by other “error” parameters such as FWER “multiple comparisons problem” or MSE “overfitting”. Thus, this offers a clear way to show that hierarchical inference can automatically control the (weak) FWER, without Bonferroni-style adjustments to the test threshold. Mathematically, I imagine some asymptotic – in the number of coefficients – argument for this behavior of the maxima, that I would need time or collaboration to formalize (I am not a mathematician by any means). In any case, the intuition is that because posterior coefficients are all increasingly shrunk, so is their maximum. I have chosen to study the maxima because it is applicable across the very different hierarchical and frequentist models used in practice in the fields I work on (imaging, genomics): spatial, cross-sectional, temporal, neither or both. For example, the posterior maximum is defined for a discretely indexed, exchangeable random process, or a continuously-indexed, non-stationary process. As a point of interest, frequentist distribution of spatial maxima is used for standard style multiple-comparisons adjusted p-values in mainstream neuroimaging, e.g. SPM.

I am very keen to learn more about the possible pros or cons of the idea above.
-Its “novelty”
– How it fares relative to alternative Bayesian omnibus “tests”, e.g. based on comparison of posterior model probabilities for an omnibus null model – a degenerate spike prior – versus some credible alternative model.
-How generally it might be formalized.
-How to integrate type II error and bias into the framework.
… and any more!

My reply:

This idea is not really my sort of thing—I’d prefer a more direct decision analysis on the full posterior distribution. But given that many researchers are interested in hypothesis testing but still want to do something better than classical null hypothesis significance testing, I thought there might be interest in these ideas. So I’m sharing them with the blog readership. Comment away!

The post Using multilevel modeling to improve analysis of multiple comparisons appeared first on Statistical Modeling, Causal Inference, and Social Science.