(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Ed Vul writes:

In the course of tinkering with someone else’s hairy dataset with a great many candidate explanatory variables (some of which are largely orthogonal factors, but the ones of most interest are competing “binning” schemes of the same latent elements). I wondered about the following “model selection” strategy, which you may have alluded to in your multiple comparisons paper:

Include all plausible factors/interactions as random intercept effects (i.e., (1|A), (1|A:B) in lmer parlance [that’s stan_lmer() now — ed.]). Since there are many competing, non-orthogonal binning schemes included all at once, the model would be overdetermined (singular) if they were all included as fixed effects [he means “non-varying coefficients.” Or maybe he means “varying coefficients estimated without regularization.” The “fixed”/”random” terminology is unclear. — ed.]. However, as random effects [“varying coefficients estimated using regularization” — ed.], we can rely on partial pooling and shrinkage to sort out among them, such that variance along factors that are not well supported by the data (or are explained away by other factors) shrinks to zero. This happens quite decisively in lmer, but a “regularizing” exponential prior on the variances in a fully bayesian model would achieve something similar, I think [easy to do in stan or rstanarm — ed.]. (A more sophisticated approach would be to put a common, pooling prior on the variances for all the individual factors…)

This approach seems to yield sensible results, but I am a bit concerned because I have never seen it used by others, so I am probably missing something. It may just be that it is rarely computationally practical to include all candidate factors/binning-schemes in such an “overcomplete” model. Or perhaps there is a compelling reason why explanatory variables of substantive interest should be treated as fixed, rather than random effects? Is there a fundamental problem with this approach that I am not thinking of? Or is this a well-known technique that I have simply never heard of?

My reply (in addition to the editorial comments inserted above):

This sounds fine. I think, though, you may be overestimating what lmer will do (perhaps my fault given that I featured lmer in my multilevel modeling book). The variance estimate from lmer can be noisy. But, sure, yes, partial pooling is the way to go, I think. Once you’ve fit the model I don’t really see the need to shrink small coefficients all the way to zero, but I guess you can if you want. Easiest I think is to fit the model in Stan (or rstanarm if you want to use lmer-style notation).

Also, the whole fixed/random thing is no big deal. You can allow any and all coefficients to vary; it’s just that if you don’t have a lot of data then it can be a good idea to put informative priors on some of the group-level variance parameters to control the amount of partial pooling. Putting in all these priors might sound kinda weird but that’s just cos we don’t have a lot of experience with such models. Once we have more examples of these (and once they’re in the new edition of my book with Jennifer), then it will be no big deal for you and others to follow this plan in your own models.

Informative priors on group-level variances makes a lot more sense than making a priori decisions to do no pooling, partial pooling, or complete pooling.

The post Partial pooling with informative priors on the hierarchical variance parameters: The next frontier in multilevel modeling appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**