Multilevel Bayesian analyses of the growth mindset experiment

Jared Murray, one of the coauthors of the Growth Mindset study we discussed yesterday, writes:

Here are some pointers to details about the multilevel Bayesian modeling we did in the Nature paper, and some notes about ongoing & future work.

We did a Bayesian analysis not dissimilar to the one you wished for! In section 8 of the supplemental material to the Nature paper, you’ll find some information about the Bayesian multilevel model we fit, starting on page 46 with the model statement and some information about priors below (variable definitions are just above). If you squint at the nonparametric regression functions and imagine them as linear, this is a pretty vanilla Bayesian multilevel model with school varying intercepts and slopes (on the treatment indicator). (For the Nature analysis all our potential treatment effect moderators are at the school level.) But the nonparametric prior distribution on those functions is actually imposing the kind of partial pooling you wanted to see, and in the end our Bayesian analysis produces substantively similar findings as the “classical” analysis, including strong evidence of positive average treatment effects and the same patterns of treatment effect heterogeneity.

The model & prior we use is a multilevel adaptation of the modeling approach we (Richard Hahn, Carlos Carvalho, and I) described in our paper “Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects.” In that paper we focused on observational studies and the pernicious effects of even completely observed confounding. But the parameterization we use there is useful in general, including RCTs like the Mindset study. In particular:

1) Explicitly parameterizing the model in terms of the conditional average treatment effect function (lambda in the Nature materials, tau in our arxiv preprint) is important so we can include in the model many variables measured at baseline (to reduce residual variance) while also restricting our attention to a smaller subset of theoretically-motivated potential treatment effect moderators.

2) Perhaps more importantly, in this parameterization we are able to put a prior on the nonparametric treatment effect function (tau/lambda) directly. This way we can control the nature and degree of regularization/shrinkage/partial pooling. For our model which uses a BART prior on the treatment effect function this amounts to careful priors on how deep the trees grow and how far the leaf parameters vary from zero (and to a lesser extent the number of trees). As you suggest, our prior shrinks all the treatment effects toward zero, and also shrinks the nonparametric conditional average treatment effect function tau/lambda toward something that’s close to additive. If that function were exactly additive we’d have only two-way covariate by treatment interactions which seems like a sensible target to shrink towards. (As an aside that might be interesting to you and your readers, this kind of shrinkage is an advantage of BART priors over many alternatives like commonly used Gaussian process priors).

These are important points of divergence of our work from the multitude of “black box” methods for estimating heterogeneous treatment effects non/semiparametrically, including Jennifer’s (wonderful!) work on BART for causal inference.

In terms of what we presented in the Nature paper we were a little constrained by the pre-registration plan, which fixed before some of us joined the team. In turn that prereg plan was constrained by convention—unfortunately, it would probably have been difficult or impossible at the time to fund the study and publish this paper in a similar venue without a prereg plan that primarily focused on the classical analysis and some NHST. [Indeed in my advice to this research team a couple years ago, I advised them to start with the classical analysis and then move to the multilevel model. —AG.] In terms of the Bayesian analysis we did present, we were limited by space considerations in the main document and a desire to avoid undermining later papers by burying new stats in supplemental materials.

We’re working on another paper that foregrounds the potential of Bayesian modeling for these kinds of problems and illustrates how it could enhance and simplify the design and analysis of a study like the NSLM. I think our approach will address many of your critiques: Rather than trying to test multiple competing hypotheses/models, we estimate a rich model of conditional average treatment effects with carefully specified, weakly informative prior distributions. Instead of “strewing the text with p-values”, we focus on different ways to summarize the posterior distribution of the treatment effect function (i.e. the covariate by treatment interactions). We do this via subgroup finding in our arxiv paper above (we kept it simple there, but those subgroup estimates are in fact the Bayes estimates of subgroups under a reasonable loss function). Of course given any set of interesting subgroups we can obtain the joint posterior distribution of subgroup average treatment effects directly once we have posterior samples, which we do in the Nature paper. The subgroup finding exercise is an instance of a more general approach to summarizing the posterior distribution over complex functions by projecting each draw onto a simpler proxy or summary, an idea we (Spencer Woody, Carlos and I) explore in a predictive context in another preprint, “Model interpretation through lower-dimensional posterior summarization.”

If you want to get an idea of what this looks like when it all comes together, here are slides from a couple of recent talks I’ve given (one at SREE aimed primarily at ed researchers, and the other at the Bayesian Nonparameterics meeting last June).

In both cases the analysis I presented diverges from the analysis in the Nature paper (the outcome in these talks is just math GPA, and I looked at the entire population of students rather than lower achieving students as in the Nature paper). So while we find similar patterns of treatment effect heterogeneity as in the Nature paper, the actual treatment effects aren’t directly comparable because the outcomes and populations are different. Anyway, these should give you a sense for the kinds of analyses we’re currently doing and hoping to normalize going forward. Hopefully the Nature paper helps that process along by showing a Bayesian analysis alongside a more conventional one.

It’s great to see statisticians and applied researchers working together in this way.