Principal Stratification on a Latent Variable (fitting a multilevel model using Stan)

Adam Sales points to this article with John Pane on principal stratification on a latent variable, and writes:

Besides the fact that the paper uses Stan, and it’s about principal stratification, which you just blogged about, I thought you might like it because of its central methodological contribution.

We had been trying to use computer log data to see if the effect of a piece of educational software varied with the way the software was used. We had originally been using student-level sample means of the underlying variables (e.g. proportion of worked sections that the student “mastered,” or the average number of hints a student requested). Eventually (with a slap to the forehead) I realized that all of the apparent effect variation we saw was being driven by students who barely used the software—their sample averages had very small sample sizes (# of sections or problems) and hence large variance. That reminded me of that example from the beginning of BDA about county-level cancer incidence, so to solve it I thought of multilevel modeling. So we ended up nesting a section-level model inside our student model and that’s basically our paper.

And here’s the abstract to the Sales and Pane article:

Mastery learning—the idea that students’ mastery of target skills should govern their advancement through a curriculum—lies at the heart of the Cognitive Tutor, a computer program designed to help teach. This paper uses log data from a large-scale effectiveness trial of the Cognitive Tutor Algebra I curriculum to estimate the role mastery learning plays in the tutor’s effect, using principal stratification. A continuous principal stratification analysis models treatment effect as a function of students’ potential adherence to mastery learning. However, adherence is not observed, but may be measured as a latent variable in an item response model. This paper describes a model for mastery learning in the Cognitive Tutor that includes an item response model in the principal stratification framework, and finds that the treatment effect may in fact decrease with adherence to mastery, or may be nearly unrelated on average.

One of the cool things about statistics, or applied math more generally, is the way in which tools that are developed for one purpose can be useful in so many other settings that have similar mathematical structures.

I sent the above discussion to Avi Feller, who wrote:

I think Adam and John have done some very careful work here, and I’m happy to see it in print.

At the same time, I’ve grown skeptical of using mixture modeling (either explicitly or implicitly) for estimating causal effects in principal stratification models (I’ve written a bit about it, but the damn paper keeps getting rejected!). So while I applaud Adam for his work, I’m less confident that this is a generally applicable strategy. Of course, these are quite challenging questions, and I’m thrilled to see more researchers tackling them!