“You took my sadness out of context at the Mariners Apartment Complex” – Lana Del Rey
It’s sunny, I’m in England, and I’m having a very tasty beer, and Lauren, Andrew, and I just finished a paper called The experiment is just as important as the likelihood in understanding the prior: A cautionary note on robust cognitive modelling.
So I guess it’s time to resurrect a blog series. On the off chance that any of you have forgotten, the Against Arianism series focusses on the idea that, in the same way that Arianism1 was heretical, so too is the idea that priors and likelihoods can be considered separately. Rather, they are consubstantial–built of the same probability substance.
There is no new thing under the sun, so obviously this has been written about a lot. But because it’s my damn blog post, I’m going to focus on a paper Andrew, Michael, and I wrote in 2017 called The Prior Can Often Only Be Understood in the Context of the Likelihood. This paper was dashed off in a hurry and under deadline pressure, but I quite like it. But it’s also maybe not the best place to stop the story.
An opportunity to comment
A few months back, the fabulous Lauren Kennedy was visiting me in Toronto on a different project. Lauren is a postdoc at Columbia working partly on complex survey data, but her background is quantitative methods in psychology. Among other things, we saw a fairly regrettable (but excellent) Claire Denis movie about vampires2.
But that’s not relevant to the story. What is relevant was that Lauren had seen an open invitation to write a comment on a paper in Computational Brain & Behaviour about Robust3 Modelling in Cognitive Science written by a team cognitive scientists and researchers in scientific theory, philosophy, and practice (Michael Lee, Amy Criss, Berna Devezer, Christopher Donkin, Alexander Etz, Fábio Leite, Dora Matzke, Jeffrey Rouder, Jennifer Trueblood, Corey White, and Joachim Vandekerckhove).
Their bold aim to sketch out the boundaries of good practice for cognitive modelling (and particularly for the times where modelling meets data) is laudable, not least because such an endeavor will always be doomed to fail in some way. But the act of stating some ideas for what constitutes best practice gives the community a concrete pole to hang this important discussion on. And Computational Brain & Behaviour recognized this and decided to hang an issue off the paper and its discussions.
The paper itself is really thoughtful and well done. And obviously I do not agree with everything in it, but that doesn’t stop me from the feeling that wide-spread adoption of their suggestions would definitely make quantitative research better.
But Lauren noticed one tool that we have found extremely useful that wasn’t mentioned in the paper: prior predictive checks. She asked if I’d be interested in joining her on a paper, and I quickly said yes!
It turns out there is another BART
The best thing about working with Lauren on this was that she is a legit psychology researcher so she isn’t just playing in someone’s back yard, she owns a patch of sand. It was immediately clear that it would be super-quick to write a comment that just said “you should use prior predictive checks”. But that would miss a real opportunity. Because cognitive modelling isn’t quite the same as standard statistical modelling (although in the case where multilevel models are appropriate Daniel Schad, Michael Betancourt, and Shravan Vasishth just wrote an excellent paper on importing general ideas of good statistical workflows into Cognitive applications).
Rather than using our standard data analysis models, a lot of the time cognitive models are generative models for the cognitive process coupled (sometimes awkwardly) with models for the data that is generated from a certain experiment. So we wanted an example model that is more in line with this practice than our standard multilevel regression examples.
Lauren found the Balloon Analogue Risk Task (BART) in Lee and Wagenmakers’ book Bayesian Cognitive Modeling: A Practical Course, which conveniently has Stan code online4. We decided to focus on this example because it’s fairly easy to understand and has all the features we needed. But hopefully we will eventually write a longer paper that covers more common types of models.
BART is an experiment that makes participants simulate pumping balloons with some fixed probability of popping after every pump. Every pump gets them more money, but they get nothing if the balloon pops. The model contains a parameter () for risk taking behaviour and the experiment is designed to see if the risk taking behaviour changes as a person gets more drunk. The model is described in the following DAG:
Exploring the prior predictive distribution
Those of you who have been paying attention will notice the Uniform(0,10) priors on the logit scale and think that these priors are a little bit terrible. And they are! Direct simulation from model leads to absolutely silly predictive distributions for the number of pumps in a single trial. Worse still, the pumps are extremely uniform across trials. Which means that the model thinks, a priori, that it is quite likely for a tipsy undergraduate to pump a balloon 90 times in each of the 20 trials. The mean number of pumps is a much more reasonable 10.
Choosing tighter upper bounds on the uniform priors leads to more sensible prior predictive distributions, but then Lauren went to test out what changes this made to inference (in particular looking at how it affects the Bayes factor against the null that the parameters were the same across different levels of drunkenness). It made very little difference. This seemed odd so she started looking closer.
Where is the p? Or, the Likelihood Principle gets in the way
So what is going on here? Well the model describe in Lee and Wagenmaker’s book is not a generative model for the experimental data. Why not? Because the balloon sometimes pops! But because in this modelling setup the probability of explosion is independent of the number of pumps, this explosive possibility only appears as a constant in the likelihood.
The much lauded Likelihood Principle tells us that we do not need to worry about these constants when we are doing inference. But when we are trying to generate data from the prior predictive distribution, we really need to care about these aspects of the model.
Once the context on the experiment is taken into account, the prior predictive distributions change a lot.
Context is important when taking statistical methods into new domains
Prior predictive checks are really powerful tools. They give us a way to set priors, they give us a way to understand what our model does, they give us a way to generate data that we can use to assess the behaviour of different model comparison tools under the experimental design at hand. (Neyman-Pearson acolytes would talk about power here, but the general question lives on beyond that framework).
Modifications of prior predictive checks should also be used to assess how predictions, inference, and model comparison methods behave under different but realistic deviations from the assumed generative model. (One of the points where I disagree with Lee et al.‘s paper is that it’s enough to just pre-register model comparision methods. We also need some sort of simulation study to know how they work for the problem at hand!)
But prior predictive checks require understanding of the substantive field as well as understanding of how the experiment was performed. And it is not always as simple as just predict y!
Balloons pop. Substantive knowledge may only be about contrasts or combinations of predictions. We need to always be aware that it’s a lot of work to translate a tool to a new scientific context. Even when that tool appears to be as straightforward to use and as easy to explain as prior predictive checks.
And maybe we should’ve called that paper The Prior Can Often Only Be Understood in the Context of the Experiment.
1 The fourth century Christian heresy that posited that Jesus was created by God and hence was not of the same substance. The council of Nicaea ended up writing a creed to stamp that one out.
2 Really never let me choose the movie. Never.
3 I hate the word “robust” here. Robust against what?! The answer appears to be “robust against un-earned certainty”, but I’m not sure. Maybe they want to Winsorize cognitive science?
4 Lauren had to twiddle it a bit, particularly using a non-centered parameterization to eliminate divergences.