“Do you have any recommendations for useful priors when datasets are small?”

Someone who wishes to remain anonymous writes:

I just read your paper with Daniel Simpson and Michael Betancourt, The Prior Can Often Only Be Understood in the Context of the Likelihood, and I find it refreshing to read that “the practical utility of a prior distribution within a given analysis then depends critically on both how it interacts with the assumed probability model for the data in the context of the actual data that are observed.” I also welcome your comment about the importance of “data generating mechanism” because, for me, is akin to selecting the “appropriate” distribution for a given response. I always make the point to the people I’m working with that we need to consider the clinical, scientific, physical and engineering principles governing the underlying phenomenon that generates the data; e.g., forces are positive quantities, particles are counts, yield is bounded between 0 and 1.

You also talk about the “big data, and small signal revolution.” In industry, however, we face the opposite problem, our datasets are usually quite small. We may have a new product, for which we want to make some claims, and we may have only 4 observations. I do not consider myself a Bayesian, but I do believe that Bayesian methods can be very helpful in industrial situations. I also read your Prior Choice Recommendations but did not find anything specific about small sample sizes. Do you have any recommendations for useful priors when datasets are small?

My quick response is that when sample size is small, or measurements are noisy, or the underlying phenomenon has high variation, then the prior distribution will become more important.

So your question is a good one!

To continue, when priors are important, you’ll have to think harder about what real prior information is available.

One way to to is . . . and I’m sorry for being so predictable in my answer, but I’ll say it anyway . . . embed your problem in a multilevel model. You have a new product with just four observations. Fine. But this new product is the latest in a stream of products, so create a model of the underlying attributes of interests, given product characteristics and time.

Don’t think of your “prior” for a parameter as some distinct piece of information; think of it as the culmination of a group-level model.

Just like when we do Mister P: We don’t slap down separate priors for the 50 states, we set up a hierarchical model with state-level predictors, and this does the partial pooling more organically. So the choice of priors becomes something more familiar: the choice of predictors in a regression model, along with choices about how to set that predictive model up.

Even with a hierarchical model, you still might want to add priors on hyperparameters, but that’s something we do discuss a bit at that link.