Yes, you can include prior information on quantities of interest, not just on parameters in your model

Nick Kavanagh writes:

I studied economics in college and never heard more than a passing reference to Bayesian stats. I started to encounter Bayesian concepts in the workplace and decided to teach myself on the side.

I was hoping to get your advice on a problem that I recently encountered. It has to do with the best way to encode prior information into a model in which the prior knowledge pertains to the overall effect of some change (not the values of individual parameters). I haven’t seen this question addressed before and thought it might be a good topic for a blog post.

I’m building a model to understand the effects of advertising on sales, controlling for other factors like pricing. A simplified version of the model is presented below.

sales = alpha + beta_ad * ad_spend + beta_price * log(price)

Additional units of advertising will, at some point, yield lower incremental sales. This non-linearity is incorporated into the model through a variable transformation — f(ad_spend, s) — where the parameter s determines the rate of diminishing returns.

sales = alpha + beta_ad * f(ad_spend, s) + beta_price * log(price)

Outside the model, I have estimates of the impact of advertising on sales obtained through randomized experiments. These experiments don’t provide estimates of beta_ad and s. They simply tell you that “increasing advertising spend by $100K generated 400 [300, 500] incremental sales.” The challenge is that different sets of parameter values for beta_ad and s yield very similar results in terms of incremental sales. I’m struggling with the best way to incorporate the experimental results into the model.

My reply:

In Stan this is super-easy: You can put priors on anything, including combinations of parameters. Consider this code fragment:

model {
  target += normal(y | a + b*x, sigma);  \\ data model
  target += normal(a | 0, 10);           \\ weak prior on a
  target += normal(b | 0, 10);           \\ weak prior on a
  target += normal(a + 5*b | 4.5, 0.2);        \\ informative prior on a + 5*b

In this example, you have prior information on the linear combination, a + 5*b, an estimate of 4.5 with standard error 0.2, from some previous experiment.

The key is that prior information is, mathematically, just more data.

You should be able to do the same thing if you have information on a nonlinear function of parameters too, but then you need to fix the Jacobian, or maybe there’s some way to do this in Stan.