I really liked this paper, and am curious what other people think before I base a grant application around applying Stan to this problem in a machine-learning context.
- Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 243–268.
Gneiting et al. define what I think is a pretty standard notion of calibration for Bayesian models based on coverage, but I’m not 100% sure if there are alternative sensible definitions.
They also define a notion of sharpness, which for continuous predictions is essentialy narrow posterior intervals, hence the name.
By way of analogy to point estimators, calibration is like unbiasedness and sharpness is like precision (i.e., inverse variance).
I seem to recall that Andrew told me that calibration is a frequentist notion, whereas a true Bayesian would just believe their priors. I’m not so worried about those labels here as about the methodological ramifications of taking the ideas of calibration and sharpness seriously.