Richard Artner, Francis Tuerlinckx, and Wolf Vanpaemel write:

We are currently researching along the lines of model selection/averaging/misspecification and post-selection inference. As far as we understand your approach to Bayesian statistical analysis looks (drastically simplified) like this:

1. A series of models is sequentially fitted (with an increase in model complexity) whereby the types of model misfits motivate the way the model is extended in each step. This process stops if additional complexity could not be handled by the amount of data at hand (i.e.; when parameter uncertainty due to estimation surpasses a certain point) or potentially earlier in the (lucky!) case that a model has been found where no discrepancies between the observed data pattern and the model assumptions can be found.

2. The final model is then, once again, put to the acid test. That means residual plots, posterior predictive checks and the likes.

3. Inference for the model parameters of interest as well as functions of them (i.e.; expected mean, quantiles of response variable etc.) is then conducted in the chosen model.

An example of this process is, for instance, given in BDA (Chapter 22.2 “Using regression predictions: incentives for telephone surveys”). [That example is in section 9.2 of the third edition of BDA. — ed.]

We are wondering to what extent the inferences achieved by such a process can be problematic and potentially misleading since the data were used twice (first to end up with the final model and second to fit the likelihood to conduct the inferences). You do not mention any broadening of credible intervals, nor data splitting where the third step is conducted on an unused test sample. Maybe you do not mention it because it does not matter so much theoretically and in practice. Or perhaps because it is too difficult to deal with the issue in a Bayesian sense.

As far as we understand it, in such a process the dataset influences the form of the likelihood, the prior distributions as well as the parameter fits (e.g.; via ML) thereby violating the internal consistency of Bayesian inference (i.e.; given an apriori specified likelihood and the “correct” prior distribution, the posterior distribution is correct where in the M-open case, correctness is defined by best approximating model).

My reply:

– Yes, that’s a reasonable summary of our model-building approach. A more elaborate version is in this paper with Jonah Gabry, Dan Simpson, Aki Vehtari, and Mike Betancourt.

– I don’t think it will ever make sense to put all of Bayesian inference in a coherent framework, even for a single application. For one thing, as Dan, Mike, and I wrote a couple of years ago, the prior can often only be understood in the context of the likelihood. And that’s just a special case of the general principle that there’s always something more we could throw into our models. Whatever we have is at best a temporary solution.

– That said, much depends on how the model is set up. We might add features to a model in a haphazard way but then go back and restructure it. For example, the survey-incentives model in section 9.2 of BDA is pretty ugly, and recently Lauren Kennedy and I have gone back to this problem and set up a model that makes more sense. So I wouldn’t consider the BDA version of this model (which in turn comes from our 2003 paper) an ideal example.

– To put it another way, we shouldn’t think of the model-building process as a blind data-fitting exercise. It’s more like we’re working toward building a larger model that makes sense, and each step in the process is a way of incorporating more information.