Data partitioning as an essential element in evaluation of predictive properties of a statistical method

In a discussion of our stacking paper, the point came up that LOO (leave-one-out cross validation) requires a partitioning of data—you can only “leave one out” if you define what “one” is.

It is sometimes said that LOO “relies on the data-exchangeability assumption,” but I don’t think that’s quite the right way to put it, but LOO does assume the relevance of a data partition. We discuss this briefly in section 3.5 of this article. For regular Bayes, p(theta|y) proportional to p(y|theta) * p(theta), there is no partition of data. “y” is just a single object. But for loo, y can be partitioned. At first this bothered me about loo, but then I decided that this is a fundamental idea, related to the idea of “internal replication” discussed by Ripley in his spatial statistics book. The idea is that with just “y” and no partitions, there is no internal replication and no statistically general way of making reliable statements about new cases.

This is similar to (but different from) the distinction in chapter 6 of BDA between the likelihood and the sampling distribution. To do inference for a given model, all we need from the data is the likelihood function. But to do model checking, we need the sampling distribution, p(y|theta), which implies a likelihood function but requires more assumptions (as can be seen, for example, in the distinction between binomial and negative binomial sampling). Similarly, to do inference for a given model, all we need is p(y|theta) with no partitioning of y, but to do predictive evaluation we need a partitioning.