Lecture: Model Evaluation, Error and Inference (Advanced Data Analysis from an Elementary Point of View)

February 7, 2013

(This article was originally published at Three-Toed Sloth , and syndicated at StatsBlogs.)

Lecture 3, Model evaluation: error and inference. Statistical models have three main uses: as ways of summarizing (reducing, compressing) the data; as scientific models, facilitating actually scientific inference; and as predictors. Both summarizing and scientific inference are linked to prediction (though in different ways), so we'll focus on prediction. In particular for now we focus on the expected error of prediction, under some particular measure of error. The distinction between in-sample error and generalization error, and why the former is almost invariably optimistic about the latter. Over-fitting. Examples of just how spectacularly one can over-fit really very harmless data. A brief sketch of the ideas of learning theory and capacity control. Data-set-splitting as a first attempt at practically controlling over-fitting. Cross-validation for estimating generalization error and for model selection. Justifying model-based inferences.

Reading: Notes, chapter 3 (R)
Cox and Donnelly, ch. 6

Advanced Data Analysis from an Elementary Point of View

Please comment on the article here: Three-Toed Sloth