Lecture: The Bootstrap (Advanced Data Analysis from an Elementary Point of View)

February 12, 2013
By

(This article was originally published at Three-Toed Sloth , and syndicated at StatsBlogs.)

The sampling distribution is the source of all knowledge regarding statistical uncertainty. Unfortunately, the true sampling distribution is inaccessible, since it is a function of exactly the quantities we are trying to infer. One exit from this vicious circle is the bootstrap principle: approximate the true sampling distribution by simulating from a good model of the process, and treating the simulation data just like the data. The simplest form of this is parametric bootstrapping, i.e., simulating from the fitted model. Nonparametric bootstrapping means simulating by re-sampling, i.e., by treating the observed sample as a complete population and drawing new samples from it. Bootstrapped standard errors, biases, confidence intervals, p-values. Tricks for making the simulated distribution closer to the true sampling distribution (pivotal intervals, studentized intervals, the double bootstrap). Bootstrapping regression models: by parametric bootstrapping; by resampling residuals; by resampling cases. Many, many examples. When does the bootstrap fail?

Note: Thanks to Prof. Christopher Genovese for delivering this lecture while I was enjoying the hospitality of the fen-folk.

Reading: Notes, chapter 6 (R for figures and examples; pareto.R; wealth.dat);
Lecture slides; R for in-class examples
Cox and Donnelly, chapter 8

Advanced Data Analysis from an Elementary Point of View



Please comment on the article here: Three-Toed Sloth


Subscribe

Email:

  Subscribe