Books to Read While the Algae Grow in Your Fur, July 2012

August 12, 2012
By

(This article was originally published at Three-Toed Sloth , and syndicated at StatsBlogs.)

Attention conservation notice: I have no taste.

Laurence Gough, Karaoke Rap and Funny Money
Mind candy. Mis-adventures of scheming, amoral low-lives in Vancouver; also the criminals they're supposed to be catching. (I kid, but Willows and Parker's colleagues are not very pre-possessing.) As always, Gough does a very nice line in studiously disabused narrative.
10th and 12th (!) books in a series (some previous installments: nos. 1, 2, 3, 7 and 8), but both self-contained.
Josh Bazell, Wild Thing
Mind candy. Sequel to Beat the Reaper, though it can be read separately. This is funny and exciting by turns (and I really like the footnotes, even when I want to argue with them), but not as well-constructed as its predecessor.
Diana Rowland, Even White Trash Zombies Get the Blues
Mind candy. The continuing travails of the titular white trash zombie, as she tries to keep herself supplied with brain slurpees, and on the right side of her parole officer.
Karin Slaughter, Criminal
Mind candy. Equal parts gripping (if squicky) thriller, and portrait of struggling against entrenched sexism in 1975 Atlanta. Part of a long-running series (previously), but I think one could jump in here, without loss.
Spoilery remark: V unq orra cerfhzvat nyy guvf gvzr gung Nznaqn jbhyq ghea bhg gb or Jvyy'f zbgure. V gnxr fbzr fngvfsnpgvba, ubjrire, va Fynhtugre'f cebivqvat na rkcynangvba sbe gubfr pyhrf...
Olivier Catoni, Statistical Learning Theory and Stochastic Optimization [Free PostScript]
Lots of finite-sample results about randomized and aggregated predictors, with what Catoni nicely describes as a "pseudo-Bayesian" flavor. Specifically, he puts a lot of emphasis on "Gibbs estimators", which go as follows. Start with a space $$\Theta$$ of models, where each model $$\theta$$ gives us a distribution over samples, say $$q(x;\theta)$$. Stick a prior measure $$\pi(d\theta)$$ over the model space, and fix an "inverse temperature" $$\beta > 0$$. Nature generates data according to some distribution $$\mathbb{P}$$ which, in general, has nothing to do with any of our models. After seeing data $$x_1, x_2, \ldots x_n \equiv x_1^n$$, we predict $$x_{n+1}$$ by averaging over models, according to the Gibbs measure / exponential family / pseudo-posterior $\rho(d\theta) = \frac{\left(q(x_1^n;\theta)\right)^{\beta}}{\int{\left(q(x_1^n;\theta^{\prime})\right)^{\beta} \pi(d\theta^{\prime})}}\pi(d\theta)$ The point of doing this is that if $$\beta$$ is chosen reasonably, then the expected log-likelihood, predicting according to $$\rho$$, is always within $$O(1/n)$$ of the expected log-likelihood of the best models in $$\Theta$$. (Catoni actually calculates the constant buried in the $$O(1/n)$$, but the answer is more complicated than I feel like writing out.) Here, importantly, expectations are all taken with respect to the true distribution $$\mathbb{P}$$, not the prior $$\pi$$. This would not be true if one did a straight Bayesian model averaging with $$\beta=1$$.
As the name "Gibbs estimator" suggests, Catoni milks the thermodynamic analogy for all its worth, and much of chapters 4--6 is about approximating free energies and even susceptibilities. (I suspect that some of these results are superseded by Maurer's brilliant "Thermodynamics and Concentration" paper [arxiv:1205.1595], but am under-motivated to check.) These analytical results are about proving generalization error bounds; when it comes to actually doing stuff, Catoni still recommends sampling from the (pseudo-) posterior with Monte Carlo, hence the last chapter on transitions in Markov chains. There is also a natural connection with the results about compressing individual sequences which open the book.
The notation is very detailed and sprawling, and I often found it hard to follow. (The writing sometimes seems to lose the forest for the leaves.) But many of the results are quite powerful, and I will be keeping my copy for reference.
Why oh why can't we have a better academic publishing system? dep't.: Prof. Catoni has, generously, put PostScript next-to-final draft of the book on his website. (Free is, to repeat, the economically efficient price.) Comparing this to the printed edition shows that Springer did absolutely nothing to the manuscript of any value to any reader. (They didn't even run it through an English spell-checker: e.g., for "Larve" in the title of section 7.3, read "Large".) They did, however, print and bind it, and put it in the distribution channels to libraries. For this, they charge \$69.95 per copy — none of which goes as royalties to the author. (I got my copy second hand.) This is not as ridiculous as what they charge for access to individual articles, but still exactly what I mean when I say that commercial academic publishing has become a parasitic obstacle to the growth and dissemination of knowledge.
V. N. Vapnik, The Nature of Statistical Learning Theory
I recently had occasion to revisit this book, and to re-read my review from 1999. The main thing I would change in the review is to bring out more strongly Vapnik's insistence on bounds on generalization error which hold uniformly across all data-generating distributions. It is this which makes finite VC dimension a necessary condition for his notion of learning. One could learn a model from a family of infinite VC dimension if the family was adapted to the distribution --- say, if the VC entropy was well-behaved.
Still strongly recommended, with my old caveats.

Please comment on the article here: Three-Toed Sloth

 Tweet

Email: