# Books to Read While the Algae Grow in Your Fur, August 2012

September 7, 2012
By

(This article was originally published at Three-Toed Sloth , and syndicated at StatsBlogs.)

Attention conservation notice: I have no taste.

M. S. Bartlett, Stochastic Population Models in Ecology and Epidemiology
Short (< 100 pp.) introduction to the topic as of 1960, and so inevitably mostly of historical interest. It was aimed at people who already knew some statistics and probability, but no biology to speak of. The models: correlated spread of plants (or colonies) around Poisson-distributed centers; birth-death models of population dynamics; Lotka-Volterra style competition between species; susceptible-infected-removed models of epidemics, with and without spatial structure. (These are, of course, still staples of the field...) All models are presented generatively, developed from plausible, though explicitly highly simplified, premises about how organisms behave. Bartlett makes a lot of use of generating-functionology, and sometimes heroic approximations to get closed-form expressions --- but he also introduces his readers to Monte Carlo, and one gets the impression that he uses this as much as he could afford to. Nicest of all, he takes pains to connect everything to real data.
Despite its technical obsolescence, I plan to rip it off shamelessly for examples when next I teach stochastic processes, or complexity and inference.
Patricia A. McKillip, The Bards of Bone Plain
Two interlocking stories about bards, far-separated in time, searching for secrets in poetry. Beautiful prose, as always, but for the first time that I can remember with one of McKillip's books, the ending felt rushed. Still eminently worth reading.
Mind candy: a sorceress from Ohio continues (previous installments) her efforts to get out of Texas, but keeps being dragged back to various hells. I am deeply ambivalent about recommending it, however. The best I can do to say why, without spoilers, is that key parts of the book are at once effectively (even viscerally) narrated, and stuff I wish I'd never encountered. Mileage, as they say, varies.
Spoilers: Gur pbasyvpg orgjrra Wrffvr naq gur rcbalzbhf "fjvgpuoynqr tbqqrff" guvf gvzr vaibyirf abg whfg zvaq-tnzrf, nf va gur cerivbhf obbx, ohg ivivqyl qrfpevorq naq uvtuyl frkhnyvmrq obql ubeebe, jvgu Wrffr'f bja ernpgvbaf gb gur ivbyngvbaf bs ure obql naq zvaq orvat irel zhpu n cneg bs obgu gur frkhnyvgl naq gur ubeebe. Senaxyl, gubfr puncgref fdhvpxrq zr gur shpx bhg. V'z cerggl fher gung'f jung gubfr cnegf bs gur obbx jrer vagraqrq gb qb, fb cbvagf sbe rssrpgvir jevgvat, ohg V qvqa'g rawbl vg ng nyy. Cneg bs gung znl or gur pbagenfg gb gur guevyyvat-nqiragher gbar bs gur cerivbhf obbxf, naq rira zbfg bs guvf bar. (Znlor vs V'q tbar va rkcrpgvat ubeebe?)
Mind candy; thriller about the US war in Afghanistan, drawing on the author's experience as a military pilot. I liked it — Young has a knack for effective descriptions in unflashy prose — but I am not sure if that wasn't because it played to some of my less-defensible prejudices. (Sequel to Silent Enemy and The Mullah's Storm, but self-contained.)
Lt. Col. Sir Wolseley Haig (ed.), Cambridge History of India, vol. III: Turks and Afghans
Picked up because I ran across it at the local used bookstore, and it occurred to me I knew next to nothing about what happened in India between the invasions by Mahmud of Ghazni and the conquest by Babur. What I was neglecting was that this was published in 1928...
Given over 500 pages to describe half a millennium of the life of one the major branches of civilization, do you spend them on the daily life and customs of the people, craft and technology, commerce, science, literature, religion, administration, and the arts? Or do you rather devote it almost exclusively to wars (alternately petty and devastating), palace intrigues, rebuking the long dead for binge drinking, and biologically absurd speculations on the "degeneration" of descendants of central Asians brought on by the climates of India, with a handful of pages that mention Persianate poetry, religion (solely as an excuse for political divisions) and tax farming? Evidently if you were a British historian towards the end of the Raj, aspiring to write the definitive history of India, c. +1000 to c. +1500, the choice was clear.
— To be fair, the long final chapter, on monumental Muslim architecture during the period, is informed and informative, though still full of pronouncements about the tastes and capacities of "the Hindu"*, "the Muslim"**, "the Persian"***, "the Arab" (and "Semitic peoples")****, , etc. And no doubt there are readers for whom this sort of obsessive recital of politico-military futility is actually useful, and would appreciate it being told briskly, which it is.
Recommended primarily if you want a depiction of 500 years of aggression and treachery that makes Game of Thrones seem like Jenny and the Cat Club.
*: "In the Indian architect this sense for the decorative was innate; it came to him as a legacy from the pre-Aryan races..."
**: "Elaborate decoration and brightly coloured ornament were at all times dear to the heart of the Muslim."
***: "[Persia's] genius was of the mimetic rather than the creative order, but she possessed a magic gift for absorbing the artistic creations of other countries and refining them to her own standard of perfection."
****: "With the Arabs, who in the beginning of the eighth century possessed themselves of Sind, our concern is small. Like other Semitic peoples they showed but little natural instinct for architecture or the formative arts." Not, "Our concern is small, because few of their works have survived, and they seem to have had little influence on what came later", which would have been perfectly reasonable.
Alexandre B. Tsybakov, Introduction to Nonparametric Estimation
What it says on the label. This short (~200 pp.) book is an introduction to the theory of non-parametric statistical estimation, divided, like Gaul, into three parts.
The first chapter introduces the basic problems considered: estimating a probability density function, estimating a regression function (with fixed and random placement of the input variable), and estimating a function observed through Gaussian noise. (The last of these has applications in signal processing, not discussed, and equivalences to the other problems, treated in detail.) The chapter then introduces the main methods to be used: kernel estimators, local polynomial, "projection" estimators (i.e., approximating the unknown function by a series expansion in orthogonal functions, especially but not exclusively Fourier expansions). The goal in this case is to establish upper bounds on the error of the function estimates, for different notions of error (mean-square at one point, mean-square averaged over space, maximum error, etc.). The emphasis is on finding the asymptotic rate at which these upper bounds go to zero. To achieve this, the text assumes that the unknown function lies in a space of functions which are more or less smooth, and upper-bounds how badly wrong kernels (or whatever) can go on such functions. (If you find yourself skeptically muttering "And how do I know the regression curve lies in a Sobolev $$\mathcal{S}(\beta,L)$$ space1?", I would first of all ask you why assuming linearity isn't even worse, and secondly ask you to wait until the third chapter.) A typical rate here would be that the mean-squared error of kernel regression is $$O(n^{-2\beta/(2\beta+1)})$$, where $$\beta > 0$$ is a measure of the smoothness of the function class. While such upper bounds have real value, in reassuring us that we can't be doing too badly, they may leave us worrying that some other estimator, beyond the ones we've considered, would do much better.
The goal of the second chapter is to alleviate this worry, by establishing lower bounds, and showing that they match the upper bounds found in chapter 1. This is a slightly tricky business. Consider the calibrating macroeconomist Fool2 who says in his heart "The regression line is $$y = x/1600$$", and sticks to this no matter what the data might be. In general, the Fool has horrible, $$O(1)$$ error --- except when he's right, in which case his error is exactly zero. To avoid such awkwardness, we compare our non-parametric estimators to the minimax error rate, the error which would be obtained by a slightly-imaginary3 estimator designed to make its error on the worst possible function as small as possible. (What counts as "the worst possible function" depends on the estimator, of course.) The Fool is not the minimax estimator, since his worst-case error is $$O(1)$$, and the upper bounds tell us we could at least get $$O(n^{-2\beta/(2\beta+1)})$$.
To get actual lower bounds, we use the correspondence between estimation and testing. Suppose we can always find two far-apart regression curves no hypothesis test could tell apart reliably. Then the expected estimation error has to be at least the testing error-rate times the distance between those hypotheses. (I'm speaking a little loosely; see the book for details.) To turn it around, if we can estimate functions very precisely, we can use our estimates to reliably test which of various near-by functions are right. Thus, invoking Neyman-Pearson theory, and various measures of distance or divergence between probability distributions, gives us fundamental lower bounds on function estimation. This reasoning can be extended to testing among more than two hypotheses, and to Fano's inequality. There is also an intriguing section, with new-to-me material, on Van Trees's inequality, which bounds Bayes risk4 in terms of integrated Fisher information.
It will not, I trust, surprise anyone that the lower bounds from Chapter 2 match the upper bounds from Chapter 1.
The rates obtained in Chapters 1 and 2 depend on the smoothness of the true function being estimated, which is unknown. It would be very annoying to have to guess this — and more than annoying to have to guess it right. An "adaptive" estimator, roughly speaking, is one which doesn't have to be told how smooth the function is, but can do (about) as well as one which was told that by an Oracle. The point of chapter 3 is to set up the machinery needed to examine adaptive estimation, and to exhibit some adaptive estimators for particular problems, mostly of the projection-estimator/series-expansion type. Unlike the first two chapters, the text of chapter 3 does not motivate itself very well, but the plot will be clear to experienced readers.
The implied reader has a firm grasp of parametric statistical inference (to the level of, say, Pitman or Casella and Berger) and of Fourier analysis, but in principle no more. There is a lot more about statistical theory than I have included in my quick sketch of the books' contents, such as the material on unbiased risk estimation, efficiency and super-efficiency, etc.; the patient reader could figure this all out from what's in Tsybakov, but either a lot of prior exposure, or a teacher, would help considerably. There is also nothing about data, or practical/computational issues (not even a mention of the curse of dimensionality!). The extensive problem sets at the end of each chapter will help with self-study, but I feel like this is really going to work best as a textbook, which is what it was written for. It would be the basis for an strong one-semester course in advanced statistical theory, or, supplemented with practical exercises (and perhaps with All of Nonparametric Statistics) a first graduate5 class in non-parametric estimation.
1: As you know, Bob, that's the class of all functions which can be differentiated at least $$\beta$$ times, and where the integral of the squared $$\beta^{\mathrm{th}}$$ derivative is no more than $$L$$. (Oddly, in some places Tsybakov's text has $$\beta-1$$ in place of $$\beta$$, but I think the math always uses the conventional definition.) ^
2: To be clear, I'm the one introducing the character of the Fool here; Tsybakov is more dignified. ^
3: I say "slightly imaginary" because we're really taking an infimum over all estimators, and there may not be any estimator which actually attains the infimum. But "infsup" doesn't sound as good as "minimax". ^
4: Since Bayes risk is integrated over a prior distribution on the unknown function, and minimax risk is the risk at the single worst unknown function, Bayes risk provides a lower bound on minimax risk. ^
5: For a first undergraduate course in non-parametric estimation, you could use Simonoff's Smoothing Methods in Statistics, or even, if desperate, Advanced Data Analysis from an Elementary Point of View. ^
Peter J. Diggle and Amanda G. Chetwynd, Statistics and Scientific Method: An Introduction for Students and Researchers
Let me begin with the good things. The book's heart is very much in the right place: instead of presenting statistics as a meaningless collection of rituals, show it as a coherent body of principles, which scientific investigators can use as tools for inquiry. The intended audience is (p. ix) "first-year postgraduate students in science and technology" (i.e., what we'd call first-year graduate students), with "no prior knowledge of statistics", and no "mathematical demands... beyond a willingness to get to grips with mathematical notation... and an understanding of basic algebra". After some introductory material, a toy example of least-squares fitting, and a chapter on general ideas of probability and maximum likelihood estimation, Chapters 4--10 all cover useful statistical topics, all motivated by real data, which is used in the discussion*. The book treats regression modeling, experimental design, and dependent data all on an equal footing. Confidence intervals are emphasized over hypothesis tests, except when there is some substantive reason to want to test specific hypotheses. There is no messing about with commercial statistical software (there is a very brief but good appendix on R), and code and data are given to reproduce everything. Simulation is used to good effect, where older texts would've wasted time on exact calculations. I would much rather see scientists read this than the usual sort of "research methods" boilerplate.
On the negative side: The bit about "scientific method" in the title, chapter 1, chapter 7, and sporadically throughout, is not very good. There is no real attempt to grapple with the literature on methodology — the only philosopher cited is Popper, who gets invoked once, on p. 80. I will permit myself to quote the section where this happens in full.
7.2 Scientific Laws
Scientific laws are expressions of quantitative relationships between variables in nature that have been validated by a combination of observational and experimental evidence.

As with laws in everyday life, accepted scientific laws can be challenged over time as new evidence is acquired. The philosopher Karl Popper summarizes this by emphasizing that science progresses not by proving things, but by disproving them (Popper, 1959, p. 31). To put this another way, a scientific hypothesis must, at least in principle, be falsifiable by experiment (iron is more dense than water), whereas a personal belief need not be (Charlie Parker was a better saxophonist than John Coltrane).

7.3 Turning a Scientific Theory into a Statistical Model...

That sound you hear is pretty much every philosopher of science since Popper and Hempel, crying out from Limbo, "Have we lived and fought in vain?"

Worse: This has also got very little with what chapter 7 does, which is fit some regression models relating how much plants grow to how much of the pollutant glyphosphate they were exposed to. The book settles on a simple linear model after some totally ad hoc transformations of the variables to make that look more plausible. I am sure that the authors — who are both statisticians of great experience and professional eminence — would not claim that this model is an actual scientific law, but they've written themselves into a corner, where they either have to pretend that it is, or be unable to explain the scientific value of their model. (On the other hand, accounts of scientific method centered on models, e.g., Ronald Giere's, have no particular difficulty here.)
Relatedly, the book curiously neglects issues of power in model-checking. Still with the example of modeling the response of plants to different concentrations of pollutants, section 7.6.8 considers whether to separately model the response depending on whether the plants were watered with distilled or tap water. This amounts to adding an extra parameter, which increases the likelihood, but by a statistically-insignificant amount (p. 97). This ignores, however, the question of whether there is enough data, precisely-enough measured, to notice a difference — i.e., the power to detect effects. Of course, a sufficiently small effect would always be insignificant, but this is why we have confidence intervals, so that we can distinguish between parameters which are precisely known to be near zero, and those about which we know squat. (Actually, using a confidence interval for the difference in slopes would fit better with the general ideas laid out here in chapter 3.) If we're going to talk about scientific method, then we need to talk about ruling out alternatives (as in, e.g., Kitcher), and so about power and severity (as in Mayo).
This brings me to lies-told-to-children. Critical values of likelihood ratio tests, under the standard asymptotic assumptions, are given in Table 3.2, for selected confidence levels and numbers of parameters. The reader is not told where these numbers come from ($$\chi^2$$ distributions), so they are given no route to figure out what to do in cases which go beyond the table. What is worse, from my point of view, is that they are given no rationale at all for where the table comes from ($$\chi^2$$ here falls out from Gaussian fluctuations of estimates around the truth, plus a second-order Taylor expansion), or why the likelihood ratio test works as it does, or even a hint that there are situations where the usual asymptotics will not apply. Throughout, confidence intervals and the like are stated based on Gaussian (or, as the book puts it, capital-N "Normal") approximations to sampling distributions, without any indication to the reader as to why this is sound, or when it might fail. (The word "bootstrap" does not appear in the index, and I don't think they use the concept at all.) Despite their good intentions, they are falling back on rituals.
Diggle and Chetwynd are both very experienced both applied statisticians and as teachers of statistics. They know better in their professional practice. I am sure that they teach their statistics students better. That they don't teach the readers of this book better is a real lost opportunity.
Disclaimer: I may turn my own data analysis notes into a book, which would to some degree compete with this one.
*: For the record: exploratory data analysis and visualization, motivated by gene expression microarrays; experimental design, motivated by agricultural and clinical field trials; comparison of means, motivated by comparing drugs; regression modeling, motivated by experiments on the effects of pollution on plant growth; survival analysis, motivated by kidney dialysis; time series, motivated by weather forecasting; and spatial statistics, motivated by air pollution monitoring.
Geoffrey Grimmett and David Stirzaker, Probability and Random Processes, 3rd edition
This is still my favorite stochastic processes textbook. My copy of the second edition, which has been with me since graduate school, is falling apart, and so I picked up a new copy at JSM, and of course began re-reading on the plane...
It's still great: it strikes a very nice balance between accessibility and mathematical seriousness. (There is just enough shown of measure-theoretic probability that students can see why it will be useful, without overwhelming situations where more elementary methods suffice.) It's extremely sound at focusing on topics which are interesting because they can be connected back to the real world, rather than being self-referential mathematical games. The problems and exercises are abundant and well-constructed, on a wide range of difficulty levels. (They are now available separately, with solutions manual, as One Thousand Exercises in Probability.)
I am very happy to see more in this edition on Monte Carlo and on stochastic calculus. (My disappointment that the latter builds towards the Black-Scholes model is irrational, since they're giving the audience what it wants.) Nothing seems to have been dropped from earlier editions.
It does have limitations. It's a book about the mathematics of probabilistic models, but has little to say about how one designs such a model in the first place. This may be inevitable, since the tools of model-building must change with the subject matter1. There is also no systematic account here of statistical inference for stochastic processes, but this is so universal among textbooks on stochastic processes that it's easier to name exceptions2 than instances. If a fourth edition would fix this, I would regard the book as perfect; instead, it is merely almost perfect.
The implied reader has a firm grasp of calculus (through multidimensional integration) and a little knowledge of linear algebra. They can also read and do proofs. No prior knowledge of probability is, strictly speaking, necessary, though it surely won't hurt. With that background, and the patience to tackle 600 pages of math, I unhesitatingly recommended this as a first book on random processes for advanced undergraduates or beginning graduate students, or for self-study.
1: E.g., tracking stocks and flows of conserved quantities, and making sure they balance, is very useful in physics and chemistry, and even some parts of biology. But it's not very useful in the social sciences, since hardly any social or economic variables of any interest are conserved. (I had never truly appreciated Galbraith's quip that "The process by which banks create money is so simple that the mind is repelled" until I tried to explain to an econophysicist that money is not, in fact, a conserved quantity.) And so on. ^
2: The best exception I've seen is Peter Guttorp's Stochastic Modeling of Scientific Data. It's a very good introduction to stochastic processes and their inference for an audience who already knows some probability, and statistics for independent data; it also talks about model-building. But it doesn't have the same large view of stochastic processes as Grimmett and Stirzaker's book, or the same clarity of exposition. Behind that, there is Bartlett's Stochastic Processes, though it's now antiquated. From a different tack, Davison's Statistical Models includes a lot on models of dependent data, but doesn't systematically go into the theory of such processes. ^

Please comment on the article here: Three-Toed Sloth

 Tweet

Email: