(This article was originally published at Three-Toed Sloth , and syndicated at StatsBlogs.)

*Attention conservation notice*: I have no taste.

- M. S. Bartlett, Stochastic Population Models in Ecology and Epidemiology
- Short (< 100 pp.) introduction to the topic as of 1960, and so inevitably
mostly of historical interest. It was aimed at people who already knew some
statistics and probability, but no biology to speak of. The models: correlated
spread of plants (or colonies) around Poisson-distributed centers; birth-death
models of population dynamics; Lotka-Volterra style competition between
species; susceptible-infected-removed models of epidemics, with and without
spatial structure. (These are, of course, still staples of the field...) All
models are presented generatively, developed from plausible, though explicitly
highly simplified, premises about how organisms behave. Bartlett makes a lot
of use of generating-functionology, and sometimes heroic approximations to get
closed-form expressions --- but he also introduces his readers to Monte Carlo,
and one gets the impression that he uses this as much as he could afford to.
Nicest of all, he takes pains to connect everything to
*real data*. - Despite its technical obsolescence, I plan to rip it off shamelessly for examples when next I teach stochastic processes, or complexity and inference.
- Patricia A. McKillip, The Bards of Bone Plain
- Two interlocking stories about bards, far-separated in time, searching for secrets in poetry. Beautiful prose, as always, but for the first time that I can remember with one of McKillip's books, the ending felt rushed. Still eminently worth reading.
- Lucy A. Snyder, Switchblade Goddess
- Mind candy: a sorceress from Ohio continues (previous installments) her efforts to get out of Texas, but keeps being dragged back to various hells. I am deeply ambivalent about recommending it, however. The best I can do to say why, without spoilers, is that key parts of the book are at once effectively (even viscerally) narrated, and stuff I wish I'd never encountered. Mileage, as they say, varies.
- Spoilers: Gur pbasyvpg orgjrra Wrffvr naq gur rcbalzbhf "fjvgpuoynqr tbqqrff" guvf gvzr vaibyirf abg whfg zvaq-tnzrf, nf va gur cerivbhf obbx, ohg ivivqyl qrfpevorq naq uvtuyl frkhnyvmrq obql ubeebe, jvgu Wrffr'f bja ernpgvbaf gb gur ivbyngvbaf bs ure obql naq zvaq orvat irel zhpu n cneg bs obgu gur frkhnyvgl naq gur ubeebe. Senaxyl, gubfr puncgref fdhvpxrq zr gur shpx bhg. V'z cerggl fher gung'f jung gubfr cnegf bs gur obbx jrer vagraqrq gb qb, fb cbvagf sbe rssrpgvir jevgvat, ohg V qvqa'g rawbl vg ng nyy. Cneg bs gung znl or gur pbagenfg gb gur guevyyvat-nqiragher gbar bs gur cerivbhf obbxf, naq rira zbfg bs guvf bar. (Znlor vs V'q tbar va rkcrpgvat ubeebe?)
- Thomas W. Young, The Renegades
- Mind candy; thriller about the US war in Afghanistan, drawing on the author's experience as a military pilot. I liked it — Young has a knack for effective descriptions in unflashy prose — but I am not sure if that wasn't because it played to some of my less-defensible prejudices. (Sequel to Silent Enemy and The Mullah's Storm, but self-contained.)
- Lt. Col. Sir Wolseley Haig (ed.), Cambridge History of India, vol. III: Turks and Afghans
- Picked up because I ran across it at
the local used
bookstore, and it occurred to me I knew next to nothing about what happened
in India between the invasions
by Mahmud of Ghazni
and the conquest by Babur.
What I was neglecting was that this was published in
*1928*... - Given over 500 pages to describe half a millennium of the life of one the
major branches of civilization, do you spend them on the daily life and customs
of the people, craft and technology, commerce, science, literature, religion,
administration, and the arts? Or do you rather devote it almost exclusively to
wars (alternately petty and devastating), palace intrigues, rebuking the long
dead for binge drinking, and biologically absurd speculations on the
"degeneration" of descendants of central Asians brought on by the climates of
India, with a handful of pages that mention Persianate poetry, religion (solely
as an excuse for political divisions) and tax farming? Evidently if you were a
British historian towards the end of the Raj, aspiring to write the definitive
history of India,
*c*. +1000 to*c*. +1500, the choice was clear. - — To be fair, the long final chapter, on monumental Muslim architecture during the period, is informed and informative, though still full of pronouncements about the tastes and capacities of "the Hindu"*, "the Muslim"**, "the Persian"***, "the Arab" (and "Semitic peoples")****, , etc. And no doubt there are readers for whom this sort of obsessive recital of politico-military futility is actually useful, and would appreciate it being told briskly, which it is.
- Recommended primarily if you want a depiction of 500 years of aggression and treachery that makes Game of Thrones seem like Jenny and the Cat Club.
- *: "In the Indian architect this sense for the decorative was innate; it came to him as a legacy from the pre-Aryan races..."
- **: "Elaborate decoration and brightly coloured ornament were at all times dear to the heart of the Muslim."
- ***: "[Persia's] genius was of the mimetic rather than the creative order, but she possessed a magic gift for absorbing the artistic creations of other countries and refining them to her own standard of perfection."
- ****: "With the Arabs, who in the beginning of the
eighth century possessed themselves of Sind, our concern is small. Like other
Semitic peoples they showed but little natural instinct for architecture or the
formative arts." Not, "Our concern is small, because few of their works have
survived, and they seem to have had little influence on what came later", which
would have been
*perfectly reasonable*. - Alexandre B. Tsybakov, Introduction to Nonparametric Estimation
- What it says on the label. This short (~200 pp.) book is an introduction to the theory of non-parametric statistical estimation, divided, like Gaul, into three parts.
- The first chapter introduces the basic problems considered: estimating a
probability density function, estimating a regression function (with fixed and
random placement of the input variable), and estimating a function observed
through Gaussian noise. (The last of these has applications in signal
processing, not discussed, and equivalences to the other problems, treated in
detail.) The chapter then introduces the main methods to be used: kernel
estimators, local polynomial, "projection" estimators (i.e., approximating the
unknown function by a series expansion in orthogonal functions, especially but
not exclusively Fourier expansions). The goal in this case is to establish
upper bounds on the error of the function estimates, for different notions of
error (mean-square at one point, mean-square averaged over space, maximum
error, etc.). The emphasis is on finding the asymptotic rate at which these
upper bounds go to zero. To achieve this, the text assumes that the unknown
function lies in a space of functions which are more or less smooth, and
upper-bounds how badly wrong kernels (or whatever) can go on such functions.
(If you find yourself skeptically muttering "And how do I know the regression
curve lies in a Sobolev \( \mathcal{S}(\beta,L) \) space
^{1}?", I would first of all ask you why assuming linearity isn't even worse, and secondly ask you to wait until the third chapter.) A typical rate here would be that the mean-squared error of kernel regression is \( O(n^{-2\beta/(2\beta+1)}) \), where \( \beta > 0 \) is a measure of the smoothness of the function class. While such upper bounds have real value, in reassuring us that we can't be doing too badly, they may leave us worrying that some other estimator, beyond the ones we've considered, would do much better. - The goal of the second chapter is to alleviate this worry, by establishing
lower bounds, and showing that they match the upper bounds found in chapter 1.
This is a slightly tricky business. Consider
the
~~calibrating macroeconomist~~Fool^{2}who says in his heart "The regression line is \( y = x/1600 \)", and sticks to this no matter what the data might be. In general, the Fool has horrible, \( O(1) \) error --- except when he's right, in which case his error is exactly zero. To avoid such awkwardness, we compare our non-parametric estimators to the*minimax*error rate, the error which would be obtained by a slightly-imaginary^{3}estimator designed to make its error on the worst possible function as small as possible. (What counts as "the worst possible function" depends on the estimator, of course.) The Fool is not the minimax estimator, since his worst-case error is \( O(1) \), and the upper bounds tell us we could at least get \( O(n^{-2\beta/(2\beta+1)}) \). - To get actual lower bounds, we use the correspondence between estimation
and testing. Suppose we can always find two far-apart regression curves no
hypothesis test could tell apart reliably. Then the expected estimation error
has to be at least the testing error-rate times the distance between those
hypotheses. (I'm speaking a little loosely; see the book for details.) To
turn it around, if we can estimate functions very precisely, we can use our
estimates to reliably test which of various near-by functions are right. Thus,
invoking Neyman-Pearson theory, and various measures of
distance or divergence between probability distributions, gives us fundamental
lower bounds on function estimation. This reasoning can be extended to testing
among more than two hypotheses, and
to Fano's
inequality. There is also an intriguing section, with new-to-me material,
on Van Trees's
inequality, which bounds Bayes risk
^{4}in terms of integrated Fisher information. - It will not, I trust, surprise anyone that the lower bounds from Chapter 2 match the upper bounds from Chapter 1.
- The rates obtained in Chapters 1 and 2 depend on the smoothness of the true
function being estimated, which is unknown. It would be very annoying to have
to
*guess*this — and more than annoying to have to guess it right. An "adaptive" estimator, roughly speaking, is one which doesn't have to be told how smooth the function is, but can do (about) as well as one which was told that by an Oracle. The point of chapter 3 is to set up the machinery needed to examine adaptive estimation, and to exhibit some adaptive estimators for particular problems, mostly of the projection-estimator/series-expansion type. Unlike the first two chapters, the text of chapter 3 does not motivate itself very well, but the plot will be clear to experienced readers. - The implied reader has a firm grasp of parametric statistical inference (to
the level of, say, Pitman or
Casella
and Berger) and of Fourier analysis, but in principle no more. There is a
lot more about statistical theory than I have included in my quick sketch of
the books' contents, such as the material on unbiased risk estimation,
efficiency and super-efficiency, etc.; the
*patient*reader could figure this all out from what's in Tsybakov, but either a lot of prior exposure, or a teacher, would help considerably. There is also nothing about data, or practical/computational issues (not even a mention of the curse of dimensionality!). The extensive problem sets at the end of each chapter will help with self-study, but I feel like this is really going to work best as a textbook, which is what it was written for. It would be the basis for an strong one-semester course in advanced statistical theory, or, supplemented with practical exercises (and perhaps with All of Nonparametric Statistics) a first graduate^{5}class in non-parametric estimation. - 1: As
you know, Bob, that's the class of all functions which can be
differentiated at least \( \beta \) times, and where the integral of the
squared \( \beta^{\mathrm{th}} \) derivative is no more than \( L \). (Oddly,
in some places Tsybakov's text has \( \beta-1 \) in place of \( \beta \), but I
think the
*math*always uses the conventional definition.) ^ - 2: To be clear, I'm the one introducing the character of the Fool here; Tsybakov is more dignified. ^
- 3: I say "slightly imaginary" because we're really taking an infimum over all estimators, and there may not be any estimator which actually attains the infimum. But "infsup" doesn't sound as good as "minimax". ^
- 4: Since Bayes risk is
integrated over a prior distribution on the unknown function, and minimax risk
is the risk at
*the single worst*unknown function, Bayes risk provides a lower bound on minimax risk. ^ - 5: For a first undergraduate course in non-parametric estimation, you could use Simonoff's Smoothing Methods in Statistics, or even, if desperate, Advanced Data Analysis from an Elementary Point of View. ^
- Peter J. Diggle and Amanda G. Chetwynd, Statistics and Scientific Method: An Introduction for Students and Researchers
- I have mixed feelings about this.
- Let me begin with the good things. The book's heart is very much in the
right place: instead of presenting statistics as a meaningless collection of
rituals, show it as a coherent body of principles, which scientific
investigators can use as tools for inquiry. The intended audience is (p. ix)
"first-year postgraduate students in science and technology" (i.e., what we'd
call first-year graduate students), with "no prior knowledge of statistics",
and no "mathematical demands... beyond a willingness to get to grips with
mathematical notation... and an understanding of basic algebra". After some
introductory material, a toy example of least-squares fitting, and a chapter on
general ideas of probability and maximum likelihood estimation, Chapters 4--10
all cover useful statistical topics, all motivated by real data, which is used
in the discussion*. The book treats regression modeling, experimental design,
and dependent data all on an equal footing. Confidence intervals are
emphasized over hypothesis tests, except when there is some substantive reason
to want to test specific hypotheses. There is no messing about with commercial
statistical software (there is a very brief but good appendix on R), and code
and data are given to reproduce everything. Simulation is used to good effect,
where older texts would've wasted time on exact calculations. I
would
*much*rather see scientists read this than the usual sort of "research methods" boilerplate. - On the negative side: The bit about "scientific method" in the title,
chapter 1, chapter 7, and sporadically throughout, is not very good. There is
no real attempt to grapple with the literature on methodology — the only
philosopher cited is Popper, who gets invoked once, on p. 80. I will permit
myself to quote the section where this happens in full.
**7.2 Scientific Laws**

Scientific laws are expressions of quantitative relationships between variables in nature that have been validated by a combination of observational and experimental evidence.As with laws in everyday life, accepted scientific laws can be challenged over time as new evidence is acquired. The philosopher Karl Popper summarizes this by emphasizing that science progresses not by proving things, but by disproving them (Popper, 1959, p. 31). To put this another way, a scientific hypothesis must, at least in principle, be falsifiable by experiment (iron is more dense than water), whereas a personal belief need not be (Charlie Parker was a better saxophonist than John Coltrane).

**7.3 Turning a Scientific Theory into a Statistical Model**...That sound you hear is pretty much every philosopher of science since Popper and Hempel, crying out from Limbo, "Have we lived and fought in vain?"

- Worse: This has also got
*very little*with what chapter 7 does, which is fit some regression models relating how much plants grow to how much of the pollutant glyphosphate they were exposed to. The book settles on a simple linear model after some totally*ad hoc*transformations of the variables to make that look more plausible. I am sure that the authors — who are both statisticians of great experience and professional eminence — would not claim that this model is an actual scientific law, but they've written themselves into a corner, where they either have to pretend that it is, or be unable to explain the*scientific*value of their model. (On the other hand, accounts of scientific method centered on models, e.g., Ronald Giere's, have no particular difficulty here.) - Relatedly, the book curiously neglects issues of power in model-checking.
Still with the example of modeling the response of plants to different
concentrations of pollutants, section 7.6.8 considers whether to separately
model the response depending on whether the plants were watered with distilled
or tap water. This amounts to adding an extra parameter, which increases the
likelihood, but by a statistically-insignificant amount (p. 97). This ignores,
however, the question of whether there is enough data, precisely-enough
measured, to
*notice*a difference — i.e., the power to detect effects. Of course, a sufficiently small effect would always be insignificant, but this is why we have confidence intervals, so that we can distinguish between parameters which are precisely known to be near zero, and those about which we know squat. (Actually, using a confidence interval for the difference in slopes would fit better with the general ideas laid out here in chapter 3.) If we're going to talk about scientific method, then we need to talk about ruling out alternatives (as in, e.g., Kitcher), and so about power and severity (as in Mayo). - This brings me
to lies-told-to-children.
Critical values of likelihood ratio tests, under the standard asymptotic
assumptions, are given in Table 3.2, for selected confidence levels and numbers
of parameters. The reader is not told where these numbers come from (\( \chi^2
\) distributions), so they are given no route to figure out what to do in cases
which go beyond the table. What is worse, from my point of view, is that they
are given no rationale
*at all*for where the table comes from (\( \chi^2 \) here falls out from Gaussian fluctuations of estimates around the truth, plus a second-order Taylor expansion), or why the likelihood ratio test works as it does, or even a hint that there are situations where the usual asymptotics will*not*apply. Throughout, confidence intervals and the like are stated based on Gaussian (or, as the book puts it, capital-N "Normal") approximations to sampling distributions, without any indication to the reader as to why this is sound, or when it might fail. (The word "bootstrap" does not appear in the index, and I don't think they use the concept at all.) Despite their good intentions, they are falling back on rituals. - Diggle and Chetwynd are both very experienced both applied statisticians and as teachers of statistics. They know better in their professional practice. I am sure that they teach their statistics students better. That they don't teach the readers of this book better is a real lost opportunity.
*Disclaimer*: I may turn my own data analysis notes into a book, which would to some degree compete with this one.- *: For the record: exploratory data analysis and visualization, motivated by gene expression microarrays; experimental design, motivated by agricultural and clinical field trials; comparison of means, motivated by comparing drugs; regression modeling, motivated by experiments on the effects of pollution on plant growth; survival analysis, motivated by kidney dialysis; time series, motivated by weather forecasting; and spatial statistics, motivated by air pollution monitoring.
- Geoffrey Grimmett and David Stirzaker, Probability and Random Processes, 3rd edition
- This is still my favorite stochastic processes textbook. My copy of the second edition, which has been with me since graduate school, is falling apart, and so I picked up a new copy at JSM, and of course began re-reading on the plane...
- It's still great: it strikes a very nice balance between accessibility and
mathematical seriousness. (There is
*just enough*shown of measure-theoretic probability that students can see why it will be useful, without overwhelming situations where more elementary methods suffice.) It's extremely sound at focusing on topics which are interesting because they can be connected back to the real world, rather than being self-referential mathematical games. The problems and exercises are abundant and well-constructed, on a wide range of difficulty levels. (They are now available separately, with solutions manual, as One Thousand Exercises in Probability.) - I am very happy to see more in this edition on Monte Carlo and on stochastic calculus. (My disappointment that the latter builds towards the Black-Scholes model is irrational, since they're giving the audience what it wants.) Nothing seems to have been dropped from earlier editions.
- It does have limitations. It's a book about the mathematics of
probabilistic models, but has little to say about how one designs such a model
in the first place. This may be inevitable, since the tools of model-building
must change with the subject matter
^{1}. There is also no systematic account here of statistical inference for stochastic processes, but this is so universal among textbooks on stochastic processes that it's easier to name exceptions^{2}than instances. If a fourth edition would fix this, I would regard the book as perfect; instead, it is merely almost perfect. - The implied reader has a firm grasp of calculus (through multidimensional integration) and a little knowledge of linear algebra. They can also read and do proofs. No prior knowledge of probability is, strictly speaking, necessary, though it surely won't hurt. With that background, and the patience to tackle 600 pages of math, I unhesitatingly recommended this as a first book on random processes for advanced undergraduates or beginning graduate students, or for self-study.
- 1: E.g., tracking stocks and flows of conserved quantities, and making sure they balance, is very useful in physics and chemistry, and even some parts of biology. But it's not very useful in the social sciences, since hardly any social or economic variables of any interest are conserved. (I had never truly appreciated Galbraith's quip that "The process by which banks create money is so simple that the mind is repelled" until I tried to explain to an econophysicist that money is not, in fact, a conserved quantity.) And so on. ^
- 2: The best exception I've seen is Peter Guttorp's Stochastic Modeling of Scientific Data. It's a very good introduction to stochastic processes and their inference for an audience who already knows some probability, and statistics for independent data; it also talks about model-building. But it doesn't have the same large view of stochastic processes as Grimmett and Stirzaker's book, or the same clarity of exposition. Behind that, there is Bartlett's Stochastic Processes, though it's now antiquated. From a different tack, Davison's Statistical Models includes a lot on models of dependent data, but doesn't systematically go into the theory of such processes. ^

Books to Read While the Algae Grow in Your Fur; Enigmas of Chance; Scientifiction and Fantastica; Afghanistan and Central Asia; Biology; Writing for Antiquity

**Please comment on the article here:** **Three-Toed Sloth **