(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

This post is by Phil Price.

I’ve been preparing a review of a new statistics textbook aimed at students and practitioners in the “physical sciences,” as distinct from the social sciences and also distinct from people who intend to take more statistics courses. I figured that since it’s been years since I looked at an intro stats textbook, I should look at a few others and see how they differ from this one, so in addition to the book I’m reviewing I’ve looked at some other textbooks aimed at similar audiences: Milton and Arnold; Hines, Montgomery, Goldsman, and Borror; and a few others. I also looked at the table of contents of several more. There is a lot of overlap in the coverage of these books — they all have discussions of common discrete and continuous distributions, joint distributions, descriptive statistics, parameter estimation, hypothesis testing, linear regression, ANOVA, factorial experimental design, and a few other topics.

I can see how, from a statistician’s point of view, the standard arrangement of topics makes perfect sense; indeed, given that adding anything else to the list necessarily means taking something away — there’s only so much time in an academic year, after all — perhaps people think no other set of topics is even possible. But here’s the thing. In my 20 years as a practicing data analyst/scientist, I have been involved in one way or another with a wide variety of projects and topics, sometimes as a full participant, sometimes just as a stats kibbitzer. A partial list of topics I’ve worked on includes the geographical and statistical distribution of indoor radon in the U.S. ; computed tomography of air pollutant concentrations; the airborne transport of biological agents; statistical and spatial distributions of ventilation and ventilation practices in homes and in large commercial buildings; effectiveness of kitchen exhaust hoods; time series models to predict electricity use in large buildings; statistical and causal relationships between vehicle characteristics and fatality rates; performance of very low-cost cookstoves in the developing world; and several other topics but I’m get tired of listing them. It’s a pretty big range of topics and a large number of researchers I’ve worked with, so I think I’m qualified to express this opinion: the standard curriculum covered in the books leaves out some of the very most important topics that my colleagues (and I) tend to struggle with or that would be useful to us, and includes several that are effectively useless or may even be harmful if people apply them without full understanding.

(keep reading below the fold)

To go ahead and shoot the largest fish in the barrel, in most of these books there is far too much discussion of hypothesis tests and far too little discussion of what people ought to do instead of hypothesis testing. This is a case where a little knowledge can be a dangerous thing. First, no matter what caveats are put in the books, many people will incorrectly interpret “we cannot reject the hypothesis that A=B” as meaning “we can safely assume that A=B”; after all, if that’s not the point of the test then what IS the point of the test? Second — and this might actually be the more important point — people who know of the existence of hypothesis tests often assume that that’s what they want, which prevents them from pondering what they really _do_ want. To give one example out of many in my own experience: I have worked with a group that is trying to provide small cookstoves to desperately poor people, mostly in Africa, to decrease the amount of wood they need to gather in order to cook their food. The group had cooked a standard meal several times, using the type of wood and the type of pan available to the people of north Sudan, and using cookstoves of different designs, and they wanted to see which cookstove required the least wood on average. They approached me with the request that I help them do a hypothesis test to see whether all of the stoves are equivalent. This is an example of a place where a hypothesis test is not what you want: the stoves couldn’t possibly perform

*exactly*equally, so all the test will tell you is whether you have enough statistical power to convincingly demonstrate the difference. What these researchers needed to do was to think about what information they really need in order to choose a stove: how big a difference in performance is of practical importance; how robust are their results (for example, they had done all of their tests using the same type of wood, dried to the same amount, but in actual use the wood type and moisture level would vary widely, so are they sure the best stove with their test wood would be the best stove in reality); and a whole bunch of other questions. In any case there is usually little point in testing a hypothesis (in this case: that all of the stoves are exactly equal) that you know to be false.

It’s true that in the physical sciences you do occasionally find cases in which a null hypothesis really could be true — the electron and the positron really could have

*exactly*the same mass, for example — but in most real-world cases, and literally every case I have actually encountered, they null hypothesis is known at the outset to be false so a test of hypothetical equality of two quantities is merely a test of statistical power (often without being recognized as such by the people performing the test).

These textbooks (and courses) should eliminate the chapter on hypothesis testing, replacing it with a one-page description of what hypothesis testing is and why it’s less useful than it seems. I’ll admit that hypothesis tests have their place and that it is a pity if students only get a one-page discussion of them, but something has to give.

Once the hypothesis test chapter is gone, it can be replaced by something useful. One thing that is needed (but usually missing) is a chapter on exploratory data analysis, especially including graphics. Graphics are important. In many of the research areas I have listed above (and others not listed), my introduction to the project occurs when a grad student, or sometimes a senior researcher, comes to me with questions about data they have been looking at for weeks. The first thing I ask for is appropriate plots: histograms of x, y, and z; plots of y vs x; etc…. the details depend on the problem but I always want to start by looking at the data. Amazingly often, the student has either never plotted the raw data or, at best, has used default plotting procedures (often from Microsoft Excel) to make just a few plots that are inadequate and fail to reveal important features. Often they have simply calculated some summary statistics and not plotted anything at all. (I keep a folder of plots of “Anscombe’s Quartet”, and I give one to anyone who comes to my office with calculations of summary statistics but without plots).

Another thing missing from the books I’ve looked at is a useful discussion of some common pitfalls of real-world experiments. Many, perhaps most, experimental datasets I’ve encountered have some really undesirable properties, such as the potential for large selection bias, or the inability to distinguish the effect of nuisance covariates from the variables of interest, or simply sources of uncertainty that are far larger than expected. I’ll again illustrate with a real example: an experiment to manipulate ventilation rates and to quantify the effect of ventilation rate on the performance of workers in a “call center.” These workers answer phones to do routine tasks such as scheduling doctor appointments. There are hundreds of workers in the building, and the experiment involved varying the amount of outdoor air they got, ranging from the typical amount to several times that much. A statistics professor had designed a pattern of low, medium, and high ventilation, with the rate varying daily during some periods and weekly during other periods. Average time spent to process a phone call was the main performance metric of interest. It all seemed pretty clean on paper, but in practice it was a mess. Partway through the study,the business introduced a new computer system, which led to an immediate drop in performance that was much larger than the effect of interest could possibly be, with performance gradually improving again as employees learned the system. Additionally, several times large numbers of new workers were added, and again there was a short-term effect on productivity that was large compared to the effect of interest (and the available data recorded only the average call processing times, not the times for individual workers). There were some other problems too. In the end, there wasn’t enough statistical power to see effects of a size that could reasonably have occurred. This example fits into a broader pattern that is almost a general rule: real data are rarely as clean as the textbook examples. In fact, many of the challenges that are routinely faced by a data analyst are related to coping with inadequacies of data. The nice, clean examples given in most textbooks are very much the exception, not the rule.

Yet another topic that has come up frequently in my experience is numerical simulation, either to directly determine an answer of interest or to confirm an analytical result. An example is error propagation: I have an output that is a complicated function of several inputs, and the values of the inputs are subject to uncertainty. What is the uncertainty in the outputs? The easiest way to answer this is often to sample from the input distributions and generate the resulting output value; repeat as needed to get a statistical distribution of outputs. Importantly, this approach can be used with any statistical distribution of input parameters (including empirical distributions), not just standard, friendly distributions. On the whole I’d probably prefer that researchers understand the use of this method but don’t know the analytical results for, say the normal distribution, rather than the other way around. But the other way around is the only thing that’s taught in these books. (Of course it would be better to know both, but again, there’s only so much time available). Oh, and speaking of errors, a standard weapon in the arsenal is cross-validation, but most of these books don’t seem to cover it, and many don’t even mention it in passing.

Overall, I don’t much like the statistics book that I’m reviewing, but I can’t say it’s any worse than the typical stats book aimed at physical scientists and engineers who do not plan to take further statistics courses. But I don’t claim to have anything like an exhaustive knowledge of the state of intro stats textbooks. Are there any books out there that cover exploratory data analysis (including exploratory graphics), and dealing with common problems of real-world data, and other things that I think should be in these books? If not, someone should write one.

I’ll say it here too, ’cause people always forget: this post is by Phil Price.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**