(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
David Hoaglin writes:
After seeing it cited, I just read your paper in Technometrics. The home radon levels provide an interesting and instructive example.
I [Hoaglin] have a different take on the difficulty of interpreting the estimated coefficient of the county-level basement proportion (gamma-sub-2) on page 434. An important part of the difficulty involves “other things being equal.” That sounds like the widespread interpretation of a regression coefficient as telling how the dependent variable responds to change in that predictor when the other predictors are held constant. Unfortunately, as a general interpretation, that language is oversimplified; it doesn’t reflect how regression actually works. The appropriate general interpretation is that the coefficient tells how the dependent variable responds to change in that predictor after allowing for simultaneous change in the other predictors in the data at hand. Thus, in the county-level regression gamma-sub-2 summarizes the relation of alpha to x-bar after allowing for the contribution of u (the log of the uranium level in the county). What was the relation between the basement proportion and the uranium level? A look at that scatterplot may make it easier to interpret gamma-sub-2.
My reply: This reminds me of the old literature in statistics and psychometrics on partial correlation. Sometimes I think that with all our technical capabilities now, we have lost some of the closeness-to-the-data that existed in earlier methods. Ideally we should be able to have the best of both worlds—complex adaptive models along with graphical and analytical tools for understanding what these models do—but we’re certainly not there yet.
David followed up with:
I strongly agree that close contact with the data is often missing, though current computing and graphics should make it easier than it was years ago. Part of the gap must lie in what students are taught to do. It should be possible to overcome that.
In connection with partial correlation and partial regression, Terry Speed’s column in the August IMS Bulletin (attached) is relevant.
I continue to be surprised at the number of textbooks that shortchange students by teaching the “held constant” interpretation of coefficients in multiple regression. Indeed, Section 3.2 of Gelman and Hill (2007) could go farther than it does, by not trying to hold any predictors constant. “Unless the data support it, one usually can’t change one predictor while holding all others constant.” In Data Analysis and Regression (1977) Fred Mosteller and John Tukey devote a chapter to “Woes of Regression Coefficients.”
My reply: As Jennifer and I discuss in our book, regression coefficients can be interpreted in more than one way. Hoaglin writes, “The appropriate general interpretation is that the coefficient tells how the dependent variable responds to change in that predictor after allowing for simultaneous change in the other predictors in the data at hand,” and Speed says something similar. But I don’t actually find that description very helpful because I don’t really know how to interpret the phrase, “allowing for simultaneous change in the other predictors.” If I’m in purely descriptive mode, I prefer to say that, if you’re regressing y on u and v, the coefficient of u is the average difference in y per difference in u, comparing pairs of items that differ in u but are identical in v. (See my paper with Pardoe on average predictive comparisons for more on this idea, including how to define this averaging so that, in a simple linear model, you end up with the usual regression coefficient.) Note two things about my purely descriptive interpretation:
1. It’s all about comparisons, nothing about how a variable “responds to change.” Why? Because, in its most basic form, regression tells you nothing at all about change. It’s a structured way of computing average comparisons in data.
2. We are comparing items that differ in u but are identical in v. Nothing about v being held constant or “clamped” (to use Terry’s term).
3. For sparse or continuous data, you can’t really find these comparisons where v is identical, so it’s clear that regression coefficients are model-based. In that sense, I don’t mind vague statements such as “allowing for simultaneous change in the other predictors.” I’d prefer the term “comparison” rather than “change,” but the real point is that regression coefficients represent averages in a sort of smoothed comparison, a particular smoothing based on a linear model.
I followed up by reading a second article by Terry on linear regression. This article too was interesting while offering points for me to disagree, or at least to elaborate. Terry writes:
Why do we run [multiple regression]? . . . To summarize. To predict. To estimate a parameter. To attempt a causal analysis. To find a model. I hope it is clear that these are different reasons.
I actually don’t think these are so different. More in a bit, but first another quote from Terry:
Think of the world of difference between using a regression model for prediction and using one for estimating a parameter with a causal interpretation, for example, the effect of class size on school children’s test scores. With prediction, we don’t need our relationship to be causal, but we do need to be concerned with the relation between our training and our test set. If we have reason to think that our future test set may differ from our past training set in unknown ways, nothing, including cross-validation, will save us. When estimating the causal parameter, we do need to ask whether the children were randomly assigned to classes of different sizes, and if not, we need to find a way to deal with possible selection bias. If we have not measured suitable covariates on our children, we may not be able to adjust for any bias.
Terry seems unaware of the potential-outcome framing of causal inference, in which causal estimands are defined in terms of various hypothetical scenarios. In that approach, causal estimation is in fact a special case of prediction. To put it another way, Speed’s “relation between our training and our test set” and his “possible section bias” are just two special case of the requirement that a model generalize to predictions of interest.
I would like to see multiple regression taught as a series of case studies, each study addressing a sharp question, and focussing on those aspects of the topic that are relevant to that question.
I doubt Terry’s seen my book with Jennifer Hill, but actually we pretty much do what he recommends. So I recommend he take a look at our book! I’m sure we don’t do everything just how he’d like but it could be a useful start for the next time he teaches the subject.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science