(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)
As I’ve written here many times, my experiences in social science and public health research have left me skeptical of statistical methods that hypothesize or try to detect zero relationships between observational data (see, for example, the discussion starting at the bottom of page 960 in my review of causal inference in the American Journal of Sociology). In short, I have a taste for continuous rather than discrete models.
As discussed in the above-linked article (with respect to the writings of cognitive scientist Steven Sloman), I think that common-sense thinking about causal inference can often mislead.
In many cases, I have found that that the theoretical frameworks of instrumental variables and potential outcomes (for a review see, for example, chapters 9 and 10 of my book with Jennifer) help clarify my thinking.
Here is an example that came up in a recent blog discussion. Computer science student Elias Bareinboim gave the following example: “suppose we know nothing about the world, except that one causal link is missing (e.g., skin color does not affect intellectual capacity).” Bareinboim describes this as a “transparent set of assumptions” but to me it’s not transparent at all. That’s ok, that’s my problem not his. But to resolve my problem, I’ll bring out my tools to understand it.
What does it mean to say that “skin color does not affect . . .”? I have to imagine an alteration of skin color. There are different ways to do this. For example, I could go to the beach and get a tan. This could well negatively affect my cognitive skills (we can call this the Jersey Shore theory). Or maybe at conception you could switch some of my genes around. Assuming this sort of manipulation were technically possible, it would change other things about me than skin color. That’s ok. Similarly, tanning has effects other than changing my skin, it also puts me at the beach (or the tanning salon) rather than in the library where I might be improving my intelligence.
All of this is the “instrumental variables” way of thinking about the world. If you want to understand the effect of some observed condition X on an outcome Y, you manipulate some instrument I that affects X, then you look at the effects of I on X and on Y.
The example X = skin color is typical in that there are different possible instruments that can be imagined, and these will have different effects on Y. For that reason, I find a claim such as “skin color does not affect intellectual capacity” to be undefined (until I know what instrument is being considered to affect skin color) and implausible, in that any instruments I can think of would have some (possibly small) effects on intellectual capacity. This is the “potential outcome” approach: we consider possible outcomes under different potential treatments (that is, different assignments of the instrument).
For an applied example, you can see our 1990 article on incumbency advantage, where we were explicit in defining conditions and potential outcomes. The point is that in studying such causal relations, it can be helpful to define the manipulation or instrument explicitly, even if it only has a theoretical existence. In that sense, instrumental variables and potential outcomes are a sort of accounting principle, giving us a tool to define as precisely as possible what we are studying.
Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science
