The ladder of social science reasoning, 4 statements in increasing order of generality, or Why didn’t they say they were sorry when it turned out they’d messed up?

“>reinhart_rogoff_coding_error_0

First the statistical point, then the background.

Statistical point

Consider the following two sentences from the abstract of the paper, Growth in a Time of Debt, published in early 2010 by Carmen Reinhart and Kenneth Rogoff:

[T]he relationship between government debt and real GDP growth is weak for debt/GDP ratios below a threshold of 90 percent of GDP. Above 90 percent, median growth rates fall by one percent, and average growth falls considerably more.

This passage corresponds to a ladder of four statements, which I’ll write in increasing order of generality:

(a) Their data show a pattern with a low correlation for debt/GDP ratios below 90%, and a strong negative correlation for debt/GDP ratios above 90%.

(b) The just-described pattern is not just a feature of past data; it also could be expected to continue into the future.

(c) The correlation between government debt and real GDP growth is low when debt/GDP ratios are low, and strongly negative when debt/GDP ratios are high.

(d) Too much debt is bad for a national economy, and the United States as of 2009 had too much debt.

– Step (a) is a simple data summary.

– You can get from (a) to (b) with simple statistical modeling, as long as you’re willing to assume stationarity and independence in some way. To put it most simply: you assume the future and the past are samples from a common distribution, or you fit some time-series model with is essentially making stationarity and independence assumptions on the residuals.

– Step (c) is a weaker, thus more general version of (b).

– And you can get from (c) to (d) by assuming the average patterns of correlation apply to the particular case of the United States in 2009.

– Completely independently, you can believe (d) in the absence of this specific evidence, just based on some macroeconomic theory.

So what happened?

It turns out that Reinhart and Rogoff made an error in their data processing, so step (a) was just wrong. Upon learning this, you might think, Game Over, but they stuck with (d), (c), and much of (b).

That’s fine—they can feel free to argue for (d) based on theoretical grounds alone, or based on some direct modeling of the U.S. economy, without reference to those historical data. They’d lose something, though, because in their published article they wrote:

Our approach here is decidedly empirical, taking advantage of a broad new historical dataset on public debt . . . Prior to this dataset, it was exceedingly difficult to get more than two or three decades of public debt data even for many rich countries, and virtually impossible for most emerging markets. Our results incorporate data on 44 countries spanning about 200 years. . . .

So if the data got screwed up, you’re kicking away the strongest leg of the stool.

And, given the importance of the published claims (essentially point (d) above) and the relevance of their unique dataset to making these claims, you’d think they authors should’ve made this broad new dataset publicly available from the start. Given the importance of this argument to the U.S. economy, one might argue they had a duty to share their data, so that other researchers could evaluate the data claim (a) and the reasonableness of the steps taking us from (a) to (b) to (c) to (d).

Background

We were discussing the Reinhart and Rogoff story in class today, and the students had two questions:

1. How could it be that the authors of this very influential paper did not share their data?

2. Why did Reinhart and Rogoff never apologize for not sharing their data?

My response to question 1 was quick; answering question 2 took longer, and brought up some interesting thoughts about scientific modeling.

First, question 1. This one’s easy to answer. Data sharing takes work, so we don’t typically do it unless it’s required, or habitual, or we’re doing it out of a public service. Views about data sharing have changed since “Growth in a Time of Debt” was published back in 2010. But even now I usually don’t get around to posting my data. I do send data to people when asked, but that’s not as good as posting so anyone can download whenever they want. There also can be legal, ethical, business, or security reasons for not sharing data, but none of these concerns arose for the Reinhart and Rogoff paper. So, the quick answer to question 1: They didn’t share their data, because people just didn’t generally share data back then. The data were public, so if anyone else wanted to reproduce the dataset, they could put in the work themselves. Attitudes have changed, but back in 2010, that’s how it usually was: If you wanted to replicate someone’s study, the burden was on you to do it all, to figure out every step.

Now on to question 2. Given that Reinhart and Rogoff didn’t share their data, and they did make a mistake which would surely have been found out years earlier had the data been shared all along, and given that the published work reportedly had a big impact on policy, why didn’t they feel bad about not sharing the data all along? Even if not-sharing-data is an “everybody does it” sort of thing, you’d still think the authors would, in retrospect, regret not just making all their files available to the world right away. But we didn’t see this reaction from Reinhart and Rogoff. Why?

My answer has to do with the different steps of modeling described in the first part of this post. I have no idea what the authors of this paper were thinking when they responded to the criticism, but here are a couple of possibilities:

Suppose you start by believing that (c) and (d) are true, and then you find data that show (a), and you convince yourself that your model deriving (b) is reasonable. Then you have a complete story and you’re done.

Or, suppose you start by disbelieving (c) and (d), but the you analyze your data and conclude (a) and (b): This implies that (c) is correct, and now you’re convinced of (d). Meanwhile you can adjust your theory so that (c) and (d) make perfect sense.

Now someone goes to the trouble of replicating your analysis, and it turns out you got (a) wrong. What do you do?

At this point, one option would be to toss the whole thing out and start over: forget about (c) and (d) until you’ve fixed (a) and (b).

But another option is to stick with your theory, continue believing (c) and (d), and just adjust (a) and (b) as needed to fit the data.

If you take that latter option, the spreadsheet error and all the questionable data-coding choices don’t really matter so much.

I think this happens a lot

I think this happens a lot in research:

(a) Discovery of a specific pattern in data;

(b) Inference of same specific pattern in the general population;

(c) Blurring this specific pattern into a stylized fact, assumed valid in the general population;

(d) Assumed applicability in new cases, beyond the conditions under which the data were gathered.

Then if there’s a failed replication, or a data problem, or a data analysis problem that invalidates (a) or (b), researchers still hang on to (c) and (d). Kind of like building a ladder to the sky and then pulling the ladder up after you so you can climb even higher.

Consider power pose, for example:

(a) Under particular experimental conditions, people in an experiment who held the “power pose” had different measurements of certain hormones and behaviors, on average, compared to people in the a control group.

(b) P-value less than 0.05 was taken as evidence that the observed data differences represented large causal effects in the general population.

(c) Assumption that power posing (not just the specific instructions in that one experiment) has general effects on power and social behavior (not just the specific things measured in that study).

(d) Statement that the average patterns represented by (c) will apply to individual people in job interviews and other social settings.

A series of failed replications cast doubt on the relevance of (a), and statistical reasoning revealed problems with the inferences in (b); furthermore, the first author of the original paper revealed data problems which further weakened (a). But, in the meantime, (c) and (d) became popular, and people didn’t want to let it go.

And, indeed, the claims economic growth and government debt, or power pose, or ESP, or various other unproven theories out there, could be correct. Statements (c) and (d) could be true, even if they were derived from mistakes in (a) and (b). This sort of thing happens all the time. But, without the strong backing of (a) and (b), our beliefs in (c) and (d) are going to depend much more on theory. And theory is tricky: often the very same theory that supports (c) and (d), can also support their opposite. These are theories that Jeremy Freese calls “more vampirical than empirical—unable to be killed by mere evidence.”

Once you’re all-in on (c) and (d), you can just park your beliefs there forever. And, if that’s where you are, then when people point out problems with (a) and (b), you’re likely to react with annoyance rather than gratitude toward the people who, from a scientific standpoint, are doing you a favor.