354 possible control groups; what to do?

February 8, 2018

Jonas Cederlöf writes:

I’m a PhD student in economics at Stockholm University and a frequent reader of your blog. I have for a long time followed your quest in trying to bring attention to p-hacking and multiple comparison problems in research. I’m now myself faced with the aforementioned problem and want to at the very least try to avoid picking (or being subject to the critique of having picked) control group which merely gives me fancy results. The setting is the following,

I run a difference-in-difference (DD) model between occupations where people working in occupation X is treated at year T. There are 354 other types of occupations where for at least 20-30 of them I could make up a “credible” story about why they would be a natural control group. One could of course run the DD-estimation on the treated group vs. the entire labor market, but claiming causality between the reform and the outcome hinges on not only the parallel trend assumption but also on that group specific shocks are absent. Hence one might wan’t to find a control group that would be subjected to the same type of shocks as the treated occupation X so one might be better of picking specific occupation from the rest of the 354 categories. Some of these might have parallel trends some others, but wouldn’t it be p-hacking choosing groups like this, based on parallel trends? The reader has no guarantee that I as a researcher haven’t picked control groups that gives me the results that will get me published?

So in summary: When one has 1 treated group and 354 potential control groups, how does one go about choosing among these?

My response: rather than picking one analysis (either ahead of time or after seeing the data), I suggest you do all 354 analyses and put them together using a hierarchical model as discussed in this paper. Really, this is not doing 354 analyses, it’s doing one analysis that includes all these comparisons.

