# Forking paths come from choices in data processing and also from choices in analysis

June 5, 2018
By

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Michael Wiebe writes:

I’m a PhD student in economics at UBC. I’m trying to get a good understanding of the garden of forking paths, and I have some questions about your paper with Eric Loken.

You describe the garden of forking paths as “researcher degrees of freedom without fishing” (#3), where the researcher only performs one test. However, in your example of partisan differences in math skills, you discuss the multiple potential comparisons that could be made: an effect for men and not women, an effect for women and not men, a significant difference, etc. I would describe this as multiple testing: the researcher is running many regressions, and reporting the significant ones. Am I misunderstanding?

The case where the researcher only performs one test is when the degrees of freedom come only from data processing. For example, the researcher only tests for a significant difference between men and women, but because they have flexibility in measuring partisanship, classifying independents, etc, they can still run multiple versions of the same test and find significance that way.

So we can classify researcher degrees of freedom as coming from (1) multiple potential comparisons and (2) flexibility in data processing. In the extreme case, the degrees of freedom come only from (2), and the researcher only performs one test. But that doesn’t seem to be how you use the term “garden of forking paths” in practice.

– You point to an example of multiple potential comparisons and write that you “would describe this as multiple testing: the researcher is running many regressions, and reporting the significant ones.” I’d say it’s multiple potential testing: the researcher might perform one analysis, but he or she gets to choose which analysis to do, based on the data. For example, the researcher notices a striking pattern among men but not women, and so performs that comparison, computes the significance level, etc. Later on, someone else points to the other comparisons that could’ve been done, and the original researcher replies, “No, I only did one comparison, so I couldn’t’ve been p-hacking.” Loken and I would reply that, as long as the analysis is affected by the data that were seen, there’s a multiple potential comparisons problem, even if only one particular comparison was done on the particular data at hand.

– You distinguish between choices in the data analysis and choices in the data processing. I don’t see these as being much different; either way, you have researcher degrees of freedom, and both sets of choices give you forking paths.

– Finally, let me emphasize that my preferred solution is not to perform just one, preregistered, comparison, nor is it to take the most extreme comparison and then perform a multiplicity correction. Rather, I recommend analyzing and presenting the grid all relevant comparisons, ideally combining them in a multilevel model.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: , ,

 Tweet

Email: