Kevin Lewis points us to this article by Taylor Hargrove, which states:
Although skin color represents a particularly salient dimension of race, its consequences for health remains unclear. The author uses four waves of panel data from the Coronary Artery Risk Development in Young Adults study and random-intercept multilevel models to address three research questions critical to understanding the skin color–health relationship among African American adults (n = 1,680): What is the relationship between skin color and two global measures of health (cumulative biological risk and self-rated health)? . . .
The findings indicate that dark-skinned women experience more physiological deterioration and self-report worse health than lighter skinned women. These associations are not evident among men, and socioeconomic factors, stressors, and discrimination do not explain the dark-light disparity in physiological deterioration among women. Differences in self-ratings of health among women, however, are generally explained by education and income.
When I read this, my first thought was that this is a topic worth studying (I don’t know the literature on this stuff, so I can’t comment on what’s come before), and my second thought was that the difference between “significant” and “not significant” is not itself statistically significant.
Having skimmed through the paper, I have a lot of problems with the details. Just for example, one of the biomarkers is “waist circumference (1 = >88 cm in women and >120 cm in men).” But that doesn’t sound right. If the issue is being fat, wouldn’t you want some more continuous measure? It’s also not clear to me why they discretize skin color into only three levels.
More generally, I think the strategy of going through results and pulling out statistically significant comparisons isn’t going to work. I’m not exactly talking about “p-hacking” here—it’s not that I think these researchers are fishing around looking for something statistically significant to sell—my problem is more that statistical-significance filtering is a noise amplifier.
I think it would make more sense to use these data to answer specific questions in a focused way, or to perform more clearly exploratory analyses that display all the data.
The current paper is a mix of both, that I don’t think works at all. Just for example:
Additionally, the coefficient for dark skin among women in Model 2 is reduced in magnitude (by approximately 37 percent) and to statistical nonsignificance at the .05 α level, suggesting that skin color differences in education and income explain the dark-light gap in self-rated health among women.
From a statistical point of view, this analysis doesn’t really make sense. To put it another way: this sort of statistical procedure has poor frequency properties, in that if it is used repeatedly, it will often give wrong answers.
I feel kinda bad about saying this, as the paper in question does not seem like a bunch of hype, nor do I see any proposed research misconduct. Still, honesty and transparency are not enough. If the ultimate goal of this research is to learn what we can about skin color and health, I recommend looking at all the data, not selecting on statistical significance, and using multilevel models to get better estimates (see here).