Dropping out dropouts is a mischievous act

May 7, 2012

(This article was originally published at Numbers Rule Your World, and syndicated at StatsBlogs.)

The following throw-away lines in a Wall Street Journal article about the "return on investment" of getting into college debt (what an idea) are the most important ones:

The report [by the College Board] also doesn't account for dropouts or extra college years. Only 56% of students who enroll in a four-year college earn a bachelor's degree within six years, according to a report last year by the Harvard Graduate School of Education...

PayScale, a Seattle data firm, examines the links between pay and variables like colleges and majors. Its analysis, which also ignores dropouts but accounts for students who take longer to complete their degrees, ...

I cut that off since I've heard enough. How can they get away with ignoring dropouts when they are assessing the return on investment of college debt?

Imagine a cohort of 10,000 students starting college on debt. By year 6, which apparently is when they stop counting, 4,400 have not graduated, either because they dropped out or they are still in school. Both of these groups are likely to have the lowest return on investment of those in the cohort. Most of the dropouts won't be getting college-graduate jobs which pay higher. Those still in school are probably troubled students who if they do graduate later, would also earn less - even if they are equally qualified, they would earn less by value of time.

Given this reality, the analyses by the College Board and by PayScale would "ignore dropouts" as if they didn't ever exist. In other words, they only look at the 5,600 not the 10,000. This means whatever return on investment they compute will be exaggerated.


Technically, this is an example of survivorship bias. The sample being studied does not contain "non-survivors", in this case dropouts, so it doesn't generalize properly.

Also, the data is censored in the sense that the observation window is not enough for us to know what would happen to those people who are in college longer than 6 years. This is a common feature of such data sets; you'd want to do something about it, not just ignore it.

There are in fact many other problems with this type of analysis. Here's another crucial one: the counterfactual for reasoning whether debt is the cause of higher future wellbeing is not having debt. In other words, any such analysis must tell us what would happen if the same students were able to complete college without having to incur debt. Based on what the WSJ reporter said, I don't think this is how they framed the problem.


Please comment on the article here: Numbers Rule Your World

Tags: , , ,