“Incentives to Learn”: How to interpret this estimate of a varying treatment effect?

Germán Jeremias Reyes writes:

I am currently taking a course on Applied Econometrics and would like to ask you about how you would interpret a particular piece of evidence.

Some background: In 2009, Michael Kremer et al. published an article called “Incentives to learn.” This is from the abstract (emphasis is mine):

We study a randomized evaluation of a merit scholarship program in which Kenyan girls who scored well on academic exams had school fees paid and received a grant. Girls showed substantial exam score gains, and teacher attendance improved in program schools. There were positive externalities for girls with low pretest scores, who were unlikely to win a scholarship.

With that in mind, for my applied econometrics class, I had to replicate one of the main figures the paper, which shows how the treatment effect varied according to the baseline test scores. The idea being that, if we observe a positive treatment effect in the lower end of the baseline test score distribution, that would be evidence of positive externalities. This is what I found:

My question is about how to interpret this figure. Before reading your blog, my interpretation would have been something like this:

“Results that the estimated treatment effect is not statistically different from zero for most of the values of the 2000 test score distribution. There are two exceptions, girls with a baseline test score slightly above -1, and girls with a baseline test score around 0.5, in both of those cases, we detect treatment effects statistically different from zero at the 95% levels, calculated with 300 bootstrap replications. These results suggest that the test score gains are concentrated in the lower middle of the test score distribution, i.e., among students with baseline test scores between -1 and -1.5 (and, perhaps, for students slightly above the mean, i.e., with a baseline score close to 0.5). This stems from the fact that those are the only estimated treatment effects statistically different from zero at the usual levels. Furthermore, the fact that the treatment effect is not statistically different from zero for students with a baseline score lower than -1 (i.e., on the lower side of the test score distribution) shows that the program did not have negative externalities on low-achieving students.”

However, your blog made me wary about reaching conclusions only based on p-values/standard errors/statistical significance. So, my question to you is, what do you think we should learn about the treatment effect based on the figure above?

My reply:

Based on the above picture, I’d be inclined to just fit a model with a linear interaction of test score with treatment. There will be uncertainty in the slope—I’m guessing the data are consistent with a slope of zero, that is a constant treatment effect—but the interaction could be of policy interest so it could be worth estimating even if the resulting estimate is highly uncertain.