(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Berk Özler writes:

Background: You receive a fictional proposal from a major foundation to review. The proposal wants to look at the impact of 5 minute “patience” training on all kinds of behaviors. This is a poor country, so there are no admin data. They make the following points:

A. If successful, this is really zero cost to roll out—it’s just pushed through smart phones. Therefore, the cost of the program can be modelled as approximately zero. It falls in the “letters to people” kind of stuff.

B. However, they want to show the impact of this on a whole bunch of things. They can check take-up because they know who clicks through and goes through everything, and they expect it to be low.

C. Given take-up and expected impacts, they argue that a fairly small impact could have quite a large effect.

D. But here is the rub: For the experiment they will need $2 million for data collection. They need to survey XY,000 households and do very long surveys on everything you can think of 4 times. Having carefully read all the stuff on p-values, this is the power they need to detect a 1% increase in savings. The combination of a “letter to people” type experiment with (a) lack of admin data and (b) desire to show effects on all kinds of things essentially blows the budget.

I am quite worried about this p-values and power stuff, since (a) on the one hand it’s good that we don’t give too much credence to small sample studies with large effects because of publication bias and (b) on the other hand, if this is going to be interpreted as “powering up” what are essentially stupid interventions, that’s not a great direction to go either. The problem is that without a theoretical framework, it’s harder to be ex ante sanguine about what is “stupid”—agnostic may be a better phrase, since it *might* turn up something.

My instinct here, motivated by the Bayesian optimal sampling literature, would be to say that they should first try this with 100 households for $500 and see what the effects are AND publish the effects. There should be an optimal sequence of experiments that leads to scale-up as positive results arise. In short, publication bias implies a preference for small sample size experiments with big effects, which are probably false. But this should cause us to solve the publication bias problem, NOT create a further distortion by powering up stupidity. Ramsay second best is not going to work here. Unfortunately, the math even in the simple case with single sequential samples is a complete nightmare, but wondering if there is a simpler way to explain this.

I’ll respond to these questions in reverse order:

– To address the last sentence in the above quote: no, the math is not a nightmare here at all. There’s no “math” at all to worry about here: just gather the data, and then, in your analysis, include as predictors all variables that are used in the data collection. With a sequential design, just include time as a predictor in your model. This general issue actually came up in a recent discussion.

To do this and get reasonable results, you’ll want to do a reasonable analysis: don’t aim for or look at p-values, Bayes factors, or “statistical significance”: just fit a serious regression model, using relevant prior information to partially pool effect size estimates toward zero. And of course commit to publishing your results no matter what shows up.

– I’m not quite sure what is meant by “a 1% increase in savings”? This is a poor country; are these cash savings? Who’s doing the saving? Are you saying that the savers will save 1% more? Or that 1% more people will save? These questions are relevant to who you target the intervention to.

– I don’t have a good sense of where this $2 million cost is coming from. I guess $2 million is a bargain if the intervention really works. I don’t have a good sense of whether you think it will really work.

– One way to get a handle on “how effective is the intervention?” question is to consider a large set of possible interventions. You have this “5 minute patience training,” whatever that is. But there must be a lot of other ideas out there that are similarly zero-cost and somehow vaguely backed by previous theory and experiment. Would it make sense to spend $2 million on each of these? This is not a rhetorical question: I’m really asking. If there are 10 possible interventions, would you do a set of studies costing $20 million? Or is the idea that any of these 10 interventions would work, but their effects would not be additive, so you just want to find one good one and it doesn’t matter which one it is.

– A related point is that interventions are compared to alternative courses of action. What are people currently doing? Maybe whatever they are currently doing is actually more effective than this 5 minute patience training?

Anyway, the good news is that you don’t need to worry about “the math.” I think all the difficulty here comes in thinking about the many possible interventions you might do.

The post Should Berk Özler spend $2 million to test a “5 minute patience training”? appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**