Chasing the noise in industrial A/B testing: what to do when all the low-hanging fruit have been picked?

Commenting on this post on the “80% power” lie, Roger Bohn writes:

The low power problem bugged me so much in the semiconductor industry that I wrote 2 papers about around 1995. Variability estimates come naturally from routine manufacturing statistics, which in semicon were tracked carefully because they are economically important. The sample size is determined by how many production lots (e.g. 24 wafers each) you are willing to run in the experiment – each lot adds to the cost.

What I found was that small process improvements were almost impossible to detect, using the then-standard experimental methods. For example, if an experiment has a genuine yield impact of 0.2 percent, that can be worth a few million dollars. (A semiconductor fabrication facility produced at that time roughly $1 to $5 billion of output per year.) But a change of that size was lost in the noise. Only when the true effect rose into the 1% or higher range was there much hope of detecting it. (And a 1% yield change, from a single experiment, would be spectacular.)

Yet semicon engineers were running these experiments all the time, and often acting on the results. What was going on? One conclusion was that most good experiments were “short loop” trials, meaning that the wafers did not go all the way through the process. For example, you could run an experiment on a single mask layer, and then measure the effect on manufacturing tolerances. (Not the right terminology in semicon, but that is what they are called elsewhere.) In this way, the only noise was from the single mask layer. Such an experiment would not tell you the impact on yields, but an engineering model could estimate the relationship between tolerances ===> yields. Now, small changes were detectable with reasonable sample sizes.

This relates to noise-chasing in A/B testing, it relates to the failure of null hypothesis significance testing when studying incremental changes, and what to do about it, and it relates to our recent discussions about how to do medical trials using precise measurements of relevant intermediate outcomes.

The post Chasing the noise in industrial A/B testing: what to do when all the low-hanging fruit have been picked? appeared first on Statistical Modeling, Causal Inference, and Social Science.