Steve Stigler writes:

I saw a piece on your blog about putting. It suggests to me that you do not play golf, or you would not think this was a model – length is much more important than you indicate. I attach an old piece by a friend that is indeed the work of a golfer!

The linked article is called “How to lower your putting score without improving,” and it’s by B. Hoadley and published in 1994. Hoadley’s recommendation is “to target a distance beyond the hole given by the formula: [Two feet]*[Probability of sinking the putt].” Sounds reasonable. And his model has lots of geometry, for example:

Anyway, what I responded to Steve was that the simple model fits just about perfectly up to 20 feet! But for longer putts, it definitely helps to include distance, as was noted by Mark Broadie in the material he sent me.

The other thing is that there’s a difference between prediction and improvement (or, as we would say in statistics jargon, a difference between prediction and causal inference).

I was able to fit a simple one-parameter model that accurately predicted success rates while not including any consideration of the difficulty of hitting the ball the right length (not too soft and not too hard). At least for short distances, up to 20 feet, my model worked, I assume because it the took the uncertainty in how hard the ball is hit, and interpreted it as uncertainty in the angle of the ball. For larger distances, these two errors don’t trade off some cleanly, hence the need for another parameter in the model.

But even for these short putts, my model can be improved if the goal is not just to predict success rates, but to figure out how to put better—that is, to predict success rates if you hit the ball differently.

It’s an interesting example of the difference between in-sample and out-of-sample prediction (and, from a statistical standpoint, causal inference is just a special case of out-of-sample prediction), similar to the familiar problem of regression with collinear or near-collinear predictors, where a wide range of possible parameter vectors will fit the data well (that’s what it means to have a ridge in the likelihood), but if you want to apply the model to predict for new data off that region of near-collinearity it will be necessary to bite the bullet and think harder about what those predictors really mean.

So . . . prior information can be important for an out-of-sample prediction or causal inference problem, even if it’s not needed to fit existing data.