Why do we, as a discipline, have so little understanding of the methods we have created and promote? Our primary tool for gaining understanding is mathematics, which has obvious appeal: most of us trained in math and there is no better form of information than a theorem that establishes a useful fact about a method. But the preceding sentence imposes a heavy burden: it must be possible to prove a theorem and facts established by the theorem must be useful. We find finite-sample facts indispensible because real datasets have finite samples and asymptotic theorems never tell us how to apply their conclusions to finite samples. But finite-sample theorems about contemporary methods are rare; it seems inescapable that they are at least extremely difficult, given their popularity in earlier eras.
This paper considers a complementary tool for opening our black-box methods, modeled explicitly on the approach molecular biologists use to open Nature’s black boxes.
I (Dan) came across a fun and fascinating article by Jim Hodges about how we explain and understand (or, really, how we don’t) statistical models. It falls nicely within the space of things that I’ve been thinking about recently and it is well worth a read.
The focus here is very much on quite simple linear mixed effects models (and maximum likelihood-type fitting of those models) but it’s really about building up a framework for systematically understanding statistical models. The language he uses is that of scientific experiments and it reminds me a lot of how Aki both talks about his computational experiments, and how he writes them up before they’re turned into papers. (See, for example, his R notebooks)
Anyway. I have nothing specific to add except this is a thing that is worth reading and thinking about and building off.