*Pryor unhooks the deer’s skull from the wall above his still-curled-up companion. Examines it. Not a good specimen –the back half of the lower jaw’s missing, a gap that, with the open cranial cavity, makes room enough for Pryor’s head.*

*He puts it on. – Will Eaves, Murmur*

So as we roll into the last dying embers of the flaming glory that is (North American) Pride Month (Europe you’ve still got some fun ahead), I’ve doing my best to make both good and questionable decisions.

The good decision was to read an absolutely fabulous book by a British author named Will Eaves (on a run of absolutely stunning books) who fairly recently released a book called Murmur. Murmur is an odd beast, I guess you could say it’s a novel about Alan Turing but that would be moderately inaccurate and probably unfair.

Firstly, because even from one of my favourite authors, I am basically allergic to people writing about maths and mathematicians. It’s always so stodgy and wooden, as if they are writing about a world they don’t understand but also can’t convincingly fake. (Pride month analogy: I mostly avoid straight people writing queer stories for the same reason.) And Turing is a particular disaster for cultural portrayals: he intersects with too many complicated things (world war 2, cryptography, computers, pre-1967 British homosexuality) for him to ever be anything but an avatar.

So this is not a book about Alan Turing being a tortured genius. It’s a book about a guy called Alec Pryor who just happens to share a bunch of biographical details with Turing (Bletchley, Cambridge, ex-Fiance, arrest and chemical castration, Jungian therapist). And it’s not a book about him being sad, wronged, gay genius who kills himself. It’s a story about him living and him processes the changes in his internal life due to his punishment and his interactions with the outside world and his past and his musings on consciousness and computation.

All of which is to say Murmur is a complex, wonderfully written book that over its hundred and seventy something pages sketches out a fully realized world that doesn’t make me want to hide under the sofa in despair. And it does that rare thing for people telling this story: it doesn’t flatten out the story by focusing on the punchline but rather the person and the life behind it.

(As an aside, I’d strongly recommend Hannah Gadsby’s Netflix special Nanette, which talks about the damage we do by truncating our stories to make other people happy. It’s the only cultural document so far of 2018 worth going out of your way to see.)

I would recommend you find yourself a copy. It’s published by a small press, so order it from them or, if you’re in London, pop into Gays The Word and get yourself a copy.

**And now that you’ve made your way through the unexpected book review, let’s get to the point**

But the point of writing this post wasn’t to do a short review of a wonderful book (it was to annoy Aki who is waiting for me to finish something). But Murmur is a book that spends some time (as you inevitably do when considering an ersatz Turing) considering the philosophical implications of artificial intelligence. (Really selling it there Daniel.) And this parallels some discussion that I’ve been seeing around the traps about what we talk about when we talk about neural networks.

Also because the quote that I ripped from an absolutely wonderful run in the novel to unceremoniously shove at the top of this post made me think of how we use methods that we don’t fully understand.

The first paper I fell into (and incidentally reading papers on neural nets is my aforementioned questionable decision) has the direct title Polynomial Regression As an Alternative to Neural Nets, where Cheng, Khomtchouk, and Matloff argue that we might as well use polynomial regression as it’s easier to interpret than a NN and basically gives the same answer.

The main mathematical argument in the paper is that if you build a NN with a polynomial activation function, each layer gives a higher-order polynomial. They argue that the Stone-Weierstrass approximation theorem suggests that any activation function will lead to a NN that can be well approximated by a high-order polynomial.

Now as a general rule, anytime someone whips out Stone-Weierstrass I feel a little skeptical. Because the bit of me that remembers my approximation theory remembers that the construction in this theorem is very slow to converge. I’m also alarmed by the use of high-degree polynomial regression using the natural basis and no regularization. Both of these things are a **very bad idea**.

But the whole story–that neural networks can also be understood as being quite like polynomial regression and that analogy can allow us to improve our NN techniques–is the sort of story that we need to tell to understand how to make these methods better in a principled way.

(Honestly, I’m not a giant fan of a lot of the paper–it does polynomial regression in exactly the way people have been telling applied people they should never do it. But hey, there’s some interesting stuff in there.)

The other paper I read was a whole lot more interesting. Rather than trying to characterize a neural network as something else, it instead tries to argue that NNs should be used to process electronic health records. And it gets good results compared to standard methods, which is always a good sign. But that’s not what’s interesting.

The interesting thing comes via Twitter. Uri Shalit, who’s a prof at Technion, noticed something very interesting in the appendix. Table 1 in the appendix showed that regularized logistic regression performs almost exactly as well as the complicated deep net.

Once again, the Goliath of AI is slain by the David of “really boring statistical methods”.

But again, there’s more to this than that. Firstly, the logistic regression required some light “feature engineering” (ie people who knew what they were doing had to do something to the covariates). In particular, they had to separate them into time bins to allow the model to do a good job at modelling time. The Deep Net didn’t need that.

This particular case of feature engineering is trivial, but in a lot of cases it’s careful understanding of the structure of the problem that lets us use simple statistical techniques instead of something weird and complex. My favourite example of this is Bin Yu’s work where she (read: her and a lot of collaborators) basically reconstructed movies a person was watching from an fMRI scan! The way they did it was to understand how certain patterns excite certain pathway (basically using existing science) and putting those activation patterns in as covariates in a LASSO regression. So the simplest modern technique + feature engineering (aka science) gave *fabulous *results.

The argument for all of these complex AI and deep learning methods is that they allow us to be a little more sloppy with the science. And it seems to work quite well for images and movies, which are extremely structured. But electronic health records are not particularly structured and can have some really weird missingness problems, so it’s not clear that the same methods will have as much room to move. In this case a pretty boring regularized logistic regression does almost exactly as well, which suggests that the deep net is not able to really fly.

The path forward now is to start understanding these cases, working out how things work, when things work, and what techniques we can beg borrow and steal from other areas of stats and machine learning. Because deep learning is not a panacea, it’s just a boy standing in front of a girl asking her to love him.

The post In my role as professional singer and ham appeared first on Statistical Modeling, Causal Inference, and Social Science.