Rise of the Machines

February 16, 2013

(This article was originally published at Normal Deviate, and syndicated at StatsBlogs.)

The Committee of Presidents of Statistical Societies (COPSS) is celebrating its 50th Anniversary. They have decided to to publish a collection and I was honored to be invited to contribute. The theme of the book is Past, Present and Future of Statistical Science.

My paper, entitled Rise of the Machines, can be found here.

To whet your appetite, here is the beginning of the paper.

Larry Wasserman


On the 50th anniversary of the Committee of Presidents of Statistical Societies I reflect on the rise of the field of Machine Learning and what it means for Statistics. Machine Learning offers a plethora of new research areas, new applications areas and new colleagues to work with. Our students now compete with Machine Learning students for jobs. I am optimistic that visionary Statistics departments will embrace this emerging field; those that ignore or eschew Machine Learning do so at their own risk and may find themselves in the rubble of an outdated, antiquated field.

1. Introduction

Statistics is the science of learning from data. Machine Learning (ML) is the science of learning from data. These fields are identical in intent although they differ in their history, conventions, emphasis and culture.

There is no denying the success and importance of the field of Statistics for science and, more generally, for society. I’m proud to be a part of the field. The focus of this essay is on one challenge (and opportunity) to our field: the rise of Machine Learning.

During my twenty-five year career I have seen Machine Learning evolve from being a collection of rather primitive (yet clever) set of methods to do classification, to a sophisticated science that is rich in theory and applications.

A quick glance at the The Journal of Machine Learning Research (\url{mlr.csail.mit.edu}) and NIPS (\url{books.nips.cc}) reveals papers on a variety of topics that will be familiar to Statisticians such as: conditional likelihood, sequential design, reproducing kernel Hilbert spaces, clustering, bioinformatics, minimax theory, sparse regression, estimating large covariance matrices, model selection, density estimation, graphical models, wavelets, nonparametric regression. These could just as well be papers in our flagship statistics journals.

This sampling of topics should make it clear that researchers in Machine Learning — who were at one time somewhat unaware of mainstream statistical methods and theory — are now not only aware of, but actively engaged in, cutting edge research on these topics.

On the other hand, there are statistical topics that are active areas of research in Machine Learning but are virtually ignored in Statistics. To avoid becoming irrelevant, we Statisticians need to (i) stay current on research areas in ML and (ii) change our outdated model for disseminating knowledge and (iii) revamp our graduate programs.

The rest of the paper can be found here.

Please comment on the article here: Normal Deviate