NSF should understand that Statistics is not Mathematics

January 11, 2013

(This article was originally published at Simply Statistics, and syndicated at StatsBlogs.)

NSF has realized that the role of Statistics is growing in all areas of science and engineering and has formed a subcommittee to examine the current structure of support of the statistical sciences.  As Roger explained in August, the NSF is divided into directorates composed of divisions. Statistics is in the Division of Mathematical Sciences (DMS) within the Directorate for Mathematical and Physical Sciences. Within this division it is a Disciplinary Research Program along with Topology, Geometric Analysis, etc.. To statisticians this does not make much sense, and my first thought when asked for recommendations was that we need a proper division. But the committee is seeking out recommendations that

[do] not include renaming of the Division of Mathematical Sciences. Particularly desired are recommendations that can be implemented within the current divisional and directorate structure of NSF; Foundation (NSF) and to provide recommendations for NSF to consider.

This clarification is there because former director Sastry Pantula suggested DMS change names to “Division of Mathematical and Statistical Sciences”.  The NSF shot down this idea and listed this as one of the reasons:

If the name change attracts more proposals to the Division from the statistics community, this could draw funding away from other subfields

So NSF does not want to take away from the other math programs and this is understandable given the current levels of research funding for Mathematics. But this being the case, I can’t really think of a recommendation other than giving Statistics it’s own division or give data related sciences their own directorate. Increasing support for the statistical sciences means increasing funding. You secure the necessary funding either by asking congress for a bigger budget (good luck with that) or by cutting from other divisions, not just Mathematics. A new division makes sense not only in practice but also in principle because Statistics is not Mathematics.

Statistics is analogous to other disciplines that use mathematics as a fundamental language, like Physics, Engineering, and Computer Science. But like those disciplines, Statistics contributes separate and fundamental scientific knowledge. While the field of applied mathematics tries to explain the world with deterministic equations, Statistics takes a dramatically different approach. In highly complex systems, such as the weather, Mathematicians battle LaPlace’s demon and struggle to explain nature using mathematics derived from first principles. Statisticians accept  that deterministic approaches are not always useful and instead develop and rely on random models. These two approaches are both important as demonstrated by the improvements in meteorological predictions  achieved once data-driven statistical models were used to compliment deterministic mathematical models.

Although Statisticians rely heavily on theoretical/mathematical thinking, another important distinction from Mathematics is that advances in our field are almost exclusively driven by empirical work. Statistics always starts with a specific, concrete real world problem: we thrive in Pasteur’s quadrant. Important theoretical work that generalizes our solutions always follows. This approach, built mostly by basic researchers, has yielded some of the most useful concepts relied upon by modren science: the p-value, randomization, analysis of variance, regression, the proportional hazards model, causal inference, Bayesian methods, and the Bootstrap, just to name a few examples. These ideas were instrumental in the most important genetic discoveries, improving agriculture, the inception of the empirical social sciences, and revolutionizing medicine via randomized clinical trials. They have also fundamentally changed the way we abstract quantitative problems from real data.

The 21st century brings the era of big data, and distinguishing Statistics from Mathematics becomes more important than ever.  Many areas of science are now being driven by new measurement technologies. Insights are being made by discovery-driven, as opposed to hypothesis-driven, experiments. Although testing hypotheses developed theoretically will of course remain important to science, it is inconceivable to think that, just like Leeuwenhoek became the father of microbiology by looking through the microscope without theoretical predictions, the era of big data will enable discoveries that we have not yet even imagined. However, it is naive to think that these new datasets will be free of noise and unwanted variability. Deterministic models alone will almost certainly fail at extracting useful information from these data just like they have failed at predicting complex systems like the weather. The advancement in science during the era of big data that the NSF wants to see will only happen if the field that specializes in making sense of data is properly defined as a separate field from Mathematics and appropriately supported.

Addendum: On a related note, NIH just announced that they plan to recruit a new senior scientific position: the Associate Director for Data Science

Please comment on the article here: Simply Statistics