Posts Tagged ‘ data ’

Composite ranking and numbersense

February 25, 2015
By
Composite ranking and numbersense

Chapter 1 of Numbersense (link)uses the example of U.S. News ranking of law schools to explore the national pastime of ranking almost anything. Since there is no objective standard for the "correct" ranking, it is pointless to complain about "arbitrary" weighting and so on. Every replacement has its own assumptions. A more productive path forward is to understand how the composite ranking is created, and shine a light on the…

Read more »

Optimizely Stats Engine 2: what about advanced users?

February 9, 2015
By
Optimizely Stats Engine 2: what about advanced users?

In Part 1, I covered the logic behind recent changes to the statistical analysis used in standard reports by Optimizely. In Part 2, I ponder what this change means for more sophisticated customers--those who are following the proper protocols for classical design of experiments, such as running tests of predetermined sample sizes, adjusting for multiple comparisons, and constructing and analyzing multivariate tests using regression with interactions. For this segment, the…

Read more »

More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)

February 7, 2015
By
More 3D Graphics (rgl) for Classification with Local Logistic Regression and Kernel Density Estimates (from The Elements of Statistical Learning)

This post builds on a previous post, but can be read and understood independently. As part of my course on statistical learning, we created 3D graphics to foster a more intuitive understanding of the various methods that are used to relax the assumption of linearity (in the predictors) in regression and classification methods. The authors […]

Read more »

Deflate-gate, Part 2: not average != extreme, and Sunday talk shows

February 2, 2015
By

Last week, I pointed out the futility of using data as proof or disproof in Deflate-gate. Emphatically, a case of "N=All" does not make things better. I later edited the post for HBR (link). In this post, I want to address a couple of more subtle technical issues related to the Sharp analysis, which can be summarized as follows: 1. New England is an outlier in the plays per fumbles…

Read more »

Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)

February 1, 2015
By
Some 3D Graphics (rgl) for Classification with Splines and Logistic Regression (from The Elements of Statistical Learning)

This semester I'm teaching from Hastie, Tibshirani, and Friedman's book, The Elements of Statistical Learning, 2nd Edition. The authors provide a Mixture Simulation data set that has two continuous predictors and a binary outcome. This data is used to demonstrate classification procedures by plotting classification boundaries in the two predictors. For example, the figure below […]

Read more »

Some Suggested Reading

January 31, 2015
By
Some Suggested Reading

Bahoc, F., H. Leeb, and B. M. Potscher, 2014. Valid confidence intervals for post-model-selection predictors. Working Paper, Department of Statistics, University of Vienna.Baumeister, C. and J. D. Hamilton, 2014. Sign restrictions, structural vector au...

Read more »

Football and statistics, on HBR!

January 30, 2015
By

I was asked to adapt my earlier post for the HBR audience, and the new version is now up on HBR. Here is the link. I'm happy that they picked up this post because most business problems concern reverse causation. A small subset of problems can be solved using A/B testing, but only those in which causes are known in advance and subject to manipulation. Even then, Facebook got into…

Read more »

Limits of statistics, and by extension data science, as illustrated by Deflate-gate

January 27, 2015
By
Limits of statistics, and by extension data science, as illustrated by Deflate-gate

A number of readers sent me Warren Sharp's piece about the ongoing New England Patriots' deflate-gate scandal (link to Slate's version of this) so I suppose I should say something about it. For those readers who are not into American football, the Superbowl is soon upon us. New England, one of the two finalists, has been accused of using footballs that are below the weight requirements on the rulebook, hence…

Read more »

Why you need a second pair of eyes

January 13, 2015
By
Why you need a second pair of eyes

Reader Aaron K. submitted an infographic advertising the upcoming New England Auto Show to be held in Boston (link). As Aaron pointed out, there is plenty of elementary errors contained in one page. I don't think the designer did these...

Read more »

Trifacta revisited: tackling a Big Data problem

January 12, 2015
By

During my vacation, I had a chance to visit Trifacta, the data-wrangling startup I blogged about last year (link). Wei Zheng, Tye Rattenbury, and Will Davis hosted me, and showed some of the new stuff they are working on. Trifacta is tackling a major Big Data problem, and I remain excited about the direction they are heading. From the beginning, I am attracted by Trifacta’s user interface. The user in…

Read more »


Subscribe

Email:

  Subscribe