Blog Archives

Announcing: Introduction to Data Science video course

February 25, 2015
By
Announcing: Introduction to Data Science video course

Win-Vector LLC’s Nina Zumel and John Mount are proud to announce their new data science video course Introduction to Data Science is now available on Udemy. We designed the course as an introduction to an advanced topic. The course description is: Use the R Programming Language to execute data science projects and become a data … Continue reading Announcing: Introduction to Data Science video course → Related posts: A bit…

Read more »

Check your return types when modeling in R

January 27, 2015
By
Check your return types when modeling in R

Just a warning: double check your return types in R, especially when using different modeling packages. We consider ourselves pretty familiar with R. We have years of experience, many other programming languages to compare R to, and we have taken Hadley Wickham’s Master R Developer Workshop (highly recommended). We already knew R’s predict function is … Continue reading Check your return types when modeling in R → Related posts: R…

Read more »

R bracket is a bit irregular

January 17, 2015
By
R bracket is a bit irregular

While skimming Professor Hadley Wickham’s Advanced R I got to thinking about nature of the square-bracket or extract operator in R. It turns out “[,]” is a bit more irregular than I remembered. The subsetting section of Advanced R has a very good discussion on the subsetting and selection operators found in R. In particular … Continue reading R bracket is a bit irregular → Related posts: R annoyances Selection…

Read more »

Is there a Kindle edition of Practical Data Science with R?

December 21, 2014
By
Is there a Kindle edition of Practical Data Science with R?

We have often been asked “why is there no Kindle edition of Practical Data Science with R on Amazon.com?” The short answer is: there is an edition you can read on your Kindle: but it is from the publisher Manning (not Amazon.com). The long answer is: when Amazon.com supplies a Kindle edition readers have to … Continue reading Is there a Kindle edition of Practical Data Science with R? →…

Read more »

A comment on preparing data for classifiers

December 4, 2014
By
A comment on preparing data for classifiers

I have been working through (with some honest appreciation) a recent article comparing many classifiers on many data sets: “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim; 15(Oct):3133−3181, 2014 (which we will call “the DWN paper” in this note). This paper applies 179 … Continue reading A comment on preparing data for classifiers → Related posts: The Geometry…

Read more »

Can we try to make an adjustment?

November 14, 2014
By
Can we try to make an adjustment?

In most of our data science teaching (including our book Practical Data Science with R) we emphasize the deliberately easy problem of “exchangeable prediction.” We define exchangeable prediction as: given a series of observations with two distinguished classes of variables/observations denoted “x”s (denoting control variables, independent variables, experimental variables, or predictor variables) and “y” (denoting … Continue reading Can we try to make an adjustment? → Related posts: Don’t use…

Read more »

Bias/variance tradeoff as gamesmanship

October 30, 2014
By
Bias/variance tradeoff as gamesmanship

Continuing our series of reading out loud from a single page of a statistics book we look at page 224 of the 1972 Dover edition of Leonard J. Savage’s “The Foundations of Statistics.” On this page we are treated to an example attributed to Leo A. Goodman in 1953 that illustrates how for normally distributed … Continue reading Bias/variance tradeoff as gamesmanship → Related posts: Automatic bias correction doesn’t fix…

Read more »

Factors are not first-class citizens in R

September 23, 2014
By
Factors are not first-class citizens in R

The primary user-facing data types in the R statistical computing environment behave as vectors. That is: one dimensional arrays of scalar values that have a nice operational algebra. There are additional types (lists, data frames, matrices, environments, and so-on) but the most common data types are vectors. In fact vectors are so common in R … Continue reading Factors are not first-class citizens in R → Related posts: R has…

Read more »

Reading the Gauss-Markov theorem

August 26, 2014
By
Reading the Gauss-Markov theorem

What is the Gauss-Markov theorem? From “The Cambridge Dictionary of Statistics” B. S. Everitt, 2nd Edition: A theorem that proves that if the error terms in a multiple regression have the same variance and are uncorrelated, then the estimators of the parameters in the model produced by least squares estimation are better (in the sense … Continue reading Reading the Gauss-Markov theorem → Related posts: What is meant by regression…

Read more »

Automatic bias correction doesn’t fix omitted variable bias

July 8, 2014
By
Automatic bias correction doesn’t fix omitted variable bias

Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to exhibit what might be considered … Continue reading Automatic bias correction doesn’t fix omitted variable bias → Related posts: Frequentist…

Read more »


Subscribe

Email:

  Subscribe