Blog Archives

On accuracy

July 22, 2016
By
On accuracy

In our last article on the algebra of classifier measures we encouraged readers to work through Nina Zumel’s original “Statistics to English Translation” series. This series has become slightly harder to find as we have use the original category designation “statistics to English translation” for additional work. To make things easier here are links to … Continue reading On accuracy

Read more »

A budget of classifier evaluation measures

July 22, 2016
By
A budget of classifier evaluation measures

Beginning analysts and data scientists often ask: “how does one remember and master the seemingly endless number of classifier metrics?” My concrete advice is: Read Nina Zumel’s excellent series on scoring classifiers. Keep notes. Settle on one or two metrics as you move project to project. We prefer “AUC” early in a project (when you … Continue reading A budget of classifier evaluation measures

Read more »

vtreat version 0.5.26 released on CRAN

July 12, 2016
By

Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.26 has been released on CRAN. ‘vtreat’ is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) ‘vtreat’ is an R package that incorporates a number of transforms and simulated out of … Continue reading vtreat version 0.5.26 released on CRAN

Read more »

y-aware scaling in context

June 22, 2016
By

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure everybody noticed that in … Continue reading y-aware scaling in context

Read more »

Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

June 9, 2016
By
Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression). Part 2: the introduction of y-aware scaling to direct the … Continue reading Why you should read Nina Zumel’s 3 part series on principal components…

Read more »

Free e-book: Exploring Data Science

June 8, 2016
By
Free e-book: Exploring Data Science

We are pleased to announce a new free e-book from Manning Publications: Exploring Data Science. Exploring Data Science is a collection of five chapters hand picked by John Mount and Nina Zumel, introducing you to various areas in data science and explaining which methodologies work best for each. Exploring Data Science gives you a free … Continue reading Free e-book: Exploring Data Science

Read more »

A demonstration of vtreat data preparation

June 2, 2016
By
A demonstration of vtreat data preparation

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive model … Continue reading A demonstration of vtreat data preparation

Read more »

On ranger respect.unordered.factors

May 30, 2016
By
On ranger respect.unordered.factors

It is often said that “R is its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading On ranger respect.unordered.factors

Read more »

For a short time: Half Off Some Manning Data Science Books

May 12, 2016
By

Our publisher Manning Publications is celebrating the release of a new data science in Python title Introducing Data Science by offering it and other Manning titles at half off until Wednesday, May 18. As part of the promotion you can also use the supplied discount code mlcielenlt for half off some R titles including R … Continue reading For a short time: Half Off Some Manning Data Science Books

Read more »

Coming up: principal components analysis

May 7, 2016
By
Coming up: principal components analysis

Just a “heads-up.” I’ve been editing a two-part three-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I think … Continue reading Coming up: principal components analysis

Read more »


Subscribe

Email:

  Subscribe