Blog Archives

y-aware scaling in context

June 22, 2016
By

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure everybody noticed that in … Continue reading y-aware scaling in context

Read more »

Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

June 9, 2016
By
Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression). Part 2: the introduction of y-aware scaling to direct the … Continue reading Why you should read Nina Zumel’s 3 part series on principal components…

Read more »

Free e-book: Exploring Data Science

June 8, 2016
By
Free e-book: Exploring Data Science

We are pleased to announce a new free e-book from Manning Publications: Exploring Data Science. Exploring Data Science is a collection of five chapters hand picked by John Mount and Nina Zumel, introducing you to various areas in data science and explaining which methodologies work best for each. Exploring Data Science gives you a free … Continue reading Free e-book: Exploring Data Science

Read more »

A demonstration of vtreat data preparation

June 2, 2016
By
A demonstration of vtreat data preparation

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive model … Continue reading A demonstration of vtreat data preparation

Read more »

On ranger respect.unordered.factors

May 30, 2016
By
On ranger respect.unordered.factors

It is often said that “R is its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading On ranger respect.unordered.factors

Read more »

For a short time: Half Off Some Manning Data Science Books

May 12, 2016
By

Our publisher Manning Publications is celebrating the release of a new data science in Python title Introducing Data Science by offering it and other Manning titles at half off until Wednesday, May 18. As part of the promotion you can also use the supplied discount code mlcielenlt for half off some R titles including R … Continue reading For a short time: Half Off Some Manning Data Science Books

Read more »

Coming up: principal components analysis

May 7, 2016
By
Coming up: principal components analysis

Just a “heads-up.” I’ve been editing a two-part three-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I think … Continue reading Coming up: principal components analysis

Read more »

vtreat cross frames

May 5, 2016
By
vtreat cross frames

vtreat cross frames John Mount, Nina Zumel 2016-05-05 As a follow on to “On Nested Models” we work R examples demonstrating “cross validated training frames” (or “cross frames”) in vtreat. Consider the following data frame. The outcome only depends on the “good” variables, not on the (high degree of freedom) “bad” variables. Modeling such a … Continue reading vtreat cross frames

Read more »

On Nested Models

April 26, 2016
By
On Nested Models

We have been recently working on and presenting on nested modeling issues. These are situations where the output of one trained machine learning model is part of the input of a later model or procedure. I am now of the opinion that correct treatment of nested models is one of the biggest opportunities for improvement … Continue reading On Nested Models

Read more »

Improved vtreat documentation

April 17, 2016
By
Improved vtreat documentation

Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here). vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically sound manner. Even with modern machine learning techniques (random forests, support vector machines, neural nets, gradient boosted … Continue reading Improved vtreat documentation

Read more »


Subscribe

Email:

  Subscribe