Posts Tagged ‘ Tutorials ’

Variable pruning is NP hard

August 28, 2016
By

I am working on some practical articles on variable selection, especially in the context of step-wise linear regression and logistic regression. One thing I noticed while preparing some examples is that summaries such as model quality (especially out of sample quality) and variable significances are not quite as simple as one would hope (they in … Continue reading Variable pruning is NP hard

Read more »

My criticism of R numeric summary

August 18, 2016
By
My criticism of R numeric summary

My criticism of R‘s numeric summary() method is: it is unfaithful to numeric arguments (due to bad default behavior) and frankly it should be considered unreliable. It is likely the way it is for historic and compatibility reasons, but in my opinion it does not currently represent a desirable set of tradeoffs. summary() likely represents … Continue reading My criticism of R numeric summary

Read more »

The Win-Vector parallel computing in R series

August 16, 2016
By

With our recent publication of “Can you nest parallel operations in R?” we now have a nice series of “how to speed up statistical computations in R” that moves from application, to larger/cloud application, and then to details. For your convenience here they are in order: A gentle introduction to parallel computing in R Running … Continue reading The Win-Vector parallel computing in R series

Read more »

On accuracy

July 22, 2016
By
On accuracy

In our last article on the algebra of classifier measures we encouraged readers to work through Nina Zumel’s original “Statistics to English Translation” series. This series has become slightly harder to find as we have use the original category designation “statistics to English translation” for additional work. To make things easier here are links to … Continue reading On accuracy

Read more »

y-aware scaling in context

June 22, 2016
By

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure everybody noticed that in … Continue reading y-aware scaling in context

Read more »

Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

June 9, 2016
By
Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression). Part 2: the introduction of y-aware scaling to direct the … Continue reading Why you should read Nina Zumel’s 3 part series on principal components…

Read more »

A demonstration of vtreat data preparation

June 2, 2016
By
A demonstration of vtreat data preparation

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive model … Continue reading A demonstration of vtreat data preparation

Read more »

On ranger respect.unordered.factors

May 30, 2016
By
On ranger respect.unordered.factors

It is often said that “R is its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading On ranger respect.unordered.factors

Read more »

Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
By
Principal Components Regression, Pt. 2: Y-Aware Methods

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components … Continue reading Principal Components Regression, Pt. 2: Y-Aware Methods

Read more »

Principal Components Regression, Pt.1: The Standard Method

May 17, 2016
By
Principal Components Regression, Pt.1: The Standard Method

In this note, we discuss principal components regression and some of the issues with it: The need for scaling. The need for pruning. The lack of “y-awareness” of the standard dimensionality reduction step. The purpose of this article is to set the stage for presenting dimensionality reduction techniques appropriate for predictive modeling, such as y-aware … Continue reading Principal Components Regression, Pt.1: The Standard Method

Read more »


Subscribe

Email:

  Subscribe