Posts Tagged ‘ Tutorials ’

Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
By
Principal Components Regression, Pt. 2: Y-Aware Methods

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components … Continue reading Principal Components Regression, Pt. 2: Y-Aware Methods

Read more »

Principal Components Regression, Pt.1: The Standard Method

May 17, 2016
By
Principal Components Regression, Pt.1: The Standard Method

In this note, we discuss principal components regression and some of the issues with it: The need for scaling. The need for pruning. The lack of “y-awareness” of the standard dimensionality reduction step. The purpose of this article is to set the stage for presenting dimensionality reduction techniques appropriate for predictive modeling, such as y-aware … Continue reading Principal Components Regression, Pt.1: The Standard Method

Read more »

Coming up: principal components analysis

May 7, 2016
By
Coming up: principal components analysis

Just a “heads-up.” I’ve been editing a two-part three-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I think … Continue reading Coming up: principal components analysis

Read more »

vtreat cross frames

May 5, 2016
By
vtreat cross frames

vtreat cross frames John Mount, Nina Zumel 2016-05-05 As a follow on to “On Nested Models” we work R examples demonstrating “cross validated training frames” (or “cross frames”) in vtreat. Consider the following data frame. The outcome only depends on the “good” variables, not on the (high degree of freedom) “bad” variables. Modeling such a … Continue reading vtreat cross frames

Read more »

On Nested Models

April 26, 2016
By
On Nested Models

We have been recently working on and presenting on nested modeling issues. These are situations where the output of one trained machine learning model is part of the input of a later model or procedure. I am now of the opinion that correct treatment of nested models is one of the biggest opportunities for improvement … Continue reading On Nested Models

Read more »

Improved vtreat documentation

April 17, 2016
By
Improved vtreat documentation

Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here). vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically sound manner. Even with modern machine learning techniques (random forests, support vector machines, neural nets, gradient boosted … Continue reading Improved vtreat documentation

Read more »

A bit on the F1 score floor

April 2, 2016
By
A bit on the F1 score floor

At Strata+Hadoop World “R Day” Tutorial, Tuesday, March 29 2016, San Jose, California we spent some time on classifier measures derived from the so-called “confusion matrix.” We repeated our usual admonition to not use “accuracy itself” as a project quality goal (business people tend to ask for it as it is the word they are … Continue reading A bit on the F1 score floor

Read more »

WVPlots: example plots in R using ggplot2

April 1, 2016
By
WVPlots: example plots in R using ggplot2

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed … Continue reading WVPlots: example plots in R using ggplot2

Read more »

Upcoming Win-Vector LLC appearances

March 23, 2016
By
Upcoming Win-Vector LLC appearances

Win-Vector LLC will be presenting on statistically validating models using R and data science at: Strata+Hadoop World “R Day” Tutorial 9:00am–5:00pm Tuesday, March 29 2016, San Jose, California. ODSC San Francisco Meetup, 6:30pm-9:00pm Thursday, March 31, 2016, San Francisco, California. We will share code and examples. Registration required (and Strata is a paid conference). Please … Continue reading Upcoming Win-Vector LLC appearances

Read more »

More on preparing data

March 18, 2016
By
More on preparing data

The Microsoft Data Science User Group just sponsored Dr. Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience. We feel Nina really hit the ball out of … Continue reading More on preparing data

Read more »


Subscribe

Email:

  Subscribe