Posts Tagged ‘ Practical Data Science ’

R Tip: Use the vtreat Package For Data Preparation

March 11, 2018
By
R Tip: Use the vtreat Package For Data Preparation

If you are working with predictive modeling or machine learning in R this is the R tip that is going to save you the most time and deliver the biggest improvement in your results. R Tip: Use the vtreat package for data preparation in predictive analytics and machine learning projects. When attempting predictive modeling with … Continue reading R Tip: Use the vtreat Package For Data Preparation

Read more »

Data Reshaping with cdata

January 17, 2018
By
Data Reshaping with cdata

I’ve just shared a short webcast on data reshaping in R using the cdata package. (link) We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! This is the material I recently presented at the January 2017 … Continue reading Data Reshaping with cdata

Read more »

Big cdata News

January 4, 2018
By
Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata. cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while … Continue reading Big cdata News

Read more »

Partial Pooling for Lower Variance Variable Encoding

September 28, 2017
By
Partial Pooling for Lower Variance Variable Encoding

Banaue rice terraces. Photo: Jon Rawlinson In a previous article, we showed the use of partial pooling, or hierarchical/multilevel models, for level coding high-cardinality categorical variables in vtreat. In this article, we will discuss a little more about the how and why of partial pooling in R. We will use the lme4 package to fit … Continue reading Partial Pooling for Lower Variance Variable Encoding

Read more »

Custom Level Coding in vtreat

September 25, 2017
By
Custom Level Coding in vtreat

One of the services that the R package vtreat provides is level coding (what we sometimes call impact coding): converting the levels of a categorical variable to a meaningful and concise single numeric variable, rather than coding them as indicator variables (AKA "one-hot encoding"). Level coding can be computationally and statistically preferable to one-hot encoding … Continue reading Custom Level Coding in vtreat

Read more »

Upcoming data preparation and modeling article series

September 23, 2017
By
Upcoming data preparation and modeling article series

I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects. vtreat handles, in a statistically sound … Continue reading Upcoming data preparation and modeling article series

Read more »

Supervised Learning in R: Regression

August 14, 2017
By
Supervised Learning in R: Regression

We are very excited to announce a new (paid) Win-Vector LLC video training course: Supervised Learning in R: Regression now available on DataCamp The course is primarily authored by Dr. Nina Zumel (our chief of course design) with contributions from Dr. John Mount. This course will get you quickly up to speed covering: What is … Continue reading Supervised Learning in R: Regression

Read more »

More documentation for Win-Vector R packages

July 29, 2017
By
More documentation for Win-Vector R packages

The Win-Vector public R packages now all have new pkgdown documentation sites! (And, a thank-you to Hadley Wickham for developing the pkgdown tool.) Please check them out (hint: vtreat is our favorite). The package sites: cdata replyr seplyr sigr vtre...

Read more »

Join Dependency Sorting

July 1, 2017
By
Join Dependency Sorting

In our latest installment of “R and big data” let’s again discuss the task of left joining many tables from a data warehouse using R and a system called "a join controller" (last discussed here). One of the great advantages to specifying complicated sequences of operations in data (rather than in code) is: it is … Continue reading Join Dependency Sorting

Read more »

Use a Join Controller to Document Your Work

June 13, 2017
By
Use a Join Controller to Document Your Work

This note describes a useful replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses). When working on real world predictive modeling tasks in production, the ability to join data and document how you … Continue reading Use a Join Controller to Document Your Work

Read more »


Subscribe

Email:

  Subscribe