Posts Tagged ‘ Tutorials ’

wrapr Implementation Update

June 19, 2017
By
wrapr Implementation Update

Introduction The development version of our R helper function wrapr::let() has switched from string-based substitution to abstract syntax tree based substitution (AST based subsitution, or language based substitution). I am looking for some feedback from wrapr::let() users already doing substantial work with wrapr::let(). If you are already using wrapr::let() please test if the current development … Continue reading wrapr Implementation Update

Read more »

Non-Standard Evaluation and Function Composition in R

June 16, 2017
By

In this article we will discuss composing standard-evaluation interfaces (SE) and composing non-standard-evaluation interfaces (NSE) in R. In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces. To use it you must know some of its structure and notation. Here are some details paraphrased … Continue reading Non-Standard Evaluation and Function Composition in R

Read more »

An easy way to accidentally inflate reported R-squared in linear regression models

June 15, 2017
By

Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R. We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. First let’s set up … Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

Read more »

Use a Join Controller to Document Your Work

June 13, 2017
By
Use a Join Controller to Document Your Work

This note describes a useful replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses). When working on real world predictive modeling tasks in production, the ability to join data and document how you … Continue reading Use a Join Controller to Document Your Work

Read more »

Managing intermediate results when using R/sparklyr

June 9, 2017
By
Managing intermediate results when using R/sparklyr

In our latest “R and big data” article we show how to manage intermediate results in non-trivial Apache Spark workflows using R, sparklyr, dplyr, and replyr. Handle management Many Sparklyr tasks involve creation of intermediate or temporary tables. This can be through dplyr::copy_to() and through dplyr::compute(). These handles can represent a reference leak and eat … Continue reading Managing intermediate results when using R/sparklyr

Read more »

Managing Spark data handles in R

May 26, 2017
By
Managing Spark data handles in R

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list or data_frame. Please read on for our handy hints on keeping your data handles neat. When using R to work over a big data system (such as Spark) much … Continue reading Managing Spark data handles in R

Read more »

New series: R and big data (concentrating on Spark and sparklyr)

May 20, 2017
By
New series: R and big data (concentrating on Spark and sparklyr)

Win-Vector LLC has recently been teaching how to use R with big data through Spark and sparklyr. We have also been helping clients become productive on R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using R, share some … Continue reading New series: R and big data (concentrating on Spark and sparklyr)

Read more »

dplyr in Context

May 7, 2017
By
dplyr in Context

Introduction Beginning R users often come to the false impression that the popular packages dplyr and tidyr are both all of R and sui generis inventions (in that they might be unprecedented and there might no other reasonable way to get the same effects in R). These packages and their conventions are high-value, but they … Continue reading dplyr in Context

Read more »

Encoding categorical variables: one-hot and beyond

April 15, 2017
By
Encoding categorical variables: one-hot and beyond

(or: how to correctly use xgboost from R) R has "one-hot" encoding hidden in most of its modeling paths. Asking an R user where one-hot encoding is used is like asking a fish where there is water; they can’t point to it as it is everywhere. For example we can see evidence of one-hot encoding … Continue reading Encoding categorical variables: one-hot and beyond

Read more »

Teaching pivot / un-pivot

April 11, 2017
By
Teaching pivot / un-pivot

Authors: John Mount and Nina Zumel Introduction In teaching thinking in terms of coordinatized data we find the hardest operations to teach are joins and pivot. One thing we commented on is that moving data values into columns, or into a “thin” or entity/attribute/value form (often called “un-pivoting”, “stacking”, “melting” or “gathering“) is easy to … Continue reading Teaching pivot / un-pivot

Read more »


Subscribe

Email:

  Subscribe