Posts Tagged ‘ Tutorials ’

More Pipes in R

December 16, 2017
By
More Pipes in R

Was enjoying Gabriel’s article Pipes in R Tutorial For Beginners and wanted call attention to a few more pipes in R (not all for beginners). data.table has essentially used the square bracket sequence “][” in a manner equivalent to piping in R since about 2006. Here is an example. The Bizarro Pipe “->.;” has always … Continue reading More Pipes in R

Read more »

Win-Vector LLC announces new “big data in R” tools

November 29, 2017
By
Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN): partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster … Continue reading Win-Vector LLC announces new “big data in R” tools

Read more »

Arbitrary Data Transforms Using cdata

November 22, 2017
By
Arbitrary Data Transforms Using cdata

We have been writing a lot on higher-order data transforms lately: Coordinatized Data: A Fluid Data Specification Data Wrangling at Scale Fluid Data Big Data Transforms. What I want to do now is "write a bit more, so I finally feel I have been concise." The cdata R package supplies general data transform operators. The … Continue reading Arbitrary Data Transforms Using cdata

Read more »

RStudio Keyboard Shortcuts for Pipes

November 18, 2017
By
RStudio Keyboard Shortcuts for Pipes

I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R. You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).

Read more »

Data Wrangling at Scale

November 15, 2017
By
Data Wrangling at Scale

Just wrote a new R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template). Please check it out.

Read more »

Update on coordinatized or fluid data

November 13, 2017
By
Update on coordinatized or fluid data

We have just released a major update of the cdata R package to CRAN. If you work with R and data, now is the time to check out the cdata package. Among the changes in the 0.5.* version of cdata package: All coordinatized data or fluid data operations are now in the cdata package (no … Continue reading Update on coordinatized or fluid data

Read more »

Let X=X in R

November 3, 2017
By
Let X=X in R

Our article "Let’s Have Some Sympathy For The Part-time R User" includes two points: Sometimes you have to write parameterized or re-usable code. The methods for doing this should be easy and legible. The first point feels abstract, until you find yourself wanting to re-use code on new projects. As for the second point: I … Continue reading Let X=X in R

Read more »

Big Data Transforms

October 29, 2017
By
Big Data Transforms

As part of our consulting practice Win-Vector LLC has been helping a few clients stand-up advanced analytics and machine learning stacks using R and substantial data stores (such as relational database variants such as PostgreSQL or big data systems such as Spark). Often we come to a point where we or a partner realize: "the … Continue reading Big Data Transforms

Read more »

Partial Pooling for Lower Variance Variable Encoding

September 28, 2017
By
Partial Pooling for Lower Variance Variable Encoding

Banaue rice terraces. Photo: Jon Rawlinson In a previous article, we showed the use of partial pooling, or hierarchical/multilevel models, for level coding high-cardinality categorical variables in vtreat. In this article, we will discuss a little more about the how and why of partial pooling in R. We will use the lme4 package to fit … Continue reading Partial Pooling for Lower Variance Variable Encoding

Read more »

Custom Level Coding in vtreat

September 25, 2017
By
Custom Level Coding in vtreat

One of the services that the R package vtreat provides is level coding (what we sometimes call impact coding): converting the levels of a categorical variable to a meaningful and concise single numeric variable, rather than coding them as indicator variables (AKA "one-hot encoding"). Level coding can be computationally and statistically preferable to one-hot encoding … Continue reading Custom Level Coding in vtreat

Read more »


Subscribe

Email:

  Subscribe