Posts Tagged ‘ Tutorials ’

Coordinatized Data: A Fluid Data Specification

March 29, 2017
By
Coordinatized Data: A Fluid Data Specification

Authors: John Mount and Nina Zumel. Introduction It has been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and understanding of this concept doesn’t … Continue reading Coordinatized Data: A Fluid Data Specification

Read more »

Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

March 25, 2017
By
Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

This is a note on debugging magrittr pipelines in R using Bizarro Pipe and eager assignment. Pipes in R The magrittr R package supplies an operator called “pipe” which is written as “%>%“. The pipe operator is partly famous due to its extensive use in dplyr and use by dplyr users. The pipe operator is … Continue reading Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

Read more »

Sorting correlation coefficients by their magnitudes in a SAS macro

Sorting correlation coefficients by their magnitudes in a SAS macro

Theoretical Background Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables.  For 2 random variables, and , the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations.  Algebraically, this can be expressed as . In real life, you can never […]

Read more »

New screencast: using R and RStudio to install and experiment with Apache Spark

March 15, 2017
By

I have new short screencast up: using R and RStudio to install and experiment with Apache Spark. More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.

Read more »

Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

March 6, 2017
By

In this screencast we demonstrate how to easily and effectively step-debug magrittr/dplyr pipelines in R using wrapr and replyr. Some of the big issues in trying to debug magrittr/dplyr pipelines include: Pipelines being large expressions that are hard to line-step into. Visibility of intermediate results. Localizing operations (in time and code position) in the presence … Continue reading Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

Read more »

vtreat: prepare data

March 3, 2017
By
vtreat: prepare data

This article is on preparing data for modeling in R using vtreat. Our example Suppose we wish to work with some data. Our example task is to train a classification model for credit approval using the ranger implementation of the random forests method. We will take our data from John Ross Quinlan's re-processed "credit approval" … Continue reading vtreat: prepare data

Read more »

Iteration and closures in R

February 26, 2017
By
Iteration and closures in R

I recently read an interesting thread on unexpected behavior in R when creating a list of functions in a loop or iteration. The issue is solved, but I am going to take the liberty to try and re-state and slow down the discussion of the problem (and fix) for clarity. The issue is: are references … Continue reading Iteration and closures in R

Read more »

The Zero Bug

February 21, 2017
By
The Zero Bug

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it. The zero bug Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes … Continue reading The Zero Bug

Read more »

Evolving R Tools and Practices

February 6, 2017
By
Evolving R Tools and Practices

One of the distinctive features of the R platform is how explicit and user controllable everything is. This allows the style of use of R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view of the evolving “current best … Continue reading Evolving R Tools and Practices

Read more »

Using the Bizarro Pipe to Debug magrittr Pipelines in R

January 30, 2017
By
Using the Bizarro Pipe to Debug magrittr Pipelines in R

I have just finished and released a free new R video lecture demonstrating how to use the “Bizarro pipe” to debug magrittr pipelines. I think R dplyr users will really enjoy it. Please read on for the link to the video lecture. In this video lecture I use the “Bizarro pipe” to debug the example … Continue reading Using the Bizarro Pipe to Debug magrittr Pipelines in R

Read more »


Subscribe

Email:

  Subscribe