Blog Archives

Coordinatized Data: A Fluid Data Specification

March 29, 2017
By
Coordinatized Data: A Fluid Data Specification

Authors: John Mount and Nina Zumel. Introduction It’s been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and understanding of this concept doesn’t fully … Continue reading Coordinatized Data: A Fluid Data Specification

Read more »

Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

March 25, 2017
By
Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

This is a note on debugging magrittr pipelines in R using Bizarro Pipe and eager assignment. Pipes in R The magrittr R package supplies an operator called “pipe” which is written as “%>%“. The pipe operator is partly famous due to its extensive use in dplyr and use by dplyr users. The pipe operator is … Continue reading Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

Read more »

Datashader is a big deal

March 22, 2017
By
Datashader is a big deal

I recently got back from Strata West 2017 (where I ran a very well received workshop on R and Spark). One thing that really stood out for me at the exhibition hall was Bokeh plus datashader from Continuum Analytics. I had the privilege of having Peter Wang himself demonstrate datashader for me and answer a … Continue reading Datashader is a big deal

Read more »

Practical Data Science with R: ACM SIGACT News Book Review and Discount!

March 19, 2017
By
Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Our book Practical Data Science with R has just been reviewed in Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory (ACM SIGACT) News by Dr. Allan M. Miller (U.C. Berkeley)! The book is half off at Manning March 21st 2017 using the following code (please share/Tweet): Deal of the Day March … Continue reading Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Read more »

Practical Data Science with R: ACM SIGACT News Book Review and Discount!

March 19, 2017
By
Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Our book Practical Data Science with R has just been reviewed in Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory (ACM SIGACT) News by Dr. Allan M. Miller (U.C. Berkeley)! The book is half off at Manning form March 21st 2017 using the following code (please share/Tweet): Deal of the Day … Continue reading Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Read more »

New screencast: using R and RStudio to install and experiment with Apache Spark

March 15, 2017
By

I have new short screencast up: using R and RStudio to install and experiment with Apache Spark. More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.

Read more »

Some Win-Vector R packages

March 9, 2017
By
Some Win-Vector R packages

This post concludes our mini-series of Win-Vector open source R packages. We end with WVPlots, a collection of ready-made ggplot2 plots we find handy. Please read on for list of some of the Win-Vector LLC open-source R packages that we are pleased to share. For each package we have prepared a short introduction, so you … Continue reading Some Win-Vector R packages

Read more »

sigr: Simple Significance Reporting

March 7, 2017
By
sigr: Simple Significance Reporting

sigr is a simple R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use. Model Example Let’s take as our example the following linear relation between x and y: library('sigr') set.seed(353525) d <- data.frame(x= rnorm(5)) … Continue reading sigr: Simple Significance Reporting

Read more »

Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

March 6, 2017
By

In this screencast we demonstrate how to easily and effectively step-debug magrittr/dplyr pipelines in R using wrapr and replyr. Some of the big issues in trying to debug magrittr/dplyr pipelines include: Pipelines being large expressions that are hard to line-step into. Visibility of intermediate results. Localizing operations (in time and code position) in the presence … Continue reading Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

Read more »

replyr: Get a Grip on Big Data in R

March 5, 2017
By
replyr: Get a Grip on Big Data in R

replyr is an R package that contains extensions, adaptions, and work-arounds to make remote R dplyr data sources (including big data systems such as Spark) behave more like local data. This allows the analyst to more easily develop and debug procedures that simultaneously work on a variety of data services (in-memory data.frame, SQLite, PostgreSQL, and … Continue reading replyr: Get a Grip on Big Data in R

Read more »


Subscribe

Email:

  Subscribe