Blog Archives

Data Reshaping with cdata

January 17, 2018
By
Data Reshaping with cdata

I’ve just shared a short webcast on data reshaping in R using the cdata package. (link) We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! This is the material I recently presented at the January 2017 … Continue reading Data Reshaping with cdata

Read more »

Base R can be Fast

January 15, 2018
By
Base R can be Fast

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are (edit: “always”) faster than R code.” The benchmark results of “rquery: Fast Data Manipulation in R” really … Continue reading Base R can be Fast

Read more »

Setting up RStudio Server quickly on Amazon EC2

January 13, 2018
By
Setting up RStudio Server quickly on Amazon EC2

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes. Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user interface to your remote R … Continue reading Setting up RStudio Server quickly on Amazon EC2

Read more »

Big cdata News

January 4, 2018
By
Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata. cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while … Continue reading Big cdata News

Read more »

Kudos to Professor Andrew Gelman

December 29, 2017
By

Kudos to Professor Andrew Gelman for telling a great joke at his own expense: Stupid-ass statisticians don’t know what a goddam confidence interval is. He brilliantly burlesqued a frustrating common occurrence many people say they “have never seen happen.” One of the pains of writing about data science is there is a (small but vocal) … Continue reading Kudos to Professor Andrew Gelman

Read more »

Announcing rquery

December 28, 2017
By

We are excited to announce the rquery R package. rquery is Win-Vector LLC‘s currently in development big data query tool for R. rquery supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with R, SQL, and dplyr at big data scale in production). As an example: … Continue reading Announcing rquery

Read more »

Plotting Deep Learning Model Performance Trajectories

December 23, 2017
By
Plotting Deep Learning Model Performance Trajectories

I am excited to share a new deep learning model performance trajectory graph. Here is an example produced based on Keras in R using ggplot2: The ideas include: We plot model performance as a function of training epoch, data set (training and validation), and metric. For legibility we facet on metric, and facets are adjusted … Continue reading Plotting Deep Learning Model Performance Trajectories

Read more »

How to Greatly Speed Up Your Spark Queries

December 20, 2017
By
How to Greatly Speed Up Your Spark Queries

For some time we have been teaching R users "when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis." The idea behind the advice is: working with fewer columns makes for quicker queries. photo: Jacques Henri Lartigue 1912 The issue arises … Continue reading How to Greatly Speed Up Your Spark Queries

Read more »

More Pipes in R

December 16, 2017
By
More Pipes in R

Was enjoying Gabriel’s article Pipes in R Tutorial For Beginners and wanted call attention to a few more pipes in R (not all for beginners). data.table has essentially used the square bracket sequence “][” in a manner equivalent to piping in R since about 2006. Here is an example. The Bizarro Pipe “->.;” has always … Continue reading More Pipes in R

Read more »

Getting started with seplyr

December 14, 2017
By
Getting started with seplyr

A big “thank you!!!” to Microsoft for hosting our new introduction to seplyr. If you are working R and big data I think the seplyr package can be a valuable tool. For how and why, please check out our new introductory article. Note: now that wrapr version 1.0.2 is up on CRAN all of the … Continue reading Getting started with seplyr

Read more »


Subscribe

Email:

  Subscribe