Posts Tagged ‘ Pragmatic Data Science ’

R Tip: Use the vtreat Package For Data Preparation

March 11, 2018
By
R Tip: Use the vtreat Package For Data Preparation

If you are working with predictive modeling or machine learning in R this is the R tip that is going to save you the most time and deliver the biggest improvement in your results. R Tip: Use the vtreat package for data preparation in predictive analytics and machine learning projects. When attempting predictive modeling with … Continue reading R Tip: Use the vtreat Package For Data Preparation

Read more »

Data Reshaping with cdata

January 17, 2018
By
Data Reshaping with cdata

I’ve just shared a short webcast on data reshaping in R using the cdata package. (link) We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! This is the material I recently presented at the January 2017 … Continue reading Data Reshaping with cdata

Read more »

Big cdata News

January 4, 2018
By
Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata. cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while … Continue reading Big cdata News

Read more »

Announcing rquery

December 28, 2017
By

We are excited to announce the rquery R package. rquery is Win-Vector LLC‘s currently in development big data query tool for R. rquery supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with R, SQL, and dplyr at big data scale in production). As an example: … Continue reading Announcing rquery

Read more »

Plotting Deep Learning Model Performance Trajectories

December 23, 2017
By
Plotting Deep Learning Model Performance Trajectories

I am excited to share a new deep learning model performance trajectory graph. Here is an example produced based on Keras in R using ggplot2: The ideas include: We plot model performance as a function of training epoch, data set (training and validation), and metric. For legibility we facet on metric, and facets are adjusted … Continue reading Plotting Deep Learning Model Performance Trajectories

Read more »

How to Greatly Speed Up Your Spark Queries

December 20, 2017
By
How to Greatly Speed Up Your Spark Queries

For some time we have been teaching R users "when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis." The idea behind the advice is: working with fewer columns makes for quicker queries. photo: Jacques Henri Lartigue 1912 The issue arises … Continue reading How to Greatly Speed Up Your Spark Queries

Read more »

Getting started with seplyr

December 14, 2017
By
Getting started with seplyr

A big “thank you!!!” to Microsoft for hosting our new introduction to seplyr. If you are working R and big data I think the seplyr package can be a valuable tool. For how and why, please check out our new introductory article. Note: now that wrapr version 1.0.2 is up on CRAN all of the … Continue reading Getting started with seplyr

Read more »

Win-Vector LLC announces new “big data in R” tools

November 29, 2017
By
Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN): partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster … Continue reading Win-Vector LLC announces new “big data in R” tools

Read more »

Vectorized Block ifelse in R

November 28, 2017
By

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R. From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure. When porting code from one language to another you hope the expressive … Continue reading Vectorized Block ifelse in R

Read more »

Arbitrary Data Transforms Using cdata

November 22, 2017
By
Arbitrary Data Transforms Using cdata

We have been writing a lot on higher-order data transforms lately: Coordinatized Data: A Fluid Data Specification Data Wrangling at Scale Fluid Data Big Data Transforms. What I want to do now is "write a bit more, so I finally feel I have been concise." The cdata R package supplies general data transform operators. The … Continue reading Arbitrary Data Transforms Using cdata

Read more »


Subscribe

Email:

  Subscribe