Posts Tagged ‘ Exciting Techniques ’

rqdatatable: rquery Powered by data.table

June 3, 2018
By
rqdatatable: rquery Powered by data.table

rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the fastest and most teachable (due … Continue reading rqdatatable: rquery Powered by data.table

Read more »

Upcoming speaking engagments

April 19, 2018
By
Upcoming speaking engagments

I have a couple of public appearances coming up soon. The East Bay R Language Beginners Group: Preparing Datasets – The Ugly Truth & Some Solutions, Tuesday, May 1, 2018 at Robert Half Technologies, 1999 Harrison Street, Oakland, CA, 94612. Official May 2018 BARUG Meeting: rquery: a Query Generator for Working With SQL Data, Tuesday, … Continue reading Upcoming speaking engagments

Read more »

Wanted: cdata Test Pilots

February 25, 2018
By
Wanted: cdata Test Pilots

I need a few volunteers to please “test pilot” the development version of the R package cdata, please. Jacqueline Cochran: at the time of her death, no other pilot held more speed, distance, or altitude records in aviation history than Cochran. Our cdata package is using an upcoming new feature called “build_frame()” that allows for … Continue reading Wanted: cdata Test Pilots

Read more »

Supercharge your R code with wrapr

January 27, 2018
By
Supercharge your R code with wrapr

I would like to demonstrate some helpful wrapr R notation tools that really neaten up your R code. Img: Christopher Ziemnowicz. Named Map Builder First I will demonstrate wrapr‘s "named map builder": :=. The named map builder adds names to vectors and lists by nice "names on the left and values on the right" notation. … Continue reading Supercharge your R code with wrapr

Read more »

Data Reshaping with cdata

January 17, 2018
By
Data Reshaping with cdata

I’ve just shared a short webcast on data reshaping in R using the cdata package. (link) We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! This is the material I recently presented at the January 2017 … Continue reading Data Reshaping with cdata

Read more »

Big cdata News

January 4, 2018
By
Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata. cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while … Continue reading Big cdata News

Read more »

Plotting Deep Learning Model Performance Trajectories

December 23, 2017
By
Plotting Deep Learning Model Performance Trajectories

I am excited to share a new deep learning model performance trajectory graph. Here is an example produced based on Keras in R using ggplot2: The ideas include: We plot model performance as a function of training epoch, data set (training and validation), and metric. For legibility we facet on metric, and facets are adjusted … Continue reading Plotting Deep Learning Model Performance Trajectories

Read more »

How to Greatly Speed Up Your Spark Queries

December 20, 2017
By
How to Greatly Speed Up Your Spark Queries

For some time we have been teaching R users "when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis." The idea behind the advice is: working with fewer columns makes for quicker queries. photo: Jacques Henri Lartigue 1912 The issue arises … Continue reading How to Greatly Speed Up Your Spark Queries

Read more »

Win-Vector LLC announces new “big data in R” tools

November 29, 2017
By
Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN): partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster … Continue reading Win-Vector LLC announces new “big data in R” tools

Read more »

Partial Pooling for Lower Variance Variable Encoding

September 28, 2017
By
Partial Pooling for Lower Variance Variable Encoding

Banaue rice terraces. Photo: Jon Rawlinson In a previous article, we showed the use of partial pooling, or hierarchical/multilevel models, for level coding high-cardinality categorical variables in vtreat. In this article, we will discuss a little more about the how and why of partial pooling in R. We will use the lme4 package to fit … Continue reading Partial Pooling for Lower Variance Variable Encoding

Read more »


Subscribe

Email:

  Subscribe