Blog Archives

RStudio Keyboard Shortcuts for Pipes

November 18, 2017
By
RStudio Keyboard Shortcuts for Pipes

I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R. You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).

Read more »

Data Wrangling at Scale

November 15, 2017
By
Data Wrangling at Scale

Just wrote a new R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template). Please check it out.

Read more »

Update on coordinatized or fluid data

November 13, 2017
By
Update on coordinatized or fluid data

We have just released a major update of the cdata R package to CRAN. If you work with R and data, now is the time to check out the cdata package. Among the changes in the 0.5.* version of cdata package: All coordinatized data or fluid data operations are now in the cdata package (no … Continue reading Update on coordinatized or fluid data

Read more »

Let X=X in R

November 3, 2017
By
Let X=X in R

Our article "Let’s Have Some Sympathy For The Part-time R User" includes two points: Sometimes you have to write parameterized or re-usable code. The methods for doing this should be easy and legible. The first point feels abstract, until you find yourself wanting to re-use code on new projects. As for the second point: I … Continue reading Let X=X in R

Read more »

Big Data Transforms

October 29, 2017
By
Big Data Transforms

As part of our consulting practice Win-Vector LLC has been helping a few clients stand-up advanced analytics and machine learning stacks using R and substantial data stores (such as relational database variants such as PostgreSQL or big data systems such as Spark). Often we come to a point where we or a partner realize: "the … Continue reading Big Data Transforms

Read more »

Some Announcements

October 24, 2017
By

Some Announcements: Dr. Nina Zumel will be presenting “Myths of Data Science: Things you Should and Should Not Believe”, Sunday, October 29, 2017 10:00 AM to 12:30 PM at the She Talks Data Meetup (Bay Area). ODSC West 2017 is soon. It is our favorite conference and we will be giving both a workshop and … Continue reading Some Announcements

Read more »

Upcoming data preparation and modeling article series

September 23, 2017
By
Upcoming data preparation and modeling article series

I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects. vtreat handles, in a statistically sound … Continue reading Upcoming data preparation and modeling article series

Read more »

My advice on dplyr::mutate()

September 22, 2017
By
My advice on dplyr::mutate()

There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of reproducible … Continue reading My advice on dplyr::mutate()

Read more »

Remember: p-values Are Not Effect Sizes

September 9, 2017
By
Remember: p-values Are Not Effect Sizes

Authors: John Mount and Nina Zumel. The p-value is a valid frequentist statistical concept that is much abused and mis-used in practice. In this article I would like to call out a few features of p-values that can cause problems in evaluating summaries. Keep in mind: p-values are useful and routinely taught correctly in statistics, … Continue reading Remember: p-values Are Not Effect Sizes

Read more »

It is Needlessly Difficult to Count Rows Using dplyr

September 3, 2017
By
It is Needlessly Difficult to Count Rows Using dplyr

Question: how hard is it to count rows using the R package dplyr? Answer: surprisingly difficult. When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and irregularities (a few of which … Continue reading It is Needlessly Difficult to Count Rows Using dplyr

Read more »


Subscribe

Email:

  Subscribe