Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on me #rstats folks? I only want to reshape two variables x & y from wide to long! Starting with: d xa xb ya yb 1 1 … Continue reading Controlling Data Layout With cdata
What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called … Continue reading Piping is Method Chaining
As of cdata version 1.0.8 cdata implements an operator notation for data transform.
The idea is simple, yet powerful.
First let’s start with some data.
d <- wrapr::build_frame(
"id", "measure", "value" |
With all of the excitement surrounding cdata style control table based data transforms (the cdata ideas being named as the “replacements” for tidyr‘s current methodology, by the tidyr authors themselves!) I thought I would take a moment to describe how they work. cdata defines two primary data manipulation operators: rowrecs_to_blocks() and blocks_to_rowrecs(). These are the … Continue reading How cdata Control Table Data Transforms Work
Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front:
It appears !! is no longer the last word in substitution (it certainly wasn’t the first).
The described effect is actually already pretty easy t…
To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!). Here are our current examples: rquery and MonetDBLite rquery and RPostgreSQL rquery and RSQLite rquery and SparkR rquery and sparklyr For the MonetDBLite the query diagrammer shows a repeated calculation that … Continue reading Getting Started With rquery
Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows: I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section. Right assignment Right assignment is a bit of … Continue reading Playing With Pipe Notations
While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on … Continue reading Timing the Same Algorithm in R, Python, and C++
In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more. My position on code-capturing interfaces (or non-standard-evaluation/NSE) is: if poorly handled, they can be a large interface price/risk to pay for the minor convenience of … Continue reading Quoting Concatenate
Pipelines in R are popular, the most popular one being magrittr as used by dplyr. This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%. Piping Piping is … Continue reading Reusable Pipelines in R
This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement. The original published timings were as follows: With performance metrics: measurements are marketing. So let’s dig in the above a bit. These timings are of the kind of small task large number of repetition breed that Matt Dowle writes … Continue reading Timing Grouped Mean Calculation in R