With all of the excitement surrounding cdata style control table based data transforms (the cdata ideas being named as the “replacements” for tidyr‘s current methodology, by the tidyr authors themselves!) I thought I would take a moment to describe how they work. cdata defines two primary data manipulation operators: rowrecs_to_blocks() and blocks_to_rowrecs(). These are the … Continue reading How cdata Control Table Data Transforms Work
We recently saw this UX (user experience) question from the tidyr author as he adapts tidyr to cdata techniques. The terminology that he is not adopting from cdata is “unpivot_to_blocks()” and “pivot_to_rowrecs()”. One of the research ideas in the cdata package is that the important thing to call out is record structure. The important point … Continue reading Why we Did Not Name the cdata Transforms wide/tall/long/short
Thank you to Win-Vector LLC General Partner Nina Zumel for stepping up her workload, allowing me take some time off from Win-Vector LLC (and time off from from revising chapter 8 of Practical Data Science with R 2nd Edition) to make time to help administer the Vietnam Rotary Global Grant mentioned below. This project is … Continue reading Support Rotary to Support our World
From https://twitter.com/sharon000/status/1107771331012108288: From https://tidyr.tidyverse.org/dev/articles/pivot.html: There are two important new features inspired by other R packages that have been advancing of reshaping in R: The reshaping operation can be specified with a data frame that describes precisely how metadata stored in column names becomes data variables (and vice versa). This is inspired by the cdata package … Continue reading Tidyverse users: gather/spread are on the way out
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practices (or at least correlates with other good pracices)? Well, it turns out we can quantify it: each additional non-core … Continue reading Quantifying R Package Dependency Risk
I would like to once again recommend our readers to our note on wrapr::let(), an R function that can help you eliminate many problematic NSE (non-standard evaluation) interfaces (and their associate problems) from your R programming tasks. The idea is to imitate the following lambda-calculus idea: let x be y in z := ( λ … Continue reading wrapr::let()
Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks. If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish … Continue reading Software Dependencies and Risk
I am collecting here some notes on testing in R. There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is … Continue reading Unit Tests in R
Let’s try some “ugly corner cases” for data manipulation in R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong. Let’s see what happens when we try to stick a fork in the … Continue reading Data Manipulation Corner Cases
Starting With Data Science A rigorous hands-on introduction to data science for engineers. Win Vector LLC is now offering a 4 day on-site intensive data science course. The course targets engineers familiar with Python and introduces them to the basics of current data science practice. This is designed as an interactive in-person (not remote or … Continue reading Starting With Data Science: A Rigorous Hands-On Introduction to Data Science for Engineers
The rquery R package has several places where the user can ask for what they have typed in to be substituted for a name or value stored in a variable. This becomes important as many of the rquery commands capture column names from un-executed code. So knowing if something is treated as a symbol/name (which … Continue reading rquery Substitution
Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. He compares a case-based approach (where the bin divisions are stuffed into code) with a join based approach. He shares code and timings. Best of all: rquery gets some attention and turns out to be the dominant … Continue reading Binning Data in a Database
We’ve been getting some good uptake on our piping in R article announcement. The article is necessarily a bit technical. But one of its key points comes from the observation that piping into names is a special opportunity to give general objects the following personality quiz: “If you were an R function, what function would … Continue reading “If You Were an R Function, What Function Would You Be?”
We forgot to say: R Journal Volume 10/2, December 2018 is out!
A huge thanks to the editors who work very hard to make this possible.
And big “thank you” helping improve, and for including our note on pipes in R.
Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front:
It appears !! is no longer the last word in substitution (it certainly wasn’t the first).
The described effect is actually already pretty easy t…
To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!). Here are our current examples: rquery and MonetDBLite rquery and RPostgreSQL rquery and RSQLite rquery and SparkR rquery and sparklyr For the MonetDBLite the query diagrammer shows a repeated calculation that … Continue reading Getting Started With rquery
Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows: I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section. Right assignment Right assignment is a bit of … Continue reading Playing With Pipe Notations
R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use. Introduction SQL represents value use by nesting. To use a query result within another query … Continue reading Query Generation in R
Manning has a new discount code and a free excerpt of our book Practical Data Science with R, 2nd Edition: here.
This section is elementary, but things really pick up speed as later on (also available in a paid preview).
In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys. The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys … Continue reading cdata Control Table Keys