We recently saw this UX (user experience) question from the tidyr author as he adapts tidyr to cdata techniques. The terminology that he is not adopting from cdata is “unpivot_to_blocks()” and “pivot_to_rowrecs()”. One of the research ideas in the cdata package is that the important thing to call out is record structure. The important point … Continue reading Why we Did Not Name the cdata Transforms wide/tall/long/short
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practices (or at least correlates with other good pracices)? Well, it turns out we can quantify it: each additional non-core … Continue reading Quantifying R Package Dependency Risk
Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks. If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish … Continue reading Software Dependencies and Risk
I am collecting here some notes on testing in R. There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is … Continue reading Unit Tests in R
Let’s try some “ugly corner cases” for data manipulation in R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong. Let’s see what happens when we try to stick a fork in the … Continue reading Data Manipulation Corner Cases
Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front:
It appears !! is no longer the last word in substitution (it certainly wasn’t the first).
The described effect is actually already pretty easy t…
Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows: I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section. Right assignment Right assignment is a bit of … Continue reading Playing With Pipe Notations
Manning has a new discount code and a free excerpt of our book Practical Data Science with R, 2nd Edition: here.
This section is elementary, but things really pick up speed as later on (also available in a paid preview).
In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys. The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys … Continue reading cdata Control Table Keys
To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts. String-interpolation … Continue reading Make Teaching R Quasi-Quotation Easier
While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant. This is derived from one of the check equations for the Moore–Penrose inverse and … Continue reading A Beautiful 2 by 2 Matrix Identity
While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on … Continue reading Timing the Same Algorithm in R, Python, and C++
In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more. My position on code-capturing interfaces (or non-standard-evaluation/NSE) is: if poorly handled, they can be a large interface price/risk to pay for the minor convenience of … Continue reading Quoting Concatenate
This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement. The original published timings were as follows: With performance metrics: measurements are marketing. So let’s dig in the above a bit. These timings are of the kind of small task large number of repetition breed that Matt Dowle writes … Continue reading Timing Grouped Mean Calculation in R
Our group has done a lot of work with non-standard calling conventions in R. Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of … Continue reading Very Non-Standard Calling in R
This note is just a quick follow-up to our last note on correcting the bias in estimated standard deviations for binomial experiments. For normal deviates there is, of course, a well know scaling correction that returns an unbiased estimate for observed standard deviations. It (from the same source): … provides an example where imposing the … Continue reading More on Bias Corrected Standard Deviation Estimates
This note is about attempting to remove the bias brought in by using sample standard deviation estimates to estimate an unknown true standard deviation of a population. We establish there is a bias, concentrate on why it is not important to remove it for reasonable sized samples, and (despite that) give a very complete bias … Continue reading How to de-Bias Standard Deviation Estimates