(This article was originally published at Statistics – Win-Vector Blog, and syndicated at StatsBlogs.)
The R
package cdata
now has version 0.7.0
available from CRAN
.
cdata
is a data manipulation package that subsumes many higher order data manipulation operations including pivot/un-pivot, spread/gather, or cast/melt. The record to record transforms are specified by drawing a table that expresses the record structure (called the “control table” and also the link between the key concepts of row-records and block-records).
What can be quickly specified and achieved using these concepts and notations is amazing and quite teachable. These transforms can be run in-memory or in remote database or big-data systems (such as Spark).
The concepts are taught in Nina Zumel’s excellent tutorial.

And in John Mount’s quick screencast/lecture.
The 0.7.0
update adds local versions of the operators in addition to the Spark and database implementations. These methods should now be a bit safer for in-memory complex/annotated types such as dates and times.
Please comment on the article here: Statistics – Win-Vector Blog