Posts Tagged ‘ Programming ’

Managing intermediate results when using R/sparklyr

June 9, 2017
By
Managing intermediate results when using R/sparklyr

In our latest “R and big data” article we show how to manage intermediate results in non-trivial Apache Spark workflows using R, sparklyr, dplyr, and replyr. Handle management Many Sparklyr tasks involve creation of intermediate or temporary tables. This can be through dplyr::copy_to() and through dplyr::compute(). These handles can represent a reference leak and eat … Continue reading Managing intermediate results when using R/sparklyr

Read more »

More on safe substitution in R

June 7, 2017
By
More on safe substitution in R

Let’s worry a bit about substitution in R. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake. From Advanced R : substitute: We can confirm the above code performs no substitution: a <- 1 b <- 2 substitute(a + … Continue reading More on safe substitution in R

Read more »

There is usually more than one way in R

June 5, 2017
By

Python has a fairly famous design principle (from “PEP 20 — The Zen of Python”): There should be one– and preferably only one –obvious way to do it. Frankly in R (especially once you add many packages) there is usually more than one way. As an example we will talk about the common R functions: … Continue reading There is usually more than one way in R

Read more »

Quick illustration of Metropolis and Metropolis-in-Gibbs Sampling in R

June 4, 2017
By
Quick illustration of Metropolis and Metropolis-in-Gibbs Sampling in R

The code below gives a simple implementation of the Metropolis and Metropolis-in-Gibbs sampling algorithms, which are useful for sampling probability densities for which the normalizing constant is difficult to calculate, are irregular, or have high dimension (Metropolis-in-Gibbs). ## Metropolis sampling ## x - current value of Markov chain (numeric vector) ## targ - target log … Continue reading Quick illustration of Metropolis and Metropolis-in-Gibbs Sampling in R →

Read more »

In defense of wrapr::let()

June 1, 2017
By
In defense of wrapr::let()

Saw this the other day: In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say: let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the column names will come later … Continue reading In defense of wrapr::let()

Read more »

Statistical computing with Scala free on-line course

May 31, 2017
By
Statistical computing with Scala free on-line course

I’ve recently delivered a three-day intensive short-course on Scala for statistical computing and data science. The course seemed to go well, and the experience has convinced me that Scala should be used a lot more by statisticians and data scientists for a range of problems in statistical computing. In particular, the simplicity of writing fast … Continue reading Statistical computing with Scala free on-line course

Read more »

Why to use wrapr::let()

May 2, 2017
By
Why to use wrapr::let()

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers. Abstractions A common definition of an abstraction is (from the OSX dictionary): the process of considering something independently of its associations, attributes, or concrete accompaniments. In … Continue reading Why to use wrapr::let()

Read more »

Teaching pivot / un-pivot

April 11, 2017
By
Teaching pivot / un-pivot

Authors: John Mount and Nina Zumel Introduction In teaching thinking in terms of coordinatized data we find the hardest operations to teach are joins and pivot. One thing we commented on is that moving data values into columns, or into a “thin” or entity/attribute/value form (often called “un-pivoting”, “stacking”, “melting” or “gathering“) is easy to … Continue reading Teaching pivot / un-pivot

Read more »

Coordinatized Data: A Fluid Data Specification

March 29, 2017
By
Coordinatized Data: A Fluid Data Specification

Authors: John Mount and Nina Zumel. Introduction It has been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and understanding of this concept doesn’t … Continue reading Coordinatized Data: A Fluid Data Specification

Read more »

Superpixels in imager

March 24, 2017
By
Superpixels in imager

Superpixels are used in image segmentation as a pre-processing step. Instead of segmenting pixels directly, we first group similar pixels into “super-pixels”, which can then be processed further (and more cheaply). (image from Wikimedia) The current version of imager doesn’t implement them, but it turns out that SLIC superpixels are particularly easy to implement. SLIC […]

Read more »


Subscribe

Email:

  Subscribe