m x n matrix with randomly assigned 0/1

August 28, 2012
By

(This article was originally published at Quantum Forest » rblogs, and syndicated at StatsBlogs.)

Today Scott Chamberlain tweeted asking for a better/faster solution to building an m x n matrix with randomly assigned 0/1. He already had a working version:

r <- 1000
c <- 1000
m0 <- matrix(0, r, c)
apply(m0, c(1,2), function(x) sample(c(0,1),1)) 

Now, I’m the first to acknowledge that I’ve never got the ‘apply’ family of functions and that—thanks Hadley—if I need to do something like that I go for the plyr package. Nevertheless, if you think about it, the starting point is a sugary version of a loop; I like loops because they are explicit and clear, but they can be very slow in R. The loop could be written as:

m00 <- matrix(0, r, c)
for(i in 1:r){
  for(j in 1:c){
    m00[i, j] <- sample(c(0,1),1)
  }
}

In contrast, my first idea was to generate a bunch of uniformly distributed [0, 1) random numbers and round them to the closest integer, which is a more 'matricy' way of thinking:

m1 <- round(matrix(runif(r*c), r, c)))

and it happens to be a lot faster. My second idea, and Edmund Hart beat me to that one, was to use something like:

m2 <- matrix(rbinom(r*c,1,0.5),r,c)

which is a nice option, generating r*c random numbers following a binomial distribution, which also has the advantage of allowing for different probabilities, rather than 0.5 as in m1. This is also faster than m1, although probably one would not notice the difference until working with fairly big matrices. When Scott pointed out the speed bump he got me thinking about order of the operations; would this be better?

m3 <- matrix(round(runif(r*c)), r, c)

In terms of speed, m3 fits between m1 and m2, so the order can make a difference.

David Smith and Rafael Maia came up with a different approach, using sample() which I had not considered at all. m4 has the advantage that one could use any number of values to randomize.

m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c)

Any of these options can be timed using the system.time() function as in system.time(m3 <- matrix(round(runif(r*c)), r, c)). It's interesting how different people come up with alternative strategies (reflecting different ways of thinking about the problem) using exactly the same language. I'm not sure what should be the 'obvious' solution; but for almost any purpose any of them (with the exception of the explicit loop) will work just fine.

Gratuitous picture: training horses in the beach (Photo: Luis).

P.S. 2012-08-29, 22:33. Another way that would work (although 4 times slower than m1) would be m5 <- matrix(ifelse(runif(r*c) < 0.5, 0, 1)). Still, it takes less than 0.5 seconds for a 1,000,000 elements matrix.
P.S. 2012-09-04. Scott Chamberlain wrote his own blog post discussing the different suggestions. Even cooler, Dirk Eddelbuettel implemented even faster creation of random 1/0 matrices using Rcpp, with inline C++ code. Much faster in a crazy overkill sort of way.



Please comment on the article here: Quantum Forest » rblogs

Tags: , ,


Subscribe

Email:

  Subscribe