I've just been trying out the parallel library in R, which provides a pretty simple means of using multiple processors to do calculations. It comes with some nice functions like pvec, which runs a function on a vector split over multiple processors. The example given is for converting dates to Unix format - the parallel processing method should be faster than doing it the standard single-core way. However, let's see what happens.
Create a vector of 1000 random dates.
library(parallel)
N <- 10000
dates <- sprintf("%04d-%02d-%02d", as.integer(2000 + rnorm(N)), as.integer(runif(N,
1, 12)), as.integer(runif(N, 1, 28)))
Time how long it takes for the standard function:
system.time(a <- as.POSIXct(dates))
## user system elapsed
## 2.356 0.003 2.359
Try it with four cores:
options(mc.cores = 4)
system.time(b <- pvec(dates, as.POSIXct))
## user system elapsed
## 15.37 127.56 59.44
The multi-core method takes 22x as long as just using a single processor! The main time-sink, it seems, is at the system level - there's some serious overhead there. I'm running this in a Macbook Air, on battery power. Hardly a workhorse, but still, what's going on?