I was curious about Drew's example showing that doing the sum,
\[
\sum_{i=1}^n (-1)^{i+1} \log(i)
\]
in R using a for loop took only
\(\approx 1.3\) times longer than doing it in a vectorized fashion.
That went against most things I had been told or encountered myself
when programming in R.
Investigating further seemed like a good opportunity to practice with Markdown (and maybe get an RPubs account).
So, let's recreate it. Here is the function that uses a for loop:
f1 <- function(ind) {
sum <- 0
for (i in ind) {
if (i%%2 == 0)
sum <- sum - log(i) else sum <- sum + log(i)
}
sum
}
Here is version 2, that does an sapply:
f2 <- function(ind) {
sum(sapply(X = ind, FUN = function(i) if (i%%2 == 0) -log(i) else log(i)))
}
And here we have the vectorized version:
f3 <- function(ind) {
sign <- replicate(length(ind), 1L)
sign[seq.int(from = 2, to = length(ind), by = 2)] <- -1L
sum(sign * log(ind))
}
And while we are at it, let's throw another one into the mix that uses vectorization and relies on recycling so we don't have to type as much:
f4 <- function(ind) {
sum(c(1, -1) * log(ind))
}
So, now we rbenchmark these:
library(rbenchmark)
ind <- 1:50000
benchmark(f1(ind), f2(ind), f3(ind), f4(ind), replications = 10)
## test replications elapsed relative user.self sys.self user.child
## 1 f1(ind) 10 0.985 31.77 0.981 0.005 0
## 2 f2(ind) 10 2.378 76.71 2.360 0.018 0
## 3 f3(ind) 10 0.748 24.13 0.736 0.011 0
## 4 f4(ind) 10 0.031 1.00 0.027 0.004 0
## sys.child
## 1 0
## 2 0
## 3 0
## 4 0
Holy Crap Batman! We see here that f1 didn't do nearly as well
as f3 because f1 did well, but rather because f3 did quite poorly.
This, of course, begs the question, “which part of the perfectly fine-looking
R code in f3 is the culprit?”
But before we try to answer that question
I am going to take a quick digression into microbenchmark which is a
package that Hadley Wickham recommends in the advanced R book that he is
writing (check it out here and the section on
microbenchmarking is here). If
you haven't run across this book, do check it out. It is a fabulous resource.
Let's try it:
library(microbenchmark)
microbenchmark(f1(ind), f2(ind), f3(ind), f4(ind), times = 10)
## Unit: milliseconds
## expr min lq median uq max neval
## f1(ind) 97.311 97.839 102.397 105.304 120.28 10
## f2(ind) 173.658 183.795 192.045 220.447 276.78 10
## f3(ind) 71.584 78.554 93.350 100.370 131.16 10
## f4(ind) 1.626 1.689 2.254 5.087 26.19 10
OK, similar results, but perhaps with more accurate timings. Apparently
microbenchark is intended for timing only very small pieces of code, but
this is what the authors say about it:
microbenchmarkserves as a more accurate replacement of the often seensystem.time(replicate(1000, expr))expression. It tries hard to accurately measure only the time it takes to evaluate expr. To achieved this, the sub-millisecond (supposedly nanosecond) accurate timing functions most modern operating systems provide are used. Additionally all evaluations of the expressions are done in C code to minimze any overhead.
Let's try some of the cool profiling methods that Drew told us about.
Here we will run f3 25 times while running Rprof()
Rprof(interval = 0.02)
for (i in 1:20) f3(ind)
Rprof(NULL)
summaryRprof()
Now, we don't actually evaluate that chunk while knitting to HTML because when
evaluated within knitr it is imbedded within some evaluation functions that
make the summaryRprof ouput rather nasty. But I will just paste it in here:
$by.self
self.time self.pct total.time total.pct
"lapply" 0.82 56.94 1.22 84.72
"FUN" 0.40 27.78 0.40 27.78
"unlist" 0.10 6.94 0.62 43.06
"f3" 0.06 4.17 1.44 100.00
"*" 0.04 2.78 0.04 2.78
"unique.default" 0.02 1.39 0.02 1.39
$by.total
total.time total.pct self.time self.pct
"f3" 1.44 100.00 0.06 4.17
"replicate" 1.34 93.06 0.00 0.00
"sapply" 1.34 93.06 0.00 0.00
"lapply" 1.22 84.72 0.82 56.94
"simplify2array" 0.64 44.44 0.00 0.00
"unlist" 0.62 43.06 0.10 6.94
"unique" 0.60 41.67 0.00 0.00
"FUN" 0.40 27.78 0.40 27.78
"*" 0.04 2.78 0.04 2.78
"unique.default" 0.02 1.39 0.02 1.39
$sample.interval
[1] 0.02
$sampling.time
[1] 1.44
Aha! There is the problem. Just a typo really. f3 uses replicate where it
probably wants to use rep or even rep.int. Check out the source code for
replicate:
function (n, expr, simplify = "array")
sapply(integer(n), eval.parent(substitute(function(...) expr)),
simplify = simplify)
<bytecode: 0x100eeaf20>
<environment: namespace:base>
replicate is actually a wrapper for sapply.
So, let's see what happens
if we use rep.int
there instead. Make a new function like f3 but use rep.int instead, and call
that f5:
f5 <- function(ind) {
sign <- rep.int(1L, length(ind))
sign[seq.int(from = 2, to = length(ind), by = 2)] <- -1L
sum(sign * log(ind))
}
Then microbenchmark them all:
microbenchmark(f1(ind), f2(ind), f3(ind), f4(ind), f5(ind), times = 10)
## Unit: milliseconds
## expr min lq median uq max neval
## f1(ind) 101.396 108.730 114.925 121.844 148.53 10
## f2(ind) 181.568 204.382 213.661 231.774 248.90 10
## f3(ind) 58.376 62.911 81.815 88.162 100.24 10
## f4(ind) 1.593 1.614 1.987 5.724 14.18 10
## f5(ind) 2.488 2.515 5.364 7.019 15.77 10
That seems a little more like it. f5 is now almost as fast as f4.
Cool Stuff.