To explore the speeding up of R code with the JVM-based implementation of (a subset of) R by the renjin project, and to compare it with other options, including R’s bytecode compiler, Rcpp, and vectorization.
Renjin in general, and the R package that contains renjin are described at http://docs.renjin.org/en/latest/package. It can be installed via the instructions given there (it is currently not on CRAN). Note that it requires a working instance of the R package rJava installed.
library("renjin")
We use this as an example of an algebra-heavy computation that involves a lengthy loop.
We approximate the below infinite sum by one with a large value of m.
\[\begin{equation} \frac{\pi}{4} = \lim_{m\to\infty}\sum_{n=0}^{m} \frac{(-1)^n}{2n+1} \end{equation}\]
compute_pi <- function(m) {
s = 0
sign = 1
for (n in 0:m) {
s = s + sign / (2 * n + 1)
sign = -sign
}
4 * s
}
compute_pi_bcc <- compiler::cmpfun(compute_pi)
Vectorization is often a very appropriate way to get rid of explicit loops in R, and thus to speed up code, and even make it more readable and more robust. However, vectorization also has two potential drawbacks:
When starting from a loop-based version, it requires manual rewriting of code. In the example here, that is simple, even trivial, but for more complex calculations, e.g., if the computations in one loop iteration depend on those from previous iterations, this may not always be easy or possible. (An example for that would be an iterative optimization algorithm.)
It requires the allocation of large vectors; in the example below, of the vector n as well as several temporary expressions that arise in the evaluation of the second line of the below function. A double precision float vector of length \(5\times10^8\) requires 3.7 GB.
compute_pi_vec_1 <- function(m) {
n <- 0:m
4 * sum((-1)^n / (2 * n + 1))
}
With a little bit more thinking, we can come up with a second approach, which avoids calculating (-1)^n and uses vectors that are half the full length.
compute_pi_vec_2 <- function(m) {
n <- seq(0, (m - 1) / 2, by = 2)
den <- 2 * n + 1
4 * (sum(1 / den) - sum(1 / (den + 2)))
}
Iterative algorithms can often be implemented in C / C++. However, one has to be able to program in multiple languages and the development process can be slow.
library("Rcpp")
compute_pi_cpp <- cppFunction("
double compute_pi_cpp(int m) {
double s = 0, sign = 1;
for (int n = 0; n <= m; ++n) {
s = s + sign / (2 * n + 1);
sign = -sign;
}
return 4*s;
}
")
pis <- rep(NA_real_, 6)
m <- 5e8
system.time(pis[1] <- compute_pi(m))
## user system elapsed
## 434.357 9.784 459.340
system.time(pis[2] <- compute_pi_bcc(m))
## user system elapsed
## 32.327 0.100 32.555
Indeed, R will already compile many functions by default, automatically, without your explicit calling of the compiler. This behavior depends on a global state setting of R that is accessed by the enableJIT function in the compiler package. For demonstration purposes, we have above disabled the automatic compilation.
system.time(pis[3] <- renjin(compute_pi(m)))
## user system elapsed
## 2.87 0.03 2.37
About half a second of the above timing is spent on the compilation of the code (rather than its execution) and would be saved when called again.
system.time(pis[4] <- compute_pi_vec_1(m))
## user system elapsed
## 12.042 5.353 17.995
system.time(pis[5] <- compute_pi_vec_2(m))
## user system elapsed
## 2.531 1.308 3.842
system.time(pis[6] <- compute_pi_cpp(m))
## user system elapsed
## 0.630 0.003 0.636
print(pis)
## [1] 3.141593 3.141593 3.141593 3.141593 3.141593 3.141593
max(abs(pis[-length(pis)] - pis[length(pis)]))
## [1] 5.999458e-09
Do something similar as above, but instead of computing \(\pi\), compute the Mandelbrot set: https://en.wikipedia.org/wiki/Mandelbrot_set
Such kinds of speed optimization are a last resort. First, make sure the code addresses the right problem and computes the correct solution. In comparison, this is more important than being fast. Using mathematical reasoning, symmetries, good approximations, the right algorithm and data structures is preferable to mindless optimization of clumsy code.
Parts of the development of renjin was funded by the European Commission through the Horizon 2020 project SOUND: http://www.sound-biomed.eu
devtools::session_info()
## Session info ---------------------------------------------------------------
## setting value
## version R Under development (unstable) (2018-07-16 r74967)
## system x86_64, darwin17.6.0
## ui unknown
## language (EN)
## collate en_US.UTF-8
## tz Europe/Berlin
## date 2018-07-18
## Packages -------------------------------------------------------------------
## package * version date source
## backports 1.1.2 2017-12-13 CRAN (R 3.6.0)
## base * 3.6.0 2018-07-16 local
## BiocStyle * 2.9.3 2018-06-13 Bioconductor
## bookdown 0.7 2018-02-18 CRAN (R 3.6.0)
## compiler 3.6.0 2018-07-16 local
## datasets * 3.6.0 2018-07-16 local
## devtools 1.13.6 2018-06-27 CRAN (R 3.6.0)
## digest 0.6.15 2018-01-28 CRAN (R 3.6.0)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.6.0)
## graphics * 3.6.0 2018-07-16 local
## grDevices * 3.6.0 2018-07-16 local
## htmltools 0.3.6 2017-04-28 CRAN (R 3.6.0)
## knitr 1.20 2018-02-20 CRAN (R 3.6.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.6.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.6.0)
## methods * 3.6.0 2018-07-16 local
## Rcpp * 0.12.17 2018-05-18 CRAN (R 3.6.0)
## renjin * 0.9.2648 2018-07-12 local
## rJava 0.9-10 2018-05-29 CRAN (R 3.6.0)
## rmarkdown 1.10 2018-06-11 CRAN (R 3.6.0)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.6.0)
## stats * 3.6.0 2018-07-16 local
## stringi 1.2.3 2018-06-12 CRAN (R 3.6.0)
## stringr 1.3.1 2018-05-10 CRAN (R 3.6.0)
## tools 3.6.0 2018-07-16 local
## utils * 3.6.0 2018-07-16 local
## withr 2.1.2 2018-03-15 CRAN (R 3.6.0)
## xfun 0.3 2018-07-06 CRAN (R 3.6.0)
## yaml 2.1.19 2018-05-01 CRAN (R 3.6.0)