1 Versions of this document

Rmarkdown source: https://gist.github.com/wolfganghuber/909e14e45af6888eec384b82682b3766
Rendered HTML: http://rpubs.com/WolfgangHuber/404211

2 Purpose of this gist

To explore the speeding up of R code with the JVM-based implementation of (a subset of) R by the renjin project, and to compare it with other options, including R’s bytecode compiler, Rcpp, and vectorization.

Renjin in general, and the R package that contains renjin are described at http://docs.renjin.org/en/latest/package. It can be installed via the instructions given there (it is currently not on CRAN). Note that it requires a working instance of the R package rJava installed.

library("renjin")

3 Computing pi via the Leibniz formula

We use this as an example of an algebra-heavy computation that involves a lengthy loop. We approximate the below infinite sum by one with a large value of m.

\[\begin{equation} \frac{\pi}{4} = \lim_{m\to\infty}\sum_{n=0}^{m} \frac{(-1)^n}{2n+1} \end{equation}\]

compute_pi <- function(m) {
    s = 0
    sign = 1
    for (n in 0:m) {
        s = s + sign / (2 * n + 1)
        sign = -sign
    }
    4 * s
}

3.1 R’s built-in byte code compiler

compute_pi_bcc <- compiler::cmpfun(compute_pi)

3.2 Vectorization

Vectorization is often a very appropriate way to get rid of explicit loops in R, and thus to speed up code, and even make it more readable and more robust. However, vectorization also has two potential drawbacks:

When starting from a loop-based version, it requires manual rewriting of code. In the example here, that is simple, even trivial, but for more complex calculations, e.g., if the computations in one loop iteration depend on those from previous iterations, this may not always be easy or possible. (An example for that would be an iterative optimization algorithm.)
It requires the allocation of large vectors; in the example below, of the vector n as well as several temporary expressions that arise in the evaluation of the second line of the below function. A double precision float vector of length \(5\times10^8\) requires 3.7 GB.

compute_pi_vec_1 <- function(m) {
  n <- 0:m
  4 * sum((-1)^n / (2 * n + 1))
}

With a little bit more thinking, we can come up with a second approach, which avoids calculating (-1)^n and uses vectors that are half the full length.

compute_pi_vec_2 <- function(m) {
  n <- seq(0, (m - 1) / 2, by = 2)
  den <- 2 * n + 1
  4 * (sum(1 / den) - sum(1 / (den + 2)))
}

3.3 Rcpp

Iterative algorithms can often be implemented in C / C++. However, one has to be able to program in multiple languages and the development process can be slow.

library("Rcpp")
compute_pi_cpp <- cppFunction("
    double compute_pi_cpp(int m) {
        double s = 0, sign = 1;
        for (int n = 0; n <= m; ++n) {
            s = s + sign / (2 * n + 1);
            sign = -sign;
        }
        return 4*s;
    }
")

4 Benchmark

pis <- rep(NA_real_, 6)
m <- 5e8

4.1 Ordinary uncompiled R function

system.time(pis[1] <- compute_pi(m))

##    user  system elapsed 
## 434.357   9.784 459.340

4.2 Compiled R function

system.time(pis[2] <- compute_pi_bcc(m))

##    user  system elapsed 
##  32.327   0.100  32.555

Indeed, R will already compile many functions by default, automatically, without your explicit calling of the compiler. This behavior depends on a global state setting of R that is accessed by the enableJIT function in the compiler package. For demonstration purposes, we have above disabled the automatic compilation.

4.3 renjin

system.time(pis[3] <- renjin(compute_pi(m)))

##    user  system elapsed 
##    2.87    0.03    2.37

About half a second of the above timing is spent on the compilation of the code (rather than its execution) and would be saved when called again.

4.4 Vectorized

system.time(pis[4] <- compute_pi_vec_1(m))

##    user  system elapsed 
##  12.042   5.353  17.995

system.time(pis[5] <- compute_pi_vec_2(m))

##    user  system elapsed 
##   2.531   1.308   3.842

4.5 Rcpp

system.time(pis[6] <- compute_pi_cpp(m))

##    user  system elapsed 
##   0.630   0.003   0.636

4.6 Check that the results are the same

print(pis)

## [1] 3.141593 3.141593 3.141593 3.141593 3.141593 3.141593

max(abs(pis[-length(pis)] - pis[length(pis)]))

## [1] 5.999458e-09

5 Exercise

Do something similar as above, but instead of computing \(\pi\), compute the Mandelbrot set: https://en.wikipedia.org/wiki/Mandelbrot_set

6 Disclaimer

Such kinds of speed optimization are a last resort. First, make sure the code addresses the right problem and computes the correct solution. In comparison, this is more important than being fast. Using mathematical reasoning, symmetries, good approximations, the right algorithm and data structures is preferable to mindless optimization of clumsy code.

7 Acknowledgements

Parts of the development of renjin was funded by the European Commission through the Horizon 2020 project SOUND: http://www.sound-biomed.eu

8 Session Info

devtools::session_info()

## Session info ---------------------------------------------------------------

##  setting  value                                             
##  version  R Under development (unstable) (2018-07-16 r74967)
##  system   x86_64, darwin17.6.0                              
##  ui       unknown                                           
##  language (EN)                                              
##  collate  en_US.UTF-8                                       
##  tz       Europe/Berlin                                     
##  date     2018-07-18

## Packages -------------------------------------------------------------------

##  package   * version  date       source        
##  backports   1.1.2    2017-12-13 CRAN (R 3.6.0)
##  base      * 3.6.0    2018-07-16 local         
##  BiocStyle * 2.9.3    2018-06-13 Bioconductor  
##  bookdown    0.7      2018-02-18 CRAN (R 3.6.0)
##  compiler    3.6.0    2018-07-16 local         
##  datasets  * 3.6.0    2018-07-16 local         
##  devtools    1.13.6   2018-06-27 CRAN (R 3.6.0)
##  digest      0.6.15   2018-01-28 CRAN (R 3.6.0)
##  evaluate    0.10.1   2017-06-24 CRAN (R 3.6.0)
##  graphics  * 3.6.0    2018-07-16 local         
##  grDevices * 3.6.0    2018-07-16 local         
##  htmltools   0.3.6    2017-04-28 CRAN (R 3.6.0)
##  knitr       1.20     2018-02-20 CRAN (R 3.6.0)
##  magrittr    1.5      2014-11-22 CRAN (R 3.6.0)
##  memoise     1.1.0    2017-04-21 CRAN (R 3.6.0)
##  methods   * 3.6.0    2018-07-16 local         
##  Rcpp      * 0.12.17  2018-05-18 CRAN (R 3.6.0)
##  renjin    * 0.9.2648 2018-07-12 local         
##  rJava       0.9-10   2018-05-29 CRAN (R 3.6.0)
##  rmarkdown   1.10     2018-06-11 CRAN (R 3.6.0)
##  rprojroot   1.3-2    2018-01-03 CRAN (R 3.6.0)
##  stats     * 3.6.0    2018-07-16 local         
##  stringi     1.2.3    2018-06-12 CRAN (R 3.6.0)
##  stringr     1.3.1    2018-05-10 CRAN (R 3.6.0)
##  tools       3.6.0    2018-07-16 local         
##  utils     * 3.6.0    2018-07-16 local         
##  withr       2.1.2    2018-03-15 CRAN (R 3.6.0)
##  xfun        0.3      2018-07-06 CRAN (R 3.6.0)
##  yaml        2.1.19   2018-05-01 CRAN (R 3.6.0)

Speeding up ‘loopy’ R code

18 July 2018

Contents