15 June, 2020

Credits

Should you bother?

Premature optimisation is the root of all evil

~Donald Knuth

  • But …
  • Sometimes slow code is slowing you down
  • A little optimisation goes a long way
  • Nobody wants to be slow

Before you optimise

  1. Be ready to test correctness of your code
  2. Be ready to test the speed of your code

Because … common mistakes when optimising are;

  • Writing faster but incorrect code
  • Writing code that isn’t actually faster!

Microbenchmarking

The bench package provides useful methods to accurately measure the performance of code

x <- runif(100)

lb <- bench::mark(
  sqrt(x),
  x ^ 0.5
)

lb %>% select(expression,total_time) %>% 
  kable("html") 
expression total_time
sqrt(x) 8.29ms
x^0.5 26.64ms

Optimisation Strategies

  • Do as little as possible
  • Avoid io calls
  • Avoid copies
  • Vectorise
  • Rewrite your R code in C++
  • Parallelise

Avoiding iocalls

The slowest possible thing you can ask a computer to do is read from a file on disk

bench::mark(
  c("a","b","c"),
  read_rds("mydata.Rds")
) %>% select(expression,total_time) %>% kable("html") 
expression total_time
c(“a”, “b”, “c”) 2.24ms
read_rds(“mydata.Rds”) 425.8ms

Do as little as possible

Often this is as simple as avoiding needlessly redoing the same thing. What’s wrong with this code?

dna2codons <- function(dna){  
  starts=seq(1,nchar(dna),by=3);
  substring(dna,starts,starts+2)
}
translate <- function(dna){
  codontable <- read_tsv("data/genetic_code.tsv")
  codondict <- codontable$AA; names(codondict) <- codontable$Codon
  codondict[dna2codons(dna)]  
}

dna <- "ATGGGGACCATGAAG"
translate(dna)
## ATG GGG ACC ATG AAG 
## "M" "G" "T" "M" "K"

Do as little as possible

codontable <- read_tsv("data/genetic_code.tsv")
codondict <- codontable$AA; names(codondict) <- codontable$Codon

translate_fast <- function(dna,codondict){
  codondict[dna2codons(dna)]  
}

dna <- "ATGGGGACCATGAAG"

bench::mark(
  translate(dna),
  translate_fast(dna,codondict)
) %>% select(expression,total_time) %>% kable("html") 
expression total_time
translate(dna) 491ms
translate_fast(dna, codondict) 220ms

Avoid copies

We made our way into the second Circle, here live the gluttons.

~ The R inferno

The second slowest thing a computer can do is allocate memory.

Because R mostly automatically manages memory it’s easy to accidentally write code that is slow because is needlessly copies data (and reallocates memory).

Avoid copies

Here are two ways to make the same vector

grow <- function(n){
  vec <- numeric(0)
  for(i in 1:n) vec <- c(vec, i)
  vec
}

assign <- function(n){
  vec <- numeric(n)
  for(i in 1:n) vec[i] <- i
  vec
}
expression total_time
grow(100) 365ms
assign(100) 82.5ms

Vectorise

Consider the following code. Can you think of a faster way to do this?

x <- runif(10)
lsum = 0
for(i in 1:10){
  lsum <- lsum + log(x[i])
}
lsum <- sum(log(x))

Vectorise

Let’s benchmark these approaches.

x <- runif(10)
lsum_loop <- function(x){
  lsum = 0
  for(i in 1:10){  lsum <- lsum + log(x[i])}
  lsum
}

bench::mark(
  lsum_loop(x),
  sum(log(x))
) %>% select(expression,total_time) %>% kable("html") 
expression total_time
lsum_loop(x) 14.84ms
sum(log(x)) 7.03ms

Rewrite in C++

Vectorisation depends on the availability of appropriate vectorised functions (eg log) in the previous example.

Sometimes it is difficult or impossible to rewrite your code to use these built-in functions. Eg when;

  • You have loops where subsequent iterations depend on previous ones
  • You need to call functions millions of times (eg when using recursion)
  • Your problem is just too complex to break down into R’s vectorised functions

In such cases you can use the Rcpp package to write your own C++ function in R.

Rewrite in C++

Parallelise

Most modern computers have multiple cores. If your problem can easily be broken into independent chunks you can most likely achieve a big performance boost simply by running those chunks in parallel on separate cores

The parallel package provides a really simple way to achieve this. It provides the function mclapply() as a multicore alternative to the usual lapply()

*On windows this is slightly messier but still easy.

See http://adv-r.had.co.nz/Profiling.html#parallelise

Parallelise

Suppose we have a function which takes some to do its work.

This function just waits and does nothing but in a real appliction it would be doing some operation on a chunk of input data.

We can use lapply() to run it 10 times. In this case our input data consists of the numbers 1 to 10.

##    user  system elapsed 
##   0.000   0.000   5.025

Parallelise

Since the operation on each chunk of data is independent we can each chunk on a separate core using mclapply(). First. How many cores do we have?

library(parallel)
detectCores()
## [1] 6
system.time(mclapply(1:10,pause(0.5), mc.cores = 6))
##    user  system elapsed 
##   0.009   0.011   1.012

We had 6 cores and our speedup was a factor of 5. This is because there is a little overhead in running all those separate processes.

Profiling

In a complex codebase it can sometimes be hard to spot the slow parts. Modern computers also have lots of clever tricks so sometimes your intuition about what is fast and what is slow can be wrong.

This is where Profiling comes in. A profiling tool will track your code as it runs, measuring the time spent within each function and on each line of code. The profvis tool provides this functionality to R programmers.