This talk mostly follows chapters 23 and 24 of the book Advanced R by Hadley Wickham.
Another source of inspiration, and a worthwhile read is the R inferno by Patrick Burns
15 June, 2020
This talk mostly follows chapters 23 and 24 of the book Advanced R by Hadley Wickham.
Another source of inspiration, and a worthwhile read is the R inferno by Patrick Burns
Premature optimisation is the root of all evil
~Donald Knuth
Because … common mistakes when optimising are;
The bench package provides useful methods to accurately measure the performance of code
x <- runif(100)
lb <- bench::mark(
sqrt(x),
x ^ 0.5
)
lb %>% select(expression,total_time) %>%
kable("html")
| expression | total_time |
|---|---|
| sqrt(x) | 8.29ms |
| x^0.5 | 26.64ms |
The slowest possible thing you can ask a computer to do is read from a file on disk
bench::mark(
c("a","b","c"),
read_rds("mydata.Rds")
) %>% select(expression,total_time) %>% kable("html")
| expression | total_time |
|---|---|
| c(“a”, “b”, “c”) | 2.24ms |
| read_rds(“mydata.Rds”) | 425.8ms |
Often this is as simple as avoiding needlessly redoing the same thing. What’s wrong with this code?
dna2codons <- function(dna){
starts=seq(1,nchar(dna),by=3);
substring(dna,starts,starts+2)
}
translate <- function(dna){
codontable <- read_tsv("data/genetic_code.tsv")
codondict <- codontable$AA; names(codondict) <- codontable$Codon
codondict[dna2codons(dna)]
}
dna <- "ATGGGGACCATGAAG"
translate(dna)
## ATG GGG ACC ATG AAG ## "M" "G" "T" "M" "K"
codontable <- read_tsv("data/genetic_code.tsv")
codondict <- codontable$AA; names(codondict) <- codontable$Codon
translate_fast <- function(dna,codondict){
codondict[dna2codons(dna)]
}
dna <- "ATGGGGACCATGAAG"
bench::mark(
translate(dna),
translate_fast(dna,codondict)
) %>% select(expression,total_time) %>% kable("html")
| expression | total_time |
|---|---|
| translate(dna) | 491ms |
| translate_fast(dna, codondict) | 220ms |
We made our way into the second Circle, here live the gluttons.
~ The R inferno
The second slowest thing a computer can do is allocate memory.
Because R mostly automatically manages memory it’s easy to accidentally write code that is slow because is needlessly copies data (and reallocates memory).
Here are two ways to make the same vector
grow <- function(n){
vec <- numeric(0)
for(i in 1:n) vec <- c(vec, i)
vec
}
assign <- function(n){
vec <- numeric(n)
for(i in 1:n) vec[i] <- i
vec
}
| expression | total_time |
|---|---|
| grow(100) | 365ms |
| assign(100) | 82.5ms |
Consider the following code. Can you think of a faster way to do this?
x <- runif(10)
lsum = 0
for(i in 1:10){
lsum <- lsum + log(x[i])
}
lsum <- sum(log(x))
Let’s benchmark these approaches.
x <- runif(10)
lsum_loop <- function(x){
lsum = 0
for(i in 1:10){ lsum <- lsum + log(x[i])}
lsum
}
bench::mark(
lsum_loop(x),
sum(log(x))
) %>% select(expression,total_time) %>% kable("html")
| expression | total_time |
|---|---|
| lsum_loop(x) | 14.84ms |
| sum(log(x)) | 7.03ms |
Vectorisation depends on the availability of appropriate vectorised functions (eg log) in the previous example.
Sometimes it is difficult or impossible to rewrite your code to use these built-in functions. Eg when;
In such cases you can use the Rcpp package to write your own C++ function in R.
Rewriting in C++ means that your function will need be compiled before it can be used. It also involves learning a few new concepts and some C++.
A good place to start (as always) is Hadley Wickham’s, Advanced R chapter on the topic
http://adv-r.had.co.nz/Rcpp.html
I’ve used this approach twice in R packages and both times it gave me at least a 10x speed improvement
https://github.com/iracooke/AlignStat https://github.com/iracooke/ampir
Most modern computers have multiple cores. If your problem can easily be broken into independent chunks you can most likely achieve a big performance boost simply by running those chunks in parallel on separate cores
The parallel package provides a really simple way to achieve this. It provides the function mclapply() as a multicore alternative to the usual lapply()
*On windows this is slightly messier but still easy.
Suppose we have a function which takes some to do its work.
This function just waits and does nothing but in a real appliction it would be doing some operation on a chunk of input data.
We can use lapply() to run it 10 times. In this case our input data consists of the numbers 1 to 10.
## user system elapsed ## 0.000 0.000 5.025
Since the operation on each chunk of data is independent we can each chunk on a separate core using mclapply(). First. How many cores do we have?
library(parallel) detectCores()
## [1] 6
system.time(mclapply(1:10,pause(0.5), mc.cores = 6))
## user system elapsed ## 0.009 0.011 1.012
We had 6 cores and our speedup was a factor of 5. This is because there is a little overhead in running all those separate processes.
In a complex codebase it can sometimes be hard to spot the slow parts. Modern computers also have lots of clever tricks so sometimes your intuition about what is fast and what is slow can be wrong.
This is where Profiling comes in. A profiling tool will track your code as it runs, measuring the time spent within each function and on each line of code. The profvis tool provides this functionality to R programmers.