Directory


Performance Enhancement


Background

The process of improving the efficienty and speed of code ececution

It involves optimizing algorithms, data structures, and implementation techniques to reduce resource consumption and minimize execution time.

In R programming, perofmrance enhancement is essentual for handling large datasets, complex computations, and real-time application effectively.

Many R commands are written in C and thus run fast

Other commpands and your own R code are pure R. They can be slow.


Writing Fast R Code

Vectorization

Utilizing vectorized functions and operations to perform computations on entire vectors or matrices at once, instead of using slower, scalar operations or loops.

Parallel Processing

Packages like parallel, foreach, and doParallel facilitate parallel execution of code across multiple cores, enabling faster computation for tasks that can be parallelized.

Compiled Code Integration

Packages like Rcpp allow for seamless integration of compiled code written in languages like C or C++ with R, enabling faster execution of performance-critical tasks.


Vectorization

Vectorization is the process of performing operations on entire vectors or matrices at once, rather than iterating over individual elements.

Benefits of Vectorization:

Improved Performance: Vectorized operations are typically faster than equivalent scalar operations or loops.

Simplified Code: Vectorized code is often more concise and easier to read compared to equivalent iterative implementations.

Enhanced Efficiency: Vectorization leverages optimized internal routines and parallel processing capabilities of R.


A Simple Example

Use vectorization to replace looping for speedup

The system.time() can be used to report time consumption. Check user

x <- runif(10^7)
y <- runif(10^7)
z <- vector(length = 10^7)
system.time(z <- x + y)
##    user  system elapsed 
##   0.015   0.004   0.018
system.time(for(i in 1:length(z)){z[i] <- x[i]+y[i]})
##    user  system elapsed 
##   0.239   0.002   0.241

Example with User Defined Function

Define a function that count the number of odd number in a vector

odd.count <- function(x){
  c <- 0
  for (i in 1:length(x)){
    if(x[i] %% 2 == 1){
      c <- c+1
    }
  }
  return(c)
}

odd.count.vec <- function(x){
  ret <- sum(x %% 2 == 1)
  return(ret)
}

Time Consumption

x <- sample(1:10000000, 1000000, replace = TRUE)

system.time(odd.count(x))
##    user  system elapsed 
##   0.068   0.002   0.070
system.time(odd.count.vec(x))
##    user  system elapsed 
##   0.004   0.000   0.003

More Vectorized Functions


ifelse()
which()
where()
any()
all()

For Matrix:
rowSums()
colSums()

Simulation Speedup

Simulations are very common in statistical studies. Here is a simple simulation example.

sim.1 <- function(){
  sum <- 0
  nrep <- 1000000
  for (i in 1:nrep){
    xy <- rnorm(2)
    sum <- sum + max(xy)
  }
  print(sum/nrep)
}

sim.2 <- function(){
  sum <- 0
  nrep <- 1000000
  xy.mat <- matrix(rnorm(2*nrep), ncol=2)
  maxs <- pmax(xy.mat[,1], xy.mat[,2]) #Returns the (regular or parallel) maxima and minima of the input values.
  return(mean(maxs))  
}

Time Consumption

system.time(sim.1())
## [1] 0.5637604
##    user  system elapsed 
##   0.425   0.003   0.428
system.time(sim.2())
##    user  system elapsed 
##   0.039   0.003   0.042

Growing Objects

Grow objects (via c(), cbind(), etc) during the loop is inefficient, especially when the object is large. R has to create a new object and copy across the information just to add a new element or row/column.

This issue becomes worse as object gets larger.

To speedup, allocate an object to hold all the results and fill it in during the loop.

grow.obj <- function(){
  total.iteration <- 30000
  ret <- NULL
  for (i in 1:total.iteration){
    ret <- c(ret, i)
  }
  return(ret)
}

system.time(grow.obj())
##    user  system elapsed 
##   0.375   0.102   0.476
allocate.obj <- function(){
  total.iteration <- 30000
  ret <- rep(0, total.iteration)
  for (i in 1:total.iteration){
    ret[i] <- i
  }
  return(ret)
}

system.time(allocate.obj())
##    user  system elapsed 
##   0.002   0.000   0.002

A Few more Examples

Element-wise Multiplication

# Scalar Operation

x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)
result_scalar <- numeric(length(x))
for (i in seq_along(x)) {
  result_scalar[i] <- x[i] * y[i]
}

# Vectorized Operation

result_vectorized <- x * y


# Scalar Logarithm

x <- c(1, 2, 3, 4, 5)
result_scalar <- numeric(length(x))
for (i in seq_along(x)) {
  result_scalar[i] <- log(x[i])
}

# Vectorized Logarithm

result_vectorized <- log(x)


# Scalar Function

scalar_function <- function(x) {
  result <- numeric(length(x))
  for (i in seq_along(x)) {
    result[i] <- x[i] ^ 2 + 2 * x[i] + 1
  }
  return(result)
}

# Vectorized Function

vectorized_function <- function(x) {
  return(x^2 + 2 * x + 1)
}

Key Takeaways

Vectorization is a fundamental concept in R for improving code performance and readability.

By leveraging vectorized operations and functions, developers can write more efficient and concise code.

Understanding and utilizing vectorization techniques is essential for optimizing R code for performance-critical tasks.


Vectorization Demo Code

Vectorization

#x <- 1 / (sqrt(1) + 1) +1 / (sqrt(2) + 1) + 1 / (sqrt(n) + 1)

1 / (sqrt(1) + 1) +1 / (sqrt(2) + 1)
## [1] 0.9142136
get.sum <- function(n){
  ret <- 0
for (i in 1:n){
  one.ret <- 1 / (sqrt(i) +1)
  ret <- ret + one.ret
}
return(ret)
}

get.sum(10)
## [1] 3.238236
system.time(get.sum(2^20))
##    user  system elapsed 
##   0.021   0.000   0.021
#Vectorization
n = 2^20

sum(1/(sqrt(1:n) + 1))
## [1] 2033.782
system.time(sum(1/(sqrt(1:n) + 1)))
##    user  system elapsed 
##   0.002   0.000   0.003

Growing Objects

grow.obj <- function(){
  total.iteration <- 30000
  ret <- NULL #Create an empty object
  for (i in 1:total.iteration){ #for 1:30000
    ret <- c(ret, i) #combine the values in the list
  }
  return(ret)
}

system.time(grow.obj())
##    user  system elapsed 
##   0.391   0.106   0.498
allocate.obj <- function(){
  total.iteration <- 30000
  ret <- rep(0, total.iteration) #create the peroperly sized empty vector ahead of time
  for (i in 1:total.iteration){
    ret[i] <- i #assign each object to its position in the vector
  }
  return(ret)
}

system.time(allocate.obj())
##    user  system elapsed 
##   0.000   0.000   0.001

Parallel


Background

What

Parallelization is the technique of dividing a task into smaller subtasks and executing them simultaneously on multiple processors or cores.

It aims to improve performance and efficiency by utilizing available computing resources effectively.

Why

Faster Execution: Parallelization allows tasks to be completed more quickly by distributing the workload across multiple processors.

Efficient Resource Utilization: It maximizes the use of available hardware resources, such as multi-core CPUs or distributed computing clusters.

Scalability: Parallelization enables scaling to larger datasets or more complex computations without significantly increasing execution time.

Types

Task Parallelism: Dividing a task into smaller independent subtasks that can be executed in parallel.

Data Parallelism: Distributing data across multiple processing units and performing the same operation on each subset simultaneously.

Pipeline Parallelism: Chaining multiple processing stages together, where each stage operates on different subsets of data concurrently.


Parallel in R

R provides several packages and tools for parallel computing, allowing users to leverage parallelization techniques to speed up computations.

Popular packages include parallel, foreach, doParallel, future, and cluster

At times, R programs can take long CPU time to run

Many modern computers contains more than one CPU cores.

Parallel computing takes advantage of those computation cores.

Parallel usually significantly reduce the computation time. However, due to code copy, data copy and other traffics in OS, the speedup may not be linear with each added CPU core.


When to Parallel

Not all of a task can be parallelized.

For example, within a for loop, if iteration i+1 requires results from iteration i.


Simple Example

Suppose we have a large dataset and need to perform time-consuming computations on each row.

Serial processing

result <- lapply(data, function(row) {
  # Perform time-consuming computation on each row
})

Parallel processing

library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
# Create a cluster
cl <- makeCluster(detectCores())

# Register the cluster for parallel processing
registerDoParallel(cl)  

result <- foreach(i = 1:20, .combine = rbind) %dopar% {
  # Perform time-consuming computation on each row
}

# Stop the cluster
stopCluster(cl)

Functions with Built-In Multiple Core Support

Some computation intensive R functions / packages have built in multiple core suppor

E.g. xgboost package by Chen et al. 

bst <- xgboost(data = sparse_matrix, label = output_vector, max_depth = 4, eta = 1, nthread = 2, nrounds = 10,objective = “binary:logistic”)

Other packages that has built-in multiple core support:

Parallel computing: Applications

https://cran.r-project.org/web/views/HighPerformanceComputing.html


Use mclapply()

For user defined functions, you can use mclapply() in Mac or Linux machines:

library(parallel)
library(MASS)

starts <- rep(100, 40)
fx <- function(nstart){
  # Perform k-means clustering on a data matrix Boston.
  return(kmeans(Boston, centers = 4, nstart=nstart))
}
numCores <- detectCores()

system.time(results <- lapply(starts, fx))
##    user  system elapsed 
##   0.421   0.019   0.441
system.time(results <- mclapply(starts, fx, mc.cores = numCores))
##    user  system elapsed 
##   0.452   0.156   0.083

Windows parLapply()

starts <- rep(100, 400)
start.time <- proc.time()
data(“Boston”)
results <- lapply(starts, fx)
print(proc.time() - start.time)
user system elapsed
8.73 0.11 8.86

For Windows
start.time <- proc.time()
cl <- makeCluster(8)
clusterExport(cl=cl, c(‘Boston’))
results <- parLapply(cl,starts, fx)
print(proc.time() - start.time)
user system elapsed
0.02 0.09 5.12


Parallel Demo Code

# Demonstration on Parallel computing

library(MASS)
library(parallel)

#----------------
# Demonstration 1
#----------------

# This example involves the k-means clustering of bostan data set.
# We will check the performance before and after parallelization

data("Boston")
fx <- function(nstart){
  # Perform k-means clustering on a data matrix Boston.
  return(kmeans(Boston, centers = 4, nstart=nstart)) #creates 4 clusters of data
}


starts <- rep(100, 400) #100, 400 times

# Serial computation
results <- lapply(starts, fx) #do the fx function with 100 400 times
system.time(results <- lapply(starts, fx))
##    user  system elapsed 
##   4.201   0.208   4.409
# Parallel computation
numCores <- detectCores()
cl <- makeCluster(8)
clusterExport(cl=cl, c('Boston'))
results <- parLapply(cl,starts, fx)
system.time(results <- parLapply(cl,starts, fx))
##    user  system elapsed 
##   0.002   0.000   0.620
stopCluster(cl)

#----------------
# Demonstration 2
#----------------
# We crate a function that estimates the value of pi using a Monte Carlo simulation. 
#We will parallelize this function using the parallel package in R and compare 
# the execution time of the parallelized version with the serial version.

# Define function to estimate pi using Monte Carlo simulation
simulate_monte_carlo <- function(n) {
  count_inside_circle <- 0
  for (i in 1:n) { #for each iteration
    x <- runif(1, min = -1, max = 1) #generate a random x and y coordinate between -1 and 1
    y <- runif(1, min = -1, max = 1)
    if (x^2 + y^2 <= 1) { #see if the point falls within the quarter
      count_inside_circle <- count_inside_circle + 1
    }
  }
  return(4 * count_inside_circle / n) 
}

# Set up parameters
num_iterations <- 1000000

# Serial computation
system.time(pi_serial <- simulate_monte_carlo(num_iterations))
##    user  system elapsed 
##   0.682   0.004   0.687
# Parallel computation
num_cores <- detectCores()
cl <- makeCluster(num_cores) #uses every available core
clusterExport(cl, c('simulate_monte_carlo'))
system.time(pi_parallel <- parSapply(cl, rep(num_iterations / num_cores, num_cores), simulate_monte_carlo))
##    user  system elapsed 
##   0.001   0.000   0.075
system.time(pi_parallel <- parSapply(cl, num_iterations, simulate_monte_carlo))
##    user  system elapsed 
##   0.000   0.000   0.662
stopCluster(cl)