The process of improving the efficienty and speed of code ececution
It involves optimizing algorithms, data structures, and implementation techniques to reduce resource consumption and minimize execution time.
In R programming, perofmrance enhancement is essentual for handling large datasets, complex computations, and real-time application effectively.
Many R commands are written in C and thus run fast
Other commpands and your own R code are pure R. They can be slow.
Utilizing vectorized functions and operations to perform computations on entire vectors or matrices at once, instead of using slower, scalar operations or loops.
Packages like parallel, foreach, and doParallel facilitate parallel execution of code across multiple cores, enabling faster computation for tasks that can be parallelized.
Packages like Rcpp allow for seamless integration of compiled code written in languages like C or C++ with R, enabling faster execution of performance-critical tasks.
Vectorization is the process of performing operations on entire vectors or matrices at once, rather than iterating over individual elements.
Improved Performance: Vectorized operations are typically faster than equivalent scalar operations or loops.
Simplified Code: Vectorized code is often more concise and easier to read compared to equivalent iterative implementations.
Enhanced Efficiency: Vectorization leverages optimized internal routines and parallel processing capabilities of R.
Use vectorization to replace looping for speedup
The system.time() can be used to report time consumption. Check user
x <- runif(10^7)
y <- runif(10^7)
z <- vector(length = 10^7)
system.time(z <- x + y)
## user system elapsed
## 0.015 0.004 0.018
system.time(for(i in 1:length(z)){z[i] <- x[i]+y[i]})
## user system elapsed
## 0.239 0.002 0.241
Define a function that count the number of odd number in a vector
odd.count <- function(x){
c <- 0
for (i in 1:length(x)){
if(x[i] %% 2 == 1){
c <- c+1
}
}
return(c)
}
odd.count.vec <- function(x){
ret <- sum(x %% 2 == 1)
return(ret)
}
x <- sample(1:10000000, 1000000, replace = TRUE)
system.time(odd.count(x))
## user system elapsed
## 0.068 0.002 0.070
system.time(odd.count.vec(x))
## user system elapsed
## 0.004 0.000 0.003
ifelse()
which()
where()
any()
all()
Simulations are very common in statistical studies. Here is a simple simulation example.
sim.1 <- function(){
sum <- 0
nrep <- 1000000
for (i in 1:nrep){
xy <- rnorm(2)
sum <- sum + max(xy)
}
print(sum/nrep)
}
sim.2 <- function(){
sum <- 0
nrep <- 1000000
xy.mat <- matrix(rnorm(2*nrep), ncol=2)
maxs <- pmax(xy.mat[,1], xy.mat[,2]) #Returns the (regular or parallel) maxima and minima of the input values.
return(mean(maxs))
}
system.time(sim.1())
## [1] 0.5637604
## user system elapsed
## 0.425 0.003 0.428
system.time(sim.2())
## user system elapsed
## 0.039 0.003 0.042
Grow objects (via c(), cbind(), etc) during the loop is inefficient, especially when the object is large. R has to create a new object and copy across the information just to add a new element or row/column.
This issue becomes worse as object gets larger.
To speedup, allocate an object to hold all the results and fill it in during the loop.
grow.obj <- function(){
total.iteration <- 30000
ret <- NULL
for (i in 1:total.iteration){
ret <- c(ret, i)
}
return(ret)
}
system.time(grow.obj())
## user system elapsed
## 0.375 0.102 0.476
allocate.obj <- function(){
total.iteration <- 30000
ret <- rep(0, total.iteration)
for (i in 1:total.iteration){
ret[i] <- i
}
return(ret)
}
system.time(allocate.obj())
## user system elapsed
## 0.002 0.000 0.002
Element-wise Multiplication
# Scalar Operation
x <- c(1, 2, 3, 4, 5)
y <- c(6, 7, 8, 9, 10)
result_scalar <- numeric(length(x))
for (i in seq_along(x)) {
result_scalar[i] <- x[i] * y[i]
}
# Vectorized Operation
result_vectorized <- x * y
# Scalar Logarithm
x <- c(1, 2, 3, 4, 5)
result_scalar <- numeric(length(x))
for (i in seq_along(x)) {
result_scalar[i] <- log(x[i])
}
# Vectorized Logarithm
result_vectorized <- log(x)
# Scalar Function
scalar_function <- function(x) {
result <- numeric(length(x))
for (i in seq_along(x)) {
result[i] <- x[i] ^ 2 + 2 * x[i] + 1
}
return(result)
}
# Vectorized Function
vectorized_function <- function(x) {
return(x^2 + 2 * x + 1)
}
Vectorization is a fundamental concept in R for improving code performance and readability.
By leveraging vectorized operations and functions, developers can write more efficient and concise code.
Understanding and utilizing vectorization techniques is essential for optimizing R code for performance-critical tasks.
Vectorization
#x <- 1 / (sqrt(1) + 1) +1 / (sqrt(2) + 1) + 1 / (sqrt(n) + 1)
1 / (sqrt(1) + 1) +1 / (sqrt(2) + 1)
## [1] 0.9142136
get.sum <- function(n){
ret <- 0
for (i in 1:n){
one.ret <- 1 / (sqrt(i) +1)
ret <- ret + one.ret
}
return(ret)
}
get.sum(10)
## [1] 3.238236
system.time(get.sum(2^20))
## user system elapsed
## 0.021 0.000 0.021
#Vectorization
n = 2^20
sum(1/(sqrt(1:n) + 1))
## [1] 2033.782
system.time(sum(1/(sqrt(1:n) + 1)))
## user system elapsed
## 0.002 0.000 0.003
Growing Objects
grow.obj <- function(){
total.iteration <- 30000
ret <- NULL #Create an empty object
for (i in 1:total.iteration){ #for 1:30000
ret <- c(ret, i) #combine the values in the list
}
return(ret)
}
system.time(grow.obj())
## user system elapsed
## 0.391 0.106 0.498
allocate.obj <- function(){
total.iteration <- 30000
ret <- rep(0, total.iteration) #create the peroperly sized empty vector ahead of time
for (i in 1:total.iteration){
ret[i] <- i #assign each object to its position in the vector
}
return(ret)
}
system.time(allocate.obj())
## user system elapsed
## 0.000 0.000 0.001
Parallelization is the technique of dividing a task into smaller subtasks and executing them simultaneously on multiple processors or cores.
It aims to improve performance and efficiency by utilizing available computing resources effectively.
Faster Execution: Parallelization allows tasks to be completed more quickly by distributing the workload across multiple processors.
Efficient Resource Utilization: It maximizes the use of available hardware resources, such as multi-core CPUs or distributed computing clusters.
Scalability: Parallelization enables scaling to larger datasets or more complex computations without significantly increasing execution time.
Task Parallelism: Dividing a task into smaller independent subtasks that can be executed in parallel.
Data Parallelism: Distributing data across multiple processing units and performing the same operation on each subset simultaneously.
Pipeline Parallelism: Chaining multiple processing stages together, where each stage operates on different subsets of data concurrently.
R provides several packages and tools for parallel computing, allowing users to leverage parallelization techniques to speed up computations.
Popular packages include parallel, foreach, doParallel, future, and cluster
At times, R programs can take long CPU time to run
Many modern computers contains more than one CPU cores.
Parallel computing takes advantage of those computation cores.
Parallel usually significantly reduce the computation time. However, due to code copy, data copy and other traffics in OS, the speedup may not be linear with each added CPU core.
Not all of a task can be parallelized.
For example, within a for loop, if iteration i+1 requires results from iteration i.
Suppose we have a large dataset and need to perform time-consuming computations on each row.
Serial processing
result <- lapply(data, function(row) {
# Perform time-consuming computation on each row
})
Parallel processing
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
# Create a cluster
cl <- makeCluster(detectCores())
# Register the cluster for parallel processing
registerDoParallel(cl)
result <- foreach(i = 1:20, .combine = rbind) %dopar% {
# Perform time-consuming computation on each row
}
# Stop the cluster
stopCluster(cl)
Some computation intensive R functions / packages have built in multiple core suppor
E.g. xgboost package by Chen et al.
bst <- xgboost(data = sparse_matrix, label = output_vector, max_depth = 4, eta = 1, nthread = 2, nrounds = 10,objective = “binary:logistic”)
Other packages that has built-in multiple core support:
Parallel computing: Applications
https://cran.r-project.org/web/views/HighPerformanceComputing.html
For user defined functions, you can use mclapply() in Mac or Linux machines:
library(parallel)
library(MASS)
starts <- rep(100, 40)
fx <- function(nstart){
# Perform k-means clustering on a data matrix Boston.
return(kmeans(Boston, centers = 4, nstart=nstart))
}
numCores <- detectCores()
system.time(results <- lapply(starts, fx))
## user system elapsed
## 0.421 0.019 0.441
system.time(results <- mclapply(starts, fx, mc.cores = numCores))
## user system elapsed
## 0.452 0.156 0.083
starts <- rep(100, 400)
start.time <- proc.time()
data(“Boston”)
results <- lapply(starts, fx)
print(proc.time() - start.time)
user system elapsed
8.73
0.11 8.86
For Windows
start.time <- proc.time()
cl
<- makeCluster(8)
clusterExport(cl=cl, c(‘Boston’))
results
<- parLapply(cl,starts, fx)
print(proc.time() - start.time)
user system elapsed
0.02 0.09 5.12
# Demonstration on Parallel computing
library(MASS)
library(parallel)
#----------------
# Demonstration 1
#----------------
# This example involves the k-means clustering of bostan data set.
# We will check the performance before and after parallelization
data("Boston")
fx <- function(nstart){
# Perform k-means clustering on a data matrix Boston.
return(kmeans(Boston, centers = 4, nstart=nstart)) #creates 4 clusters of data
}
starts <- rep(100, 400) #100, 400 times
# Serial computation
results <- lapply(starts, fx) #do the fx function with 100 400 times
system.time(results <- lapply(starts, fx))
## user system elapsed
## 4.201 0.208 4.409
# Parallel computation
numCores <- detectCores()
cl <- makeCluster(8)
clusterExport(cl=cl, c('Boston'))
results <- parLapply(cl,starts, fx)
system.time(results <- parLapply(cl,starts, fx))
## user system elapsed
## 0.002 0.000 0.620
stopCluster(cl)
#----------------
# Demonstration 2
#----------------
# We crate a function that estimates the value of pi using a Monte Carlo simulation.
#We will parallelize this function using the parallel package in R and compare
# the execution time of the parallelized version with the serial version.
# Define function to estimate pi using Monte Carlo simulation
simulate_monte_carlo <- function(n) {
count_inside_circle <- 0
for (i in 1:n) { #for each iteration
x <- runif(1, min = -1, max = 1) #generate a random x and y coordinate between -1 and 1
y <- runif(1, min = -1, max = 1)
if (x^2 + y^2 <= 1) { #see if the point falls within the quarter
count_inside_circle <- count_inside_circle + 1
}
}
return(4 * count_inside_circle / n)
}
# Set up parameters
num_iterations <- 1000000
# Serial computation
system.time(pi_serial <- simulate_monte_carlo(num_iterations))
## user system elapsed
## 0.682 0.004 0.687
# Parallel computation
num_cores <- detectCores()
cl <- makeCluster(num_cores) #uses every available core
clusterExport(cl, c('simulate_monte_carlo'))
system.time(pi_parallel <- parSapply(cl, rep(num_iterations / num_cores, num_cores), simulate_monte_carlo))
## user system elapsed
## 0.001 0.000 0.075
system.time(pi_parallel <- parSapply(cl, num_iterations, simulate_monte_carlo))
## user system elapsed
## 0.000 0.000 0.662
stopCluster(cl)