The goal of this tutorial is to learn how to create vectors in a efficient way in order to use the minimum computational power possible.
Let’s see three different methods for creating a vector:
x <- 1:5
x_seq <- seq(from = 1, to = 5)
x_seq1 <- seq(from = 1, to = 5, by = 1)
x_loop <- vector(mode = "integer", length = 0)
for (i in 1:5) {
x_loop[i] <- i
}
Seems that all the methods used leads to the same result, and that’s completely true. But do they all have the same computational efficiency? Let’s check it with a package that measures the time the CPU uses for reading and executing the code.
library(microbenchmark)
microbenchmark(
x <- 1:5,
x_seq <- seq(from = 1, to = 5),
x_seq1 <- seq(from = 1, to = 5, by = 1),
x_loop <- vector(mode = "integer", length = 0),
for (i in 1:5) {
x_loop[i] <- i
}
)
## Unit: nanoseconds
## expr min lq
## x <- 1:5 0 0.0
## x_seq <- seq(from = 1, to = 5) 5531 6716.0
## x_seq1 <- seq(from = 1, to = 5, by = 1) 15802 19160.5
## x_loop <- vector(mode = "integer", length = 0) 0 395.0
## for (i in 1:5) { x_loop[i] <- i } 1409187 1461927.5
## mean median uq max neval
## 359.60 395 790.0 1580 100
## 10168.88 8296 13037.0 38321 100
## 28452.43 26469 34172.5 63210 100
## 1189.25 1185 1580.0 11062 100
## 1596208.69 1526125 1596643.0 2552496 100
As it can be seen, when creating a vector, it is better to avoid functions or loops as much as possible as they make the code to do calls to more pieces of internal code for applying the function.
Although you have different methods to reach the same result, you must look for the most efficient way, as doing executions of poor efficient code with big pieces of data will make you wait much more than using efficient code.