1 Goal


The goal of this tutorial is to get familiar with the apply family of functions using typical examples.


2 Data preparation


# First we load the libraries
library(ggplot2)

# In this example we will use the open repository of plants classification Iris. 
data("iris")
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We are going to remove the non-numerical variables
iris$Species <- NULL
str(iris)
## 'data.frame':    150 obs. of  4 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

3 Apply family of functions

3.1 The apply function


# The apply family of functions is the alternative to loops in R
# It works on pieces of data from matrices, vectors, dataframes, etc.

# In the case of the apply function it takes three arguments
# The first one is the dataframe to work with
# The second one is if the action is going to be applied on rows:1 or columns:2
# The third one is the function to be used in every vector of the dataframe

# Example: calculating the average of each column 

apply(iris, 2, mean)
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333
# Example: calculating the average of each row
head(apply(iris, 1, mean))
## [1] 2.550 2.375 2.350 2.350 2.550 2.850

3.2 The lapply function


# Inside of the apply function are several parameters that define how the output is built
# One of this parameters is "simplify", which decides if the output is a list (FALSE) or a numeric vector (TRUE)
# Let's calculate the mean of each column of the iris dataset with lapply that returns each column as an element of a list

lapply(iris[, 1:ncol(iris)], mean)
## $Sepal.Length
## [1] 5.843333
## 
## $Sepal.Width
## [1] 3.057333
## 
## $Petal.Length
## [1] 3.758
## 
## $Petal.Width
## [1] 1.199333
# We can add extra parameters to define the configuration of the mean function
lapply(iris[, 1:ncol(iris)], mean, na.rm = TRUE)
## $Sepal.Length
## [1] 5.843333
## 
## $Sepal.Width
## [1] 3.057333
## 
## $Petal.Length
## [1] 3.758
## 
## $Petal.Width
## [1] 1.199333

3.3 The sapply function


# Inside of the apply function are several parameters that define how the output is built
# One of this parameters is "simplify", which decides if the output is a list (FALSE) or a numeric vector (TRUE)
# Let's calculate the mean of each column of the iris dataset with sapply that returns a vector with the result

sapply(iris[, 1:ncol(iris)], mean)
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333
# We can add extra parameters to define the configuration of the mean function
sapply(iris[, 1:ncol(iris)], mean, na.rm = TRUE)
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333
# Now we can compare this result with the lapply function and see that it is the same

sapply(iris[, 1:ncol(iris)], mean, na.rm = TRUE, simplify = FALSE)
## $Sepal.Length
## [1] 5.843333
## 
## $Sepal.Width
## [1] 3.057333
## 
## $Petal.Length
## [1] 3.758
## 
## $Petal.Width
## [1] 1.199333

4 Conclusion


In this tutorial we have made a first approximation to the apply family of functions, including apply, lapply and sapply. Notice that this family is bigger and we encourage you to explore the other functions contained in this family.