Learning Apply Family Functions in R

The objective of this worksheet is to enhance knowledge of the apply family functions.

1. `apply` function

apply : apply function to each column or row of matrix (more accurately array). Since, dataframe is also matrix, therefore, apply also works on dataframe. See example,

Df <- data.frame(a = 1:10, b = rnorm(10), c = runif(10, 10, 100))
apply(Df, 1, sum) # one as second argument indicate provide sum of each row.

##  [1]  21.39524  64.37107  98.04714  31.53353  49.88592  37.38125  55.24396
##  [8] 107.99997  87.07680  57.44775

apply(Df, 2, sum) # provide sum of each column

##          a          b          c 
##  55.000000   1.756291 553.626336

#More example: 
#Summary statistics of each column
apply(Df, 2, summary)

##             a           b        c
## Min.     1.00 -1.32141966 20.77769
## 1st Qu.  3.25 -0.52045449 33.59466
## Median   5.50  0.09476273 47.72110
## Mean     5.50  0.17562908 55.36263
## 3rd Qu.  7.75  0.92338168 75.28305
## Max.    10.00  1.64472951 98.35524

#only mean of each column
apply(Df, 2, mean)

##          a          b          c 
##  5.5000000  0.1756291 55.3626336

#min of each column
apply(Df, 2, min)

##        a        b        c 
##  1.00000 -1.32142 20.77769

#sum of square of each column 
apply(Df, 2, function(x) sum(x^2))

##            a            b            c 
##   385.000000     9.150518 37433.727216

#Number of NA in each column
apply(Df, 2, function(x) sum(is.na(x)))

## a b c 
## 0 0 0

Try following excercise:

# Ex1: Why sum of first 2 column is not returned? 
#Create following R object: 
Dfm <- data.frame(a = 1:10, b = 10:1, letters[1:10])
apply(Dfm, 2, sum)

## Error in FUN(newX[, i], ...): invalid 'type' (character) of argument

#Ex2: Get mean and variance using single line function for each column. 
Df <- data.frame(a = 1:10, b = rnorm(10), c = runif(10, 10, 100))

#Ex3: Replace NA's value with 0 in following dataframe using single line function using apply
Df <- data.frame(a = c(1:9, NA), b = sample(c(1, 5, NA), size = 10, replace = TRUE))

2. `lapply` function

lapply apply a Function to each element of a List or Vector. Lapply returns output in the list, of the same length of X (input vector or list).

v <- c(1, 2, 3, 4, 5)

#square root of each element
lapply(v, sqrt)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 1.414214
## 
## [[3]]
## [1] 1.732051
## 
## [[4]]
## [1] 2
## 
## [[5]]
## [1] 2.236068

#square of each element
lapply(v, function(x) x^2)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 9
## 
## [[4]]
## [1] 16
## 
## [[5]]
## [1] 25

#Function on list
Ls <- list(a = 1:10, b = runif(5), c = rnorm(15))

#length of each element
lapply(Ls, length)

## $a
## [1] 10
## 
## $b
## [1] 5
## 
## $c
## [1] 15

#mean of each element
lapply(Ls, mean)

## $a
## [1] 5.5
## 
## $b
## [1] 0.528497
## 
## $c
## [1] -0.02633585

#summary statistics of each element of  list
lapply(Ls, summary)

## $a
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00 
## 
## $b
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.07896 0.20182 0.45567 0.52850 0.91396 0.99208 
## 
## $c
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -1.08278 -0.44829 -0.12568 -0.02634  0.36911  1.46365

Try following excercise:

#Ex1: Why following code works on dataframe? 
Df <- data.frame(a = 1:10, b = 10:1, c = rnorm(10)) 
lapply(Df, mean)

## $a
## [1] 5.5
## 
## $b
## [1] 5.5
## 
## $c
## [1] -0.3396287

#Ex2: calculate number of missing values for each column of data in list (Lsd). 
data(cars)
data("mtcars")
Lsd <- list(cars = cars, mtcars = mtcars)

#Ex3: Calculate summary of each column of data in Lsd. 

#Ex4: Write a function that use lapply within a lapply. 

#Ex5: Attempt all `apply` excercise using lapply

3. `sapply` function

Sapply is same as lapply, but it simplifies the output to vector or matrix, if possible. sapply(x, f, simplify = FALSE, USE.NAMES = FALSE) is the same as lapply(x, f). Same example using sapply, and see the difference.

v <- c(1, 2, 3, 4, 5)

#square root of each element
sapply(v, sqrt)

## [1] 1.000000 1.414214 1.732051 2.000000 2.236068

#square of each element
sapply(v, function(x) x^2)

## [1]  1  4  9 16 25

#Function on list
Ls <- list(a = 1:10, b = runif(5), c = rnorm(15))

#length of each element
sapply(Ls, length)

##  a  b  c 
## 10  5 15

#mean of each element
sapply(Ls, mean)

##           a           b           c 
##  5.50000000  0.30585782 -0.03530268

#summary statistics of each element of  list
sapply(Ls, summary) #may be much better.

##             a         b           c
## Min.     1.00 0.1492890 -1.14370132
## 1st Qu.  3.25 0.1936619 -0.58296544
## Median   5.50 0.2005962 -0.18599022
## Mean     5.50 0.3058578 -0.03530268
## 3rd Qu.  7.75 0.2083238  0.41617079
## Max.    10.00 0.7774181  1.60228060

Attempt all excercise of lapply using sapply and check the differences in the outcome. If you find yourself typing unlist(lapply(...)), stop and consider sapply.

4. `vapply` function

vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. For example,

vapply(Df, FUN = mean, FUN.VALUE = 0) #since return values are numeric, it works.

##          a          b          c 
##  5.5000000  5.5000000 -0.3396287

vapply(Df, FUN = mean, FUN.VALUE = "a")  #does not work

## Error in vapply(Df, FUN = mean, FUN.VALUE = "a"): values must be type 'character',
##  but FUN(X[[1]]) result is type 'double'

For more information, please refer ?vapply.

5. `mapply` function

This is useful when you have several data structures (e.g. vectors, lists) and you want to apply a function to the 1st elements of each, and then the 2nd elements of each, etc., coercing the result to a vector/array as in sapply.

#Sums the 1st elements, the 2nd elements, etc. 
mapply(sum, 1:5, 1:5, 1:5)

## [1]  3  6  9 12 15

mapply(rep, 1:4, 4:1)

## [[1]]
## [1] 1 1 1 1
## 
## [[2]]
## [1] 2 2 2
## 
## [[3]]
## [1] 3 3
## 
## [[4]]
## [1] 4

#To generate random numbers with different mean and standard deviation
mapply(FUN = function(x, y) rnorm(5, x, y), 1:5, 5:1)

##            [,1]       [,2]      [,3]     [,4]     [,5]
## [1,] -1.4199575 10.9466578 -1.178147 2.480393 4.231029
## [2,]  2.7047023  9.4547140  3.964918 2.161408 5.467605
## [3,]  0.6912268  0.9505227  2.268016 4.452615 4.822792
## [4,]  9.2267466 -2.5584926 -2.002012 3.025329 5.724015
## [5,] 10.1348753 -0.7810312  2.342527 7.124626 5.635718

I never felt the need of mapply function for any of the problem.

Learning Apply Family Functions in R

Neeraj Jain

24/03/2020

1. `apply` function

2. `lapply` function

3. `sapply` function

4. `vapply` function

5. `mapply` function

Learning Apply Family Functions in R

Neeraj Jain

24/03/2020

1. apply function

2. lapply function

3. sapply function

4. vapply function

5. mapply function

1. `apply` function

2. `lapply` function

3. `sapply` function

4. `vapply` function

5. `mapply` function