Load the packages

suppressPackageStartupMessages({
library(knitr)
library(pander)
})

Introduction

R has many apply functions which are very helpful to simplify our code. The apply functions are all covered in dplyr package but it is still good to know the differences and how to use it. It is just too convenient to ignore them.

First, the following Mnemonics gives you an overview of what each *apply function do in general.

Mnemonics

Example

apply

For sum/mean of each row/columns, there are more optimzed function: colMeans, rowMeans, colSums, rowSums.While using apply to dataframe, it will automatically coerce it to a matrix.

# Two dimensional matrix
myMetric <- matrix(floor(runif(15,0,100)),5,3)
myMetric
##      [,1] [,2] [,3]
## [1,]   73   25   70
## [2,]   59   86   46
## [3,]   46   18   45
## [4,]   80   76   97
## [5,]   80   62   93
# apply min to rows
apply(myMetric,1,min)
## [1] 25 46 18 76 62
# apply min to columns
apply(myMetric,2,min)
## [1] 46 18 45

lapply

For list vector, it applys the function to each element in it. lapply is the workhourse under all * apply functions. The most fundamental one.

x <- list(a = runif(5,0,1), b = seq(1:10), c = seq(10:100))
lapply(x, FUN = mean)
## $a
## [1] 0.5111451
## 
## $b
## [1] 5.5
## 
## $c
## [1] 46

sapply

sapply is doing the same thing as lapply, it is just the output different. It simplifies the output to a vector rather than a list.

x <- list(a = runif(5,0,1), b = seq(1:10), c = seq(10:100))
sapply(x, FUN = mean)
##          a          b          c 
##  0.6295385  5.5000000 46.0000000

vapply - similar to sapply, just speed faster.

x <- list(a = runif(5,0,1), b = seq(1:10), c = seq(10:100))
vapply(x, FUN = length, FUN.VALUE = 0L)
##  a  b  c 
##  5 10 91

mapply and map

For when you have several data structures (e.g. vectors, lists) and you want to apply a function to the 1st elements of each, and then the 2nd elements of each, etc., coercing the result to a vector/array as in sapply.

Map is a wrapper to mapply with SIMPLIFY = FALSE, so it will be guaranteed to return a list.

mapply(sum, 1:5, 1:10,1:20)
##  [1]  3  6  9 12 15 13 16 19 22 25 13 16 19 22 25 23 26 29 32 35
mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
## 
## [[2]]
## [1] 2 2 2
## 
## [[3]]
## [1] 3 3
## 
## [[4]]
## [1] 4

rapply

This is a recursive apply, especially useful for a nested list structure. For example:

#Append ! to string, otherwise increment
myFun <- function(x){
    if (is.character(x)){
    return(paste(x,"!",sep=""))
    }
    else{
    return(x + 1)
    }
}

#A nested list structure
l <- list(a = list(a1 = "Boo", b1 = 2, c1 = "Eeek"), 
          b = 3, c = "Yikes", 
          d = list(a2 = 1, b2 = list(a3 = "Hey", b3 = 5)))


#Result is named vector, coerced to character           
rapply(l,myFun)
##     a.a1     a.b1     a.c1        b        c     d.a2  d.b2.a3  d.b2.b3 
##   "Boo!"      "3"  "Eeek!"      "4" "Yikes!"      "2"   "Hey!"      "6"
print('break')
## [1] "break"
#Result is a nested list like l, with values altered
rapply(l, myFun, how = "replace")
## $a
## $a$a1
## [1] "Boo!"
## 
## $a$b1
## [1] 3
## 
## $a$c1
## [1] "Eeek!"
## 
## 
## $b
## [1] 4
## 
## $c
## [1] "Yikes!"
## 
## $d
## $d$a2
## [1] 2
## 
## $d$b2
## $d$b2$a3
## [1] "Hey!"
## 
## $d$b2$b3
## [1] 6

tapply

For when you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

tapply is similar in spirit to the split-apply-combine functions that are common in R (aggregate, by, ave, ddply, etc.)

x <-  1:20
y = factor(rep(letters[1:5], each = 4))
tapply(x,y,sum)
##  a  b  c  d  e 
## 10 26 42 58 74