Apply function family is a basic function in R. The apply function include apply, lapply, sapply, mapply and tapply.
###(1) apply
apply: for dataframe, deal with row or column
apply(X, MARGIN, FUN, …)
a<-matrix(1:12,c(3,4))
a
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
apply(a,1,sum)
## [1] 22 26 30
apply(a,2,sum)
## [1] 6 15 24 33
###(2) lapply and sapply
The lapply and sapply are nearly same, only minor difference in the output formate. lapply and sapply deal with data by the column name or list name. data frame is a special list.
lapply(list, function, …)
a.df<-data.frame(a)
is.list(a.df)
## [1] TRUE
str(a.df)
## 'data.frame': 3 obs. of 4 variables:
## $ X1: int 1 2 3
## $ X2: int 4 5 6
## $ X3: int 7 8 9
## $ X4: int 10 11 12
lapply(a.df, function(x) x+3)
## $X1
## [1] 4 5 6
##
## $X2
## [1] 7 8 9
##
## $X3
## [1] 10 11 12
##
## $X4
## [1] 13 14 15
###(3) sapply
sapply(list, function, …, simplify=T) if simplify=F, then the output is same with lapply. if simplify=T, the the output formate is determined by the input formate.
yy<-sapply(a.df, function(x) x^2)
yy
## X1 X2 X3 X4
## [1,] 1 16 49 100
## [2,] 4 25 64 121
## [3,] 9 36 81 144
y1<-sapply(a.df, sum)
y1
## X1 X2 X3 X4
## 6 15 24 33
###(4) mapply
mapply is a multivariate sapply. This is suit for Loop the input.It apply a function to Multiple List or Vector arguments.
mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
mapply(function(x,y) x^y, c(1:5), c(1:5))
## [1] 1 4 27 256 3125
###(5) tapply
tapply: it applies to different factors.
tapply(X, INDEX, FUN = NULL, …, simplify = TRUE) X: a vector INDEX: factor
df <- data.frame(year=kronecker(2001:2003, rep(1,4)), loc=c('beijing','beijing','shanghai','shanghai'), type=rep(c('A','B'),6), sale=rep(1:12))
df
## year loc type sale
## 1 2001 beijing A 1
## 2 2001 beijing B 2
## 3 2001 shanghai A 3
## 4 2001 shanghai B 4
## 5 2002 beijing A 5
## 6 2002 beijing B 6
## 7 2002 shanghai A 7
## 8 2002 shanghai B 8
## 9 2003 beijing A 9
## 10 2003 beijing B 10
## 11 2003 shanghai A 11
## 12 2003 shanghai B 12
tapply(df$sale,df[,c('year','loc')],sum)
## loc
## year beijing shanghai
## 2001 3 7
## 2002 11 15
## 2003 19 23