Inside tapply() function, we need three items:

tapply( X, INDEX, FUN = , …)

where:

I’ll depict how to use it using an example:

# We'll first have a data frame

df <- data.frame(
  class = c ("A","A","A","B","B","B"),
  result = c ("1","0","1","1","0","1"),
  income = c (14, 15, 13, 13, 20, 6)
)

# Take a look at this data frame

df
##   class result income
## 1     A      1     14
## 2     A      0     15
## 3     A      1     13
## 4     B      1     13
## 5     B      0     20
## 6     B      1      6

In this data frame, I made up of 6 people:

Apply Function to One Variable, grouped by Another Variable

For example, we want to know the average income for each class.

# Find out the average income for each class:

tapply(df$income,df$class, mean)
##  A  B 
## 14 13

We can see that:

We can also use: na.rm argument inside tapply() to indicate that: we wish to calculate the mean while ignoring NA values in the data frame:

# I update my data.frame with one NA item

df_new <- data.frame(
  class = c ("A","A","A","B","B","B"),
  result = c ("1","0","1","1","0","1"),
  income = c (14, 15, 13, 13, NA, 6)
)

# Take a look!
df_new
##   class result income
## 1     A      1     14
## 2     A      0     15
## 3     A      1     13
## 4     B      1     13
## 5     B      0     NA
## 6     B      1      6

Then I calculate the average income for each class.

# It doesn't work if I don't include the *na.rm = TRUE*
tapply(df_new$income, df_new$class, mean)
##  A  B 
## 14 NA
tapply(df_new$income, df_new$class, mean,
       na.rm = TRUE)
##    A    B 
## 14.0  9.5

Furthermore, I with to know the average income for each result

tapply(df_new$income, df_new$result, mean, 
       na.rm = TRUE)
##    0    1 
## 15.0 11.5

Apply Function to One Variable, Grouped by Multiple Variables

For example, we want to find the:

for each

# find out the average income, grouped by class and result

tapply (df$income, list(df$class, df$result), mean)
##    0    1
## A 15 13.5
## B 20  9.5

We can interpret the result as:

Actually, you can think of conditional probability:

\[ E (income | succeed = 0, class = A) = 15 \]

Note: In this example we grouped by two variables, so it is necessary for us to use a list() function.

apply() function.

This function enables us to apply a function to the rows or columns of a matrix or data frame.


Basic syntax:

apply(X, margin, function)


Let’s show it by an example!

# Let's create a matrix

m <- matrix(1:9, nrow = 3, byrow = TRUE)

# Take a look at it:
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

For example, I want to compute the

# calculate the mean of each row
apply (m, 1, mean)
## [1] 2 5 8

We can check this by using:

rowMeans(m)
## [1] 2 5 8

Furthermore, I wish to find out the:

# find the sum of each column of m
apply(m, 2, sum)
## [1] 12 15 18

We can check this by using:

colSums(m)
## [1] 12 15 18

lapply() function

It helps us in applying functions on list objects and returns a list of object of the same length. It takes a vector or data frame as input and gives output in the form of a list object. It applies a certain operation to all the elements of the list it doesn’t need a MARGIN command.


Basic syntax:

lapply (X, function)


# I'll still using the matrix m created before
# Now, I defind a function of my own:

stretch <- function(x){
  x <- 2*x+3
}

t(lapply(m, stretch))
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 5    11   17   7    13   19   9    15   21

Usually, the result from lapply() function is a column vector, to save space, I just transpose the results.

As you can see, my function called stretch and it multiplies each cell of the matrix by 2 and add 3 on them.

Another example:

# Here, I have a list of names
names <- c ("abe", "bush", "charlie", "daisy")

lapply(names, toupper)
## [[1]]
## [1] "ABE"
## 
## [[2]]
## [1] "BUSH"
## 
## [[3]]
## [1] "CHARLIE"
## 
## [[4]]
## [1] "DAISY"

sapply()function.

This function helps us in applying functions on a list, vector, or data frame and returns an array or matrix object of the same length.

This sapply() function applies a certain operation to all the elements of the object so it doesn’t need a MARGIN.


Basic syntax: sapply(X, function)


We will examine how to use it using examples:

# remember the 3*3 matrix m, and the *stretch* function?

sapply(m, stretch)
## [1]  5 11 17  7 13 19  9 15 21

Differences between sapply() and lapply()?

  • lapply() always returns a list.

  • sapply() returns a simplified version of the result (vector or matrix) if possible; otherwise, returns a list.

For example:

df_diff <- data.frame(
  a = c (0:5),
  b = c (5:10),
  c = c (20:25)
)

df_diff
##   a  b  c
## 1 0  5 20
## 2 1  6 21
## 3 2  7 22
## 4 3  8 23
## 5 4  9 24
## 6 5 10 25
# using lapply()
lapply (df_diff, mean)
## $a
## [1] 2.5
## 
## $b
## [1] 7.5
## 
## $c
## [1] 22.5
# using sapply()
sapply(df_diff, mean)
##    a    b    c 
##  2.5  7.5 22.5