library(stringr)

It is import to note that the dplyr library can be used to achieve the same results as the apply() functions with efficiency and clean code (if working with DataFrames / Tibles).

apply()

Calculate body mass index given, height and weight.

weight <- c(
  94, 85, 82, 100, 83, 85, 77, 80, 64, 57, 98, 95, 85, 90, 51,
  74, 88, 61, 66, 62
)
height <- c(
  1.62, 1.82, 1.66, 1.85, 1.88, 1.52, 1.71, 1.86, 1.72, 1.68,
  1.88, 1.68, 1.77, 1.73, 1.54, 1.86, 1.6, 1.52, 1.63, 1.88
)

# combine vectors into a matrix
df <- data.frame(weight, height)

# define a function to calculate BMI
calculate_bmi <- function(row) {
  weight <- row[1]
  height <- row[2]
  return(weight / (height ^ 2))
}
df$bmi <- apply(df, 1, calculate_bmi)
df
##    weight height      bmi
## 1      94   1.62 35.81771
## 2      85   1.82 25.66115
## 3      82   1.66 29.75758
## 4     100   1.85 29.21841
## 5      83   1.88 23.48348
## 6      85   1.52 36.79017
## 7      77   1.71 26.33289
## 8      80   1.86 23.12406
## 9      64   1.72 21.63332
## 10     57   1.68 20.19558
## 11     98   1.88 27.72748
## 12     95   1.68 33.65930
## 13     85   1.77 27.13141
## 14     90   1.73 30.07117
## 15     51   1.54 21.50447
## 16     74   1.86 21.38976
## 17     88   1.60 34.37500
## 18     61   1.52 26.40235
## 19     66   1.63 24.84098
## 20     62   1.88 17.54187

lapply()

The lapply() function performs an operation on a vector and returns a list.

Example

Capitalize the first letter of all the columns of a DataFrame.

df <- data.frame(
  Name = c("ALICE", "Daniel", "james", "Mary"),
  Education = c('secondary', 'primary', 'Tertiary', "SECONDARY"),
  Age = c(34, 45, 28, 21)
)
df
##     Name Education Age
## 1  ALICE secondary  34
## 2 Daniel   primary  45
## 3  james  Tertiary  28
## 4   Mary SECONDARY  21
# create a function
capitalize_first_letter <- function(x) {
  if (is.character(x)) {
    x <- paste0(toupper(substr(x, 1, 1)), substr(tolower(x), 2, nchar(x)))
  }
  return(x)
}
# apply it to all the variables of the DataFrame
df1 = df
df1[] <- lapply(df1, capitalize_first_letter)
df1
##     Name Education Age
## 1  Alice Secondary  34
## 2 Daniel   Primary  45
## 3  James  Tertiary  28
## 4   Mary Secondary  21

The above can be achieved using the str_to_title() function from stringr library.

df2 = df
df2[, 1:2] <- lapply(df2[, 1:2], str_to_title)
df2
##     Name Education Age
## 1  Alice Secondary  34
## 2 Daniel   Primary  45
## 3  James  Tertiary  28
## 4   Mary Secondary  21

Notice that in the above, you must specify the columns (in this case were have specified the string columns only using df2[, 1:2]), failing to do so will convert the numeric column Age to character. The above can be modified so that R detects the numeric columns by itself, which will be the practical approach when dealing with DataFrames that have several variables. That is, it is impractical to specify the variables manually as we have done above.

df3 = df
title_case <- function(df) {
  df[] <- lapply(df, function(col) {
    if (is.character(col)) str_to_title(col) else col
  })
  return(df)
}
df3 <- title_case(df3)
df3
##     Name Education Age
## 1  Alice Secondary  34
## 2 Daniel   Primary  45
## 3  James  Tertiary  28
## 4   Mary Secondary  21

sapply()

The sapply() function works like lapply() only that it attempts to simplify the result to either a vector or a matrix.

Example

df4 = df
df4[, 1:2] <- sapply(df4[, 1:2], str_to_title)
df4
##     Name Education Age
## 1  Alice Secondary  34
## 2 Daniel   Primary  45
## 3  James  Tertiary  28
## 4   Mary Secondary  21

mapply()

This function is applied to multiple objects (e.g. vectors).

Example

We use this function to calculate body mass index given, height and weight. (We had already done this the lapply() function).

weight <- c(
  94, 85, 82, 100, 83, 85, 77, 80, 64, 57, 98, 95, 85, 90, 51,
  74, 88, 61, 66, 62
)
height <- c(
  1.62, 1.82, 1.66, 1.85, 1.88, 1.52, 1.71, 1.86, 1.72, 1.68,
  1.88, 1.68, 1.77, 1.73, 1.54, 1.86, 1.6, 1.52, 1.63, 1.88
)
calculate_bmi <- function(height, weight) {
  return(weight / (height ^ 2))
}
bmi <- mapply(calculate_bmi, height, weight)
df = data.frame(weight, height, bmi)
df
##    weight height      bmi
## 1      94   1.62 35.81771
## 2      85   1.82 25.66115
## 3      82   1.66 29.75758
## 4     100   1.85 29.21841
## 5      83   1.88 23.48348
## 6      85   1.52 36.79017
## 7      77   1.71 26.33289
## 8      80   1.86 23.12406
## 9      64   1.72 21.63332
## 10     57   1.68 20.19558
## 11     98   1.88 27.72748
## 12     95   1.68 33.65930
## 13     85   1.77 27.13141
## 14     90   1.73 30.07117
## 15     51   1.54 21.50447
## 16     74   1.86 21.38976
## 17     88   1.60 34.37500
## 18     61   1.52 26.40235
## 19     66   1.63 24.84098
## 20     62   1.88 17.54187

Clearly, the mapply() function is straightforward, compared to the apply() function (at least for this task).

tapply()

The tapply() can be used to calculate statistics by a grouping variable. In the following example, we calculate the systolic blood pressure by age_group.

age_group <- c(
  "18-29", "30-39", "40-49", "50-59", "60-69", "18-29", "30-39", 
  "40-49", "50-59", "60-69", "18-29", "30-39", "40-49", "50-59", 
  "60-69", "18-29", "30-39", "40-49", "50-59", "60-69"
)
systolic_bp <- c(
  120, 130, 140, 150, 160, 118, 132, 142, 148, 155, 
  125, 128, 135, 145, 158, 122, 134, 139, 147, 162
)
# calculate average systolic blood pressure
avg_sbp <- tapply(systolic_bp, age_group, mean)
avg_sbp
##  18-29  30-39  40-49  50-59  60-69 
## 121.25 131.00 139.00 147.50 158.75

vapply()

It is similar to the tapply() function but a bit complex.

average_systolic_bp <- function(bp) {
  mean(bp)
}
average_systolic_bp_by_age <- vapply(
  split(systolic_bp, age_group), 
  average_systolic_bp, numeric(1)
)
average_systolic_bp_by_age
##  18-29  30-39  40-49  50-59  60-69 
## 121.25 131.00 139.00 147.50 158.75

switch() statement with lapply()

The R switch statement is a multi-way branch used for the conditional execution of code, enhancing the readability and efficiency of programs. Unlike if-else constructs, switch allows examining numerous potential cases for a variable or expression and executes corresponding blocks of code.

gender <- c("f", "m", "f", "f", "m", "f")
gender_new <- sapply(gender, function(x) {
  switch(x,
         f = "Female",
         m = "Male"
  )
})
df = data.frame(gender, gender_new)
df
##   gender gender_new
## 1      f     Female
## 2      m       Male
## 3      f     Female
## 4      f     Female
## 5      m       Male
## 6      f     Female