library(stringr)
It is import to note that the dplyr library
can be used to achieve the same results as the apply()
functions with efficiency and clean code (if working with DataFrames /
Tibles).
Calculate body mass index given, height and
weight.
weight <- c(
94, 85, 82, 100, 83, 85, 77, 80, 64, 57, 98, 95, 85, 90, 51,
74, 88, 61, 66, 62
)
height <- c(
1.62, 1.82, 1.66, 1.85, 1.88, 1.52, 1.71, 1.86, 1.72, 1.68,
1.88, 1.68, 1.77, 1.73, 1.54, 1.86, 1.6, 1.52, 1.63, 1.88
)
# combine vectors into a matrix
df <- data.frame(weight, height)
# define a function to calculate BMI
calculate_bmi <- function(row) {
weight <- row[1]
height <- row[2]
return(weight / (height ^ 2))
}
df$bmi <- apply(df, 1, calculate_bmi)
df
## weight height bmi
## 1 94 1.62 35.81771
## 2 85 1.82 25.66115
## 3 82 1.66 29.75758
## 4 100 1.85 29.21841
## 5 83 1.88 23.48348
## 6 85 1.52 36.79017
## 7 77 1.71 26.33289
## 8 80 1.86 23.12406
## 9 64 1.72 21.63332
## 10 57 1.68 20.19558
## 11 98 1.88 27.72748
## 12 95 1.68 33.65930
## 13 85 1.77 27.13141
## 14 90 1.73 30.07117
## 15 51 1.54 21.50447
## 16 74 1.86 21.38976
## 17 88 1.60 34.37500
## 18 61 1.52 26.40235
## 19 66 1.63 24.84098
## 20 62 1.88 17.54187
The lapply() function performs an operation on a vector
and returns a list.
Example
Capitalize the first letter of all the columns of a DataFrame.
df <- data.frame(
Name = c("ALICE", "Daniel", "james", "Mary"),
Education = c('secondary', 'primary', 'Tertiary', "SECONDARY"),
Age = c(34, 45, 28, 21)
)
df
## Name Education Age
## 1 ALICE secondary 34
## 2 Daniel primary 45
## 3 james Tertiary 28
## 4 Mary SECONDARY 21
# create a function
capitalize_first_letter <- function(x) {
if (is.character(x)) {
x <- paste0(toupper(substr(x, 1, 1)), substr(tolower(x), 2, nchar(x)))
}
return(x)
}
# apply it to all the variables of the DataFrame
df1 = df
df1[] <- lapply(df1, capitalize_first_letter)
df1
## Name Education Age
## 1 Alice Secondary 34
## 2 Daniel Primary 45
## 3 James Tertiary 28
## 4 Mary Secondary 21
The above can be achieved using the str_to_title()
function from stringr library.
df2 = df
df2[, 1:2] <- lapply(df2[, 1:2], str_to_title)
df2
## Name Education Age
## 1 Alice Secondary 34
## 2 Daniel Primary 45
## 3 James Tertiary 28
## 4 Mary Secondary 21
Notice that in the above, you must specify the columns (in this case
were have specified the string columns only using
df2[, 1:2]), failing to do so will convert the numeric
column Age to character. The above can be modified so that
R detects the numeric columns by itself, which will be the
practical approach when dealing with DataFrames that have several
variables. That is, it is impractical to specify the variables manually
as we have done above.
df3 = df
title_case <- function(df) {
df[] <- lapply(df, function(col) {
if (is.character(col)) str_to_title(col) else col
})
return(df)
}
df3 <- title_case(df3)
df3
## Name Education Age
## 1 Alice Secondary 34
## 2 Daniel Primary 45
## 3 James Tertiary 28
## 4 Mary Secondary 21
The sapply() function works like lapply()
only that it attempts to simplify the result to either a vector or a
matrix.
Example
df4 = df
df4[, 1:2] <- sapply(df4[, 1:2], str_to_title)
df4
## Name Education Age
## 1 Alice Secondary 34
## 2 Daniel Primary 45
## 3 James Tertiary 28
## 4 Mary Secondary 21
This function is applied to multiple objects (e.g. vectors).
Example
We use this function to calculate body mass index given,
height and weight. (We had already done this
the lapply() function).
weight <- c(
94, 85, 82, 100, 83, 85, 77, 80, 64, 57, 98, 95, 85, 90, 51,
74, 88, 61, 66, 62
)
height <- c(
1.62, 1.82, 1.66, 1.85, 1.88, 1.52, 1.71, 1.86, 1.72, 1.68,
1.88, 1.68, 1.77, 1.73, 1.54, 1.86, 1.6, 1.52, 1.63, 1.88
)
calculate_bmi <- function(height, weight) {
return(weight / (height ^ 2))
}
bmi <- mapply(calculate_bmi, height, weight)
df = data.frame(weight, height, bmi)
df
## weight height bmi
## 1 94 1.62 35.81771
## 2 85 1.82 25.66115
## 3 82 1.66 29.75758
## 4 100 1.85 29.21841
## 5 83 1.88 23.48348
## 6 85 1.52 36.79017
## 7 77 1.71 26.33289
## 8 80 1.86 23.12406
## 9 64 1.72 21.63332
## 10 57 1.68 20.19558
## 11 98 1.88 27.72748
## 12 95 1.68 33.65930
## 13 85 1.77 27.13141
## 14 90 1.73 30.07117
## 15 51 1.54 21.50447
## 16 74 1.86 21.38976
## 17 88 1.60 34.37500
## 18 61 1.52 26.40235
## 19 66 1.63 24.84098
## 20 62 1.88 17.54187
Clearly, the mapply() function is straightforward,
compared to the apply() function (at least for this
task).
The tapply() can be used to calculate statistics by a
grouping variable. In the following example, we calculate the systolic
blood pressure by age_group.
age_group <- c(
"18-29", "30-39", "40-49", "50-59", "60-69", "18-29", "30-39",
"40-49", "50-59", "60-69", "18-29", "30-39", "40-49", "50-59",
"60-69", "18-29", "30-39", "40-49", "50-59", "60-69"
)
systolic_bp <- c(
120, 130, 140, 150, 160, 118, 132, 142, 148, 155,
125, 128, 135, 145, 158, 122, 134, 139, 147, 162
)
# calculate average systolic blood pressure
avg_sbp <- tapply(systolic_bp, age_group, mean)
avg_sbp
## 18-29 30-39 40-49 50-59 60-69
## 121.25 131.00 139.00 147.50 158.75
It is similar to the tapply() function but a bit
complex.
average_systolic_bp <- function(bp) {
mean(bp)
}
average_systolic_bp_by_age <- vapply(
split(systolic_bp, age_group),
average_systolic_bp, numeric(1)
)
average_systolic_bp_by_age
## 18-29 30-39 40-49 50-59 60-69
## 121.25 131.00 139.00 147.50 158.75
The R switch statement is a multi-way branch used for
the conditional execution of code, enhancing the readability and
efficiency of programs. Unlike if-else constructs, switch allows
examining numerous potential cases for a variable or expression and
executes corresponding blocks of code.
gender <- c("f", "m", "f", "f", "m", "f")
gender_new <- sapply(gender, function(x) {
switch(x,
f = "Female",
m = "Male"
)
})
df = data.frame(gender, gender_new)
df
## gender gender_new
## 1 f Female
## 2 m Male
## 3 f Female
## 4 f Female
## 5 m Male
## 6 f Female