This is a reproduction of the the Functionals chapter, which comes from Chapter 11 of Advanced R. The purpose was to learn the concepts. I skipped a lot of the extraneous information, focusing on the apply and Map functions.
library(tidyverse)
lapply2(): R implementation of lapply():
lapply2 <- function(x, f, ...) {
out <- vector("list", length(x))
for (i in seq_along(x)) {
out[[i]] <- f(x[[i]], ...)
}
out
}
Show how lapply2() works:
replicate() takes n, a number of replications, and a function to replicate. We pass rnorm(10), which computes 10 random normals. The simplify = FALSE argument returns a list instead of a matrix. lapply2(l, mean) computes the mean of each column of the list, l.
set.seed(159)
l <- replicate(20, rnorm(10), simplify = FALSE)
lapply2(l, mean) %>% unlist()
[1] -0.44782126 -0.28139494 -0.04527234 -0.38860758 -0.33920482 -0.81036404 0.18424530
[8] -0.42299603 0.44155869 -0.26928065 -0.05602635 -0.14358085 0.44727663 -0.11995728
[15] -0.25086861 -0.12244024 -0.23319594 0.48824863 0.22955091 -0.51299327
Very useful for getting information from columns of a data frame:
mtcars %>%
lapply(class) %>%
unlist()
mpg cyl disp hp drat wt qsec vs
"numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
am gear carb
"numeric" "numeric" "numeric"
Divide each column of mtcars by it’s mean:
mtcars %>%
lapply(function(x) round(x / mean(x), 2)) %>%
as_tibble()
lapply() equivalent?trims <- c(0, 0.1, 0.2, 0.5)
x <- rcauchy(100)
lapply(trims, function(trim) mean(x, trim = trim))
lapply(trims, mean, x = x)
The two functions are the same due to lapply()’s ... argument, which passes additional arguments to the function being looped. The first one is more readable in my opinion, but they do the same thing.
scale01 <- function(x) {
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
We’ll use the flights data set from the nycflights13 library because it has columns in various formats. To apply to only numeric columns, first get numeric columns using is.numeric().
library(nycflights13)
flights_num <- flights %>%
lapply(is.numeric) %>%
unlist()
flights_num
year month day dep_time sched_dep_time
TRUE TRUE TRUE TRUE TRUE
dep_delay arr_time sched_arr_time arr_delay carrier
TRUE TRUE TRUE TRUE FALSE
flight tailnum origin dest air_time
TRUE FALSE FALSE FALSE TRUE
distance hour minute time_hour
TRUE TRUE TRUE FALSE
Then, apply scale01 to subset.
flights[,flights_num] %>%
lapply(scale01) %>%
as_tibble()
lapply() to fit linear models to the mtcars using the formulas stored in this list:formulas <- list(
mpg ~ disp,
mpg ~ I(1 / disp),
mpg ~ disp + wt,
mpg ~ I(1 / disp) + wt
)
fits <- lapply(formulas, function(x) lm(x, mtcars))
fits[[1]] # show first fit
Call:
lm(formula = x, data = mtcars)
Coefficients:
(Intercept) disp
29.59985 -0.04122
mpg ~ disp to each of the bootstrap replicates of mtcars in the list below by using lapply().bootstraps <- lapply(1:10, function(i) {
rows <- sample(1:nrow(mtcars), rep = TRUE)
mtcars[rows, ]
})
Use lapply() to loop over the bootstraps. Need to use an anonymous function to send the bootstraps to the data argument.
fits_boot <- bootstraps %>%
lapply(function(x) lm(formulas[[1]], data = x))
fits_boot[[1]] # show first fit
Call:
lm(formula = formulas[[1]], data = x)
Coefficients:
(Intercept) disp
31.27484 -0.04702
rsq <- function(mod) summary(mod)$r.squared
lapply(fits, rsq) %>% unlist()
[1] 0.7183433 0.8596865 0.7809306 0.8838038
# lapply(fits_boot, rsq) %>% unlist() # Uncomment to get boostrapped fits
Used for vector output.
The output of sapply() is equivalent to lapply() %>% unlist() since the default is simplify = TRUE.
sapply(mtcars, is.numeric)
mpg cyl disp hp drat wt qsec vs am gear carb
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
lapply(mtcars, is.numeric) %>% unlist()
mpg cyl disp hp drat wt qsec vs am gear carb
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
vapply() is the same as sapply() but with more structure as the output type and length must be specified.
vapply(mtcars, is.numeric, logical(length = 1))
mpg cyl disp hp drat wt qsec vs am gear carb
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
mapply() is used for multiple inputs; however, Map() is preferred.
set.seed(10)
x <- 1:10
y <- 2 * (x + rnorm(10))
mapply(`/`, y, x) %>%
mean()
[1] 1.803832
Used for working in two dimensions (i.e. matrixstructure).
a <- matrix(1:25, ncol = 5) %>%
t()
a
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
# Rowwise
apply(a, 1, sum)
[1] 15 40 65 90 115
# Columnwise
apply(a, 2, sum)
[1] 55 60 65 70 75
# Row and column
apply(a, 2, function(x) x ^ 2)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 9 16 25
[2,] 36 49 64 81 100
[3,] 121 144 169 196 225
[4,] 256 289 324 361 400
[5,] 441 484 529 576 625
Two other useful matrix functions are sweep() and outer(). These are skipped for brevity.
Used for group apply.
tapply(mtcars$mpg, mtcars$gear, mean)
3 4 5
16.10667 24.53333 21.38000
Can achieve the same result using split() and sapply().
mtcars$mpg %>%
split(mtcars$gear) %>%
sapply(mean)
3 4 5
16.10667 24.53333 21.38000
I restructured the contents here to discuss Map(), Reduce(), and predicate functions including Filter(), Find(), and Position() all together.
Used for multiple inputs so we can process lists or data frames in parallel.
# Generate some sample data
xs <- replicate(5, runif(10), simplify = FALSE)
ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
Map(weighted.mean, xs, ws) %>%
unlist()
[1] 0.5938165 0.5972073 0.4717003 0.5697301 0.4826299
Used to recursively apply a function two arguments at a time to a vector.
x <- 1:10
Reduce(`*`, x) # Factorial
[1] 3628800
factorial(10)
[1] 3628800
Can find intersection between all list elements using reduce.
set.seed(684)
l <- replicate(5, sample(1:10, 15, replace = T), simplify = FALSE)
l
[[1]]
[1] 10 5 7 6 8 5 4 3 9 5 1 3 10 1 1
[[2]]
[1] 8 1 5 3 6 10 3 7 3 2 2 6 2 6 9
[[3]]
[1] 1 4 6 6 4 7 3 9 9 5 1 8 4 7 1
[[4]]
[1] 7 4 7 9 1 1 10 9 1 8 10 5 2 2 4
[[5]]
[1] 2 1 8 10 6 4 4 1 1 2 1 6 6 5 3
Reduce(intersect, l) %>% unlist()
[1] 5 8 1
A predicate is a function that returns a TRUE or FALSE. Predicate functions apply a predicate to each member of a list.
Selects only items that match the predicate function.
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
Filter(is.numeric, df)
Finds column values that meet the criteria.
Find(is.numeric, df)
[1] 1 2 3
Finds the column position that meets the criteria
Position(is.numeric, df)
[1] 1