── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
9.2.1 purrr::map()
purrr::map()53 takes a vector and a function, calls the function once for each element of the vector, and returns the results as a list. In other words, map(1:3, f) is equivalent to list(f(1), f(2), f(3)).
# map_chr() always returns a character vectormap_chr(mtcars, typeof)
mpg cyl disp hp drat wt qsec vs
"double" "double" "double" "double" "double" "double" "double" "double"
am gear carb
"double" "double" "double"
# map_lgl() always returns a logical vectormap_lgl(iris, is.double)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
TRUE TRUE TRUE TRUE FALSE
# map_int() always returns a integer vectorn_unique <-function(x) length(unique(x))map_int(mtcars, n_unique)
mpg cyl disp hp drat wt qsec vs am gear carb
25 3 27 22 22 29 30 2 2 3 6
# map_dbl() always returns a double vector!map_dbl(mtcars, mean)
mpg cyl disp hp drat wt qsec
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
vs am gear carb
0.437500 0.406250 3.687500 2.812500
notice mtcars is a data frame, and data frames are lists containing vectors of the same length.
porperty:
All map-variant functions always return an output vector the same length as the input
Thus each call of .f needs to return a single value.
pair <-function(x) c(x, x)map_dbl(1:3, pair)#> ! Result must be length 1, not 2.
Similarly, the type of return must be correct.
map_dbl(1:2, as.character)#> Error: Can't coerce element 1 from a character to a double
map_dbl will die trying to coerce the output to a length-1 double.
In either case, it’s often useful to switch back to map(), because map() can accept any type of output. That allows you to see the problematic output, and figure out what to do with it.
equivalence in BASE R
sapply() and vapply() can also returns atomic vector
sapply() :
avoid to use as it tries to simplify result, potentially returning matrix, list or vector.
.x and .y for two argument functions, and ..1, ..2, ..3, etc, for functions with an arbitrary number of arguments.
shortcut is useful for generating random data.
x <-map(1:3, ~runif(2))str(x)
List of 3
$ : num [1:2] 0.0879 0.813
$ : num [1:2] 0.798 0.351
$ : num [1:2] 0.575 0.587
indexing
powered by purrr::pluck() , map() families can be used for indexing too.
character vector to select elements by name,
an integer vector to select by position
a list to select by both name and position (useful for nested lists!)
x <-list(list(-1, x =1, y =c(2), z ="a"),list(-2, x =4, y =c(5, 6), z ="b"),list(-3, x =8, y =c(9, 10, 11)))# Select by name# for each element in x, index by name "x"map_dbl(x, "x")
[1] 1 4 8
# Or by position# for each element in x, index the first elementmap_dbl(x, 1)
[1] -1 -2 -3
# Or by both# for each element in x, index by name "y", and then index the first element.map_dbl(x, list("y", 1))
[1] 2 5 9
Don’t confuse:
Notice: default of non-existing indexing is NULL. See ?pluck. so although map(x, "z") , map(x, “z”) is not as NULL cannot be coerced into character. Unless we provide a default.
# You'll get an error if a component doesn't exist:map_chr(x, "z")#> Error: Result 3 must be a single string, not NULL of length 0
# Unless you supply a .default valuemap_chr(x, "z", .default =NA)
[1] "a" "b" NA
# or simply use map()map(x, "z")
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
NULL
lapply() accepts function as string input or symbol input. e.g lapply(1:3, squaring) is equivalent to lapply(1:3, “squaring”)
9.2.3 Passing additional arguments
In other words, passing arguments with …
Additional arguments
for e.g. we can supply na.rm = T to mean()
method 1: passing inside anonymous function
x <-list(1:5, c(NA, NA, 2, 10))# We can do it this way.# quick review: map() returns a listmap(x, ~mean(.x, na.rm = T))
Don’t write it using both twiddle and additional argument(s), the later will be ignored!!
Don’t pass the argument saved for named function after anonymous function. It won’t work. Since the argument is not defined in the anonymous function.
map(x, ~mean(.x), na.rm = T)
[[1]]
[1] 3
[[2]]
[1] NA
But we can pass additional argument this way into anonymous function for arguments we defined in the anonymous function using .x, .y or ..1, ..2, ..3 …
map(1:3, function(x, y){x^2+ y}, y =10000)
[[1]]
[1] 10001
[[2]]
[1] 10004
[[3]]
[1] 10009
is equivalent to
map(1:3, ~.x^2+ .y, .y =10000)
[[1]]
[1] 10001
[[2]]
[1] 10004
[[3]]
[1] 10009
is equivalent to
map(1:3, .f =~ ..1^2+ ..2, ..2 =10000)
[[1]]
[1] 10001
[[2]]
[1] 10004
[[3]]
[1] 10009
Properties:
Any arguments after f in the map() call are inserted after the individual element in each f() call
map() is only vectorised over its first argument, If an argument after f is a vector, it will be passed along as is:
Difference b.w method 1 2
method1: the extra argument(s) is evaluated for every f call
method2: the extra argument(s) is evaluated only once at map() call.
my_func <-function(a, b) a + bx <-rep(0, 5)# evaluated for every f callmap_dbl(x, ~my_func(.x, runif(n =1)) )
This is good for reading. Otherwise the user needs to remember the order of the argument for the function.
Why map() uses .x and .f
map() uses weird .x and .f to avoid the situation where the function provided to map() uses x or f.
for example, recall our simple_map which uses f to as argument name for the function.
simple_map <-function(x, f, ...) { out <-vector("list", length(x))for (i inseq_along(x)) { out[[i]] <-f(x[[i]], ...) } out}
But then if our function also uses f as one of its argument name.
Our map_dbl() would work! :D.
As it know the f is not the function argument, instead it is the argument to supply into .f = my_func .
my_func <-function(a, f){a + f}map_dbl(1:3, my_func, f =10)
[1] 11 12 13
But simple_map would seize the f , wrongly recognizing it as the function to iterate over 1:3.
simple_map(1:3, my_func, f =10)#> Error in f(x[[i]], ...) : could not find function "f"
Recognize simple_map(1:3, my_func, f = 10) is equivalent to simple_map(x = 1:3, f = 10, my_func)
To make it harder to debug, in a case where the provided function itself also uses f as a function, then the error might be hard to fathom.
# f is supposed to be a function here!bootstrap_summary <-function(x, f) {f(sample(x, replace =TRUE))}# f is seized by simple_map() ):simple_map(mtcars, bootstrap_summary, f = mean)#> Error in mean.default(x[[i]], ...): 'trim' must be numeric of length one
This is essentailly calling simple_map(x = mtcars, f = mean, trim = bootstrap), resulting in the error.
Takeaways
.x and .f naming is to avoid conflict in cases where the .f function itself has argument named x or f
Just in case if .x and .f are also conflicted, use anonymous function instead.
Explicitly provide the name when passing argument.
Compare with apply() family
Base functions that pass along ... use a variety of naming conventions to prevent undesired argument matching:
The apply family mostly uses capital letters (e.g. X and FUN).
What is …
… is essentially the arguments passed into an argument of the current function.
e.g.
map(.x, .f, ..., .progress = FALSE)
apply(X, MARGIN, FUN, ..., simplify = TRUE)
9.2.6 Exercises
Q1
Use as_mapper() to explore how purrr generates anonymous functions for the integer, character, and list helpers. What helper allows you to extract attributes? Read the documentation to find out.
as_mapper(2)
function (x, ...)
pluck_raw(x, list(2), .default = NULL)
<environment: 0x10c0934b0>
as_mapper("cool")
function (x, ...)
pluck_raw(x, list("cool"), .default = NULL)
<environment: 0x10c1064b0>
function (n, min = 0, max = 1)
.Call(C_runif, n, min, max)
<bytecode: 0x10c169958>
<environment: namespace:stats>
as_mapper(mean)
function (x, ...)
UseMethod("mean")
<bytecode: 0x10e2e74f8>
<environment: namespace:base>
as_mapper(head)
function (x, ...)
UseMethod("head")
<bytecode: 0x108a2f870>
<environment: namespace:utils>
Q2
map(1:3, ~ runif(2)) is a useful pattern for generating random numbers, but map(1:3, runif(2)) is not. Why not? Can you explain why it returns the result that it does?
basically the function body part runif(2) runs for three iterations, each time generating a pair of uniform (0,1) for us.
On the other hand, runif(2) here is not a function-type value, nor does it have a twiddle to turn it into an anonymous function(also a function-type value) In that case, as_mapper turn it into an indexing function, basically equivalent to map(1:3, .f = 0.5) . And in this version of purrr is seems like it rounds down when indexing while indexing 0 is equivalent to indexing 1.
map(1:3, runif(2))
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
We basically indexed each length-one vector by one and then indexed the resulting length-one vector by one again.
Should we add to the indexing, the result would be NULL, the default of pluck_raw.
The following code uses a map nested inside another map to apply a function to every element of a nested list. Why does it fail, and what do you need to do to make it work?
instead of triple being passed into .f map and map is called for each element of x to map that element list by tripling through it, the line of code wrongly passed triple as the .f, passing map as an argument to that triple, which results in the error.
Alternatively if we recognize the first argument is simply formula
models <-map(formulas, lm, data = mtcars)
Q7
Fit the model mpg ~ disp to each of the bootstrap replicates of mtcars in the list below, then extract the \(R^2\)of the model fit (Hint: you can compute the \(R^2\) with summary().)