This Article is Inspired by a Purrr Webinar by R-Ladies Baltimore available on YouTube; https://www.youtube.com/watch?v=IewsPpjKElc&t=3248s
The purrr
(Henry and Wickham
2020) package is part of the tidyverse
(Wickham et al. 2019) and contains
functions that allow us to;
Apply named or anonymous functions to vectors or lists.
Save the results in a variety of convenient formats.
For instance, you could have data in a series of Ms Excel documents and desire to combine them into one data set. Similarly, you could use purrr to generate a series of reports customised to an audience.
The formula for converting temperature in Celsius to Fahrenheit is as follows.
\(Celsius = (Fahrenheit - 32) * \frac{5}{9}\)
Now, we can use this formula to define a function that allows us to convert temperatures easily.
fahrenheit_to_celsius <- function(fahrenheit){
(fahrenheit - 32) * 5 / 9
}
The function fahrenheit_to_celsius
takes one argument,
temperature in degrees fahrenheit. We convert 100 degrees fahrenheit and
zero degree fahrenheit to celsius using the function.
## 100 degrees fahrenheit to celsius
fahrenheit_to_celsius(100)
[1] 37.77778
## 0 degrees fahrenheit to celsius
fahrenheit_to_celsius(0)
[1] -17.77778
But supposing we had a long vector of temperatures that you want to convert into Celsius. Consider this vector of average temperatures in degree Fahrenheit recorded in 100 different cities.
[1] -50.0000000 -48.9898990 -47.9797980 -46.9696970 -45.9595960
[6] -44.9494949 -43.9393939 -42.9292929 -41.9191919 -40.9090909
[11] -39.8989899 -38.8888889 -37.8787879 -36.8686869 -35.8585859
[16] -34.8484848 -33.8383838 -32.8282828 -31.8181818 -30.8080808
[21] -29.7979798 -28.7878788 -27.7777778 -26.7676768 -25.7575758
[26] -24.7474747 -23.7373737 -22.7272727 -21.7171717 -20.7070707
[31] -19.6969697 -18.6868687 -17.6767677 -16.6666667 -15.6565657
[36] -14.6464646 -13.6363636 -12.6262626 -11.6161616 -10.6060606
[41] -9.5959596 -8.5858586 -7.5757576 -6.5656566 -5.5555556
[46] -4.5454545 -3.5353535 -2.5252525 -1.5151515 -0.5050505
[51] 0.5050505 1.5151515 2.5252525 3.5353535 4.5454545
[56] 5.5555556 6.5656566 7.5757576 8.5858586 9.5959596
[61] 10.6060606 11.6161616 12.6262626 13.6363636 14.6464646
[66] 15.6565657 16.6666667 17.6767677 18.6868687 19.6969697
[71] 20.7070707 21.7171717 22.7272727 23.7373737 24.7474747
[76] 25.7575758 26.7676768 27.7777778 28.7878788 29.7979798
[81] 30.8080808 31.8181818 32.8282828 33.8383838 34.8484848
[86] 35.8585859 36.8686869 37.8787879 38.8888889 39.8989899
[91] 40.9090909 41.9191919 42.9292929 43.9393939 44.9494949
[96] 45.9595960 46.9696970 47.9797980 48.9898990 50.0000000
Converting each of the values into Celsius manually is not convenient. This is where the functions in purrr come handy. In this article, I focus on the map functions.
map
FunctionsThe map functions from purrr typically take a vector, a column in a data frame, or even a whole data frame. Map will then apply the specified function to each element of the vector, a column in a data frame, or the entire data frame.
## Arguments required by purrr map functions
## ... refers to affitional arguments
map(vector, function, ...)
map
FunctionThe map function in purrr will loop over the 100 elements and convert each into Celsius. The output is a list of the 100 conversions. Below are the first five conversions.
map(temp, fahrenheit_to_celsius) |> head(5)
[[1]]
[1] -45.55556
[[2]]
[1] -44.99439
[[3]]
[1] -44.43322
[[4]]
[1] -43.87205
[[5]]
[1] -43.31089
The equivalent for
loop is as follows;
for(i in temp){
c = fahrenheit_to_celsius(i)
print(c)
}
While the map
function is more compact than the
for
loop, it has the disadvantage of returning a list of
values.
map_dbl
FunctionUnlike the map
function that will return a list,
map_dbl
returns a vector.
map_dbl(temp, fahrenheit_to_celsius)
[1] -45.5555556 -44.9943883 -44.4332211 -43.8720539 -43.3108866
[6] -42.7497194 -42.1885522 -41.6273850 -41.0662177 -40.5050505
[11] -39.9438833 -39.3827160 -38.8215488 -38.2603816 -37.6992144
[16] -37.1380471 -36.5768799 -36.0157127 -35.4545455 -34.8933782
[21] -34.3322110 -33.7710438 -33.2098765 -32.6487093 -32.0875421
[26] -31.5263749 -30.9652076 -30.4040404 -29.8428732 -29.2817059
[31] -28.7205387 -28.1593715 -27.5982043 -27.0370370 -26.4758698
[36] -25.9147026 -25.3535354 -24.7923681 -24.2312009 -23.6700337
[41] -23.1088664 -22.5476992 -21.9865320 -21.4253648 -20.8641975
[46] -20.3030303 -19.7418631 -19.1806958 -18.6195286 -18.0583614
[51] -17.4971942 -16.9360269 -16.3748597 -15.8136925 -15.2525253
[56] -14.6913580 -14.1301908 -13.5690236 -13.0078563 -12.4466891
[61] -11.8855219 -11.3243547 -10.7631874 -10.2020202 -9.6408530
[66] -9.0796857 -8.5185185 -7.9573513 -7.3961841 -6.8350168
[71] -6.2738496 -5.7126824 -5.1515152 -4.5903479 -4.0291807
[76] -3.4680135 -2.9068462 -2.3456790 -1.7845118 -1.2233446
[81] -0.6621773 -0.1010101 0.4601571 1.0213244 1.5824916
[86] 2.1436588 2.7048260 3.2659933 3.8271605 4.3883277
[91] 4.9494949 5.5106622 6.0718294 6.6329966 7.1941639
[96] 7.7553311 8.3164983 8.8776655 9.4388328 10.0000000
The function map_dbl
allows us to create new columns
when used together with the mutate
function from the
tidyverse.
temp %>%
tibble() %>%
set_names("fahrenheit") %>%
mutate(celsius = map_dbl(fahrenheit, fahrenheit_to_celsius)) %>%
head()
# A tibble: 6 × 2
fahrenheit celsius
<dbl> <dbl>
1 -50 -45.6
2 -49.0 -45.0
3 -48.0 -44.4
4 -47.0 -43.9
5 -46.0 -43.3
6 -44.9 -42.7
If we use map
, we get a messy data frame with the new
variable appearing a list. We would have to use unlist
to
make the new variable clean. See below.
temp %>%
tibble() %>%
set_names("fahrenheit") %>%
mutate(celsius = map(fahrenheit, fahrenheit_to_celsius)) %>%
head()
# A tibble: 6 × 2
fahrenheit celsius
<dbl> <list>
1 -50 <dbl [1]>
2 -49.0 <dbl [1]>
3 -48.0 <dbl [1]>
4 -47.0 <dbl [1]>
5 -46.0 <dbl [1]>
6 -44.9 <dbl [1]>
map_chr
, map_lgl
and map_int
FunctionsLike map
and map_dbl
, the
map_chr
, map_lgl
and map_int
functions takes the same arguments but returns a vector of characters,
logical, or integers, respectively, or die trying
.
temp %>%
tibble() %>%
set_names("fahrenheit") %>%
mutate(celsius = map_chr(fahrenheit, fahrenheit_to_celsius)) %>%
head()
# A tibble: 6 × 2
fahrenheit celsius
<dbl> <chr>
1 -50 -45.555556
2 -49.0 -44.994388
3 -48.0 -44.433221
4 -47.0 -43.872054
5 -46.0 -43.310887
6 -44.9 -42.749719
map_dfr
and
map_dfc
Functions.The map_dfr() and map_dfc() return a data frame created by row-binding and column-binding respectively. They require dplyr to be installed. I For instance, you may have 100 excel sheets with some data and wish to read them into R and combine them into a single data frame. I have illustrated how we can use map_dfr() and map_dfc() to do this in separate tutorials available at https://rpubs.com/Karuitha/.
Walk
FunctionThe function walk() returns the input .x (invisibly).
The walk functions work similarly to the map functions, but you use them when you’re interested in applying a function that performs an action instead of producing data (e.g., print()) (Altman, Behrman, and Wickham 2021).
The walk functions are useful for performing actions like writing files and printing a list of plots.
This makes it easy to use in pipe.
map2
Functions: Map over Multiple Inputs
Simultaneously.As per the R documentation;
These functions are variants of map() that iterate over multiple arguments simultaneously. They are parallel in the sense that each input is processed in parallel with the others, not in the sense of multicore computing. They share the same notion of “parallel” as base::pmax() and base::pmin(). map2() and walk2() are specialised for the two argument case; pmap() and pwalk() allow you to provide any number of arguments in a list. Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row. map_dfr(), pmap_dfr() and map2_dfc(), pmap_dfc() return data frames created by row-binding and column-binding respectively. They require dplyr to be installed (R Core Team 2022).
Suppose we have the following data.
We can use map2
to apply a function to both x and y.
map2(x, y, ~ .x + .y)
[[1]]
[1] 11
[[2]]
[1] 21
[[3]]
[1] 31
Alternatively, we can achieve the same result by using;
map2(x, y, `+`)
[[1]]
[1] 11
[[2]]
[1] 21
[[3]]
[1] 31
Like the map
functions, the map2
function
has several variants, including but not limited to;
Please consult R documentation (??map2) for more details.
pmap
FunctionLike mp2
, pmap
operates over multiple
variables, in this case more than two.
To sum up the three, we use the following piece of code.
pmap(list(x, y, z), sum)
[[1]]
[1] 29
[[2]]
[1] 38
[[3]]
[1] 46
There are several variants of pmap, inclusing but not limited to;
In this article, I have highlighted the use of map functions from the purrr package in R. Additional information is available in the R help pages and the resources cited in the references section.