Functional Programming in R Using Purrr

Background

The purrr (Henry and Wickham 2020) package is part of the tidyverse (Wickham et al. 2019) and contains functions that allow us to;

Apply named or anonymous functions to vectors or lists.
Save the results in a variety of convenient formats.

For instance, you could have data in a series of Ms Excel documents and desire to combine them into one data set. Similarly, you could use purrr to generate a series of reports customised to an audience.

First Example: Converting Celsius to Fahrenheit

The formula for converting temperature in Celsius to Fahrenheit is as follows.

\(Celsius = (Fahrenheit - 32) * \frac{5}{9}\)

Now, we can use this formula to define a function that allows us to convert temperatures easily.

fahrenheit_to_celsius <- function(fahrenheit){
  
  (fahrenheit - 32) * 5 / 9
}

The function fahrenheit_to_celsius takes one argument, temperature in degrees fahrenheit. We convert 100 degrees fahrenheit and zero degree fahrenheit to celsius using the function.

## 100 degrees fahrenheit to celsius
fahrenheit_to_celsius(100)

[1] 37.77778

## 0 degrees fahrenheit to celsius
fahrenheit_to_celsius(0)

[1] -17.77778

But supposing we had a long vector of temperatures that you want to convert into Celsius. Consider this vector of average temperatures in degree Fahrenheit recorded in 100 different cities.

temp <- seq(-50, 50, length.out = 100)
print(temp)

  [1] -50.0000000 -48.9898990 -47.9797980 -46.9696970 -45.9595960
  [6] -44.9494949 -43.9393939 -42.9292929 -41.9191919 -40.9090909
 [11] -39.8989899 -38.8888889 -37.8787879 -36.8686869 -35.8585859
 [16] -34.8484848 -33.8383838 -32.8282828 -31.8181818 -30.8080808
 [21] -29.7979798 -28.7878788 -27.7777778 -26.7676768 -25.7575758
 [26] -24.7474747 -23.7373737 -22.7272727 -21.7171717 -20.7070707
 [31] -19.6969697 -18.6868687 -17.6767677 -16.6666667 -15.6565657
 [36] -14.6464646 -13.6363636 -12.6262626 -11.6161616 -10.6060606
 [41]  -9.5959596  -8.5858586  -7.5757576  -6.5656566  -5.5555556
 [46]  -4.5454545  -3.5353535  -2.5252525  -1.5151515  -0.5050505
 [51]   0.5050505   1.5151515   2.5252525   3.5353535   4.5454545
 [56]   5.5555556   6.5656566   7.5757576   8.5858586   9.5959596
 [61]  10.6060606  11.6161616  12.6262626  13.6363636  14.6464646
 [66]  15.6565657  16.6666667  17.6767677  18.6868687  19.6969697
 [71]  20.7070707  21.7171717  22.7272727  23.7373737  24.7474747
 [76]  25.7575758  26.7676768  27.7777778  28.7878788  29.7979798
 [81]  30.8080808  31.8181818  32.8282828  33.8383838  34.8484848
 [86]  35.8585859  36.8686869  37.8787879  38.8888889  39.8989899
 [91]  40.9090909  41.9191919  42.9292929  43.9393939  44.9494949
 [96]  45.9595960  46.9696970  47.9797980  48.9898990  50.0000000

Converting each of the values into Celsius manually is not convenient. This is where the functions in purrr come handy. In this article, I focus on the map functions.

Mechanics of `map` Functions

The map functions from purrr typically take a vector, a column in a data frame, or even a whole data frame. Map will then apply the specified function to each element of the vector, a column in a data frame, or the entire data frame.

## Arguments required by purrr map functions
## ... refers to affitional arguments
map(vector, function, ...)

The `map` Function

The map function in purrr will loop over the 100 elements and convert each into Celsius. The output is a list of the 100 conversions. Below are the first five conversions.

map(temp, fahrenheit_to_celsius) |> head(5)

[[1]]
[1] -45.55556

[[2]]
[1] -44.99439

[[3]]
[1] -44.43322

[[4]]
[1] -43.87205

[[5]]
[1] -43.31089

The equivalent for loop is as follows;

for(i in temp){
  c = fahrenheit_to_celsius(i)
  print(c)
}

While the map function is more compact than the for loop, it has the disadvantage of returning a list of values.

The `map_dbl` Function

Unlike the map function that will return a list, map_dbl returns a vector.

map_dbl(temp, fahrenheit_to_celsius)

  [1] -45.5555556 -44.9943883 -44.4332211 -43.8720539 -43.3108866
  [6] -42.7497194 -42.1885522 -41.6273850 -41.0662177 -40.5050505
 [11] -39.9438833 -39.3827160 -38.8215488 -38.2603816 -37.6992144
 [16] -37.1380471 -36.5768799 -36.0157127 -35.4545455 -34.8933782
 [21] -34.3322110 -33.7710438 -33.2098765 -32.6487093 -32.0875421
 [26] -31.5263749 -30.9652076 -30.4040404 -29.8428732 -29.2817059
 [31] -28.7205387 -28.1593715 -27.5982043 -27.0370370 -26.4758698
 [36] -25.9147026 -25.3535354 -24.7923681 -24.2312009 -23.6700337
 [41] -23.1088664 -22.5476992 -21.9865320 -21.4253648 -20.8641975
 [46] -20.3030303 -19.7418631 -19.1806958 -18.6195286 -18.0583614
 [51] -17.4971942 -16.9360269 -16.3748597 -15.8136925 -15.2525253
 [56] -14.6913580 -14.1301908 -13.5690236 -13.0078563 -12.4466891
 [61] -11.8855219 -11.3243547 -10.7631874 -10.2020202  -9.6408530
 [66]  -9.0796857  -8.5185185  -7.9573513  -7.3961841  -6.8350168
 [71]  -6.2738496  -5.7126824  -5.1515152  -4.5903479  -4.0291807
 [76]  -3.4680135  -2.9068462  -2.3456790  -1.7845118  -1.2233446
 [81]  -0.6621773  -0.1010101   0.4601571   1.0213244   1.5824916
 [86]   2.1436588   2.7048260   3.2659933   3.8271605   4.3883277
 [91]   4.9494949   5.5106622   6.0718294   6.6329966   7.1941639
 [96]   7.7553311   8.3164983   8.8776655   9.4388328  10.0000000

The function map_dbl allows us to create new columns when used together with the mutate function from the tidyverse.

temp %>% 
  
  tibble() %>% 
  
  set_names("fahrenheit") %>% 
  
  mutate(celsius = map_dbl(fahrenheit, fahrenheit_to_celsius)) %>% 
  
  head()

# A tibble: 6 × 2
  fahrenheit celsius
       <dbl>   <dbl>
1      -50     -45.6
2      -49.0   -45.0
3      -48.0   -44.4
4      -47.0   -43.9
5      -46.0   -43.3
6      -44.9   -42.7

If we use map, we get a messy data frame with the new variable appearing a list. We would have to use unlist to make the new variable clean. See below.

temp %>% 
  
  tibble() %>% 
  
  set_names("fahrenheit") %>% 
  
  mutate(celsius = map(fahrenheit, fahrenheit_to_celsius)) %>% 
  
  head()

# A tibble: 6 × 2
  fahrenheit celsius  
       <dbl> <list>   
1      -50   <dbl [1]>
2      -49.0 <dbl [1]>
3      -48.0 <dbl [1]>
4      -47.0 <dbl [1]>
5      -46.0 <dbl [1]>
6      -44.9 <dbl [1]>

The `map_chr`, `map_lgl` and `map_int` Functions

Like map and map_dbl, the map_chr, map_lgl and map_int functions takes the same arguments but returns a vector of characters, logical, or integers, respectively, or die trying.

temp %>% 
  
  tibble() %>% 
  
  set_names("fahrenheit") %>% 
  
  mutate(celsius = map_chr(fahrenheit, fahrenheit_to_celsius)) %>% 
  
  head()

# A tibble: 6 × 2
  fahrenheit celsius   
       <dbl> <chr>     
1      -50   -45.555556
2      -49.0 -44.994388
3      -48.0 -44.433221
4      -47.0 -43.872054
5      -46.0 -43.310887
6      -44.9 -42.749719

The `map_dfr` and `map_dfc` Functions.

The map_dfr() and map_dfc() return a data frame created by row-binding and column-binding respectively. They require dplyr to be installed. I For instance, you may have 100 excel sheets with some data and wish to read them into R and combine them into a single data frame. I have illustrated how we can use map_dfr() and map_dfc() to do this in separate tutorials available at https://rpubs.com/Karuitha/.

The `Walk` Function

The function walk() returns the input .x (invisibly).

The walk functions work similarly to the map functions, but you use them when you’re interested in applying a function that performs an action instead of producing data (e.g., print()) (Altman, Behrman, and Wickham 2021).

The walk functions are useful for performing actions like writing files and printing a list of plots.

This makes it easy to use in pipe.

The `map2` Functions: Map over Multiple Inputs Simultaneously.

As per the R documentation;

These functions are variants of map() that iterate over multiple arguments simultaneously. They are parallel in the sense that each input is processed in parallel with the others, not in the sense of multicore computing. They share the same notion of “parallel” as base::pmax() and base::pmin(). map2() and walk2() are specialised for the two argument case; pmap() and pwalk() allow you to provide any number of arguments in a list. Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row. map_dfr(), pmap_dfr() and map2_dfc(), pmap_dfc() return data frames created by row-binding and column-binding respectively. They require dplyr to be installed (R Core Team 2022).

Suppose we have the following data.

x <- list(1, 1, 1)
y <- list(10, 20, 30)

We can use map2 to apply a function to both x and y.

map2(x, y, ~ .x +  .y)

[[1]]
[1] 11

[[2]]
[1] 21

[[3]]
[1] 31

Alternatively, we can achieve the same result by using;

map2(x, y, `+`)

[[1]]
[1] 11

[[2]]
[1] 21

[[3]]
[1] 31

Like the map functions, the map2 function has several variants, including but not limited to;

map2_lgl
map2_int
map2_dbl
map2_chr

Please consult R documentation (??map2) for more details.

The `pmap` Function

Like mp2, pmap operates over multiple variables, in this case more than two.

x <- list(1, 1, 1)
y <- list(10, 20, 30)
z <- list(18, 17, 15)

To sum up the three, we use the following piece of code.

pmap(list(x, y, z), sum)

[[1]]
[1] 29

[[2]]
[1] 38

[[3]]
[1] 46

There are several variants of pmap, inclusing but not limited to;

pmap_lgl
pmap_int
pmap_dbl
pmap_chr

Conclusion

In this article, I have highlighted the use of map functions from the purrr package in R. Additional information is available in the R help pages and the resources cited in the references section.

Altman, Sara, Bill Behrman, and Hadley Wickham. 2021. Functional Programming. Stanford University Press.

Henry, Lionel, and Hadley Wickham. 2020. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.

R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Functional Programming in R Using Purrr

Background

First Example: Converting Celsius to Fahrenheit

Mechanics of map Functions

The map Function

The map_dbl Function

The map_chr, map_lgl and map_int Functions

The map_dfr and map_dfc Functions.

The Walk Function

The map2 Functions: Map over Multiple Inputs Simultaneously.

The pmap Function