##Introduction

This notebook goes through the use of map, map2, and pmap in the tidyverse purr package. We will start with the use of tibble, which is also contained in the purr package. We will use this function to create a list of numbers 1-26 to test the use of the map functions.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
num_list <- tibble(x = 1:26)
head(num_list)
## # A tibble: 6 x 1
##       x
##   <int>
## 1     1
## 2     2
## 3     3
## 4     4
## 5     5
## 6     6

###Map(x, f)

Map takes three main arguments ‘x’ which is a list that you want to map, ‘f’ which is the function you want to preform the map on. Map will apply some sort of change to elements in a list and return a result. The function can be as complicated or as simple as you like. The same number of lists that are passed into the function will be in the result 1:1.

# returns a list of numbers double the size
num_list %>% map(function(x) x*2)
## $x
##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
## [26] 52
# returns a list of numbers half the size
num_list %>% map(function(x) x*.5)
## $x
##  [1]  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5
## [16]  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0

###Map2(x, y, f)

Map2 two takes in the same arguments as map with one addition. ‘y’ which is the second vector, which need to the same size and greater than one. Map2 will perform a change using two lists and output a single list. The function will use both elements in the row to perform the transformation.

x = 6:10
y = 51:55
# this will return the sum of both columns to a sinlge list of same size
map2(x, y,  `+`)
## [[1]]
## [1] 57
## 
## [[2]]
## [1] 59
## 
## [[3]]
## [1] 61
## 
## [[4]]
## [1] 63
## 
## [[5]]
## [1] 65
# this will return the product of the two columns to a list of same size
map2(x, y, `*`)
## [[1]]
## [1] 306
## 
## [[2]]
## [1] 364
## 
## [[3]]
## [1] 424
## 
## [[4]]
## [1] 486
## 
## [[5]]
## [1] 550

###Pmap(l,f)

Pmap takes in two arguments, ‘f’ which we have explained before and, ‘l’ which is a list of vectors, such as a data frame. The length of .l determines the number of arguments that .f will be called with. List names will be used if present. Pmap can perform a change to multiple variables and output a single list or dataframe.

three_num_list <- tibble(x = 0:5, y = 10:15, z = 20:25)
three_num_list
## # A tibble: 6 x 3
##       x     y     z
##   <int> <int> <int>
## 1     0    10    20
## 2     1    11    21
## 3     2    12    22
## 4     3    13    23
## 5     4    14    24
## 6     5    15    25
#takes the sum of all three columns
pmap(list(three_num_list$x,three_num_list$y, three_num_list$z ), sum)
## [[1]]
## [1] 30
## 
## [[2]]
## [1] 33
## 
## [[3]]
## [1] 36
## 
## [[4]]
## [1] 39
## 
## [[5]]
## [1] 42
## 
## [[6]]
## [1] 45

Using Real Data

In this code block we will load in data from fivethirtyeight’s dataset on alchohol consumtion by country.

library(RCurl)
## 
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
## 
##     complete
data <- getURL('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv')
alchohol_df <- read.csv(text = data)
head(alchohol_df)
##             country beer_servings spirit_servings wine_servings
## 1       Afghanistan             0               0             0
## 2           Albania            89             132            54
## 3           Algeria            25               0            14
## 4           Andorra           245             138           312
## 5            Angola           217              57            45
## 6 Antigua & Barbuda           102             128            45
##   total_litres_of_pure_alcohol
## 1                          0.0
## 2                          4.9
## 3                          0.7
## 4                         12.4
## 5                          5.9
## 6                          4.9
#map_chr can apply map to character vectors
alchohol_df$country <- (alchohol_df$country) %>% map_chr(paste0, ': Surveyed')
head(alchohol_df)
##                       country beer_servings spirit_servings wine_servings
## 1       Afghanistan: Surveyed             0               0             0
## 2           Albania: Surveyed            89             132            54
## 3           Algeria: Surveyed            25               0            14
## 4           Andorra: Surveyed           245             138           312
## 5            Angola: Surveyed           217              57            45
## 6 Antigua & Barbuda: Surveyed           102             128            45
##   total_litres_of_pure_alcohol
## 1                          0.0
## 2                          4.9
## 3                          0.7
## 4                         12.4
## 5                          5.9
## 6                          4.9
product_function <- function(arg1, arg2){
col <- arg1 * arg2
x <- as.data.frame(col)
}
alchohol_df$beer_wine_prod <- (map2_df(alchohol_df$beer_servings, alchohol_df$wine_servings, product_function))
head(alchohol_df$beer_wine_prod)
##     col
## 1     0
## 2  4806
## 3   350
## 4 76440
## 5  9765
## 6  4590
head(pmap(list(alchohol_df$beer_servings, alchohol_df$spirit_servings, alchohol_df$wine_servings), mean))
## [[1]]
## [1] 0
## 
## [[2]]
## [1] 89
## 
## [[3]]
## [1] 25
## 
## [[4]]
## [1] 245
## 
## [[5]]
## [1] 217
## 
## [[6]]
## [1] 102

###Information From R Help Docs

Map Description: The map functions transform their input by applying a function to each element and returning a vector the same length as the input.

Map2 & Pmap Description These functions are variants of map() that iterate over multiple arguments simultaneously. They are parallel in the sense that each input is processed in parallel with the others, not in the sense of multicore computing. They share the same notion of “parallel” as base::pmax() and base::pmin(). map2() and walk2() are specialised for the two argument case; pmap() and pwalk() allow you to provide any number of arguments in a list. Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row. map_dfr(), pmap_dfr() and map2_dfc(), pmap_dfc() return data frames created by row-binding and column-binding respectively. They require dplyr to be installed.