Foundations of Functional programming with purrr

1. Simplifying iteration and lists with purrr
2. More complex iterations
3. Troubleshooting lists
- What if one of your elements isn’t a number when you need it to be
- Another way
4. Problem solving
- What if the values you want are buried?
- Graphs - ggplot2

This report is a summary of lesson by Datacamp

1. Simplifying iteration and lists with purrr

map(object, function)

object: can be a vector or a list
function: any function in R that takes the input offered by the object

str(wesanderson)

## List of 15
##  $ GrandBudapest : chr [1:4] "#F1BB7B" "#FD6467" "#5B1A18" "#D67236"
##  $ Moonrise1     : chr [1:4] "#F3DF6C" "#CEAB07" "#D5D5D3" "#24281A"
##  $ Royal1        : chr [1:4] "#899DA4" "#C93312" "#FAEFD1" "#DC863B"
##  $ Moonrise2     : chr [1:4] "#798E87" "#C27D38" "#CCC591" "#29211F"
##  $ Cavalcanti    : chr [1:5] "#D8B70A" "#02401B" "#A2A475" "#81A88D" ...
##  $ Royal2        : chr [1:5] "#9A8822" "#F5CDB4" "#F8AFA8" "#FDDDA0" ...
##  $ GrandBudapest2: chr [1:4] "#E6A0C4" "#C6CDF7" "#D8A499" "#7294D4"
##  $ Moonrise3     : chr [1:5] "#85D4E3" "#F4B5BD" "#9C964A" "#CDC08C" ...
##  $ Chevalier     : chr [1:4] "#446455" "#FDD262" "#D3DDDC" "#C7B19C"
##  $ Zissou        : chr [1:5] "#3B9AB2" "#78B7C5" "#EBCC2A" "#E1AF00" ...
##  $ FantasticFox  : chr [1:5] "#DD8D29" "#E2D200" "#46ACC8" "#E58601" ...
##  $ Darjeeling    : chr [1:5] "#FF0000" "#00A08A" "#F2AD00" "#F98400" ...
##  $ Rushmore      : chr [1:5] "#E1BD6D" "#EABE94" "#0B775E" "#35274A" ...
##  $ BottleRocket  : chr [1:7] "#A42820" "#5F5647" "#9B110E" "#3F5151" ...
##  $ Darjeeling2   : chr [1:5] "#ECCBAE" "#046C9A" "#D69C4E" "#ABDDDE" ...

map(wesanderson, ~length(.x))

## $GrandBudapest
## [1] 4
## 
## $Moonrise1
## [1] 4
## 
## $Royal1
## [1] 4
## 
## $Moonrise2
## [1] 4
## 
## $Cavalcanti
## [1] 5
## 
## $Royal2
## [1] 5
## 
## $GrandBudapest2
## [1] 4
## 
## $Moonrise3
## [1] 5
## 
## $Chevalier
## [1] 4
## 
## $Zissou
## [1] 5
## 
## $FantasticFox
## [1] 5
## 
## $Darjeeling
## [1] 5
## 
## $Rushmore
## [1] 5
## 
## $BottleRocket
## [1] 7
## 
## $Darjeeling2
## [1] 5

2. More complex iterations

using pipes %>%

map_chr: character vector
map_lgl: logical vector[TRUE or FALSE]
map_int: integer vector
map_dbl: double vector
map_df

# Set names so each element of the list is named for the film title
sw_films_named <- sw_films %>% 
  set_names(map_chr(sw_films, "title"))

# Check to see if the names worked/are correct
names(sw_films_named)

## [1] "A New Hope"              "Attack of the Clones"   
## [3] "The Phantom Menace"      "Revenge of the Sith"    
## [5] "Return of the Jedi"      "The Empire Strikes Back"
## [7] "The Force Awakens"

# Create a list of values from 1 through 10
numlist <- list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Iterate over the numlist 
map(numlist, ~.x %>% sqrt() %>% sin())

## [[1]]
## [1] 0.841471
## 
## [[2]]
## [1] 0.9877659
## 
## [[3]]
## [1] 0.9870266
## 
## [[4]]
## [1] 0.9092974
## 
## [[5]]
## [1] 0.7867491
## 
## [[6]]
## [1] 0.6381576
## 
## [[7]]
## [1] 0.4757718
## 
## [[8]]
## [1] 0.3080717
## 
## [[9]]
## [1] 0.14112
## 
## [[10]]
## [1] -0.02068353

Simulating data

# List of sites north, east, and west
sites <- list("north", "east", "west")

# Create a list of data frames, each with a years, a, and b column
list_of_df <-  map(sites,  
  ~data.frame(sites = .x,
       a = rnorm(mean = 5, n = 200, sd = (5/2)),
       b = rnorm(mean = 200, n = 200, sd = 15)))

str(list_of_df)

## List of 3
##  $ :'data.frame':    200 obs. of  3 variables:
##   ..$ sites: chr [1:200] "north" "north" "north" "north" ...
##   ..$ a    : num [1:200] 5.93 7 6.37 8.44 2.71 ...
##   ..$ b    : num [1:200] 190 204 202 209 214 ...
##  $ :'data.frame':    200 obs. of  3 variables:
##   ..$ sites: chr [1:200] "east" "east" "east" "east" ...
##   ..$ a    : num [1:200] 4.62 7.11 9.81 4.63 3.26 ...
##   ..$ b    : num [1:200] 201 183 189 219 185 ...
##  $ :'data.frame':    200 obs. of  3 variables:
##   ..$ sites: chr [1:200] "west" "west" "west" "west" ...
##   ..$ a    : num [1:200] 4.67 2.59 4.84 8.1 7.34 ...
##   ..$ b    : num [1:200] 171 203 204 211 224 ...

Run a linear model

# Map over the models to look at the relationship of a vs. b
list_of_df %>%
    map(~ lm(a ~ b, data = .)) %>%
    map(summary)

## [[1]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6297 -1.5939 -0.0227  1.7435  6.0581 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 4.204233   2.419925   1.737   0.0839 .
## b           0.004737   0.012127   0.391   0.6965  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.484 on 198 degrees of freedom
## Multiple R-squared:  0.0007699,  Adjusted R-squared:  -0.004277 
## F-statistic: 0.1526 on 1 and 198 DF,  p-value: 0.6965
## 
## 
## [[2]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9136 -1.5043  0.1051  1.6355  6.8671 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 4.452565   2.311250   1.926   0.0555 .
## b           0.001106   0.011573   0.096   0.9239  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.343 on 198 degrees of freedom
## Multiple R-squared:  4.614e-05,  Adjusted R-squared:  -0.005004 
## F-statistic: 0.009137 on 1 and 198 DF,  p-value: 0.9239
## 
## 
## [[3]]
## 
## Call:
## lm(formula = a ~ b, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1749 -1.6561 -0.1603  1.8163  5.5570 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  6.307431   2.335960   2.700  0.00753 **
## b           -0.005617   0.011772  -0.477  0.63379   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.372 on 198 degrees of freedom
## Multiple R-squared:  0.001148,   Adjusted R-squared:  -0.003896 
## F-statistic: 0.2277 on 1 and 198 DF,  p-value: 0.6338

map_*()

vector 로 반환됨

# Pull out the director element of sw_films in a list and character vector
map_chr(sw_films, ~.x[["director"]])

## [1] "George Lucas"     "George Lucas"     "George Lucas"     "George Lucas"    
## [5] "Richard Marquand" "Irvin Kershner"   "J. J. Abrams"

# Compare outputs when checking if director is George Lucas
map_lgl(sw_films, ~.x[["director"]] == "George Lucas")

## [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

# Pull out episode_id element as double vector
map_dbl(sw_films, ~.x[["episode_id"]])

## [1] 4 2 1 3 6 5 7

# Pull out episode_id element as integer vector
map_int(sw_films, ~.x[["episode_id"]])

## [1] 4 2 1 3 6 5 7

map2() and pmap()

map2(): 2 list

# List of 1, 2 and 3
means <- list(1, 2, 3)
# Create sites list
sites <- list("north", "west", "east")

# Map over two arguments: sites and means
list_of_files_map2 <- map2(means, sites, ~data.frame(sites = .y,
                           a = rnorm(mean = .x, n = 200, sd = (5/2))))

pmap(): 3 list or more map(), map2()과 다르게 input list들을 모두 묶은 하나의 list가 필요함

# Create a master list, a list of lists
sigma <- list(1, 2, 3)
means2 <- list(0.5, 1, 1.5)
sigma2 <- list(0.5, 1, 1.5)

pmapinputs <- list(sites = sites,  means = means, sigma = sigma, 
                   means2 = means2, sigma2 = sigma2)

# Map over the master list
list_of_files_pmap <- pmap(pmapinputs, 
  function(sites, means, sigma, means2, sigma2) 
    data.frame(sites = sites,
        a = rnorm(mean = means, n = 200, sd = sigma),
        b = rnorm(mean = means2, n = 200, sd = sigma2)))

3. Troubleshooting lists

What if one of your elements isn’t a number when you need it to be

safely(): to identify where in the list an issue is occuring
- map 안에서 사용가능하며 $result와 $error list가 동시에 만들어짐
- transpose()를 통해 error가 발생한 element를 먼저 reordering 할 수 있음

safely() replace with NA

# Map safely over log
a <- list(1, "I can", 10, 0, "purrr") %>%
      map(safely(log, otherwise = NA_real_)) %>%
    # Transpose the result
      transpose()
# Print the list
a

## $result
## $result[[1]]
## [1] 0
## 
## $result[[2]]
## [1] NA
## 
## $result[[3]]
## [1] 2.302585
## 
## $result[[4]]
## [1] -Inf
## 
## $result[[5]]
## [1] NA
## 
## 
## $error
## $error[[1]]
## NULL
## 
## $error[[2]]
## <simpleError in .Primitive("log")(x, base): 수학함수에 숫자가 아닌 인자가 전달되었습니다>
## 
## $error[[3]]
## NULL
## 
## $error[[4]]
## NULL
## 
## $error[[5]]
## <simpleError in .Primitive("log")(x, base): 수학함수에 숫자가 아닌 인자가 전달되었습니다>

# Print the result element in the list
a[["result"]]

## [[1]]
## [1] 0
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] 2.302585
## 
## [[4]]
## [1] -Inf
## 
## [[5]]
## [1] NA

# Print the error element in the list
a[["error"]]

## [[1]]
## NULL
## 
## [[2]]
## <simpleError in .Primitive("log")(x, base): 수학함수에 숫자가 아닌 인자가 전달되었습니다>
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## <simpleError in .Primitive("log")(x, base): 수학함수에 숫자가 아닌 인자가 전달되었습니다>

Convert data to numeric

height_cm <- map(sw_people, "height") %>% 
  map(function(x) {
    ifelse(x == "unknown", NA, as.numeric(x))
  })

str(height_cm)

## List of 87
##  $ : num 172
##  $ : num 167
##  $ : num 96
##  $ : num 202
##  $ : num 150
##  $ : num 178
##  $ : num 165
##  $ : num 97
##  $ : num 183
##  $ : num 182
##  $ : num 188
##  $ : num 180
##  $ : num 228
##  $ : num 180
##  $ : num 173
##  $ : num 175
##  $ : num 170
##  $ : num 180
##  $ : num 66
##  $ : num 170
##  $ : num 183
##  $ : num 200
##  $ : num 190
##  $ : num 177
##  $ : num 175
##  $ : num 180
##  $ : num 150
##  $ : logi NA
##  $ : num 88
##  $ : num 160
##  $ : num 193
##  $ : num 191
##  $ : num 170
##  $ : num 196
##  $ : num 224
##  $ : num 206
##  $ : num 183
##  $ : num 137
##  $ : num 112
##  $ : num 183
##  $ : num 163
##  $ : num 175
##  $ : num 180
##  $ : num 178
##  $ : num 94
##  $ : num 122
##  $ : num 163
##  $ : num 188
##  $ : num 198
##  $ : num 196
##  $ : num 171
##  $ : num 184
##  $ : num 188
##  $ : num 264
##  $ : num 188
##  $ : num 196
##  $ : num 185
##  $ : num 157
##  $ : num 183
##  $ : num 183
##  $ : num 170
##  $ : num 166
##  $ : num 165
##  $ : num 193
##  $ : num 191
##  $ : num 183
##  $ : num 168
##  $ : num 198
##  $ : num 229
##  $ : num 213
##  $ : num 167
##  $ : num 79
##  $ : num 96
##  $ : num 193
##  $ : num 191
##  $ : num 178
##  $ : num 216
##  $ : num 234
##  $ : num 188
##  $ : num 178
##  $ : num 206
##  $ : logi NA
##  $ : logi NA
##  $ : logi NA
##  $ : logi NA
##  $ : logi NA
##  $ : num 165

Finding the problem areas

height_ft <- map(sw_people , "height") %>% 
  map(safely(function(x){
    x * 0.0328084
    # print errors
  }, quiet = FALSE)) %>% 
transpose()

height_ft[["result"]]

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL
## 
## [[11]]
## NULL
## 
## [[12]]
## NULL
## 
## [[13]]
## NULL
## 
## [[14]]
## NULL
## 
## [[15]]
## NULL
## 
## [[16]]
## NULL
## 
## [[17]]
## NULL
## 
## [[18]]
## NULL
## 
## [[19]]
## NULL
## 
## [[20]]
## NULL
## 
## [[21]]
## NULL
## 
## [[22]]
## NULL
## 
## [[23]]
## NULL
## 
## [[24]]
## NULL
## 
## [[25]]
## NULL
## 
## [[26]]
## NULL
## 
## [[27]]
## NULL
## 
## [[28]]
## NULL
## 
## [[29]]
## NULL
## 
## [[30]]
## NULL
## 
## [[31]]
## NULL
## 
## [[32]]
## NULL
## 
## [[33]]
## NULL
## 
## [[34]]
## NULL
## 
## [[35]]
## NULL
## 
## [[36]]
## NULL
## 
## [[37]]
## NULL
## 
## [[38]]
## NULL
## 
## [[39]]
## NULL
## 
## [[40]]
## NULL
## 
## [[41]]
## NULL
## 
## [[42]]
## NULL
## 
## [[43]]
## NULL
## 
## [[44]]
## NULL
## 
## [[45]]
## NULL
## 
## [[46]]
## NULL
## 
## [[47]]
## NULL
## 
## [[48]]
## NULL
## 
## [[49]]
## NULL
## 
## [[50]]
## NULL
## 
## [[51]]
## NULL
## 
## [[52]]
## NULL
## 
## [[53]]
## NULL
## 
## [[54]]
## NULL
## 
## [[55]]
## NULL
## 
## [[56]]
## NULL
## 
## [[57]]
## NULL
## 
## [[58]]
## NULL
## 
## [[59]]
## NULL
## 
## [[60]]
## NULL
## 
## [[61]]
## NULL
## 
## [[62]]
## NULL
## 
## [[63]]
## NULL
## 
## [[64]]
## NULL
## 
## [[65]]
## NULL
## 
## [[66]]
## NULL
## 
## [[67]]
## NULL
## 
## [[68]]
## NULL
## 
## [[69]]
## NULL
## 
## [[70]]
## NULL
## 
## [[71]]
## NULL
## 
## [[72]]
## NULL
## 
## [[73]]
## NULL
## 
## [[74]]
## NULL
## 
## [[75]]
## NULL
## 
## [[76]]
## NULL
## 
## [[77]]
## NULL
## 
## [[78]]
## NULL
## 
## [[79]]
## NULL
## 
## [[80]]
## NULL
## 
## [[81]]
## NULL
## 
## [[82]]
## NULL
## 
## [[83]]
## NULL
## 
## [[84]]
## NULL
## 
## [[85]]
## NULL
## 
## [[86]]
## NULL
## 
## [[87]]
## NULL

height_ft[["error"]]

## [[1]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[2]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[3]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[4]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[5]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[6]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[7]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[8]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[9]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[10]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[11]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[12]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[13]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[14]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[15]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[16]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[17]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[18]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[19]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[20]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[21]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[22]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[23]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[24]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[25]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[26]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[27]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[28]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[29]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[30]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[31]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[32]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[33]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[34]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[35]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[36]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[37]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[38]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[39]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[40]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[41]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[42]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[43]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[44]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[45]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[46]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[47]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[48]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[49]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[50]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[51]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[52]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[53]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[54]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[55]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[56]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[57]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[58]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[59]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[60]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[61]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[62]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[63]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[64]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[65]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[66]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[67]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[68]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[69]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[70]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[71]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[72]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[73]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[74]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[75]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[76]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[77]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[78]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[79]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[80]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[81]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[82]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[83]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[84]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[85]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[86]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>
## 
## [[87]]
## <simpleError in x * 0.0328084: 이항연산자에 수치가 아닌 인수입니다>

Another way

possibly(): safely()와 동일하지만 error 메시지가 없음

# Take the log of each element in the list
list(1, "I can", 10, 0, "purrr") %>% 
  map(possibly(function(x){
    log(x)
}, otherwise = NA_real_))

## [[1]]
## [1] 0
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] 2.302585
## 
## [[4]]
## [1] -Inf
## 
## [[5]]
## [1] NA

# Create a piped workflow that returns double vectors
height_cm %>%  
  map_dbl(possibly(function(x){
  # Convert centimeters to feet
    x * 0.0328084
}, otherwise = NA_real_))

##  [1] 5.643045 5.479003 3.149606 6.627297 4.921260 5.839895 5.413386 3.182415
##  [9] 6.003937 5.971129 6.167979 5.905512 7.480315 5.905512 5.675853 5.741470
## [17] 5.577428 5.905512 2.165354 5.577428 6.003937 6.561680 6.233596 5.807087
## [25] 5.741470 5.905512 4.921260       NA 2.887139 5.249344 6.332021 6.266404
## [33] 5.577428 6.430446 7.349082 6.758530 6.003937 4.494751 3.674541 6.003937
## [41] 5.347769 5.741470 5.905512 5.839895 3.083990 4.002625 5.347769 6.167979
## [49] 6.496063 6.430446 5.610236 6.036746 6.167979 8.661418 6.167979 6.430446
## [57] 6.069554 5.150919 6.003937 6.003937 5.577428 5.446194 5.413386 6.332021
## [65] 6.266404 6.003937 5.511811 6.496063 7.513124 6.988189 5.479003 2.591864
## [73] 3.149606 6.332021 6.266404 5.839895 7.086614 7.677166 6.167979 5.839895
## [81] 6.758530       NA       NA       NA       NA       NA 5.413386

more cleaner, more human-readable

walk()

Comparing walk() vs. no walk() outputs

short_list <- list(-10, 1, 10)

short_list

## [[1]]
## [1] -10
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 10

walk(short_list, print)

## [1] -10
## [1] 1
## [1] 10

walk() for printing cleaner list outputs

# Map over the first 10 elements of gap_split
plots <- map2(gap_split[1:3], 
              names(gap_split[1:3]), 
              ~ ggplot(.x, aes(year, lifeExp)) + 
                geom_line() +
                labs(title = .y))

# Object name, then function name
walk(plots, print)

4. Problem solving

Checking for names
- names
- map_chr
- set_names

gh_users_named <- gh_users %>% 
    set_names(map_chr(gh_users,"name"))

names(gh_users_named)

## [1] "Gábor Csárdi"           "Jennifer (Jenny) Bryan" "Jeff L."               
## [4] "Julia Silge"            "Thomas J. Leeper"       "Maëlle Salmon"

Asking questions from a list
1. Which user joined GitHub first?

map_chr(gh_users, ~.x[["created_at"]]) %>%
  set_names(map_chr(gh_users, "name")) %>%
  sort()

## Jennifer (Jenny) Bryan           Gábor Csárdi                Jeff L. 
## "2011-02-03T22:37:41Z" "2011-03-09T17:29:25Z" "2012-03-24T18:16:43Z" 
##       Thomas J. Leeper          Maëlle Salmon            Julia Silge 
## "2013-02-07T21:07:00Z" "2014-08-05T08:10:04Z" "2015-05-19T02:51:23Z"

Are all the repositories user-owned, rather than organization-owned?

map_lgl(gh_users, ~.x[["type"]] == "User")

## [1] TRUE TRUE TRUE TRUE TRUE TRUE

Which user has the most public repositories?

map_int(gh_users, ~.x[["public_repos"]]) %>% 
  set_names(map_chr(gh_users, "name")) %>% 
  sort()

##            Julia Silge          Maëlle Salmon           Gábor Csárdi 
##                     26                     31                     52 
##                Jeff L.       Thomas J. Leeper Jennifer (Jenny) Bryan 
##                     67                     99                    168

What if the values you want are buried?

# Map over gh_repos to generate numeric output
map(gh_repos, 
    ~map_dbl(.x, 
             ~.x[["size"]])) %>%
    # Grab the largest element
    map(~max(.x))

## [[1]]
## [1] 39461
## 
## [[2]]
## [1] 96325
## 
## [[3]]
## [1] 374812
## 
## [[4]]
## [1] 24070
## 
## [[5]]
## [1] 558176
## 
## [[6]]
## [1] 76455

Graphs - ggplot2

# Create a data frame with four columns
map_df(gh_users, ~data.frame(
  login = .x[["login"]],
  name = .x[["name"]],
  followers = .x[["followers"]],
  public_repos = .x[["public_repos"]]
)) %>% 
  # Plot followers by public_repos
  ggplot(., aes(x = followers, y = public_repos)) +
  geom_point()

# or 
map_df(gh_users, `[`, 
       c("login","name","followers","public_repos"))

## # A tibble: 6 × 4
##   login       name                   followers public_repos
##   <chr>       <chr>                      <int>        <int>
## 1 gaborcsardi Gábor Csárdi                 303           52
## 2 jennybc     Jennifer (Jenny) Bryan       780          168
## 3 jtleek      Jeff L.                     3958           67
## 4 juliasilge  Julia Silge                  115           26
## 5 leeper      Thomas J. Leeper             213           99
## 6 masalmon    Maëlle Salmon                 34           31

What is the distribution of heights of characters in each of the Star Wars films?

film_by_character <- tibble(filmtitle = map_chr(sw_films, "title")) %>% 
  mutate(filmtitle,
         characters = map(sw_films, "characters")) %>% 
  unnest(cols = c(characters))

sw_characters <- map_df(sw_people, `[`, c("height", "mass", "name", "url"))

character_data <- inner_join(film_by_character, sw_characters, by = c("characters" = "url")) %>% 
  mutate(height = as.numeric(height),
         mass = as.numeric(mass))

ggplot(character_data, aes(x =height)) +
  geom_histogram(stat = "count") +
  facet_wrap(~filmtitle)

Foundations of Functional programming with purrr

lumpen95

2025-02-07

1. Simplifying iteration and lists with purrr

2. More complex iterations

Simulating data

Run a linear model

map_*()

map2() and pmap()

3. Troubleshooting lists

What if one of your elements isn’t a number when you need it to be

safely() replace with NA

Convert data to numeric

Finding the problem areas

Another way

more cleaner, more human-readable

4. Problem solving

What if the values you want are buried?

Graphs - ggplot2