Purrr

Let’s say we have a dataset of alcohol consumption among countries and we want to find the mean consumption of beer, spirit, wine and pure alcohol. Dataset was retrieved from fivethirtyeight.

alcohol <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv", sep = ",", stringsAsFactors = F)


With base R we would type:

beer <- mean(alcohol$beer_servings)
spirit <- mean(alcohol$spirit_servings)
wine <- mean(alcohol$wine_servings)
pure <- mean(alcohol$total_litres_of_pure_alcohol)

c(beer, spirit, wine, pure)
## [1] 106.160622  80.994819  49.450777   4.717098

As you can see, writing code like this is not very efficient because it involves a lot of copy and pasting and can make possible errors. To solve this issue of minimizing repetition with further replication, we can use purrr. purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.

purrr allows you to map functions to data.

  • map() makes a list.
  • map_lgl() makes a logical vector.
  • map_int() makes an integer vector.
  • map_dbl() makes a double vector.
  • map_chr() makes a character vector.

Appropriately the basic function in purrr is called map() and it transforms the input by applying a function to each element and returning a vector the same length as the input.

We can use this function to perform the same computations as above.

map_dbl(alcohol[, c(2, 3, 4, 5)], mean)
##                beer_servings              spirit_servings 
##                   106.160622                    80.994819 
##                wine_servings total_litres_of_pure_alcohol 
##                    49.450777                     4.717098


Pipes can also be used…

alcohol[, c(2, 3, 4, 5)] %>% map_dbl(mean)
##                beer_servings              spirit_servings 
##                   106.160622                    80.994819 
##                wine_servings total_litres_of_pure_alcohol 
##                    49.450777                     4.717098


Find other ‘purrr’ functions here


Stringr

This package is used to manipulate strings. For instance, if we want to gather the countries beginning with letter “S” using regular expressions or regexs. Patterns in stringr are interpreted as regexs.

unlist(str_extract_all(alcohol$country, "^S.+"))
##  [1] "South Korea"                  "St. Kitts & Nevis"           
##  [3] "St. Lucia"                    "St. Vincent & the Grenadines"
##  [5] "Samoa"                        "San Marino"                  
##  [7] "Sao Tome & Principe"          "Saudi Arabia"                
##  [9] "Senegal"                      "Serbia"                      
## [11] "Seychelles"                   "Sierra Leone"                
## [13] "Singapore"                    "Slovakia"                    
## [15] "Slovenia"                     "Solomon Islands"             
## [17] "Somalia"                      "South Africa"                
## [19] "Spain"                        "Sri Lanka"                   
## [21] "Sudan"                        "Suriname"                    
## [23] "Swaziland"                    "Sweden"                      
## [25] "Switzerland"                  "Syria"

Or count how many countries begining with “S”

str_count(alcohol$country, "^S.+")
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [141] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#Sum up
sum(str_count(alcohol$country, "^S.+"))
## [1] 26


Another is to convert the names of the columns to a title format instead of doing so individually

#original 
names(alcohol)
## [1] "country"                      "beer_servings"               
## [3] "spirit_servings"              "wine_servings"               
## [5] "total_litres_of_pure_alcohol"
kable(head(alcohol)) %>% kable_styling()
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol
Afghanistan 0 0 0 0.0
Albania 89 132 54 4.9
Algeria 25 0 14 0.7
Andorra 245 138 312 12.4
Angola 217 57 45 5.9
Antigua & Barbuda 102 128 45 4.9
#to title
names(alcohol) <- str_to_title(names(alcohol), locale = "en")
names(alcohol)
## [1] "Country"                      "Beer_servings"               
## [3] "Spirit_servings"              "Wine_servings"               
## [5] "Total_litres_of_pure_alcohol"
kable(head(alcohol)) %>% kable_styling()
Country Beer_servings Spirit_servings Wine_servings Total_litres_of_pure_alcohol
Afghanistan 0 0 0 0.0
Albania 89 132 54 4.9
Algeria 25 0 14 0.7
Andorra 245 138 312 12.4
Angola 217 57 45 5.9
Antigua & Barbuda 102 128 45 4.9


There are many other functions within the ‘stringr’ package that can be checked out here