Purrr

Let’s say we have a dataset of alcohol consumption among countries and we want to find the mean consumption of beer, spirit, wine and pure alcohol. Dataset was retrieved from fivethirtyeight.

alcohol <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv", sep = ",", stringsAsFactors = F)

With base R we would type:

beer <- mean(alcohol$beer_servings)
spirit <- mean(alcohol$spirit_servings)
wine <- mean(alcohol$wine_servings)
pure <- mean(alcohol$total_litres_of_pure_alcohol)

c(beer, spirit, wine, pure)

## [1] 106.160622  80.994819  49.450777   4.717098

As you can see, writing code like this is not very efficient because it involves a lot of copy and pasting and can make possible errors. To solve this issue of minimizing repetition with further replication, we can use purrr. purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.

purrr allows you to map functions to data.

map() makes a list.
map_lgl() makes a logical vector.
map_int() makes an integer vector.
map_dbl() makes a double vector.
map_chr() makes a character vector.

Appropriately the basic function in purrr is called map() and it transforms the input by applying a function to each element and returning a vector the same length as the input.

We can use this function to perform the same computations as above.

map_dbl(alcohol[, c(2, 3, 4, 5)], mean)

##                beer_servings              spirit_servings 
##                   106.160622                    80.994819 
##                wine_servings total_litres_of_pure_alcohol 
##                    49.450777                     4.717098

Pipes can also be used…

alcohol[, c(2, 3, 4, 5)] %>% map_dbl(mean)

##                beer_servings              spirit_servings 
##                   106.160622                    80.994819 
##                wine_servings total_litres_of_pure_alcohol 
##                    49.450777                     4.717098

Find other ‘purrr’ functions here

Stringr

This package is used to manipulate strings. For instance, if we want to gather the countries beginning with letter “S” using regular expressions or regexs. Patterns in stringr are interpreted as regexs.

unlist(str_extract_all(alcohol$country, "^S.+"))

##  [1] "South Korea"                  "St. Kitts & Nevis"           
##  [3] "St. Lucia"                    "St. Vincent & the Grenadines"
##  [5] "Samoa"                        "San Marino"                  
##  [7] "Sao Tome & Principe"          "Saudi Arabia"                
##  [9] "Senegal"                      "Serbia"                      
## [11] "Seychelles"                   "Sierra Leone"                
## [13] "Singapore"                    "Slovakia"                    
## [15] "Slovenia"                     "Solomon Islands"             
## [17] "Somalia"                      "South Africa"                
## [19] "Spain"                        "Sri Lanka"                   
## [21] "Sudan"                        "Suriname"                    
## [23] "Swaziland"                    "Sweden"                      
## [25] "Switzerland"                  "Syria"

Or count how many countries begining with “S”

str_count(alcohol$country, "^S.+")

##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [141] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

#Sum up
sum(str_count(alcohol$country, "^S.+"))

## [1] 26

Another is to convert the names of the columns to a title format instead of doing so individually

#original 
names(alcohol)

## [1] "country"                      "beer_servings"               
## [3] "spirit_servings"              "wine_servings"               
## [5] "total_litres_of_pure_alcohol"

kable(head(alcohol)) %>% kable_styling()

country	beer_servings	spirit_servings	wine_servings	total_litres_of_pure_alcohol
Afghanistan	0	0	0	0.0
Albania	89	132	54	4.9
Algeria	25	0	14	0.7
Andorra	245	138	312	12.4
Angola	217	57	45	5.9
Antigua & Barbuda	102	128	45	4.9

#to title
names(alcohol) <- str_to_title(names(alcohol), locale = "en")
names(alcohol)

## [1] "Country"                      "Beer_servings"               
## [3] "Spirit_servings"              "Wine_servings"               
## [5] "Total_litres_of_pure_alcohol"

kable(head(alcohol)) %>% kable_styling()

Country	Beer_servings	Spirit_servings	Wine_servings	Total_litres_of_pure_alcohol
Afghanistan	0	0	0	0.0
Albania	89	132	54	4.9
Algeria	25	0	14	0.7
Andorra	245	138	312	12.4
Angola	217	57	45	5.9
Antigua & Barbuda	102	128	45	4.9

TidyVerse: Purrr and Stringr

Javern Wilson

April 24, 2019

Purrr

Find other ‘purrr’ functions here

Stringr

There are many other functions within the ‘stringr’ package that can be checked out here