Let’s say we have a dataset of alcohol consumption among countries and we want to find the mean consumption of beer, spirit, wine and pure alcohol. Dataset was retrieved from fivethirtyeight.
alcohol <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv", sep = ",", stringsAsFactors = F)
With base R we would type:
beer <- mean(alcohol$beer_servings)
spirit <- mean(alcohol$spirit_servings)
wine <- mean(alcohol$wine_servings)
pure <- mean(alcohol$total_litres_of_pure_alcohol)
c(beer, spirit, wine, pure)
## [1] 106.160622 80.994819 49.450777 4.717098
As you can see, writing code like this is not very efficient because it involves a lot of copy and pasting and can make possible errors. To solve this issue of minimizing repetition with further replication, we can use purrr
. purrr
enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
purrr
allows you to map functions to data.
Appropriately the basic function in purrr is called map() and it transforms the input by applying a function to each element and returning a vector the same length as the input.
We can use this function to perform the same computations as above.
map_dbl(alcohol[, c(2, 3, 4, 5)], mean)
## beer_servings spirit_servings
## 106.160622 80.994819
## wine_servings total_litres_of_pure_alcohol
## 49.450777 4.717098
Pipes can also be used…
alcohol[, c(2, 3, 4, 5)] %>% map_dbl(mean)
## beer_servings spirit_servings
## 106.160622 80.994819
## wine_servings total_litres_of_pure_alcohol
## 49.450777 4.717098
This package is used to manipulate strings. For instance, if we want to gather the countries beginning with letter “S” using regular expressions or regexs. Patterns in stringr are interpreted as regexs.
unlist(str_extract_all(alcohol$country, "^S.+"))
## [1] "South Korea" "St. Kitts & Nevis"
## [3] "St. Lucia" "St. Vincent & the Grenadines"
## [5] "Samoa" "San Marino"
## [7] "Sao Tome & Principe" "Saudi Arabia"
## [9] "Senegal" "Serbia"
## [11] "Seychelles" "Sierra Leone"
## [13] "Singapore" "Slovakia"
## [15] "Slovenia" "Solomon Islands"
## [17] "Somalia" "South Africa"
## [19] "Spain" "Sri Lanka"
## [21] "Sudan" "Suriname"
## [23] "Swaziland" "Sweden"
## [25] "Switzerland" "Syria"
Or count how many countries begining with “S”
str_count(alcohol$country, "^S.+")
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [141] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#Sum up
sum(str_count(alcohol$country, "^S.+"))
## [1] 26
Another is to convert the names of the columns to a title format instead of doing so individually
#original
names(alcohol)
## [1] "country" "beer_servings"
## [3] "spirit_servings" "wine_servings"
## [5] "total_litres_of_pure_alcohol"
kable(head(alcohol)) %>% kable_styling()
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol |
---|---|---|---|---|
Afghanistan | 0 | 0 | 0 | 0.0 |
Albania | 89 | 132 | 54 | 4.9 |
Algeria | 25 | 0 | 14 | 0.7 |
Andorra | 245 | 138 | 312 | 12.4 |
Angola | 217 | 57 | 45 | 5.9 |
Antigua & Barbuda | 102 | 128 | 45 | 4.9 |
#to title
names(alcohol) <- str_to_title(names(alcohol), locale = "en")
names(alcohol)
## [1] "Country" "Beer_servings"
## [3] "Spirit_servings" "Wine_servings"
## [5] "Total_litres_of_pure_alcohol"
kable(head(alcohol)) %>% kable_styling()
Country | Beer_servings | Spirit_servings | Wine_servings | Total_litres_of_pure_alcohol |
---|---|---|---|---|
Afghanistan | 0 | 0 | 0 | 0.0 |
Albania | 89 | 132 | 54 | 4.9 |
Algeria | 25 | 0 | 14 | 0.7 |
Andorra | 245 | 138 | 312 | 12.4 |
Angola | 217 | 57 | 45 | 5.9 |
Antigua & Barbuda | 102 | 128 | 45 | 4.9 |