Javern Wilson (Orginal) Code location: https://github.com/acatlin/SPRING2019TIDYVERSE/blob/master/DATA607%20Tidyverse%20Assignment.Rmd
Mohamed (Extended) Code location : https://github.com/acatlin/SPRING2019TIDYVERSE/blob/master/tidyverse_task2_mohamed.Rmd
Let’s say we have a dataset of alcohol consumption among countries and we want to find the mean consumption of beer, spirit, wine and pure alcohol. Dataset was retrieved from fivethirtyeight.
alcohol <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv", sep = ",", stringsAsFactors = F)
alcohol## country beer_servings spirit_servings
## 1 Afghanistan 0 0
## 2 Albania 89 132
## 3 Algeria 25 0
## 4 Andorra 245 138
## 5 Angola 217 57
## 6 Antigua & Barbuda 102 128
## 7 Argentina 193 25
## 8 Armenia 21 179
## 9 Australia 261 72
## 10 Austria 279 75
## 11 Azerbaijan 21 46
## 12 Bahamas 122 176
## 13 Bahrain 42 63
## 14 Bangladesh 0 0
## 15 Barbados 143 173
## 16 Belarus 142 373
## 17 Belgium 295 84
## 18 Belize 263 114
## 19 Benin 34 4
## 20 Bhutan 23 0
## 21 Bolivia 167 41
## 22 Bosnia-Herzegovina 76 173
## 23 Botswana 173 35
## 24 Brazil 245 145
## 25 Brunei 31 2
## 26 Bulgaria 231 252
## 27 Burkina Faso 25 7
## 28 Burundi 88 0
## 29 Cote d'Ivoire 37 1
## 30 Cabo Verde 144 56
## 31 Cambodia 57 65
## 32 Cameroon 147 1
## 33 Canada 240 122
## 34 Central African Republic 17 2
## 35 Chad 15 1
## 36 Chile 130 124
## 37 China 79 192
## 38 Colombia 159 76
## 39 Comoros 1 3
## 40 Congo 76 1
## 41 Cook Islands 0 254
## 42 Costa Rica 149 87
## 43 Croatia 230 87
## 44 Cuba 93 137
## 45 Cyprus 192 154
## 46 Czech Republic 361 170
## 47 North Korea 0 0
## 48 DR Congo 32 3
## 49 Denmark 224 81
## 50 Djibouti 15 44
## 51 Dominica 52 286
## 52 Dominican Republic 193 147
## 53 Ecuador 162 74
## 54 Egypt 6 4
## 55 El Salvador 52 69
## 56 Equatorial Guinea 92 0
## 57 Eritrea 18 0
## 58 Estonia 224 194
## 59 Ethiopia 20 3
## 60 Fiji 77 35
## 61 Finland 263 133
## 62 France 127 151
## 63 Gabon 347 98
## 64 Gambia 8 0
## 65 Georgia 52 100
## 66 Germany 346 117
## 67 Ghana 31 3
## 68 Greece 133 112
## 69 Grenada 199 438
## 70 Guatemala 53 69
## 71 Guinea 9 0
## 72 Guinea-Bissau 28 31
## 73 Guyana 93 302
## 74 Haiti 1 326
## 75 Honduras 69 98
## 76 Hungary 234 215
## 77 Iceland 233 61
## 78 India 9 114
## 79 Indonesia 5 1
## 80 Iran 0 0
## 81 Iraq 9 3
## 82 Ireland 313 118
## 83 Israel 63 69
## 84 Italy 85 42
## 85 Jamaica 82 97
## 86 Japan 77 202
## 87 Jordan 6 21
## 88 Kazakhstan 124 246
## 89 Kenya 58 22
## 90 Kiribati 21 34
## 91 Kuwait 0 0
## 92 Kyrgyzstan 31 97
## 93 Laos 62 0
## 94 Latvia 281 216
## 95 Lebanon 20 55
## 96 Lesotho 82 29
## 97 Liberia 19 152
## 98 Libya 0 0
## 99 Lithuania 343 244
## 100 Luxembourg 236 133
## 101 Madagascar 26 15
## 102 Malawi 8 11
## 103 Malaysia 13 4
## 104 Maldives 0 0
## 105 Mali 5 1
## 106 Malta 149 100
## 107 Marshall Islands 0 0
## 108 Mauritania 0 0
## 109 Mauritius 98 31
## 110 Mexico 238 68
## 111 Micronesia 62 50
## 112 Monaco 0 0
## 113 Mongolia 77 189
## 114 Montenegro 31 114
## 115 Morocco 12 6
## 116 Mozambique 47 18
## 117 Myanmar 5 1
## 118 Namibia 376 3
## 119 Nauru 49 0
## 120 Nepal 5 6
## 121 Netherlands 251 88
## 122 New Zealand 203 79
## 123 Nicaragua 78 118
## 124 Niger 3 2
## 125 Nigeria 42 5
## 126 Niue 188 200
## 127 Norway 169 71
## 128 Oman 22 16
## 129 Pakistan 0 0
## 130 Palau 306 63
## 131 Panama 285 104
## 132 Papua New Guinea 44 39
## 133 Paraguay 213 117
## 134 Peru 163 160
## 135 Philippines 71 186
## 136 Poland 343 215
## 137 Portugal 194 67
## 138 Qatar 1 42
## 139 South Korea 140 16
## 140 Moldova 109 226
## 141 Romania 297 122
## 142 Russian Federation 247 326
## 143 Rwanda 43 2
## 144 St. Kitts & Nevis 194 205
## 145 St. Lucia 171 315
## 146 St. Vincent & the Grenadines 120 221
## 147 Samoa 105 18
## 148 San Marino 0 0
## 149 Sao Tome & Principe 56 38
## 150 Saudi Arabia 0 5
## 151 Senegal 9 1
## 152 Serbia 283 131
## 153 Seychelles 157 25
## 154 Sierra Leone 25 3
## 155 Singapore 60 12
## 156 Slovakia 196 293
## 157 Slovenia 270 51
## 158 Solomon Islands 56 11
## 159 Somalia 0 0
## 160 South Africa 225 76
## 161 Spain 284 157
## 162 Sri Lanka 16 104
## 163 Sudan 8 13
## 164 Suriname 128 178
## 165 Swaziland 90 2
## 166 Sweden 152 60
## 167 Switzerland 185 100
## 168 Syria 5 35
## 169 Tajikistan 2 15
## 170 Thailand 99 258
## 171 Macedonia 106 27
## 172 Timor-Leste 1 1
## 173 Togo 36 2
## 174 Tonga 36 21
## 175 Trinidad & Tobago 197 156
## 176 Tunisia 51 3
## 177 Turkey 51 22
## 178 Turkmenistan 19 71
## 179 Tuvalu 6 41
## 180 Uganda 45 9
## 181 Ukraine 206 237
## 182 United Arab Emirates 16 135
## 183 United Kingdom 219 126
## 184 Tanzania 36 6
## 185 USA 249 158
## 186 Uruguay 115 35
## 187 Uzbekistan 25 101
## 188 Vanuatu 21 18
## 189 Venezuela 333 100
## 190 Vietnam 111 2
## 191 Yemen 6 0
## 192 Zambia 32 19
## 193 Zimbabwe 64 18
## wine_servings total_litres_of_pure_alcohol
## 1 0 0.0
## 2 54 4.9
## 3 14 0.7
## 4 312 12.4
## 5 45 5.9
## 6 45 4.9
## 7 221 8.3
## 8 11 3.8
## 9 212 10.4
## 10 191 9.7
## 11 5 1.3
## 12 51 6.3
## 13 7 2.0
## 14 0 0.0
## 15 36 6.3
## 16 42 14.4
## 17 212 10.5
## 18 8 6.8
## 19 13 1.1
## 20 0 0.4
## 21 8 3.8
## 22 8 4.6
## 23 35 5.4
## 24 16 7.2
## 25 1 0.6
## 26 94 10.3
## 27 7 4.3
## 28 0 6.3
## 29 7 4.0
## 30 16 4.0
## 31 1 2.2
## 32 4 5.8
## 33 100 8.2
## 34 1 1.8
## 35 1 0.4
## 36 172 7.6
## 37 8 5.0
## 38 3 4.2
## 39 1 0.1
## 40 9 1.7
## 41 74 5.9
## 42 11 4.4
## 43 254 10.2
## 44 5 4.2
## 45 113 8.2
## 46 134 11.8
## 47 0 0.0
## 48 1 2.3
## 49 278 10.4
## 50 3 1.1
## 51 26 6.6
## 52 9 6.2
## 53 3 4.2
## 54 1 0.2
## 55 2 2.2
## 56 233 5.8
## 57 0 0.5
## 58 59 9.5
## 59 0 0.7
## 60 1 2.0
## 61 97 10.0
## 62 370 11.8
## 63 59 8.9
## 64 1 2.4
## 65 149 5.4
## 66 175 11.3
## 67 10 1.8
## 68 218 8.3
## 69 28 11.9
## 70 2 2.2
## 71 2 0.2
## 72 21 2.5
## 73 1 7.1
## 74 1 5.9
## 75 2 3.0
## 76 185 11.3
## 77 78 6.6
## 78 0 2.2
## 79 0 0.1
## 80 0 0.0
## 81 0 0.2
## 82 165 11.4
## 83 9 2.5
## 84 237 6.5
## 85 9 3.4
## 86 16 7.0
## 87 1 0.5
## 88 12 6.8
## 89 2 1.8
## 90 1 1.0
## 91 0 0.0
## 92 6 2.4
## 93 123 6.2
## 94 62 10.5
## 95 31 1.9
## 96 0 2.8
## 97 2 3.1
## 98 0 0.0
## 99 56 12.9
## 100 271 11.4
## 101 4 0.8
## 102 1 1.5
## 103 0 0.3
## 104 0 0.0
## 105 1 0.6
## 106 120 6.6
## 107 0 0.0
## 108 0 0.0
## 109 18 2.6
## 110 5 5.5
## 111 18 2.3
## 112 0 0.0
## 113 8 4.9
## 114 128 4.9
## 115 10 0.5
## 116 5 1.3
## 117 0 0.1
## 118 1 6.8
## 119 8 1.0
## 120 0 0.2
## 121 190 9.4
## 122 175 9.3
## 123 1 3.5
## 124 1 0.1
## 125 2 9.1
## 126 7 7.0
## 127 129 6.7
## 128 1 0.7
## 129 0 0.0
## 130 23 6.9
## 131 18 7.2
## 132 1 1.5
## 133 74 7.3
## 134 21 6.1
## 135 1 4.6
## 136 56 10.9
## 137 339 11.0
## 138 7 0.9
## 139 9 9.8
## 140 18 6.3
## 141 167 10.4
## 142 73 11.5
## 143 0 6.8
## 144 32 7.7
## 145 71 10.1
## 146 11 6.3
## 147 24 2.6
## 148 0 0.0
## 149 140 4.2
## 150 0 0.1
## 151 7 0.3
## 152 127 9.6
## 153 51 4.1
## 154 2 6.7
## 155 11 1.5
## 156 116 11.4
## 157 276 10.6
## 158 1 1.2
## 159 0 0.0
## 160 81 8.2
## 161 112 10.0
## 162 0 2.2
## 163 0 1.7
## 164 7 5.6
## 165 2 4.7
## 166 186 7.2
## 167 280 10.2
## 168 16 1.0
## 169 0 0.3
## 170 1 6.4
## 171 86 3.9
## 172 4 0.1
## 173 19 1.3
## 174 5 1.1
## 175 7 6.4
## 176 20 1.3
## 177 7 1.4
## 178 32 2.2
## 179 9 1.0
## 180 0 8.3
## 181 45 8.9
## 182 5 2.8
## 183 195 10.4
## 184 1 5.7
## 185 84 8.7
## 186 220 6.6
## 187 8 2.4
## 188 11 0.9
## 189 3 7.7
## 190 1 2.0
## 191 0 0.1
## 192 4 2.5
## 193 4 4.7
With base R we would type:
beer <- mean(alcohol$beer_servings)
spirit <- mean(alcohol$spirit_servings)
wine <- mean(alcohol$wine_servings)
pure <- mean(alcohol$total_litres_of_pure_alcohol)
c(beer, spirit, wine, pure)## [1] 106.160622 80.994819 49.450777 4.717098
As you can see, writing code like this is not very efficient because it involves a lot of copy and pasting and can make possible errors. To solve this issue of minimizing repetition with further replication, we can use purrr. purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
purrr allows you to map functions to data.
Appropriately the basic function in purrr is called map() and it transforms the input by applying a function to each element and returning a vector the same length as the input.
We can use this function to perform the same computations as above.
map_dbl(alcohol[, c(2, 3, 4, 5)], mean)## beer_servings spirit_servings
## 106.160622 80.994819
## wine_servings total_litres_of_pure_alcohol
## 49.450777 4.717098
Pipes can also be used…
alcohol[, c(2, 3, 4, 5)] %>% map_dbl(mean)## beer_servings spirit_servings
## 106.160622 80.994819
## wine_servings total_litres_of_pure_alcohol
## 49.450777 4.717098
This package is used to manipulate strings. For instance, if we want to gather the countries beginning with letter “S” using regular expressions or regexs. Patterns in stringr are interpreted as regexs.
unlist(str_extract_all(alcohol$country, "^S.+"))## [1] "South Korea" "St. Kitts & Nevis"
## [3] "St. Lucia" "St. Vincent & the Grenadines"
## [5] "Samoa" "San Marino"
## [7] "Sao Tome & Principe" "Saudi Arabia"
## [9] "Senegal" "Serbia"
## [11] "Seychelles" "Sierra Leone"
## [13] "Singapore" "Slovakia"
## [15] "Slovenia" "Solomon Islands"
## [17] "Somalia" "South Africa"
## [19] "Spain" "Sri Lanka"
## [21] "Sudan" "Suriname"
## [23] "Swaziland" "Sweden"
## [25] "Switzerland" "Syria"
Or count how many countries begining with “S”
str_count(alcohol$country, "^S.+")## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [71] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [106] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [141] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
## [176] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#Sum up
sum(str_count(alcohol$country, "^S.+"))## [1] 26
Another is to convert the names of the columns to a title format instead of doing so individually
#original
names(alcohol)## [1] "country" "beer_servings"
## [3] "spirit_servings" "wine_servings"
## [5] "total_litres_of_pure_alcohol"
kable(head(alcohol)) %>% kable_styling()| country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol |
|---|---|---|---|---|
| Afghanistan | 0 | 0 | 0 | 0.0 |
| Albania | 89 | 132 | 54 | 4.9 |
| Algeria | 25 | 0 | 14 | 0.7 |
| Andorra | 245 | 138 | 312 | 12.4 |
| Angola | 217 | 57 | 45 | 5.9 |
| Antigua & Barbuda | 102 | 128 | 45 | 4.9 |
#to title
names(alcohol) <- str_to_title(names(alcohol), locale = "en")
names(alcohol)## [1] "Country" "Beer_servings"
## [3] "Spirit_servings" "Wine_servings"
## [5] "Total_litres_of_pure_alcohol"
kable(head(alcohol)) %>% kable_styling()| Country | Beer_servings | Spirit_servings | Wine_servings | Total_litres_of_pure_alcohol |
|---|---|---|---|---|
| Afghanistan | 0 | 0 | 0 | 0.0 |
| Albania | 89 | 132 | 54 | 4.9 |
| Algeria | 25 | 0 | 14 | 0.7 |
| Andorra | 245 | 138 | 312 | 12.4 |
| Angola | 217 | 57 | 45 | 5.9 |
| Antigua & Barbuda | 102 | 128 | 45 | 4.9 |
alcohol_Top10 <- alcohol %>% arrange(desc(Total_litres_of_pure_alcohol)) %>% top_n(10) %>% select(Country,Beer_servings,Spirit_servings,Wine_servings) ## Selecting by Total_litres_of_pure_alcohol
alcohol_gather <- alcohol_Top10 %>% gather(Type, Servings,Beer_servings,Spirit_servings,Wine_servings) %>% arrange(desc(Servings,Country))
kable(alcohol_Top10) %>% kable_styling()| Country | Beer_servings | Spirit_servings | Wine_servings |
|---|---|---|---|
| Belarus | 142 | 373 | 42 |
| Lithuania | 343 | 244 | 56 |
| Andorra | 245 | 138 | 312 |
| Grenada | 199 | 438 | 28 |
| Czech Republic | 361 | 170 | 134 |
| France | 127 | 151 | 370 |
| Russian Federation | 247 | 326 | 73 |
| Ireland | 313 | 118 | 165 |
| Luxembourg | 236 | 133 | 271 |
| Slovakia | 196 | 293 | 116 |
ggplot(data = alcohol_gather, aes(x = Country, y = Servings, fill = Type)) + geom_col() + coord_flip() + scale_fill_brewer(palette = 12) + labs(title = "Top 10 alcohol Consuming Countires - Servings Type")