We are going to be using the drinks dataset from the fivethirtyeight package (you will need to install) reported in Mona Chalabi’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine, and Spirits?”
Replicate, as best you can, the horizontal bar chart for the four countries shown below. Hint: you will need to convert from wide to long. Publish your chart and code as a report to RPubs and link your report in the submission text below.
Grading criteria:
Assignment Due: Wednesday, November 25 Monday, November 30 at 11:59 PM
The fivethirtyeight dataset was obtained from RStudio’s Tools Install Packages menu.
See if the new dataset can be used without changes in next section.
# One-time 'fivethirtyeight' package installation from interface, then load library:
library(fivethirtyeight)
# Examine drinks tibble
library(tidyverse)
names(drinks)
## [1] "country" "beer_servings"
## [3] "spirit_servings" "wine_servings"
## [5] "total_litres_of_pure_alcohol"
head(drinks)
## # A tibble: 6 x 5
## country beer_servings spirit_servings wine_servings total_litres_of_pure…
## <chr> <int> <int> <int> <dbl>
## 1 Afghanistan 0 0 0 0
## 2 Albania 89 132 54 4.9
## 3 Algeria 25 0 14 0.7
## 4 Andorra 245 138 312 12.4
## 5 Angola 217 57 45 5.9
## 6 Antigua & B… 102 128 45 4.9
It looks like the new dataset is not immediately useful. According to E.F. Codd’s third normal form, tidy data should have the following attributes:
We already have a tibble with observations, but according to the sample chart, we only need to show country, type (of alcohol consumed), and servings (per type). But the drinks dataset has separate columns for spirits, beer and wine, making it untidy.
So, we should reshape the drinks tibble by “converting from wide to long”. This can be done by gathering these untidy columns into key/value column pairs of type and servings to tidy the dataset.
Then the spirits, beer, and wine columns, with their respective values will be transformed into rows, becoming observations of type and servings variables. The new tibble will qualify as tidy.
Click here to see the data after it has been cleaned up a bit.
# Tidy the tibble by gathering the types and servings, and drop the totals.
drinks2 <- gather(drinks, country,
key = type, value = "servings",
c(beer_servings, spirit_servings, wine_servings),
total_litres_of_pure_alcohol,
-c(country, total_litres_of_pure_alcohol)) %>%
select(country, type, servings)
# Examine the gathered drinks2 tibble
head(drinks2)
## # A tibble: 6 x 3
## country type servings
## <chr> <chr> <int>
## 1 Afghanistan beer_servings 0
## 2 Albania beer_servings 89
## 3 Algeria beer_servings 25
## 4 Andorra beer_servings 245
## 5 Angola beer_servings 217
## 6 Antigua & Barbuda beer_servings 102
Note: After gathering and selecting the desired data format, we now have a new drinks tibble with:
The next step will be to use the new tidy drinks tibble to complete the data cleanup.
Steps required to match the sample chart:
# Complete the dataset cleanup, and select the four required countries. Remove the _servings text from the type observations. Sort the results.
result <- drinks2 %>%
group_by(country) %>%
filter(country %in% c('USA', 'Seychelles', 'Iceland', 'Greece')) %>%
mutate(type = str_replace(type, "_servings", "")) %>%
arrange(desc(country)) %>%
select(country, type, servings) %>%
rowid_to_column(var = "Observation")
The resulting drinks data subset should be useful now. To confirm this, Table 1 (below) shows the results by country.
Click here to skip to the final step and see if the data produces a charts matching the sample!
# The resulting table displays beer, spirits, and wine servings observations for each of our four selected contries.
library(kableExtra)
kable(result,"html", align = 'cllc', escape = TRUE,
caption = "Table 1. Alcohol consumption observations for the USA and selected countries.",
col.names = c("Observation", "Country","Alcohol Beverage Type","# of Servings")) %>%
column_spec(column = 1, width = ".5in") %>%
column_spec(column = 2, width = ".1in") %>%
column_spec(column = 3, width = ".5in") %>%
column_spec(column = 4, width = ".5in") %>%
kable_paper("hover", full_width = F)
| Observation | Country | Alcohol Beverage Type | # of Servings |
|---|---|---|---|
| 1 | USA | beer | 249 |
| 2 | USA | spirit | 158 |
| 3 | USA | wine | 84 |
| 4 | Seychelles | beer | 157 |
| 5 | Seychelles | spirit | 25 |
| 6 | Seychelles | wine | 51 |
| 7 | Iceland | beer | 233 |
| 8 | Iceland | spirit | 61 |
| 9 | Iceland | wine | 78 |
| 10 | Greece | beer | 133 |
| 11 | Greece | spirit | 112 |
| 12 | Greece | wine | 218 |
It looks like the displayed data in the new chart below is a perfect match for the sample chart!
The resulting report is complete and ready to publish.
# We use the new drinks subset result in a graph that should match the sample provided with the assignment.
library(ggplot2)
ggplot(result) +
geom_col(aes(x=servings, y=country, fill=type),
position = "dodge", show.legend = TRUE) +
theme(panel.background = element_rect(fill = "#EBEBEB"),
panel.grid.major = element_line(size = 0.5,
linetype = 'solid', colour = "white"),
panel.grid.minor = element_line(size = 0.25,
linetype = 'solid',colour = "white"))