Assignment 4 - Dear Mona Followup - Where do people drink the most beer, wine and spirits?

Assignment 4 Description

We are going to be using the drinks dataset from the fivethirtyeight package (you will need to install) reported in Mona Chalabi’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine, and Spirits?”

Replicate, as best you can, the horizontal bar chart for the four countries shown below. Hint: you will need to convert from wide to long. Publish your chart and code as a report to RPubs and link your report in the submission text below.

Grading criteria:

perfect replication - 100 points
near-perfect replication (i.e., minor differences between the chart shown and your submission) - 90 points
the chart shows correct data, but not a near-perfect replication - 80 points
chart visually represents the data correctly but significantly differs from the one shown - 70 points
chart visually represents the data incorrectly and significantly differs from the one shown - 60 points
there is no chart but a long data frame was created with the proper data - 50 points

Assignment Due: ~~Wednesday, November 25~~ Monday, November 30 at 11:59 PM

Acquiring the Drinks Dataset

The fivethirtyeight dataset was obtained from RStudio’s Tools Install Packages menu.

See if the new dataset can be used without changes in next section.

# One-time 'fivethirtyeight' package installation from interface, then load library:
library(fivethirtyeight)

# Examine drinks tibble
library(tidyverse)
names(drinks)

## [1] "country"                      "beer_servings"               
## [3] "spirit_servings"              "wine_servings"               
## [5] "total_litres_of_pure_alcohol"

head(drinks)

## # A tibble: 6 x 5
##   country      beer_servings spirit_servings wine_servings total_litres_of_pure…
##   <chr>                <int>           <int>         <int>                 <dbl>
## 1 Afghanistan              0               0             0                   0  
## 2 Albania                 89             132            54                   4.9
## 3 Algeria                 25               0            14                   0.7
## 4 Andorra                245             138           312                  12.4
## 5 Angola                 217              57            45                   5.9
## 6 Antigua & B…           102             128            45                   4.9

Drinks are Untidy!

It looks like the new dataset is not immediately useful. According to E.F. Codd’s third normal form, tidy data should have the following attributes:

Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.

We already have a tibble with observations, but according to the sample chart, we only need to show country, type (of alcohol consumed), and servings (per type). But the drinks dataset has separate columns for spirits, beer and wine, making it untidy.

So, we should reshape the drinks tibble by “converting from wide to long”. This can be done by gathering these untidy columns into key/value column pairs of type and servings to tidy the dataset.

Then the spirits, beer, and wine columns, with their respective values will be transformed into rows, becoming observations of type and servings variables. The new tibble will qualify as tidy.

Click here to see the data after it has been cleaned up a bit.

# Tidy the tibble by gathering the types and servings, and drop the totals.
drinks2 <- gather(drinks, country, 
    key = type, value = "servings", 
    c(beer_servings, spirit_servings, wine_servings),
    total_litres_of_pure_alcohol, 
    -c(country, total_litres_of_pure_alcohol)) %>%
  select(country, type, servings)

# Examine the gathered drinks2 tibble
head(drinks2)

## # A tibble: 6 x 3
##   country           type          servings
##   <chr>             <chr>            <int>
## 1 Afghanistan       beer_servings        0
## 2 Albania           beer_servings       89
## 3 Algeria           beer_servings       25
## 4 Andorra           beer_servings      245
## 5 Angola            beer_servings      217
## 6 Antigua & Barbuda beer_servings      102

Note: After gathering and selecting the desired data format, we now have a new drinks tibble with:

Only the variables we need: country, type, servings.
Rows and columns of 579, 3 (transformed from 193, 5). This confirms that all of the original servings columns have been converted to rows, and we have dropped all unnecessary columns.

Data Cleanup, Aisle 2.

The next step will be to use the new tidy drinks tibble to complete the data cleanup.

Steps required to match the sample chart:

Filter for USA, Seychelles, Iceland, and Greece.
Mutate the "_servings" text from the gathered observations.
Arrange the result tibble by country, in descending order.
Select only necessary columns: country, type, and servings.
Add an observation number for easy reference.
Display the resulting subset: Table 1. Alcohol consumption observations for the USA and selected countries.

# Complete the dataset cleanup, and select the four required countries. Remove the _servings text from the type observations. Sort the results.
result <- drinks2 %>% 
  group_by(country) %>% 
  filter(country %in% c('USA', 'Seychelles', 'Iceland', 'Greece')) %>% 
  mutate(type = str_replace(type, "_servings", "")) %>% 
  arrange(desc(country)) %>% 
  select(country, type, servings) %>% 
  rowid_to_column(var = "Observation")

Tidy Drinks On Display.

The resulting drinks data subset should be useful now. To confirm this, Table 1 (below) shows the results by country.

Click here to skip to the final step and see if the data produces a charts matching the sample!

# The resulting table displays beer, spirits, and wine servings observations for each of our four selected contries. 
library(kableExtra)
kable(result,"html", align = 'cllc', escape = TRUE, 
        caption = "Table 1. Alcohol consumption observations for the USA and selected countries.",
        col.names = c("Observation", "Country","Alcohol Beverage Type","# of Servings")) %>% 
  column_spec(column = 1, width = ".5in") %>% 
  column_spec(column = 2, width = ".1in") %>% 
  column_spec(column = 3, width = ".5in") %>% 
  column_spec(column = 4, width = ".5in") %>% 
  kable_paper("hover", full_width = F)

Table 1. Alcohol consumption observations for the USA and selected countries.
Observation	Country	Alcohol Beverage Type	# of Servings
1	USA	beer	249
2	USA	spirit	158
3	USA	wine	84
4	Seychelles	beer	157
5	Seychelles	spirit	25
6	Seychelles	wine	51
7	Iceland	beer	233
8	Iceland	spirit	61
9	Iceland	wine	78
10	Greece	beer	133
11	Greece	spirit	112
12	Greece	wine	218

Conclusion: Charting the Course.

It looks like the displayed data in the new chart below is a perfect match for the sample chart!

The servings are on the x-axis, and the countries are on the y-axis.
Each country has columns for beer, spirit and wine.
Colors are the same for columns, the background panel, and lines.
Labels are the same for both charts.
The same legend displayed in the sample.

The resulting report is complete and ready to publish.

# We use the new drinks subset result in a graph that should match the sample provided with the assignment.
library(ggplot2)
ggplot(result) +
  geom_col(aes(x=servings, y=country, fill=type), 
           position = "dodge", show.legend = TRUE) +
  theme(panel.background = element_rect(fill = "#EBEBEB"),
        panel.grid.major = element_line(size = 0.5,
                    linetype = 'solid', colour = "white"),
        panel.grid.minor = element_line(size = 0.25,
                    linetype = 'solid',colour = "white"))