Let’s start by loading the libraries
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
library(dplyr)
library(fivethirtyeight)
Instructions: Instructions We are going to be using the drinks dataset from the fivethirtyeight package (you will need to install) reported in Mona Chalabi’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine, and Spirits?”
Replicate, as best you can, the horizontal bar chart for the four countries shown below. Hint: you will need to convert from wide to long. Publish your chart and code as a report to RPubs and link your report in the submission text below.
Grading criteria:
perfect replication - 100 points near-perfect replication (i.e., minor differences between the chart shown and your submission) - 90 points the chart shows correct data, but not a near-perfect replication - 80 points chart visually represents the data correctly but significantly differs from the one shown - 70 points chart visually represents the data incorrectly and significantly differs from the one shown - 60 points there is no chart but a long data frame was created with the proper data - 50 points
Now, let’s explore the file
?drinks
After running the helper code above, we see that drinks is a data frame with 193 rows representing countries and 5 variables:
country country
beer_servings Servings of beer in average serving sizes per person
spirit_servings Servings of spirits in average serving sizes per person
wine_servings Servings of wine in average serving sizes per person
total_litres_of_pure_alcohol Total litres of pure alcohol per person
Now, let’s do some data wrangling using tidy, filter the countries we want store it in drinks_Subgroup
drinks_Subgroup <- drinks %>%
filter(country %in% c("USA", "Seychelles", "Iceland", "Greece")) %>%
select(-total_litres_of_pure_alcohol) %>%
rename(beer = beer_servings, spirit = spirit_servings, wine = wine_servings)
drinks_Subgroup
## # A tibble: 4 x 4
## country beer spirit wine
## <chr> <int> <int> <int>
## 1 Greece 133 112 218
## 2 Iceland 233 61 78
## 3 Seychelles 157 25 51
## 4 USA 249 158 84
Now, let’s convert our data to tidy format:
drinks_smaller_tidy <- drinks_Subgroup %>%
pivot_longer(names_to = "type",
values_to = "servings",
cols = -country)
drinks_smaller_tidy
## # A tibble: 12 x 3
## country type servings
## <chr> <chr> <int>
## 1 Greece beer 133
## 2 Greece spirit 112
## 3 Greece wine 218
## 4 Iceland beer 233
## 5 Iceland spirit 61
## 6 Iceland wine 78
## 7 Seychelles beer 157
## 8 Seychelles spirit 25
## 9 Seychelles wine 51
## 10 USA beer 249
## 11 USA spirit 158
## 12 USA wine 84
Now, let’s plot our data:
library(ggplot2)
ggplot(drinks_smaller_tidy, aes(x = country, y = servings, fill = type)) +
geom_col(position = "dodge")+ coord_flip()