Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week we post a raw dataset, a chart or article related to that dataset, and ask you to explore the data. While the dataset will be “tamed”, it will not always be tidy! As such you might need to apply various R for Data Science techniques to wrangle the data into a true tidy format. The goal of TidyTuesday is to apply your R skills, get feedback, explore other’s work, and connect with the greater #RStats community! As such we encourage everyone of all skills to participate!
library(tidyverse)
library(tidytuesdayR)
tt <- tt_load("2021-01-26")
##
## Downloading file 1 of 1: `plastics.csv`
write.csv(tt$plastics, "plastics.csv")
plastics <- read.csv("plastics.csv")
Take an initial look at the format of the data available.
tt %>%
map(glimpse)
## Rows: 13,380
## Columns: 14
## $ country <chr> "Argentina", "Argentina", "Argentina", "Argentina", "A…
## $ year <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, …
## $ parent_company <chr> "Grand Total", "Unbranded", "The Coca-Cola Company", "…
## $ empty <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ hdpe <dbl> 215, 155, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ ldpe <dbl> 55, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ o <dbl> 607, 532, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0…
## $ pet <dbl> 1376, 848, 222, 39, 38, 22, 21, 26, 19, 14, 14, 14, 14…
## $ pp <dbl> 281, 122, 35, 4, 0, 7, 6, 0, 1, 4, 3, 1, 0, 0, 3, 0, 4…
## $ ps <dbl> 116, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ pvc <dbl> 18, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ grand_total <dbl> 2668, 1838, 257, 43, 38, 29, 27, 26, 20, 18, 17, 15, 1…
## $ num_events <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
## $ volunteers <dbl> 243, 243, 243, 243, 243, 243, 243, 243, 243, 243, 243,…
## $plastics
## # A tibble: 13,380 x 14
## country year parent_company empty hdpe ldpe o pet pp ps pvc
## <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Argent… 2019 Grand Total 0 215 55 607 1376 281 116 18
## 2 Argent… 2019 Unbranded 0 155 50 532 848 122 114 17
## 3 Argent… 2019 The Coca-Cola… 0 0 0 0 222 35 0 0
## 4 Argent… 2019 Secco 0 0 0 0 39 4 0 0
## 5 Argent… 2019 Doble Cola 0 0 0 0 38 0 0 0
## 6 Argent… 2019 Pritty 0 0 0 0 22 7 0 0
## 7 Argent… 2019 PepsiCo 0 0 0 0 21 6 0 0
## 8 Argent… 2019 Casoni 0 0 0 0 26 0 0 0
## 9 Argent… 2019 Villa Del Sur… 0 0 0 0 19 1 0 0
## 10 Argent… 2019 Manaos 0 0 0 0 14 4 0 0
## # … with 13,370 more rows, and 3 more variables: grand_total <dbl>,
## # num_events <dbl>, volunteers <dbl>
Explore the data and process it into a nice format for plotting! Access each dataset by name by using a dollarsign after the tt object and then the name of the data set.
Bad plot below…. Pretty plot even further down!
plastics %>%
filter(country == "United States of America") %>%
filter(parent_company != "Unbranded", parent_company != "null", parent_company != "Grand Total") %>%
group_by(parent_company) %>%
group_by(year) %>%
arrange(desc(grand_total)) %>%
top_n(7) %>%
pivot_longer(empty:grand_total, names_to = "plastic_type", values_to = "count") %>%
ggplot(aes(x = reorder(parent_company, -count), y = count)) +
geom_bar(aes(fill = plastic_type), stat = "identity")
## Warning: Removed 584 rows containing missing values (position_stack).
plastics <- readr::read_csv("plastics.csv") %>%
select(-num_events, - volunteers)
## Warning: Missing column names filled in: 'X1' [1]
devtools::install_github("ciannabp/inauguration")
library(inauguration)
inauguration("inauguration_2021_bernie")
plastics %>%
filter(country == "United States of America") %>%
filter(!parent_company %in% c("Unbranded", "null", "Grand Total")) %>%
group_by(year) %>%
arrange(desc(grand_total)) %>%
top_n(7) %>%
mutate(comapny_ordered = fct_reorder(parent_company, -grand_total)) %>%
pivot_longer(empty:grand_total, names_to = "plastic_type", values_to = "count") %>%
filter(plastic_type != "grand_total") %>%
filter(year == 2019) %>%
ggplot(aes(x = comapny_ordered, y = count)) +
geom_bar(aes(fill = plastic_type), stat = "identity") +
theme(axis.text.x=element_text(angle=45,vjust=1,hjust=1)) +
scale_fill_manual(values = inauguration("inauguration_2021_bernie")) +
xlab("Company Name")
## Warning: Removed 7 rows containing missing values (position_stack).