TidyTuesday

Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week we post a raw dataset, a chart or article related to that dataset, and ask you to explore the data. While the dataset will be “tamed”, it will not always be tidy! As such you might need to apply various R for Data Science techniques to wrangle the data into a true tidy format. The goal of TidyTuesday is to apply your R skills, get feedback, explore other’s work, and connect with the greater #RStats community! As such we encourage everyone of all skills to participate!

library(tidyverse)
library(tidytuesdayR)

tt <- tt_load("2021-01-26")

## 
##  Downloading file 1 of 1: `plastics.csv`

write.csv(tt$plastics, "plastics.csv")

plastics <- read.csv("plastics.csv")

Glimpse Data

Take an initial look at the format of the data available.

tt %>% 
  map(glimpse)

## Rows: 13,380
## Columns: 14
## $ country        <chr> "Argentina", "Argentina", "Argentina", "Argentina", "A…
## $ year           <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, …
## $ parent_company <chr> "Grand Total", "Unbranded", "The Coca-Cola Company", "…
## $ empty          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ hdpe           <dbl> 215, 155, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ ldpe           <dbl> 55, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ o              <dbl> 607, 532, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 0, 0, 0…
## $ pet            <dbl> 1376, 848, 222, 39, 38, 22, 21, 26, 19, 14, 14, 14, 14…
## $ pp             <dbl> 281, 122, 35, 4, 0, 7, 6, 0, 1, 4, 3, 1, 0, 0, 3, 0, 4…
## $ ps             <dbl> 116, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ pvc            <dbl> 18, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ grand_total    <dbl> 2668, 1838, 257, 43, 38, 29, 27, 26, 20, 18, 17, 15, 1…
## $ num_events     <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
## $ volunteers     <dbl> 243, 243, 243, 243, 243, 243, 243, 243, 243, 243, 243,…

## $plastics
## # A tibble: 13,380 x 14
##    country  year parent_company empty  hdpe  ldpe     o   pet    pp    ps   pvc
##    <chr>   <dbl> <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Argent…  2019 Grand Total        0   215    55   607  1376   281   116    18
##  2 Argent…  2019 Unbranded          0   155    50   532   848   122   114    17
##  3 Argent…  2019 The Coca-Cola…     0     0     0     0   222    35     0     0
##  4 Argent…  2019 Secco              0     0     0     0    39     4     0     0
##  5 Argent…  2019 Doble Cola         0     0     0     0    38     0     0     0
##  6 Argent…  2019 Pritty             0     0     0     0    22     7     0     0
##  7 Argent…  2019 PepsiCo            0     0     0     0    21     6     0     0
##  8 Argent…  2019 Casoni             0     0     0     0    26     0     0     0
##  9 Argent…  2019 Villa Del Sur…     0     0     0     0    19     1     0     0
## 10 Argent…  2019 Manaos             0     0     0     0    14     4     0     0
## # … with 13,370 more rows, and 3 more variables: grand_total <dbl>,
## #   num_events <dbl>, volunteers <dbl>

Wrangle

Explore the data and process it into a nice format for plotting! Access each dataset by name by using a dollarsign after the tt object and then the name of the data set.

Bad plot below…. Pretty plot even further down!

plastics %>%
  filter(country == "United States of America") %>%
  filter(parent_company != "Unbranded", parent_company != "null", parent_company != "Grand Total") %>%
  group_by(parent_company) %>%
  group_by(year) %>%
  arrange(desc(grand_total)) %>%
  top_n(7) %>%
  pivot_longer(empty:grand_total, names_to = "plastic_type", values_to = "count") %>%
  ggplot(aes(x = reorder(parent_company, -count), y = count)) +
  geom_bar(aes(fill = plastic_type), stat = "identity")

## Warning: Removed 584 rows containing missing values (position_stack).

plastics <- readr::read_csv("plastics.csv") %>%
  select(-num_events, - volunteers)

## Warning: Missing column names filled in: 'X1' [1]

devtools::install_github("ciannabp/inauguration")
library(inauguration)
inauguration("inauguration_2021_bernie")

plastics %>%
  filter(country == "United States of America") %>%
  filter(!parent_company %in% c("Unbranded", "null", "Grand Total")) %>%
  group_by(year) %>%
  arrange(desc(grand_total)) %>%
  top_n(7) %>%
  mutate(comapny_ordered = fct_reorder(parent_company, -grand_total)) %>%
  pivot_longer(empty:grand_total, names_to = "plastic_type", values_to = "count") %>%
  filter(plastic_type != "grand_total") %>%
  filter(year == 2019) %>%
  ggplot(aes(x = comapny_ordered, y = count)) +
  geom_bar(aes(fill = plastic_type), stat = "identity") +
  theme(axis.text.x=element_text(angle=45,vjust=1,hjust=1)) +
  scale_fill_manual(values = inauguration("inauguration_2021_bernie")) +
  xlab("Company Name")

## Warning: Removed 7 rows containing missing values (position_stack).

Plastics and Corporations

2021-01-26

TidyTuesday

Glimpse Data

Wrangle