This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.

Import data

library(tidyverse)
library(scales)
theme_set(theme_light())
coast_vs_waste <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/coastal-population-vs-mismanaged-plastic.csv")
mismanaged_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-mismanaged-plastic-waste-vs-gdp-per-capita.csv")
waste_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-plastic-waste-vs-gdp-per-capita.csv")

coast_vs_waste
## # A tibble: 20,093 x 6
##    Entity  Code   Year `Mismanaged plastic… `Coastal populat… `Total population…
##    <chr>   <chr> <dbl>                <dbl>             <dbl>              <dbl>
##  1 Afghan… AFG    1800                   NA                NA            3280000
##  2 Afghan… AFG    1820                   NA                NA            3280000
##  3 Afghan… AFG    1870                   NA                NA            4207000
##  4 Afghan… AFG    1913                   NA                NA            5730000
##  5 Afghan… AFG    1950                   NA                NA            8151455
##  6 Afghan… AFG    1951                   NA                NA            8276820
##  7 Afghan… AFG    1952                   NA                NA            8407148
##  8 Afghan… AFG    1953                   NA                NA            8542906
##  9 Afghan… AFG    1954                   NA                NA            8684494
## 10 Afghan… AFG    1955                   NA                NA            8832253
## # … with 20,083 more rows
mismanaged_vs_gdp
## # A tibble: 22,204 x 6
##    Entity  Code   Year `Per capita mismana… `GDP per capita, P… `Total populati…
##    <chr>   <chr> <dbl>                <dbl>               <dbl>            <dbl>
##  1 Afghan… AFG    1800                   NA                  NA          3280000
##  2 Afghan… AFG    1820                   NA                  NA          3280000
##  3 Afghan… AFG    1870                   NA                  NA          4207000
##  4 Afghan… AFG    1913                   NA                  NA          5730000
##  5 Afghan… AFG    1950                   NA                  NA          8151455
##  6 Afghan… AFG    1951                   NA                  NA          8276820
##  7 Afghan… AFG    1952                   NA                  NA          8407148
##  8 Afghan… AFG    1953                   NA                  NA          8542906
##  9 Afghan… AFG    1954                   NA                  NA          8684494
## 10 Afghan… AFG    1955                   NA                  NA          8832253
## # … with 22,194 more rows
waste_vs_gdp
## # A tibble: 22,204 x 6
##    Entity  Code   Year `Per capita plasti… `GDP per capita, PP… `Total populati…
##    <chr>   <chr> <dbl>               <dbl>                <dbl>            <dbl>
##  1 Afghan… AFG    1800                  NA                   NA          3280000
##  2 Afghan… AFG    1820                  NA                   NA          3280000
##  3 Afghan… AFG    1870                  NA                   NA          4207000
##  4 Afghan… AFG    1913                  NA                   NA          5730000
##  5 Afghan… AFG    1950                  NA                   NA          8151455
##  6 Afghan… AFG    1951                  NA                   NA          8276820
##  7 Afghan… AFG    1952                  NA                   NA          8407148
##  8 Afghan… AFG    1953                  NA                   NA          8542906
##  9 Afghan… AFG    1954                  NA                   NA          8684494
## 10 Afghan… AFG    1955                  NA                   NA          8832253
## # … with 22,194 more rows
library(janitor)
# Data cleaning
clean_dataset <- function(tbl) {
  tbl %>%
    clean_names() %>%
    rename(country = entity,
           country_code = code) %>%
    filter(year == 2010) %>%
    select(-year)
}
plastic_waste <- coast_vs_waste %>%
  clean_dataset() %>%
  select(-total_population_gapminder) %>%
  inner_join(clean_dataset(mismanaged_vs_gdp) %>%
               select(-total_population_gapminder), by = c("country", "country_code")) %>%
  inner_join(clean_dataset(waste_vs_gdp), by = c("country", "country_code")) %>%
  select(country,
         country_code,
         mismanaged_waste = mismanaged_plastic_waste_tonnes,
         coastal_population,
         total_population = total_population_gapminder,
         mismanaged_per_capita = per_capita_mismanaged_plastic_waste_kilograms_per_person_per_day,
         gdp_per_capita = gdp_per_capita_ppp_constant_2011_international_rate) %>%
  filter(!is.na(mismanaged_waste))

plastic_waste
## # A tibble: 187 x 7
##    country country_code mismanaged_waste coastal_populat… total_population
##    <chr>   <chr>                   <dbl>            <dbl>            <dbl>
##  1 Albania ALB                     29705          2530533          3204284
##  2 Algeria DZA                    520555         16556580         35468208
##  3 Angola  AGO                     62528          3790041         19081912
##  4 Anguil… AIA                        52            14561            15358
##  5 Antigu… ATG                      1253            66843            88710
##  6 Argent… ARG                    157777         16449245         40412376
##  7 Aruba   ABW                       372           137910           107488
##  8 Austra… AUS                     13889         17235954         22268384
##  9 Bahamas BHS                      1333           341145           342877
## 10 Bahrain BHR                      4376           743574          1261835
## # … with 177 more rows, and 2 more variables: mismanaged_per_capita <dbl>,
## #   gdp_per_capita <dbl>

Description of the data and definition of variables

Dave uses data from the Global Plastic Waste data set. This data set actually includes three other data sets in orrder to accumate the correct information. The three data sets are coast_vs_waste, mismanaged_vs_gdp, and waste_vs_gdp. The first data set or coast_vs_waste has five columns which include the labels entity or country, code which is a three letter representation of the country’s name, year, mismanaged plastic waste in tonnes, the coastal population of the country, and the total population of the country. The rows are split up by year in order to pull information from a specific year easily. The data set mismanaged_vs_gdp is aplit into the columns entity, code, year, per capita mismanaged plastic waste, GDP per capita, and total population. This data’s rows are also defined by the year of the information. The final data set has columns that are labeled entity, code, year, per capita plastic waste, gdp per capita, and total population. Again, this data set has rows defined by each year that the data is from. One interesting thing that Dave had to do to this data was to clean all of it using the “janitor package”. Dave also used filters in order to make the data easier to read and more compatible. The data sets all had a large amount of “NA” in the columns. These were from the earlier years where we may have not had the technology to find the involved information. To make the data more relevant, Dave used a filter to find years where there were no “NA” in the row. Most of the data was reduced down to 2010 to the present.

Visualize data

Hint: One graph of your choice.

g1 <- plastic_waste %>%
  arrange(-total_population) %>%
  mutate(pct_population_coastal = pmin(1, coastal_population / total_population),
         high_coastal_pop = ifelse(pct_population_coastal >= .8, ">=80%", "<80%")) %>%
  ggplot(aes(gdp_per_capita, mismanaged_per_capita)) +
  geom_point(aes(size = total_population)) +
  geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
  scale_x_log10(labels = dollar_format()) +
  scale_y_log10() +
  scale_size_continuous(guide = FALSE) +
  labs(x = "GDP per capita",
       y = "Mismanaged plastic waste (kg per person per day)",
       color = "Coastal population",
       title = "How plastic waste mismanagement correlates with country income",
       subtitle = "Based in Our World in Data 2010 numbers. Size represents total population")
g1
## Warning: Removed 42 rows containing missing values (geom_point).
## Warning: Removed 39 rows containing missing values (geom_text).

What is the story behind the graph?

This graph represents how much mismanaged plastic waste there is in a country (per person, per day, in kilograms) compared to the countries GDP per capita. This would be a great way to find some sort of relationship between GDP per capita and the amount of plastic pollution there is in a country. The plot’s size are also representation of the population of the country.

Hide the messages, but display the code and its results on the webpage.

Write your name for the author at the top.

Use the correct slug.