This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
library(scales)
theme_set(theme_light())
coast_vs_waste <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/coastal-population-vs-mismanaged-plastic.csv")
mismanaged_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-mismanaged-plastic-waste-vs-gdp-per-capita.csv")
waste_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-plastic-waste-vs-gdp-per-capita.csv")
coast_vs_waste
## # A tibble: 20,093 x 6
## Entity Code Year `Mismanaged plastic… `Coastal populat… `Total population…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1800 NA NA 3280000
## 2 Afghan… AFG 1820 NA NA 3280000
## 3 Afghan… AFG 1870 NA NA 4207000
## 4 Afghan… AFG 1913 NA NA 5730000
## 5 Afghan… AFG 1950 NA NA 8151455
## 6 Afghan… AFG 1951 NA NA 8276820
## 7 Afghan… AFG 1952 NA NA 8407148
## 8 Afghan… AFG 1953 NA NA 8542906
## 9 Afghan… AFG 1954 NA NA 8684494
## 10 Afghan… AFG 1955 NA NA 8832253
## # … with 20,083 more rows
mismanaged_vs_gdp
## # A tibble: 22,204 x 6
## Entity Code Year `Per capita mismana… `GDP per capita, P… `Total populati…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1800 NA NA 3280000
## 2 Afghan… AFG 1820 NA NA 3280000
## 3 Afghan… AFG 1870 NA NA 4207000
## 4 Afghan… AFG 1913 NA NA 5730000
## 5 Afghan… AFG 1950 NA NA 8151455
## 6 Afghan… AFG 1951 NA NA 8276820
## 7 Afghan… AFG 1952 NA NA 8407148
## 8 Afghan… AFG 1953 NA NA 8542906
## 9 Afghan… AFG 1954 NA NA 8684494
## 10 Afghan… AFG 1955 NA NA 8832253
## # … with 22,194 more rows
waste_vs_gdp
## # A tibble: 22,204 x 6
## Entity Code Year `Per capita plasti… `GDP per capita, PP… `Total populati…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1800 NA NA 3280000
## 2 Afghan… AFG 1820 NA NA 3280000
## 3 Afghan… AFG 1870 NA NA 4207000
## 4 Afghan… AFG 1913 NA NA 5730000
## 5 Afghan… AFG 1950 NA NA 8151455
## 6 Afghan… AFG 1951 NA NA 8276820
## 7 Afghan… AFG 1952 NA NA 8407148
## 8 Afghan… AFG 1953 NA NA 8542906
## 9 Afghan… AFG 1954 NA NA 8684494
## 10 Afghan… AFG 1955 NA NA 8832253
## # … with 22,194 more rows
library(janitor)
# Data cleaning
clean_dataset <- function(tbl) {
tbl %>%
clean_names() %>%
rename(country = entity,
country_code = code) %>%
filter(year == 2010) %>%
select(-year)
}
plastic_waste <- coast_vs_waste %>%
clean_dataset() %>%
select(-total_population_gapminder) %>%
inner_join(clean_dataset(mismanaged_vs_gdp) %>%
select(-total_population_gapminder), by = c("country", "country_code")) %>%
inner_join(clean_dataset(waste_vs_gdp), by = c("country", "country_code")) %>%
select(country,
country_code,
mismanaged_waste = mismanaged_plastic_waste_tonnes,
coastal_population,
total_population = total_population_gapminder,
mismanaged_per_capita = per_capita_mismanaged_plastic_waste_kilograms_per_person_per_day,
gdp_per_capita = gdp_per_capita_ppp_constant_2011_international_rate) %>%
filter(!is.na(mismanaged_waste))
plastic_waste
## # A tibble: 187 x 7
## country country_code mismanaged_waste coastal_populat… total_population
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Albania ALB 29705 2530533 3204284
## 2 Algeria DZA 520555 16556580 35468208
## 3 Angola AGO 62528 3790041 19081912
## 4 Anguil… AIA 52 14561 15358
## 5 Antigu… ATG 1253 66843 88710
## 6 Argent… ARG 157777 16449245 40412376
## 7 Aruba ABW 372 137910 107488
## 8 Austra… AUS 13889 17235954 22268384
## 9 Bahamas BHS 1333 341145 342877
## 10 Bahrain BHR 4376 743574 1261835
## # … with 177 more rows, and 2 more variables: mismanaged_per_capita <dbl>,
## # gdp_per_capita <dbl>
Dave uses data from the Global Plastic Waste data set. This data set actually includes three other data sets in orrder to accumate the correct information. The three data sets are coast_vs_waste, mismanaged_vs_gdp, and waste_vs_gdp. The first data set or coast_vs_waste has five columns which include the labels entity or country, code which is a three letter representation of the country’s name, year, mismanaged plastic waste in tonnes, the coastal population of the country, and the total population of the country. The rows are split up by year in order to pull information from a specific year easily. The data set mismanaged_vs_gdp is aplit into the columns entity, code, year, per capita mismanaged plastic waste, GDP per capita, and total population. This data’s rows are also defined by the year of the information. The final data set has columns that are labeled entity, code, year, per capita plastic waste, gdp per capita, and total population. Again, this data set has rows defined by each year that the data is from. One interesting thing that Dave had to do to this data was to clean all of it using the “janitor package”. Dave also used filters in order to make the data easier to read and more compatible. The data sets all had a large amount of “NA” in the columns. These were from the earlier years where we may have not had the technology to find the involved information. To make the data more relevant, Dave used a filter to find years where there were no “NA” in the row. Most of the data was reduced down to 2010 to the present.
Hint: One graph of your choice.
g1 <- plastic_waste %>%
arrange(-total_population) %>%
mutate(pct_population_coastal = pmin(1, coastal_population / total_population),
high_coastal_pop = ifelse(pct_population_coastal >= .8, ">=80%", "<80%")) %>%
ggplot(aes(gdp_per_capita, mismanaged_per_capita)) +
geom_point(aes(size = total_population)) +
geom_text(aes(label = country), vjust = 1, hjust = 1, check_overlap = TRUE) +
scale_x_log10(labels = dollar_format()) +
scale_y_log10() +
scale_size_continuous(guide = FALSE) +
labs(x = "GDP per capita",
y = "Mismanaged plastic waste (kg per person per day)",
color = "Coastal population",
title = "How plastic waste mismanagement correlates with country income",
subtitle = "Based in Our World in Data 2010 numbers. Size represents total population")
g1
## Warning: Removed 42 rows containing missing values (geom_point).
## Warning: Removed 39 rows containing missing values (geom_text).
This graph represents how much mismanaged plastic waste there is in a country (per person, per day, in kilograms) compared to the countries GDP per capita. This would be a great way to find some sort of relationship between GDP per capita and the amount of plastic pollution there is in a country. The plot’s size are also representation of the population of the country.