This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
coast_vs_waste <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/coastal-population-vs-mismanaged-plastic.csv")
## Parsed with column specification:
## cols(
## Entity = col_character(),
## Code = col_character(),
## Year = col_double(),
## `Mismanaged plastic waste (tonnes)` = col_double(),
## `Coastal population` = col_double(),
## `Total population (Gapminder)` = col_double()
## )
mismanaged_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-mismanaged-plastic-waste-vs-gdp-per-capita.csv")
## Parsed with column specification:
## cols(
## Entity = col_character(),
## Code = col_character(),
## Year = col_double(),
## `Per capita mismanaged plastic waste (kilograms per person per day)` = col_double(),
## `GDP per capita, PPP (constant 2011 international $) (Rate)` = col_double(),
## `Total population (Gapminder)` = col_double()
## )
waste_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-plastic-waste-vs-gdp-per-capita.csv")
## Parsed with column specification:
## cols(
## Entity = col_character(),
## Code = col_character(),
## Year = col_double(),
## `Per capita plastic waste (kilograms per person per day)` = col_double(),
## `GDP per capita, PPP (constant 2011 international $) (constant 2011 international $)` = col_double(),
## `Total population (Gapminder)` = col_double()
## )
coast_vs_waste
## # A tibble: 20,093 x 6
## Entity Code Year `Mismanaged plastic… `Coastal populat… `Total population…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1800 NA NA 3280000
## 2 Afghan… AFG 1820 NA NA 3280000
## 3 Afghan… AFG 1870 NA NA 4207000
## 4 Afghan… AFG 1913 NA NA 5730000
## 5 Afghan… AFG 1950 NA NA 8151455
## 6 Afghan… AFG 1951 NA NA 8276820
## 7 Afghan… AFG 1952 NA NA 8407148
## 8 Afghan… AFG 1953 NA NA 8542906
## 9 Afghan… AFG 1954 NA NA 8684494
## 10 Afghan… AFG 1955 NA NA 8832253
## # … with 20,083 more rows
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
# Data cleaning
clean_dataset <- function(tbl) {
tbl %>%
clean_names() %>%
rename(country = entity,
country_code = code) %>%
filter(year == 2010) %>%
select(-year)
}
plastic_waste <- coast_vs_waste %>%
clean_dataset() %>%
select(-total_population_gapminder) %>%
inner_join(clean_dataset(mismanaged_vs_gdp) %>%
select(-total_population_gapminder), by = c("country", "country_code")) %>%
inner_join(clean_dataset(waste_vs_gdp), by = c("country", "country_code")) %>%
select(country,
country_code,
mismanaged_waste = mismanaged_plastic_waste_tonnes,
coastal_population,
total_population = total_population_gapminder,
mismanaged_per_capita = per_capita_mismanaged_plastic_waste_kilograms_per_person_per_day,
gdp_per_capita = gdp_per_capita_ppp_constant_2011_international_rate) %>%
filter(!is.na(mismanaged_waste))
mismanaged_vs_gdp
## # A tibble: 22,204 x 6
## Entity Code Year `Per capita mismana… `GDP per capita, P… `Total populati…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1800 NA NA 3280000
## 2 Afghan… AFG 1820 NA NA 3280000
## 3 Afghan… AFG 1870 NA NA 4207000
## 4 Afghan… AFG 1913 NA NA 5730000
## 5 Afghan… AFG 1950 NA NA 8151455
## 6 Afghan… AFG 1951 NA NA 8276820
## 7 Afghan… AFG 1952 NA NA 8407148
## 8 Afghan… AFG 1953 NA NA 8542906
## 9 Afghan… AFG 1954 NA NA 8684494
## 10 Afghan… AFG 1955 NA NA 8832253
## # … with 22,194 more rows
waste_vs_gdp
## # A tibble: 22,204 x 6
## Entity Code Year `Per capita plasti… `GDP per capita, PP… `Total populati…
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… AFG 1800 NA NA 3280000
## 2 Afghan… AFG 1820 NA NA 3280000
## 3 Afghan… AFG 1870 NA NA 4207000
## 4 Afghan… AFG 1913 NA NA 5730000
## 5 Afghan… AFG 1950 NA NA 8151455
## 6 Afghan… AFG 1951 NA NA 8276820
## 7 Afghan… AFG 1952 NA NA 8407148
## 8 Afghan… AFG 1953 NA NA 8542906
## 9 Afghan… AFG 1954 NA NA 8684494
## 10 Afghan… AFG 1955 NA NA 8832253
## # … with 22,194 more rows
The describtion of the data set is variable waste amounts catagorized by 3 letter symbols for the countries and defined by amount in metric tons. The data also offers many other variables like pouplation, more specificly costal population and the year of the data being recorded which he later evaluated as N/A for almost all the data given so he filtered it out to narrow his data results. Showing how he clean his data gives me better ways in applying them to my own skill set in cleaning my won data. The data specificly in coast vs. waste targets more on specifily waste levels on the coasts of countires rather than the total.
Hint: One graph of your choice.
ggplot(plastic_waste, aes(gdp_per_capita, mismanaged_per_capita)) +
geom_point() +
scale_x_log10() +
scale_y_log10()
The story behind this graphs shows that very rich countries in those groups have relativly low rate of plstic waste and mismanagement and also shows a trent that countries with low wealth have a high rate of plastic waste. Theres no direct tight correalation but is seen primarily though middle-class families that waste and mismanage the most waste my Kg. There is a strong correlation within the graph that shows a lower gdp per capita results in high ratings in waste by counties.