Prepare a report that has an interesting narrative that focuses on a subset of the data you find interesting that includes both arsenic and fluoride data. Your report should be uploaded to RPubs, and you should post a link to your RPubs report in Piazza. You are required to join the data. It is up to you to determine how to handle missing values. Your document title should be exactly Assignment 1:
Also, the HTML document you publish to RPubs must have the following elements: * at least two level two headers ## and at least one bulleted list with at least two items (5 points)
you must create a data frame or tibble that joins both arsenic and fluoride by location. (10 points)
at least one table showing relevant data that is not so long that it overwhelms the report (consider using the head command). The code that creates the portion of the table must be displayed. (7.5 points)
at least one chart. For at least one of your charts, the code that created it must not be displayed. (7.5 points)
a narrative discussing what you find interesting along with any issues you might have had preparing the data (10 points)
published on RPubs (40 points)
clickable link posted in Piazza as a note titled “yourname’s Assignment 1,” where yourname is your actual name (10 points)
Setup
My setup chunk looks like this:
{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(scipen = 999)
library(tidyverse)
library(janitor)
library(kableExtra)
library(scales)
library(gapminder)
library(viridis)
I loaded the data and changed the names of some long column names so they’re easier to work with in R. I also got rid of a few columns in the second dataset that I won’t need, because I’m going to merge the datasets together.
coast_vs_waste <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/coastal-population-vs-mismanaged-plastic.csv") %>%
dplyr::rename("Mismanaged_waste" = "Mismanaged plastic waste (tonnes)", "Coastal_pop" = "Coastal population",
"Total_pop" = "Total population (Gapminder)", "country" = "Entity")
coast_vs_waste %>% head()
## # A tibble: 6 x 6
## country Code Year Mismanaged_waste Coastal_pop Total_pop
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1800 NA NA 3280000
## 2 Afghanistan AFG 1820 NA NA 3280000
## 3 Afghanistan AFG 1870 NA NA 4207000
## 4 Afghanistan AFG 1913 NA NA 5730000
## 5 Afghanistan AFG 1950 NA NA 8151455
## 6 Afghanistan AFG 1951 NA NA 8276820
mismanaged_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-mismanaged-plastic-waste-vs-gdp-per-capita.csv") %>%
dplyr::rename("Waste_percapita" = "Per capita mismanaged plastic waste (kilograms per person per day)",
"GDP_percapita" = "GDP per capita, PPP (constant 2011 international $) (Rate)",
"country" = "Entity") %>%
dplyr::select(-`Total population (Gapminder)`, -Code)
mismanaged_vs_gdp %>% head()
## # A tibble: 6 x 4
## country Year Waste_percapita GDP_percapita
## <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan 1800 NA NA
## 2 Afghanistan 1820 NA NA
## 3 Afghanistan 1870 NA NA
## 4 Afghanistan 1913 NA NA
## 5 Afghanistan 1950 NA NA
## 6 Afghanistan 1951 NA NA
gap_continent<- gapminder %>%
dplyr::select(country, continent)
Plastic pollution is a major and growing problem, negatively affecting oceans and wildlife health. Our World in Data has a lot of great data at the various levels including globally, per country, and over time.
coast_vs_waste.csv| variable | class | description | name changed to |
|---|---|---|---|
| Entity | Character | Country Name | country |
| Code | Character | 3 Letter country code | Code |
| Year | Integer (date) | Year | Year |
| Mismanaged plastic waste (tonnes) | double | Tonnes of mismanaged plastic waste | Mismanaged_waste |
| Coastal population | Double | Number of individuals living on/near coast | Coastal_pop |
| Total Population | double | Total population according to Gapminder | Total_pop |
mismanaged_vs_gdp.csv| variable | class | description | name changed to |
|---|---|---|---|
| Entity | Character | Country Name | country |
| Code | Character | 3 Letter country code | Code |
| Year | Integer (date) | Year | Year |
| Per capita mismanaged plastic waste (kg per day) | double | Amount of mismanaged plastic waste per capita in kg/day | Waste_percap |
| GDP per capita | Double | GDP per capita constant 2011 international $, rate | GDP_percapita |
| Total Population | double | Total population according to Gapminder | Total_pop |
In this line, I joined the two datasets together using left_join() and filtered out NA values in the Mismanaged_waste column using filter(!is.na())
waste_df<-coast_vs_waste %>%
left_join(mismanaged_vs_gdp, by = c("country", "Year")) %>%
filter(!is.na(Mismanaged_waste))
waste_df %>% head(10)
## # A tibble: 10 x 8
## country Code Year Mismanaged_waste Coastal_pop Total_pop
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Albania ALB 2010 29705 2530533 3204284
## 2 Algeria DZA 2010 520555 16556580 35468208
## 3 Angola AGO 2010 62528 3790041 19081912
## 4 Anguil~ AIA 2010 52 14561 15358
## 5 Antigu~ ATG 2010 1253 66843 88710
## 6 Argent~ ARG 2010 157777 16449245 40412376
## 7 Aruba ABW 2010 372 137910 107488
## 8 Austra~ AUS 2010 13889 17235954 22268384
## 9 Bahamas BHS 2010 1333 341145 342877
## 10 Bahrain BHR 2010 4376 743574 1261835
## # ... with 2 more variables: Waste_percapita <dbl>, GDP_percapita <dbl>
I’ve hidden this code, because I’d like you to write your own code for creating a table, but I’m just displaying the top 5 countries with the greatest mismanaged waste, and I’ve sorted the values in descending order.
| country | Mismanaged_waste | Total_pop | Waste_percapita |
|---|---|---|---|
| China | 8819717 | 1341335152 | 0.092 |
| Indonesia | 3216856 | 239870937 | 0.047 |
| Philippines | 1883659 | 93260798 | 0.062 |
| Vietnam | 1833819 | 87848445 | 0.090 |
| Sri Lanka | 1591179 | 20859949 | 0.299 |
You should say some things like…
“It’s was surprising that countries with small populations had some of the largest mismanaged waste per capita because…”
or
“I was surprised that the USA had a pretty low mismanaged waste per capita. It seems like the USA has a lot of mismanaged waste to me!”
or
“It makes sense that small, wealthy nations like Sweden, Canada, and Japan would produce less mismanaged waste per capita. In contrast, developing nations like Sri Lanka would have many waste management challanges because of lack of infrastructure, which would lead to high levels of mismanaged waste despite small population sizes.”
or
“I strugged when figuring out how to remove missing values.”
or
“It was difficult to come up with an interesting and informative figure for these data because…”