Alternate Example: Assignment 1

Instructions:

Prepare a report that has an interesting narrative that focuses on a subset of the data you find interesting that includes both arsenic and fluoride data. Your report should be uploaded to RPubs, and you should post a link to your RPubs report in Piazza. You are required to join the data. It is up to you to determine how to handle missing values. Your document title should be exactly Assignment 1: where and are your actual name. (10 points)

Also, the HTML document you publish to RPubs must have the following elements: * at least two level two headers ## and at least one bulleted list with at least two items (5 points)

you must create a data frame or tibble that joins both arsenic and fluoride by location. (10 points)
at least one table showing relevant data that is not so long that it overwhelms the report (consider using the head command). The code that creates the portion of the table must be displayed. (7.5 points)
at least one chart. For at least one of your charts, the code that created it must not be displayed. (7.5 points)
a narrative discussing what you find interesting along with any issues you might have had preparing the data (10 points)
published on RPubs (40 points)
clickable link posted in Piazza as a note titled “yourname’s Assignment 1,” where yourname is your actual name (10 points)

Setup

My setup chunk looks like this:
{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(scipen = 999)
library(tidyverse)
library(janitor)
library(kableExtra)
library(scales)
library(gapminder)
library(viridis)

Load data

I loaded the data and changed the names of some long column names so they’re easier to work with in R. I also got rid of a few columns in the second dataset that I won’t need, because I’m going to merge the datasets together.

coast_vs_waste <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/coastal-population-vs-mismanaged-plastic.csv") %>% 
  dplyr::rename("Mismanaged_waste" = "Mismanaged plastic waste (tonnes)", "Coastal_pop" = "Coastal population", 
                "Total_pop" = "Total population (Gapminder)", "country" = "Entity")

coast_vs_waste %>% head()

## # A tibble: 6 x 6
##   country     Code   Year Mismanaged_waste Coastal_pop Total_pop
##   <chr>       <chr> <dbl>            <dbl>       <dbl>     <dbl>
## 1 Afghanistan AFG    1800               NA          NA   3280000
## 2 Afghanistan AFG    1820               NA          NA   3280000
## 3 Afghanistan AFG    1870               NA          NA   4207000
## 4 Afghanistan AFG    1913               NA          NA   5730000
## 5 Afghanistan AFG    1950               NA          NA   8151455
## 6 Afghanistan AFG    1951               NA          NA   8276820

mismanaged_vs_gdp <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-21/per-capita-mismanaged-plastic-waste-vs-gdp-per-capita.csv") %>% 
  dplyr::rename("Waste_percapita" = "Per capita mismanaged plastic waste (kilograms per person per day)", 
                "GDP_percapita" = "GDP per capita, PPP (constant 2011 international $) (Rate)",
                "country" = "Entity") %>% 
  dplyr::select(-`Total population (Gapminder)`, -Code)

mismanaged_vs_gdp %>% head()

## # A tibble: 6 x 4
##   country      Year Waste_percapita GDP_percapita
##   <chr>       <dbl>           <dbl>         <dbl>
## 1 Afghanistan  1800              NA            NA
## 2 Afghanistan  1820              NA            NA
## 3 Afghanistan  1870              NA            NA
## 4 Afghanistan  1913              NA            NA
## 5 Afghanistan  1950              NA            NA
## 6 Afghanistan  1951              NA            NA

gap_continent<- gapminder %>% 
  dplyr::select(country, continent)

Metadata:

Plastic pollution is a major and growing problem, negatively affecting oceans and wildlife health. Our World in Data has a lot of great data at the various levels including globally, per country, and over time.

`coast_vs_waste.csv`

variable	class	description	name changed to
Entity	Character	Country Name	country
Code	Character	3 Letter country code	Code
Year	Integer (date)	Year	Year
Mismanaged plastic waste (tonnes)	double	Tonnes of mismanaged plastic waste	Mismanaged_waste
Coastal population	Double	Number of individuals living on/near coast	Coastal_pop
Total Population	double	Total population according to Gapminder	Total_pop

`mismanaged_vs_gdp.csv`

variable	class	description	name changed to
Entity	Character	Country Name	country
Code	Character	3 Letter country code	Code
Year	Integer (date)	Year	Year
Per capita mismanaged plastic waste (kg per day)	double	Amount of mismanaged plastic waste per capita in kg/day	Waste_percap
GDP per capita	Double	GDP per capita constant 2011 international $, rate	GDP_percapita
Total Population	double	Total population according to Gapminder	Total_pop

Join datasets

In this line, I joined the two datasets together using left_join() and filtered out NA values in the Mismanaged_waste column using filter(!is.na())

waste_df<-coast_vs_waste %>% 
  left_join(mismanaged_vs_gdp, by = c("country", "Year")) %>% 
  filter(!is.na(Mismanaged_waste))

waste_df %>% head(10)

## # A tibble: 10 x 8
##    country Code   Year Mismanaged_waste Coastal_pop Total_pop
##    <chr>   <chr> <dbl>            <dbl>       <dbl>     <dbl>
##  1 Albania ALB    2010            29705     2530533   3204284
##  2 Algeria DZA    2010           520555    16556580  35468208
##  3 Angola  AGO    2010            62528     3790041  19081912
##  4 Anguil~ AIA    2010               52       14561     15358
##  5 Antigu~ ATG    2010             1253       66843     88710
##  6 Argent~ ARG    2010           157777    16449245  40412376
##  7 Aruba   ABW    2010              372      137910    107488
##  8 Austra~ AUS    2010            13889    17235954  22268384
##  9 Bahamas BHS    2010             1333      341145    342877
## 10 Bahrain BHR    2010             4376      743574   1261835
## # ... with 2 more variables: Waste_percapita <dbl>, GDP_percapita <dbl>

Create a table using kable()

Most mismanaged waste (top 5 producers)

I’ve hidden this code, because I’d like you to write your own code for creating a table, but I’m just displaying the top 5 countries with the greatest mismanaged waste, and I’ve sorted the values in descending order.

country	Mismanaged_waste	Total_pop	Waste_percapita
China	8819717	1341335152	0.092
Indonesia	3216856	239870937	0.047
Philippines	1883659	93260798	0.062
Vietnam	1833819	87848445	0.090
Sri Lanka	1591179	20859949	0.299

Create a data visualization

Discuss findings

You should say some things like…
“It’s was surprising that countries with small populations had some of the largest mismanaged waste per capita because…”
or
“I was surprised that the USA had a pretty low mismanaged waste per capita. It seems like the USA has a lot of mismanaged waste to me!”
or
“It makes sense that small, wealthy nations like Sweden, Canada, and Japan would produce less mismanaged waste per capita. In contrast, developing nations like Sri Lanka would have many waste management challanges because of lack of infrastructure, which would lead to high levels of mismanaged waste despite small population sizes.”
or
“I strugged when figuring out how to remove missing values.”
or
“It was difficult to come up with an interesting and informative figure for these data because…”