FinalProjectDraft

library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.2.0     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

Problem Statement

Plastic production continues to increase by millions of tons annually, with 18% of it coming solely from North America and 46% from Asia (Thomas, et al. 2024).

With this sheer amount of plastic production, there is worry about managing plastic waste. While there are multiple types, one key factor of most plastics is their persistence; Plastic bags take about 100 years to decompose effectively (Dey, et al. 2023). Currently, the most common factors of plastic waste management are decomposition, incineration, and land filling (Dey, et al. 2023). Each of these methods create harmful byproducts for environments and the atmosphere. Land filling and discarded plastic in particular have detrimental effects on coastal ecosystems: specifically coral. When coral reefs come into contact with plastic, “likelihood of disease increases from 4% to 89%” (Lamb, et al. 2018).

One question that may arise: What about recycling? Journalists from the Guardian found that most of the plastic recycled in the US ends up shipped to other countries and sold for profit. In 2015, about 1.6m tons of recycled plastic was sent to China and Hong Kong for processing, however the majority was contaminated or non-recyclable, adding to the plastic waste issue (McCormick, et al. 2019).

We plan on looking at this data of exported plastic from the US and compare it to both quality of life in these countries as well as the health of coral reefs along their coasts. We hope to answer our hypothesis:

Is there a relationship between US plastic exports and coral bleaching from microplastics?

US Plastic Exports

Sums of Plastic Exports

We want to get the sum of plastic exports in each country, to ultimately find which countries are taking in the most total exports.

exports <- read.csv("data/PlasticExportsAllCountries.csv")
exports$Country <- replace(exports$Country, exports$Country=="Korea, South", "South Korea") # fix names 
countriesALL <- unique(exports$Country)
sums <- c()
for (co in countriesALL){
  temp <- exports |>
    filter(Country == co)
  sums <- c(sums, sum(temp$Vessel.Total.Exports.SWT..kg.))
}
total <- data.frame(countriesALL, sums) |>
  arrange(desc(sums)) |>
  slice(1:10)

These will be our actual countries of interest, these are the top 10 countries that have taken the largest sum of US plastics from 2002-2025.

countries <- total$countriesALL
plastics <- exports |>
  filter(Country %in% countries)
countries
 [1] "China"       "Turkey"      "India"       "South Korea" "Taiwan"     
 [6] "Thailand"    "Malaysia"    "Vietnam"     "Mexico"      "Indonesia"  

Change Over Time

Now we can plot each of these top 10 countries, to see how much plastic they import from the US has changed over time.

plastics |>
  ggplot(aes(
    x=Time, 
    y=Vessel.Total.Exports.SWT..kg., 
    group=Country, 
    color=Country)) +
  geom_line(linewidth = 1) +
  ylim(0,1e+10) +
  scale_color_manual(values = c("China" = "red", 
                                "India" = "orange", 
                                "Indonesia" = "yellow", 
                                "South Korea" = "green", 
                                "Malaysia" = "cyan", 
                                "Mexico" = "blue", 
                                "Taiwan" = "purple", 
                                "Thailand"="magenta", 
                                "Turkey"="brown", 
                                "Vietnam" = "black"))

China used to take so many imports it goes off the grid!! We can see those specific values below.

china <- plastics |>
  filter(Country == "China")
china[4:5]
                Time Vessel.Total.Exports.SWT..kg.
1               2002                    6837504264
2               2003                    9657904605
3               2004                   10056645383
4               2005                   12644015078
5               2006                   13366723862
6               2007                   15130617662
7               2008                   15347431005
8               2009                   21320328280
9               2010                   17548150307
10              2011                   21869331900
11              2012                   19426783263
12              2013                   18037054581
13              2014                   16449414539
14              2015                   16450206494
15              2016                   16026925729
16              2017                   13780362926
17              2018                    8833965905
18              2019                    5819363579
19              2020                    4800312085
20              2021                     870630978
21              2022                     748688923
22              2023                     877117693
23              2024                     844556759
24 2025 through July                     151550283

Looking at the plot, what interesting changes do you see? One general trend is how other countries seem to increase after China’s sharp decrease. But as of 2025, each country seems to be decreasing how much plastic they are accepting from the US.

Difference in 2025

We can also see if these are still the top 10 countries in 2025. Let’s get the 2025 top 10 below.

top10_2025 <- plastics |>
  filter(Time == "2025 through July") |>
  arrange(desc(Vessel.Total.Exports.SWT..kg.)) |>
  slice(1:10)
top10_2025 <- top10_2025$Country
top10_2025
 [1] "India"       "Turkey"      "Thailand"    "Vietnam"     "Malaysia"   
 [6] "Taiwan"      "South Korea" "Mexico"      "Indonesia"   "China"      

Let’s see the difference between the total (2002-2025) top 10 and the 2025 top 10!

intersect(countries, top10_2025)
 [1] "China"       "Turkey"      "India"       "South Korea" "Taiwan"     
 [6] "Thailand"    "Malaysia"    "Vietnam"     "Mexico"      "Indonesia"  

They are still the same! Even with stricter import laws in Hong Kong and China, they are still at the top of the list for US plastic imports, although the actual placements in the top 10 may have changed.

Quality of Life

Combining data

We have data for the GDP and Life expectancy in each country over time, lets combine that with our plastic data to see if there are any trends with quality of life and how much plastic is imported from the US.

quality <- read.csv("data/life-expectancy-vs-gdp-per-capita.csv")
plastics <- plastics |>
      mutate(Time = as.integer(Time))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `Time = as.integer(Time)`.
Caused by warning:
! NAs introduced by coercion
qualityPlastic <- left_join(x = quality, y = plastics, by=join_by(Entity == Country, Year == Time))

Now we have added a column in our country data to include the kilograms of plastic imported from the US in each country.

Plotting the data

Lets start by plotting our top 10 countries using the most recent data we have in 2022. We can size each point by how much plastic it imported from the US in that year.

# Following: https://plotly.com/ggplot2/animations/

qualityPlastic$Vessel.Total.Exports.SWT..kg.[is.na(qualityPlastic$Vessel.Total.Exports.SWT..kg.)] <- 3 # Make sure US is not size 0
p <- qualityPlastic |>
  filter(Entity %in% countries & Year > 2002 & Year < 2023) 
plot <- p|>
  ggplot(aes(x = gdp_per_capita,
             y = life_expectancy_0,
             color = Entity)) +
  scale_color_manual(values = c("China" = "red",
                                "India" = "orange",
                                "Indonesia" = "yellow",
                                "South Korea" = "green",
                                "Malaysia" = "cyan",
                                "Mexico" = "blue",
                                "Taiwan" = "purple",
                                "Thailand"="magenta",
                                "Turkey"="brown",
                                "Vietnam" = "black",
                                "United States" = "gray")) +
  geom_point(aes(size=Vessel.Total.Exports.SWT..kg., frame = Year)) +
  geom_point(data=qualityPlastic |>
  filter(Entity=="United States" & Year > 2002 & Year < 2023), aes(frame = Year))
Warning in geom_point(aes(size = Vessel.Total.Exports.SWT..kg., frame = Year)):
Ignoring unknown aesthetics: frame
Warning in geom_point(data = filter(qualityPlastic, Entity == "United States" &
: Ignoring unknown aesthetics: frame
ggplotly(plot)

While there are dots all over this plot, we can see each of our countries had a lower GDP than the small gray dot of the US. Surprisingly, though, there are a wide variety of life expectancies among the top 10 countries to take in plastic imported from the US.

Surprising Regressions!

While it may seem like large amounts of plastic imports would detriment rather than help a country’s wellbeing, there is actually a strong positive trend between kilograms of plastic imports and both GDP and life expectancy!

What do you think contributes to this relationship?

GDPRel <- lm(Vessel.Total.Exports.SWT..kg. ~ gdp_per_capita, data = qualityPlastic)
summary(GDPRel)

Call:
lm(formula = Vessel.Total.Exports.SWT..kg. ~ gdp_per_capita, 
    data = qualityPlastic)

Residuals:
       Min         1Q     Median         3Q        Max 
-3.714e+08 -2.750e+07 -1.751e+07 -1.507e+07  2.183e+10 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.182e+07  3.888e+06    3.04  0.00237 ** 
gdp_per_capita 2.247e+03  3.053e+02    7.36 1.91e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.81e+08 on 21584 degrees of freedom
  (42972 observations deleted due to missingness)
Multiple R-squared:  0.002503,  Adjusted R-squared:  0.002457 
F-statistic: 54.17 on 1 and 21584 DF,  p-value: 1.906e-13
ggplot(data = qualityPlastic, aes(x = Vessel.Total.Exports.SWT..kg., y = gdp_per_capita)) +
  geom_point() +
  geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 42972 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 42972 rows containing missing values or values outside the scale range
(`geom_point()`).

LifeRel <- lm(Vessel.Total.Exports.SWT..kg. ~ life_expectancy_0, data = qualityPlastic)
summary(GDPRel)

Call:
lm(formula = Vessel.Total.Exports.SWT..kg. ~ gdp_per_capita, 
    data = qualityPlastic)

Residuals:
       Min         1Q     Median         3Q        Max 
-3.714e+08 -2.750e+07 -1.751e+07 -1.507e+07  2.183e+10 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    1.182e+07  3.888e+06    3.04  0.00237 ** 
gdp_per_capita 2.247e+03  3.053e+02    7.36 1.91e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.81e+08 on 21584 degrees of freedom
  (42972 observations deleted due to missingness)
Multiple R-squared:  0.002503,  Adjusted R-squared:  0.002457 
F-statistic: 54.17 on 1 and 21584 DF,  p-value: 1.906e-13
ggplot(data = qualityPlastic, 
       aes(
         x = Vessel.Total.Exports.SWT..kg., 
         y = life_expectancy_0)) +
  geom_point() +
  geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 42993 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 42993 rows containing missing values or values outside the scale range
(`geom_point()`).

Coral and Plastic

Bleaching Stats

Bleaching <- read_csv("data/global_bleaching_environmental (3).csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 41361 Columns: 62
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (52): Data_Source, Ocean_Name, Reef_ID, Realm_Name, Ecoregion_Name, Cou...
dbl   (9): Site_ID, Sample_ID, Latitude_Degrees, Longitude_Degrees, Turbidit...
date  (1): Date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Bleaching <- Bleaching |>
  filter(Date_Month == "12")

Bleaching$Country_Name <- replace(Bleaching$Country_Name, Bleaching$Country_Name=="Japan", "South Korea")
# Japan and South Korea share coral reefs hence above code
Bleaching$Percent_Bleaching <- replace(Bleaching$Percent_Bleaching, Bleaching$Percent_Bleaching=="nd", "0")
countries_use <- c(
  "China", "India", "Indonesia", "South Korea", "Malaysia", "Mexico",
  "Taiwan", "Thailand", "Vietnam"
)



# explained by https://www.statology.org/r-select-rows-by-condition/
clean_bleaching <- Bleaching[Bleaching$Country_Name %in% countries_use,
                      c("Country_Name", "Percent_Bleaching", "Date_Year")]
clean_bleaching <- clean_bleaching |>
  dplyr::mutate(
    Percent_Bleaching = dplyr::na_if(Percent_Bleaching, "nd"),
    Percent_Bleaching = as.numeric(Percent_Bleaching)
  )
view(clean_bleaching)

This filters out the countries that were included in the top 10 plastic imports.

clean_bleaching |>
  ggplot(aes(
    x = Country_Name,
    y = Percent_Bleaching,
    fill = Country_Name
  )) +
  geom_col() +
  scale_fill_manual(values = c(
    "China" = "red",
    "India" = "orange",
    "Indonesia" = "yellow",
    "Malaysia" = "cyan",
    "Mexico" = "blue",
    "Taiwan" = "purple",
    "Thailand" = "magenta",
    "Vietnam" = "black",
    "South Korea" = "green"
  )) +
  ylim(0, 100) +
  labs(
    x = "Country",
    y = "Percent Bleaching",
    title = "Coral Bleaching Percent by Country"
  ) 
Warning: Removed 236 rows containing missing values or values outside the scale range
(`geom_col()`).

##Microplastics

MarineMicroplastics <- read_csv("data/Marine_Microplastics_WGS84_8553846406879449657.csv")
Rows: 22530 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (20): Ocean, Region, Subregion, Country, State, Beach Location, Marine S...
dbl (16): OBJECTID, Latitude (degree), Longitude(degree), Ocean Bottom Depth...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
countries_use <- c( "China", "India", "Indonesia", "South Korea", "Malaysia", "Mexico", "Taiwan", "Thailand", "Vietnam" ) 

MarineMicroplastics <- MarineMicroplastics %>%
  #https://www.statology.org/is-na/
  mutate(
    Country = case_when(
      is.na(Country) & Region == "Bay of Bengal"     ~ "India",
      is.na(Country) & Region == "Java Sea"          ~ "Indonesia",
      is.na(Country) & Region == "East China Sea"    ~ "South Korea",
      is.na(Country) & Region == "Malacca Strait"   ~ "Malaysia",
      is.na(Country) & Region == "South China Sea" ~ "Vietnam",
      TRUE ~ Country
    )
  )


clean_microplastics <- MarineMicroplastics[MarineMicroplastics$Country %in% countries_use, c("Country", "Region", "Concentration class range")]
view(clean_microplastics)
ggplot(clean_microplastics,
       aes(x = Country, fill = `Country`)) +
  geom_bar() +
  scale_fill_manual(values = c(
    "China" = "red",
    "India" = "orange",
    "Indonesia" = "yellow",
    "Malaysia" = "cyan",
    "Mexico" = "blue",
    "Taiwan" = "purple",
    "Thailand" = "magenta",
    "Vietnam" = "black",
    "South Korea" = "green"
  )) +
  labs(
    title = "Microplastic Concentration Class Ranges by Country",
    x = "Country",
    y = "Count",
    fill = "Country"
  )

Relation to Plastic Imports

importCoral <- left_join(x = clean_bleaching, y = plastics, by=join_by(Country_Name == Country, Date_Year == Time))
importCoral |>
  ggplot(aes(
    x=Vessel.Total.Exports.SWT..kg., 
    y=Percent_Bleaching, 
    #group=Country, 
    color=Country_Name)) +
  geom_point() +
  scale_color_manual(values = c("China" = "red", 
                                "India" = "orange", 
                                "Indonesia" = "yellow", 
                                "South Korea" = "green", 
                                "Malaysia" = "cyan", 
                                "Mexico" = "blue", 
                                "Taiwan" = "purple", 
                                "Thailand"="magenta", 
                                "Turkey"="brown", 
                                "Vietnam" = "black")) +
  labs(title="Percent Bleaching vs US Plastic Imports (2002-2019)")
Warning: Removed 37 rows containing missing values or values outside the scale range
(`geom_point()`).