# Load libraries
library(tidyverse)
library(leaflet)
library(ggfortify)
library(plotly)
library(highcharter)
# Read data
setwd("~/DATA 110")
bleaching <- read_csv("global_bleaching_environmental.csv")Global Coral Bleaching
Introduction & Background Information
This project explores the Biological and Chemical Oceanography Data Management Office’s (BCO-DMO) global coral bleaching data base. Coral bleaching is when corals become stressed and expel the algae that lives in them and with which they have a symbiotic relationship. Changing conditions can cause this to happen; including changes in temperature, light, and nutrients (Source 2). This data set includes the percentage of an identified group of coral that is bleached, along with other factors about the area. These factors include temperature, sea surface temperature anomaly, date of measurement, exposure, turbidity, location, and more. This data is a compilation from seven different sources and is identified in the data_source variable. The creators of this data set cleaned and standardized the contributions of their sources and then added some data including distance to land and cyclone frequency (Source 3). I struggled to find a dataset that was acceptable for this project so after searching for a long time I was happy to find this one. In addition to it having the requirements for the project I chose this dataset because I wanted to make a map (it has latitude and longitude variables) and because I am interested in environmental data. This dataset needed some cleaning, it used the string “nd” rather than NA for missing values to I had to convert them. Due to this string many numeric variables were interpenetrated as character values, so I used mutate_at to fix the variable type. I also made the variable names lowercase to make them easier to work with.
Libraries & Data
Data Cleaning
# Replace nd with NA and copy data into new clean data frame (Source 4)
bleaching2 <- data.frame(lapply(bleaching, function(x){
gsub("nd", NA, x)
}))
# Make column names lowercase
names(bleaching2) <- tolower(names(bleaching2))
# Convert proper columns to numerical variables (Source 5)
bleaching2 <- bleaching2 |>
mutate_at(vars(latitude_degrees, longitude_degrees, distance_to_shore, turbidity, cyclone_frequency, date_day, date_month, date_year, depth_m, percent_cover, percent_bleaching, climsst, temperature_kelvin, temperature_mean, temperature_minimum, temperature_maximum, temperature_kelvin_standard_deviation, windspeed, ssta, ssta_standard_deviation, ssta_mean, ssta_minimum, ssta_maximum, ssta_frequency, ssta_frequency_standard_deviation, ssta_frequencymax, ssta_frequencymean, ssta_dhw, ssta_dhw_standard_deviation, ssta_dhwmax, ssta_dhwmean, tsa, tsa_standard_deviation, tsa_minimum, tsa_maximum, tsa_mean), as.numeric)
# Make sure all observations include bleaching
bleaching2 <- bleaching2 |>
filter(!is.na(percent_bleaching))Linear Regression
# Model
mod <- lm(percent_bleaching ~ country_name + ocean_name + exposure + date_month + depth_m + temperature_kelvin + temperature_minimum + temperature_maximum + ssta_minimum + ssta_frequency + ssta_frequencymean + ssta_dhw + longitude_degrees + latitude_degrees, data = bleaching2)
summary(mod)
Call:
lm(formula = percent_bleaching ~ country_name + ocean_name +
exposure + date_month + depth_m + temperature_kelvin + temperature_minimum +
temperature_maximum + ssta_minimum + ssta_frequency + ssta_frequencymean +
ssta_dhw + longitude_degrees + latitude_degrees, data = bleaching2)
Residuals:
Min 1Q Median 3Q Max
-64.377 -6.514 -2.083 2.235 97.754
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value
(Intercept) -613.62385 36.31695 -16.896
country_nameBahamas 20.07060 2.85358 7.033
country_nameBarbados 20.83533 3.17760 6.557
country_nameBelize 12.20722 2.80007 4.360
country_nameBermuda 23.19369 3.81250 6.084
country_nameBrazil -2.84615 2.31493 -1.229
country_nameBrunei -1.13899 2.00394 -0.568
country_nameCambodia 9.78313 1.90080 5.147
country_nameChile 84.66719 15.18558 5.575
country_nameChina 16.41366 1.85149 8.865
country_nameColombia 11.22013 2.79946 4.008
country_nameCosta Rica 22.69047 3.33338 6.807
country_nameCuba 12.26630 2.81851 4.352
country_nameDjibouti 13.89439 4.36073 3.186
country_nameDominica 21.03332 2.98254 7.052
country_nameDominican Republic 12.46203 2.82653 4.409
country_nameEast Timor -3.07915 2.73980 -1.124
country_nameEcuador 35.13612 5.45693 6.439
country_nameEgypt 19.01531 2.24592 8.467
country_nameFederated States of Micronesia 0.56888 1.88479 0.302
country_nameFiji -4.10596 0.71720 -5.725
country_nameFrance 9.65179 2.84714 3.390
country_nameFrench Polynesia 5.09022 2.22557 2.287
country_nameGrenada 5.70410 3.07667 1.854
country_nameGuatemala 11.12888 5.97154 1.864
country_nameHaiti 4.88167 2.95616 1.651
country_nameIran 16.77229 3.36976 4.977
country_nameJamaica 10.09284 2.78372 3.626
country_nameJapan 32.24098 2.09349 15.401
country_nameJordan 22.73794 4.92163 4.620
country_nameKiribati 6.39640 7.63197 0.838
country_nameKuwait 17.51272 7.82932 2.237
country_nameMalaysia 2.89600 1.35903 2.131
country_nameMartinique 24.99494 4.01943 6.219
country_nameMexico 29.64111 2.82091 10.508
country_nameMontserrat 7.13699 4.02782 1.772
country_nameMyanmar 8.33343 2.59995 3.205
country_nameNew Caledonia -5.55949 0.86784 -6.406
country_nameNicaragua 5.80774 5.94482 0.977
country_nameOman 13.32574 2.26712 5.878
country_namePalau 12.87995 3.24423 3.970
country_namePanama 12.60334 2.71721 4.638
country_namePapua New Guinea 2.90364 1.79728 1.616
country_namePhilippines 8.09123 1.56843 5.159
country_nameSaint Lucia 8.97886 2.86230 3.137
country_nameSao Tome & Principe -3.22641 3.20984 -1.005
country_nameSaudi Arabia 19.88694 3.30816 6.011
country_nameSudan 18.17439 3.11630 5.832
country_nameTaiwan 13.02591 2.01636 6.460
country_nameTonga 4.57158 10.90052 0.419
country_nameUnited Arab Emirates 21.18400 3.07792 6.883
country_nameUnited Kingdom 29.78273 2.88367 10.328
country_nameUnited States 26.07635 2.74994 9.483
country_nameVanuatu -3.31815 1.45305 -2.284
country_nameVenezuela 2.29802 2.97669 0.772
country_nameVietnam 10.80267 1.67104 6.465
country_nameYemen 11.80272 4.68499 2.519
ocean_nameAtlantic 8.97962 1.26216 7.115
ocean_namePacific NA NA NA
ocean_nameRed Sea NA NA NA
exposureSheltered 0.36834 0.24220 1.521
exposureSometimes 2.74749 0.37643 7.299
date_month 0.24934 0.03528 7.067
depth_m 0.30738 0.02454 12.523
temperature_kelvin 0.64095 0.06103 10.503
temperature_minimum 0.51431 0.09706 5.299
temperature_maximum 0.83851 0.10432 8.038
ssta_minimum -1.42894 0.22022 -6.489
ssta_frequency 0.49446 0.01984 24.925
ssta_frequencymean -1.07811 0.06144 -17.547
ssta_dhw 0.48583 0.02760 17.605
longitude_degrees 0.02782 0.00697 3.992
latitude_degrees -0.49979 0.04160 -12.015
Pr(>|t|)
(Intercept) < 2e-16 ***
country_nameBahamas 2.06e-12 ***
country_nameBarbados 5.59e-11 ***
country_nameBelize 1.31e-05 ***
country_nameBermuda 1.19e-09 ***
country_nameBrazil 0.218905
country_nameBrunei 0.569786
country_nameCambodia 2.67e-07 ***
country_nameChile 2.49e-08 ***
country_nameChina < 2e-16 ***
country_nameColombia 6.14e-05 ***
country_nameCosta Rica 1.02e-11 ***
country_nameCuba 1.35e-05 ***
country_nameDjibouti 0.001443 **
country_nameDominica 1.80e-12 ***
country_nameDominican Republic 1.04e-05 ***
country_nameEast Timor 0.261083
country_nameEcuador 1.22e-10 ***
country_nameEgypt < 2e-16 ***
country_nameFederated States of Micronesia 0.762784
country_nameFiji 1.05e-08 ***
country_nameFrance 0.000700 ***
country_nameFrench Polynesia 0.022195 *
country_nameGrenada 0.063752 .
country_nameGuatemala 0.062382 .
country_nameHaiti 0.098678 .
country_nameIran 6.49e-07 ***
country_nameJamaica 0.000289 ***
country_nameJapan < 2e-16 ***
country_nameJordan 3.86e-06 ***
country_nameKiribati 0.401979
country_nameKuwait 0.025307 *
country_nameMalaysia 0.033104 *
country_nameMartinique 5.09e-10 ***
country_nameMexico < 2e-16 ***
country_nameMontserrat 0.076419 .
country_nameMyanmar 0.001351 **
country_nameNew Caledonia 1.52e-10 ***
country_nameNicaragua 0.328607
country_nameOman 4.21e-09 ***
country_namePalau 7.20e-05 ***
country_namePanama 3.53e-06 ***
country_namePapua New Guinea 0.106198
country_namePhilippines 2.50e-07 ***
country_nameSaint Lucia 0.001709 **
country_nameSao Tome & Principe 0.314828
country_nameSaudi Arabia 1.86e-09 ***
country_nameSudan 5.54e-09 ***
country_nameTaiwan 1.06e-10 ***
country_nameTonga 0.674933
country_nameUnited Arab Emirates 6.01e-12 ***
country_nameUnited Kingdom < 2e-16 ***
country_nameUnited States < 2e-16 ***
country_nameVanuatu 0.022405 *
country_nameVenezuela 0.440118
country_nameVietnam 1.03e-10 ***
country_nameYemen 0.011766 *
ocean_nameAtlantic 1.15e-12 ***
ocean_namePacific NA
ocean_nameRed Sea NA
exposureSheltered 0.128314
exposureSometimes 2.99e-13 ***
date_month 1.62e-12 ***
depth_m < 2e-16 ***
temperature_kelvin < 2e-16 ***
temperature_minimum 1.17e-07 ***
temperature_maximum 9.51e-16 ***
ssta_minimum 8.80e-11 ***
ssta_frequency < 2e-16 ***
ssta_frequencymean < 2e-16 ***
ssta_dhw < 2e-16 ***
longitude_degrees 6.58e-05 ***
latitude_degrees < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.06 on 26824 degrees of freedom
(7620 observations deleted due to missingness)
Multiple R-squared: 0.2958, Adjusted R-squared: 0.294
F-statistic: 161 on 70 and 26824 DF, p-value: < 2.2e-16
# Diagnostic plots
autoplot(mod, 1:4, nrow=2, ncol=2)Warning: Removed 10 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).

Equation
percent_bleaching = 20.07060(Bahamas) + 20.83533(Barbados) + 12.20722(Belize) + 23.19369(Bermuda) - 2.84615(Brazil) - 1.13899(Brunei) + 9.78313(Cambodia) + 84.66719(Chile) + 16.41366(China) + 11.22013(Colombia) + 22.69047(Costa Rica) + 12.26630(Cuba) + 13.89439(Djibouti) + 21.03332(Dominica) + 12.46203(Dominican Republic) - 3.07915(Timor) + 35.13612(Ecuador) + 19.01531(Egypt) + 0.56888(Federated States of Micronesia) - 4.10596(Fiji) + 9.65179(France) + 5.09022(French Polynesia) + 5.70410(Grenada) + 11.12888(Guatemala) + 4.88167(Haiti) + 16.77229(Jamaica) + 32.24098(Japan) + 22.73794(Jordan) + 6.39640(Kiribati) + 17.51272(Kuwait) + 2.89600(Malaysia) + 24.99494(Martinique) + 29.64111(Mexico) + 7.13699(Montserrat) + 8.33343(Myanmar) - 5.55949(New Caledonia) + 5.80774(Nicaragua) + 13.32574(Oman) + 12.87995(Palau) 12.60334(Panama) + 2.90364(Papua New Guinea) + 8.09123(Philippines) + 8.97886(Saint Lucia) - 3.22641(Sao Tome & Principe) + 19.88694(Saudi Arabia) + 18.17439(Sudan) + 13.02591(Taiwan) + 4.57158(Tonga) + 21.18400(United Arab Emirates + 29.78273(United Kingdom) + 26.07635(United States) - 3.31815(Vanuatu) + 2.29802(Venezuela) + 10.80267(Vietnam) + 11.80272(Yemen) + 8.97962(Atlantic) + 0.36834(Sheltered) + 2.74749(Sometimes) + 0.24934(month) + 0.30738(depth) + 0.64095(Temperature) + 0.51431(Temperature min) + 0.83851(Temperature max) - 1.42894(ssta min) + 0.49446(ssta frequency) - 1.07811(ssta frequency mean) + 0.48583(ssta dhw) + 0.02782(longitude) - 0.49979(latitude) - 613.62385
Analysis
This model attempts to predict the percentage of bleached coral in each given observation based on several environmental factors. All the p-values for numerical variables are < 0.001 making them very significant for the model. Each categorical variable has at least one factor with p < 0.001. The model shows some trends such as higher temperatures, deeper water, sea surface temperature anomaly frequency all correlating with higher bleaching percentages. The adjusted R^2 is 0.294 which means that the model was able to explain 29.4% of the variability in the data. The model’s accuracy being so low could suggest that the data is missing one or more significant predictors and/or that a linear model isn’t appropriate. The diagnostic plots support the idea that a linear model may not be appropriate because the residuals form a strange square pattern, and the qq plot is not at all linear looking more like part of a logistic curve.
Global Coral Reef Bleaching Map
# Make color palette (Source 6)
cols <- colorNumeric(palette = c("#FF007B", "#8777B9", "#00FBFF", "#FFFFFF"), domain = bleaching2$percent_bleaching)
# Make tooltip
tooltip <- paste0("<b>Percent Bleaching: </b>", bleaching2$percent_bleaching, "<b>%</b>", "<br>",
"<b>Date: </b>", bleaching2$date, "<br>")
# Final Map
leaflet() |>
setView(lng = 0, lat = 0, zoom = 1) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(data = bleaching2, lat = bleaching2$latitude_degrees, lng = bleaching2$longitude_degrees,
color = ~cols(bleaching2$percent_bleaching), radius = 80000, popup = tooltip) |>
addLegend(pal = cols, values = bleaching2$percent_bleaching, group = "circles", position = "bottomleft", title = "Percent Bleaching") |> # (Source 7)
addControl(position = "bottomright", html = "Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)") |> # (Source 8)
addControl(position = "topright", html = "<b>Global Coral Reef Bleaching</b>")Analysis
This map shows the locations of each observation with each circle being colored by the percent of bleached coral. The tooltip shows the exact bleaching percentage along with the date the data was recorded. This map shows that while there is a lot of variability in how much coral is bleached even between sties that are very close to each other, there are still some locations that in general experience less recorded bleaching than others (north east African coastline has far fewer recorded bleaching events when compared to the reefs near Madagascar). I would like it if the size of the bubbles shrank as the user zoomed in on part of the map to make it easier to see overlapping circles.
Line Graph
# Set colors
cols <- c("#8777B9", "#FF007B")
# Dataset of only eastern hemisphere
east <- bleaching2 |>
filter(longitude_degrees > 0)
# Make trend line data by finding averages for each month
eastByMonth <- data.frame(month = c("Jan", "Feb", "March", "April", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"))
for(i in 1:12){ # (Source 9)
avg <- east |>
filter(date_month == i & !is.na(temperature_kelvin))
eastByMonth$temp[i] <- mean(avg$temperature_kelvin)
}
for(i in 1:12){
avg <- east |>
filter(date_month == i)
eastByMonth$bleach[i] <- mean(avg$percent_bleaching)
}
# Final graph for the east
highchart() |>
hc_yAxis_multiples(
list(title = list(text = "Percent Bleaching")),
list(title = list(text = "Temperature (kelvin)"),
opposite = TRUE)) |>
hc_add_series(data = eastByMonth$bleach,
name = "Percent Bleaching", type = "line", yAxis = 0) |>
hc_add_series(data = eastByMonth$temp,
name = "Temperature (kelvin)", type = "line", yAxis = 1) |>
hc_xAxis(categories = eastByMonth$month, categoryorder = "category ascending") |>
hc_chart(style = list(fontFamily = "AvantGarde",
fontWeight = "bold")) |>
hc_colors(cols) |>
hc_tooltip(shared = TRUE) |>
hc_title(text = "Eastern Hemisphere Coral Bleaching & Temperature by Month") |>
hc_caption(text = "Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)")# Dataset of only western hemisphere
west <- bleaching2 |>
filter(longitude_degrees < 0)
# Make trendline data by finding averages for each month
westByMonth <- data.frame(month = c("Jan", "Feb", "March", "April", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"))
for(i in 1:12){
avg <- west |>
filter(date_month == i & !is.na(temperature_kelvin))
westByMonth$temp[i] <- mean(avg$temperature_kelvin)
}
for(i in 1:12){
avg <- west |>
filter(date_month == i)
westByMonth$bleach[i] <- mean(avg$percent_bleaching)
}
# Final graph for the west
highchart() |>
hc_yAxis_multiples(
list(title = list(text = "Percent Bleaching")),
list(title = list(text = "Temperature (kelvin)"),
opposite = TRUE)) |>
hc_add_series(data = westByMonth$bleach,
name = "Percent Bleaching", type = "line", yAxis = 0) |>
hc_add_series(data = westByMonth$temp,
name = "Temperature (kelvin)", type = "line", yAxis = 1) |>
hc_xAxis(categories = westByMonth$month, categoryorder = "category ascending") |>
hc_chart(style = list(fontFamily = "AvantGarde",
fontWeight = "bold")) |>
hc_colors(cols) |>
hc_tooltip(shared = TRUE) |> #, pointFormat = "{point.bleach:.2f}: {point.temp:.2f}") |>
hc_title(text = "Western Hemisphere Coral Bleaching & Temperature by Month") |>
hc_caption(text = "Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)")Analysis
These graphs show the average temperature and average percent bleaching (one for the eastern hemisphere and one for the west). I separated the east and west to better control for temperature differences. I was hoping the bleaching would line up with the temperature to show a cause of bleaching and it’s yearly cycle. While existent in the graphs these trends weren’t as stark as I had hoped (and expected). A temperature vs bleaching graph may have been better at telling that story. Regardless these graphs definitely show the variability of bleaching based on the time of year. It also appears that bleaching is more prevalent in the western hemisphere as it has a max average bleaching of around 22% while the east had a short lived max of around 15%. I wish I had been able to round the values in the tool tip but I couldn’t get the code in the notes to work for my graphs.
Data Collection Graph
p <- bleaching2 |>
ggplot(aes(date_year, group = date_month, fill = date_month)) +
geom_area(stat = "count") +
scale_fill_gradient(high = c("#8777B9", "#FF007B"), low = c("#FFFFFF", "#00FBFF")) +
theme_dark() +
geom_vline(xintercept = 2005, color = "#f7f9fc", size = 1) +
geom_vline(xintercept = 2016, color = "#ffe3fd", size = 1) +
geom_vline(xintercept = 1998, color = "#b9adde", size = 1) +
labs(title = "Coral Bleaching Yearly Data Collection", caption = "Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)", fill = "Month") +
xlab("Year") +
ylab("Total Observations") +
geom_text(aes(x = 1999.2, y = 3400, label = "2005 Caribbean:\n80% corals bleached\n >40% corals killed"), color = "#f7f9fc", cex=3.5) + # (Source 10)
geom_text(aes(x = 2011.4, y = 2500, label = "2014-2017 Global:\nMost damaging &\nlongest bleaching\nevent"), color = "#ffe3fd", cex=3.5) + # (Source 11)
geom_text(aes(x = 1992, y = 1700, label = "145 total observations\nbefore 1998"), color = "#b9adde", cex=3.5)Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p
#ggplotly(p)Analysis
This area plot shows how much data was collected each year and in which month is was collected. The oldest observation is from 1980 while the most resent one added to this data set was recorded in 2020. This plot shows that bleaching measurements started to be taken in higher volumes starting in 1998 and then really picked up in 2005 aligning with a record bleaching crisis in the Caribbean. The data seems to be fairly evenly distributed between it’s months of collection. Since there are so many observations plotly takes several minutes to load and then runs slowly once it does. While I can’t be certain that the collection trends of this data are reflective of global collection it is cool to see collection increase when there are major bleaching events.
Sources
https://www.nationalgeographic.com/science/article/coral-bleaching-reefs-climate-change-el-nino-environment
https://oceanservice.noaa.gov/facts/coral_bleach.html
https://www.bco-dmo.org/dataset/773466/data/view
https://stackoverflow.com/questions/29271549/replace-all-occurrences-of-a-string-in-a-data-frame
https://www.geeksforgeeks.org/convert-multiple-columns-to-numeric-using-dplyr/
https://cssgradient.io/
https://rstudio.github.io/leaflet/reference/addLegend.html
R documentation addControl
https://www.w3schools.com/r/r_for_loop.asp
https://www.climate.gov/news-features/featured-images/coral-bleaching-could-be-severe-2005-event
https://www.theguardian.com/environment/2025/apr/23/coral-reef-bleaching-worst-global-event-on-record
