This project explores the Biological and Chemical Oceanography Data Management Office’s (BCO-DMO) global coral bleaching data base. Coral bleaching is when corals become stressed and expel the algae that lives in them and with which they have a symbiotic relationship. Changing conditions can cause this to happen; including changes in temperature, light, and nutrients (Source 2). This data set includes the percentage of an identified group of coral that is bleached, along with other factors about the area. These factors include temperature, sea surface temperature anomaly, date of measurement, exposure, turbidity, location, and more. This data is a compilation from seven different sources and is identified in the data_source variable. The creators of this data set cleaned and standardized the contributions of their sources and then added some data including distance to land and cyclone frequency (Source 3). I struggled to find a dataset that was acceptable for this project so after searching for a long time I was happy to find this one. In addition to it having the requirements for the project I chose this dataset because I wanted to make a map (it has latitude and longitude variables) and because I am interested in environmental data. This dataset needed some cleaning, it used the string “nd” rather than NA for missing values to I had to convert them. Due to this string many numeric variables were interpenetrated as character values, so I used mutate_at to fix the variable type. I also made the variable names lowercase to make them easier to work with.
Libraries & Data
# Load librarieslibrary(tidyverse)
Warning: package 'ggplot2' was built under R version 4.4.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
Warning: package 'leaflet' was built under R version 4.4.3
library(ggfortify)
Warning: package 'ggfortify' was built under R version 4.4.2
library(plotly)
Warning: package 'plotly' was built under R version 4.4.3
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(highcharter)
Warning: package 'highcharter' was built under R version 4.4.3
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 41361 Columns: 62
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (52): Data_Source, Ocean_Name, Reef_ID, Realm_Name, Ecoregion_Name, Cou...
dbl (9): Site_ID, Sample_ID, Latitude_Degrees, Longitude_Degrees, Turbidit...
date (1): Date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data Cleaning
# Replace nd with NA and copy data into new clean data frame (Source 4)bleaching2 <-data.frame(lapply(bleaching, function(x){gsub("nd", NA, x)}))# Make column names lowercasenames(bleaching2) <-tolower(names(bleaching2))# Convert proper columns to numerical variables (Source 5)bleaching2 <- bleaching2 |>mutate_at(vars(latitude_degrees, longitude_degrees, distance_to_shore, turbidity, cyclone_frequency, date_day, date_month, date_year, depth_m, percent_cover, percent_bleaching, climsst, temperature_kelvin, temperature_mean, temperature_minimum, temperature_maximum, temperature_kelvin_standard_deviation, windspeed, ssta, ssta_standard_deviation, ssta_mean, ssta_minimum, ssta_maximum, ssta_frequency, ssta_frequency_standard_deviation, ssta_frequencymax, ssta_frequencymean, ssta_dhw, ssta_dhw_standard_deviation, ssta_dhwmax, ssta_dhwmean, tsa, tsa_standard_deviation, tsa_minimum, tsa_maximum, tsa_mean), as.numeric)# Make sure all observations include bleachingbleaching2 <- bleaching2 |>filter(!is.na(percent_bleaching))
This model attempts to predict the percentage of bleached coral in each given observation based on several environmental factors. All the p-values for numerical variables are < 0.001 making them very significant for the model. Each categorical variable has at least one factor with p < 0.001. The model shows some trends such as higher temperatures, deeper water, sea surface temperature anomaly frequency all correlating with higher bleaching percentages. The adjusted R^2 is 0.294 which means that the model was able to explain 29.4% of the variability in the data. The model’s accuracy being so low could suggest that the data is missing one or more significant predictors and/or that a linear model isn’t appropriate. The diagnostic plots support the idea that a linear model may not be appropriate because the residuals form a strange square pattern, and the qq plot is not at all linear looking more like part of a logistic curve.
Global Coral Reef Bleaching Map
Make color palette (Source 6)
cols <- colorNumeric(palette = c("#FF007B", "#8777B9", "#00FBFF", "#FFFFFF"), domain = bleaching2$percent_bleaching)
Make tooltip
tooltip <- paste0("<b>Percent Bleaching: </b>", bleaching2$percent_bleaching, "<b>%</b>", "<br>",
"<b>Date: </b>", bleaching2$date, "<br>")
Final Map
leaflet() |>
setView(lng = 0, lat = 0, zoom = 1) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(data = bleaching2, lat = bleaching2$latitude_degrees, lng = bleaching2$longitude_degrees,
color = ~cols(bleaching2$percent_bleaching), radius = 80000, popup = tooltip) |>
addLegend(pal = cols, values = bleaching2$percent_bleaching, group = "circles", position = "bottomleft", title = "Percent Bleaching") |> # (Source 7)
addControl(position = "bottomright", html = "Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)") |> # (Source 8)
addControl(position = "topright", html = "<b>Global Coral Reef Bleaching</b>")
Analysis
This map shows the locations of each observation with each circle being colored by the percent of bleached coral. The tooltip shows the exact bleaching percentage along with the date the data was recorded. This map shows that while there is a lot of variability in how much coral is bleached even between sties that are very close to each other, there are still some locations that in general experience less recorded bleaching than others (north east African coastline has far fewer recorded bleaching events when compared to the reefs near Madagascar). I would like it if the size of the bubbles shrank as the user zoomed in on part of the map to make it easier to see overlapping circles.
Line Graph
# Set colorscols <-c("#8777B9", "#FF007B")# Dataset of only eastern hemisphereeast <- bleaching2 |>filter(longitude_degrees >0)# Make trend line data by finding averages for each montheastByMonth <-data.frame(month =c("Jan", "Feb", "March", "April", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"))for(i in1:12){ # (Source 9) avg <- east |>filter(date_month == i &!is.na(temperature_kelvin)) eastByMonth$temp[i] <-mean(avg$temperature_kelvin)}for(i in1:12){ avg <- east |>filter(date_month == i) eastByMonth$bleach[i] <-mean(avg$percent_bleaching)}# Final graph for the easthighchart() |>hc_yAxis_multiples(list(title =list(text ="Percent Bleaching")),list(title =list(text ="Temperature (kelvin)"),opposite =TRUE)) |>hc_add_series(data = eastByMonth$bleach, name ="Percent Bleaching", type ="line", yAxis =0) |>hc_add_series(data = eastByMonth$temp, name ="Temperature (kelvin)", type ="line", yAxis =1) |>hc_xAxis(categories = eastByMonth$month, categoryorder ="category ascending") |>hc_chart(style =list(fontFamily ="AvantGarde",fontWeight ="bold")) |>hc_colors(cols) |>hc_tooltip(shared =TRUE) |>hc_title(text ="Eastern Hemisphere Coral Bleaching & Temperature by Month") |>hc_caption(text ="Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)")
# Dataset of only western hemispherewest <- bleaching2 |>filter(longitude_degrees <0)# Make trendline data by finding averages for each monthwestByMonth <-data.frame(month =c("Jan", "Feb", "March", "April", "May", "June", "July", "Aug", "Sept", "Oct", "Nov", "Dec"))for(i in1:12){ avg <- west |>filter(date_month == i &!is.na(temperature_kelvin)) westByMonth$temp[i] <-mean(avg$temperature_kelvin)}for(i in1:12){ avg <- west |>filter(date_month == i) westByMonth$bleach[i] <-mean(avg$percent_bleaching)}# Final graph for the westhighchart() |>hc_yAxis_multiples(list(title =list(text ="Percent Bleaching")),list(title =list(text ="Temperature (kelvin)"),opposite =TRUE)) |>hc_add_series(data = westByMonth$bleach, name ="Percent Bleaching", type ="line", yAxis =0) |>hc_add_series(data = westByMonth$temp, name ="Temperature (kelvin)", type ="line", yAxis =1) |>hc_xAxis(categories = westByMonth$month, categoryorder ="category ascending") |>hc_chart(style =list(fontFamily ="AvantGarde",fontWeight ="bold")) |>hc_colors(cols) |>hc_tooltip(shared =TRUE) |>#, pointFormat = "{point.bleach:.2f}: {point.temp:.2f}") |>hc_title(text ="Western Hemisphere Coral Bleaching & Temperature by Month") |>hc_caption(text ="Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)")
Analysis
These graphs show the average temperature and average percent bleaching (one for the eastern hemisphere and one for the west). I separated the east and west to better control for temperature differences. I was hoping the bleaching would line up with the temperature to show a cause of bleaching and it’s yearly cycle. While existent in the graphs these trends weren’t as stark as I had hoped (and expected). A temperature vs bleaching graph may have been better at telling that story. Regardless these graphs definitely show the variability of bleaching based on the time of year. It also appears that bleaching is more prevalent in the western hemisphere as it has a max average bleaching of around 22% while the east had a short lived max of around 15%. I wish I had been able to round the values in the tool tip but I couldn’t get the code in the notes to work for my graphs.
Data Collection Graph
p <- bleaching2 |>ggplot(aes(date_year, group = date_month, fill = date_month)) +geom_area(stat ="count") +scale_fill_gradient(high =c("#8777B9", "#FF007B"), low =c("#FFFFFF", "#00FBFF")) +theme_dark() +geom_vline(xintercept =2005, color ="#f7f9fc", size =1) +geom_vline(xintercept =2016, color ="#ffe3fd", size =1) +geom_vline(xintercept =1998, color ="#b9adde", size =1) +labs(title ="Coral Bleaching Yearly Data Collection", caption ="Source: Biological and Chemical Oceanography Data Management Office (BCO-DMO)", fill ="Month") +xlab("Year") +ylab("Total Observations") +geom_text(aes(x =1999.2, y =3400, label ="2005 Caribbean:\n80% corals bleached\n >40% corals killed"), color ="#f7f9fc", cex=3.5) +# (Source 10)geom_text(aes(x =2011.4, y =2500, label ="2014-2017 Global:\nMost damaging &\nlongest bleaching\nevent"), color ="#ffe3fd", cex=3.5) +# (Source 11)geom_text(aes(x =1992, y =1700, label ="145 total observations\nbefore 1998"), color ="#b9adde", cex=3.5)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p
#ggplotly(p)
Analysis
This area plot shows how much data was collected each year and in which month is was collected. The oldest observation is from 1980 while the most resent one added to this data set was recorded in 2020. This plot shows that bleaching measurements started to be taken in higher volumes starting in 1998 and then really picked up in 2005 aligning with a record bleaching crisis in the Caribbean. The data seems to be fairly evenly distributed between it’s months of collection. Since there are so many observations plotly takes several minutes to load and then runs slowly once it does. While I can’t be certain that the collection trends of this data are reflective of global collection it is cool to see collection increase when there are major bleaching events.