My Data set: SFCN Coral Reef Monitoring Data Package Version 2 (1999-2024): CDR, Disease, Species List, and Video Datasets
Description: The coral reef monitoring dataset from the South Florida/Caribbean Network (SFCN) focuses on reefs in five national parks including Biscayne, Dry Tortugas, Virgin Islands, Buck Island, and Salt River Bay. Researchers collected information on stony coral cover, algae, sponges, gorgonians, and substrate, as well as coral species diversity, reef structure, and rugosity (surface complexity). They also tracked coral bleaching, coral disease, and the abundance of the long-spined sea urchin Diadema antillarum, since these factors strongly influence reef health. The purpose of collecting this data is to understand how reef ecosystems change over time and across different management zones, and to evaluate the effects of unusual events like hurricanes or bleaching episodes.
Rows: 48889 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (12): TripName, StartDate, Park, SiteFullName, Site, Project, Purpose, ...
dbl (9): TripID, Year, SiteLatitude, SiteLongitude, ProtocolVersion, Trans...
lgl (1): SensitiveTaxon
date (1): VerifiedDate
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Welch Two Sample t-test
data: richness_2024 and richness_all_years
t = -2.9558, df = 21.489, p-value = 0.007425
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.249403 -1.091591
sample estimates:
mean of x mean of y
16.5500 20.2205
The average species richness in 2024 (16.55 taxa per plot) was significantly lower than in other years (20.22 taxa per plot), with a p‑value of 0.007 (assuming our significance level is 0.05) indicating a reliable difference of about 1 to 6 fewer taxa in 2024.
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
ggplot(taxon_year_counts, aes(x = Year, y = Count, color = TaxonScientificName)) +geom_line(size =1) +labs(title ="Individuals per Taxon Over Time Across All Sites",x ="Year",y ="Number of Individuals",color ="Taxon") +theme_minimal() +theme(legend.position ="right",legend.title =element_text(face ="bold"),axis.text.x =element_text(angle =45, hjust =1))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
A Better View of the Graph:
My year of focus is 2024. The reason why theres a huge dip in the general graph in 2020 for all species is because they didn’t sample all of the plots that they were sampling in the first couple of years, so the data is incomplete. This dip in 2020 is primarily due to COVID-19 pandemic and the lack of work during the period due to quarantine. In the years following 2020, the number of plots being sampled is still less than the number of plots sampled in the first couple of years but there is an upward trend.
Individuals Per Taxon Per Site 2024:
library(dplyr)library(ggplot2)taxon_site_2024 <- CoralData %>%filter(Year ==2024) %>%group_by(SiteFullName, TaxonScientificName) %>%summarise(Count =n(), .groups ="drop")ggplot(taxon_site_2024, aes(x = SiteFullName, y = Count, fill = TaxonScientificName)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Individuals per Taxon per Site (2024)",x ="Site",y ="Number of Individuals",fill ="Taxon") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),legend.position ="bottom")
# A tibble: 6 × 2
SiteFullName num_taxa
<chr> <int>
1 Haulover 25
2 Newfound 25
3 Salt River 25
4 Yawzi 25
5 Tektite 23
6 South Fore Reef 21
Now that we know the sites with the greatest amount of taxa for 2024 we want to compare these sites to see how similar or different their species compositions are to eachother. To do that we will calculate the average Sorensen dissimilarity index across these six sites.
Average Sorensen Dissimilarity Index across the 6 sites with the Greatest Biodiversity (2024):
We got a value of 0.127 indicating that our sites share a lot of their taxa. Although they are sites which show the largest amount of biodiversity across all sites, they share the same kinds of species.
Average Individuals per Species per Site Across All Years:
ggplot(avg_counts, aes(x = SiteFullName, y = AvgIndividuals, fill = TaxonScientificName)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Average Individuals per Species per Site Across All Years",x ="Site",y ="Average Individuals",fill ="Taxon") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1),legend.position ="bottom")
So overall the most abundant species per site per year is Orbicella annularis with about 15 individuals. This is consistent with 2024’s results as well. But there’s an explanation for this. Orbicella annularis is the most abundant coral species in South Florida and Caribbean reefs because it is a dominant reef-builder that creates large, massive colonies forming the backbone of reef structures. Its ability to adapt to different depths, currents, and light conditions allows it to thrive across diverse reef zones (de Matas, T.-J. ,2016). Compared to more vulnerable species like Acropora palmata or Dendrogyra cylindrus, historically Orbicella annularis has shown greater resilience to disease and bleaching, helping it maintain high population levels (de Matas, T.-J. ,2016). These reefs fall within its natural geographic range, making it consistently present in monitoring datasets across all years. As a keystone species, its established colonies and ecological importance explain why it remains the most abundant even in 2024.
Our average Shannon diversity value per site demonstrates that our biodiversity is more moderate. A Shannon Diversity index value of 2.57 is “moderate” because it indicates a community with a fair number of species and some balance, but not the highest possible richness or evenness.
Simpson Eveness Per Site(2024)
simpson_diversity <-diversity(species_matrix, index ="simpson")print(simpson_diversity)
Since our Simpsons Index value average is 0.915 that means that most of our Simpson values are very close to 1 so that means we have low species diversity and high dominance.
0.084 is the probability that two randomly selected individuals from each site will be different from eachother indicating that we have low species diversity and high dominance within sites.
Justification for Not Using Jaccard, Euclidian, or Bray-Curtis Metrics:
For my specific Data Set I don’t believe using Jaccard, Euclidian, or Bray-Curtis metrics would be useful. These indices emphasize differences in species presence or abundance across sites, but my data show consistently low diversity and high dominance. Because most sites are dominated by only a few taxa, Jaccard values would be artificially high and fail to capture meaningful variation. Euclidean distance would mostly reflect differences in absolute counts of dominant taxa rather than true ecological differences. Similarly, Bray–Curtis would highlight shifts in dominance patterns rather than overall community structure, so measures like Simpsons, Shannon Diversity, and Sorensen dissimilarity index are more appropriate for interpreting this dataset. Also when i was trying to use euclidian distance I was getting really huge values per site which were non representative because the average number of species per site is around 15 and those values were in the hundreds. I will admit that part of that is due to human error, but generally for the information outlined on the rubric the metrics used suffice.
What overall hypotheses or conclusions can you draw from your analysis of this dataset?
The dataset analysis shows that reef biodiversity in 2024 experienced a statistically significant decline in species richness, dropping to an average of 16.55 taxa per plot compared to 20.22 in prior years. Despite this loss, overall diversity remained moderate, with a Shannon Index of 2.57 indicating a fair but not maximal balance of species. The reef’s structural resilience is maintained by the dominance of Orbicella annularis, a keystone species that continues to anchor the ecosystem and resist disease and bleaching. Among the most biodiverse sites, species composition was highly similar, reflected in a low Sorensen dissimilarity index of 0.127, suggesting widespread overlap in taxa. Nevertheless, due to sampling limitations, especially the reduced effort in 2020 due to the COVID-19 pandemic and fewer sites surveyed in 2024, our year to year comparisons might look a bit different than how they could be.
Citations:
de Matas, T.-J. (2016). Orbicella annularis – Boulder star coral. The Online Guide to the Animals of Trinidad and Tobago, Department of Life Sciences, University of the West Indies. Retrieved from https://sta.uwi.edu/fst/lifesciences/sites/default/files/lifesciences/documents/ogatt/Orbicella_annularis%20-%20Boulder%20Star%20Coral.pdf
National Park Service. (2024). SFCN coral reef monitoring data package, version 2 (1999–2024): CDR disease, species list, and video transects. Data.gov. . Retrieved from https://catalog.data.gov/dataset/sfcn-coral-reef-monitoring-data-package-version-2-1999-2024-cdr-disease-species-list-and-v