knitr::opts_chunk$set(
project_root <- normalizePath("~/Hunter/S26/GTECH-385/final-project", mustWork = TRUE),
knitr::opts_knit$set(root.dir = project_root),
echo = TRUE,
message = FALSE,
warning = FALSE,
fig.width = 9,
fig.height = 5.5,
dpi = 300
)
library(terra)
library(sf)
library(dplyr)
library(tidyr)
library(ggplot2)
library(readr)
library(readxl)
library(janitor)
library(stringr)
# Pulling from TIGER to provide shoreline for context in figures
has_tigris <- requireNamespace("tigris", quietly = TRUE)
if (has_tigris) {
options(tigris_use_cache = TRUE)
}
dir.create("figures", showWarnings = FALSE)
dir.create("outputs", showWarnings = FALSE)
dir.create("outputs/rasters", recursive = TRUE, showWarnings = FALSE)
dir.create("outputs/rasters/sst", recursive = TRUE, showWarnings = FALSE)
dir.create("outputs/tables", recursive = TRUE, showWarnings = FALSE)
The sudden recent collapse of the Peconic bay scallop industry has been the immediate inspiration behind this project. Despite several bay restoration efforts and seemingly healthy juvenile recruitment, over summer 2019 NYSDEC reported catastrophic adult bay scallop mortality, exceeding 90% mortality across the overall Peconic region, with some closures nearing a complete decimation of stock [3]. The Peconic Estuary Partnership convened a pretty intensive technical review committee after the 2019 adult die-off to assess possible drivers, and NYSDEC’s species assessment links the collapse to interacting stressors including high summer water temperatures, low dissolved oxygen, and a coccidian parasite — although the actual cause of the crash is still not really well understood [4].
Peconic Bay scallop landings context from the Peconic Bay report [3].
In an attempt to demystify this anomaly, I will be looking across a ~10 year timescale. This squarely centers the 2019 event while still allowing enough wiggle room for some greater ecological context. I’m using a 2014–2025 snapshot as the study window, which allows for a baseline prior to the crash, the actual scallop collapse interval between 2019–2022, and finally the current window from ~2024–2025. The end goal is to create a station-year covariate table that checks whether collapse-period years and nearby shellfish-management areas also carry stacked stress signatures: high modeled nitrogen pressure, high chlorophyll-a, low dissolved oxygen, shallow/low-slope bathymetry, and warmer or less variable SST. Unfortunately, despite bay scallop closures being readily available as a spatial dataset, per-closure landings are absent; as a result, I can only give an approximation of risk rather than a definite cause.
The main R libraries are kept close to the course workflow.
terra handles raster reading, reprojection, masking, slope
creation, and raster extraction; sf handles station points,
subwatersheds, closure polygons, and distance operations;
dplyr, tidyr, readr,
readxl, janitor, and stringr
handle table cleanup; ggplot2 creates the static figures.
tigris is optional and only adds shoreline/water-context
geometry to maps.
The working datasets are summarized below before moving into the actual raster/vector processing.
| Dataset | Source | Role in workflow |
|---|---|---|
| NOAA 2023 NGS Long Island Sound Topobathymetric DEM, product 10274 | NOAA coastal lidar DEM repository | High-resolution nearshore/topobathymetric depth where measured coverage exists [1]. |
| NOAA/NCEI Gardiners Bay M030 30 m Bathymetric DEM | NOAA/NCEI regional bathymetric DEM | Broader Peconic/Gardiners bathymetric base where the 2023 topobathy has no measured cells [2]. |
| Peconic Estuary Program boundary | Suffolk County / Peconic Estuary Program boundary layer | Initial clipping frame for the study area. It includes land, so water cells are later isolated from DEM elevation values below 0 m. |
| TIGER/Line Area Water polygons | U.S. Census TIGER/Line | Optional shoreline/water-context geometry for maps only. |
| Peconic nitrogen loading model | Stephen Lloyd’s 2014 NLM application for the Peconic Estuary, exposed through the NLM ArcGIS FeatureServer | Static land-based nitrogen-pressure layer for 43 modeled Peconic subwatersheds. Numeric fields include atmospheric deposition, wastewater treatment facilities, septic systems, agriculture, lawns, parks/golf, total load, and kg N per year per hectare [5]. |
| SCDHS Peconic water-quality workbook | Suffolk County Department of Health Services | Station observations for dissolved oxygen, chlorophyll-a, Secchi depth, temperature, salinity, and nutrient species where available [6]. |
| NOAA CoastWatch MUR SST | NOAA CoastWatch ERDDAP daily MUR SST | April–August daily surface-temperature fields from 2014–2025, later summarized into yearly and monthly SST covariates [7]. |
| NYSDEC shellfish closure polygons | NYSDEC shellfish closure data | Management/context polygons for summarizing environmental stress near shellfish-relevant areas [8]. |
The bathymetry layer combines the two finished NOAA DEM products. The 2023 topobathymetric product was downloaded from the NOAA coastal lidar DEM repository as tiled GeoTIFFs through the product VRT. It supplies the finer nearshore detail where coverage exists [1]. The Gardiners Bay M030 DEM supplies broader bathymetric coverage for the Peconic/Gardiners system [2]. In R, both products are converted to positive water depth by retaining only cells below 0 m. The final depth layer keeps the 2023 topobathymetry wherever measured coverage exists, then fills remaining bathymetric cells from M030.
The nitrogen layer comes from Stephen Lloyd’s 2014 nitrogen loading model for the Peconic Estuary [5]. The model divides the estuary watershed into 43 subwatersheds and estimates nitrogen loading by source category, including atmospheric deposition, wastewater treatment facilities, septic systems, agriculture, lawns, parks, and golf courses. In this workflow, each water-quality station receives the modeled nitrogen yield of the nearest subwatershed. That assignment is used as a coarse shoreline-proximity pressure variable and kept alongside the station-to-subwatershed distance.
After the bathymetry and nitrogen setup, the workflow shifts to station-year summaries. SCDHS station observations are summarized over the warm season, then joined to fixed spatial covariates: extracted depth, extracted slope, bathymetric class, nearest modeled nitrogen yield, and distance to the nearest nitrogen-loading subwatershed. MUR SST is summarized separately by year and month to create April–August thermal exposure and variability metrics. These derived tables are then used for the final covariate comparisons and stress-index summaries.
Note that several preparation steps are done outside R for speed and file-size reasons (to prime datasets for actual ingestion and evaluation using the actual software). GDAL handles DEM clipping, raster conversion, and VRT-based DEM ingestion; curl and wget handle ERDDAP, FeatureServer, and NetCDF downloads. The relevant terminal commands are documented in sidecar repository files, especially the README, so that this R Markdown file can focus on the derived layers and analysis rather than re-downloading every source file during knitting.
I will only be providing one example of the intial GDAL workflow (External DEM preparation with GDAL). For a full step-by-step walk through of this process refer to the accompanying git repository for direct replication of methodology: https://github.com/gamfeld/GTECH385-peconic-estuary-health-analysis
paths <- list(
dem_2023 = "data/dem/peconic_dem_10m_boundary_10274.tif",
dem_m030 = "data/dem/gardiners_bay_m030_30m.nc",
pep_boundary = "data/Peconic_Estuary_Program_Boundary.geojson"
)
dem23 <- rast(paths$dem_2023)
m030_raw <- rast(paths$dem_m030)
pep_boundary <- st_read(paths$pep_boundary, quiet = TRUE)
# Project the PEP boundary to the 2023 DEM CRS for masking and plotting.
pep_6347 <- st_transform(pep_boundary, crs(dem23))
pep_v <- vect(pep_6347)
dem23
## class : SpatRaster
## size : 3807, 8603, 1 (nrow, ncol, nlyr)
## resolution : 10, 10 (x, y)
## extent : 678110, 764140, 4525640, 4563710 (xmin, xmax, ymin, ymax)
## coord. ref. : NAD83(2011) / UTM zone 18N (EPSG:6347)
## source : peconic_dem_10m_boundary_10274.tif
## name : peconic_dem_10m_boundary_10274
## min value : -13.31304
## max value : 55.06969
m030_raw
## class : SpatRaster
## size : 1010, 2064, 1 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : 698199.6, 760119.6, 4530077, 4560377 (xmin, xmax, ymin, ymax)
## coord. ref. : NAD83 / UTM zone 18N (EPSG:26918)
## source : gardiners_bay_m030_30m.nc
## varname : Band1 (GDAL Band Number 1)
## name : Band1
# TIGER area-water polygons are only used for visual context.
shoreline_ctx <- NULL
if (has_tigris) {
shoreline_ctx <- tryCatch({
suffolk_water <- tigris::area_water(state = "NY", county = "Suffolk", class = "sf")
nassau_water <- tigris::area_water(state = "NY", county = "Nassau", class = "sf")
bind_rows(suffolk_water, nassau_water) |>
st_transform(st_crs(pep_6347)) |>
st_crop(st_bbox(pep_6347))
}, error = function(e) NULL)
}
if (is.null(shoreline_ctx)) {
message("No TIGER area-water shoreline context loaded; using PEP boundary only.")
}
dem_ranges <- tibble(
raster = c("2023 topobathymetric DEM", "M030 Gardiners Bay DEM"),
min_elevation_m = c(
as.numeric(global(dem23, min, na.rm = TRUE)[1, 1]),
as.numeric(global(m030_raw, min, na.rm = TRUE)[1, 1])
),
max_elevation_m = c(
as.numeric(global(dem23, max, na.rm = TRUE)[1, 1]),
as.numeric(global(m030_raw, max, na.rm = TRUE)[1, 1])
)
)
knitr::kable(dem_ranges, digits = 2)
| raster | min_elevation_m | max_elevation_m |
|---|---|---|
| 2023 topobathymetric DEM | -13.31 | 55.07 |
| M030 Gardiners Bay DEM | -29.60 | 0.87 |
# Mask the 2023 product to the PEP boundary. It is already clipped, but this keeps the step explicit.
dem23_clip <- mask(crop(dem23, pep_v), pep_v)
# Project M030 onto the 2023 10 m working grid, then mask to the same PEP boundary.
m030_10m <- project(m030_raw, dem23_clip, method = "bilinear")
m030_clip <- mask(crop(m030_10m, pep_v), pep_v)
# Convert measured below-zero elevations to positive water depth.
# Cells >= 0 are land/upland or above vertical datum and are removed from bathymetric analysis.
depth23 <- abs(ifel(dem23_clip < 0, dem23_clip, NA))
depth_m030 <- abs(ifel(m030_clip < 0, m030_clip, NA))
names(depth23) <- "depth_2023_m"
names(depth_m030) <- "depth_m030_m"
# Keep 2023 topobathy wherever it exists; fill remaining bathymetric cells from M030.
depth_final <- cover(depth23, depth_m030)
names(depth_final) <- "depth_m"
# Track which DEM supplied each cell.
depth_source <- ifel(!is.na(depth23), 1, ifel(!is.na(depth_m030), 2, NA))
names(depth_source) <- "depth_source"
writeRaster(depth23, "outputs/rasters/depth_2023_topobathy_measured_10m.tif", overwrite = TRUE)
writeRaster(depth_m030, "outputs/rasters/depth_m030_gardiners_resampled_10m.tif", overwrite = TRUE)
writeRaster(depth_final, "outputs/rasters/peconic_depth_10m_2023_with_m030_base.tif", overwrite = TRUE)
writeRaster(depth_source, "outputs/rasters/peconic_depth_source_2023_vs_m030.tif", overwrite = TRUE)
depth_stats <- tibble(
raster = c("2023 measured topobathy", "M030 Gardiners Bay base", "Combined depth layer"),
min_depth_m = c(
as.numeric(global(depth23, min, na.rm = TRUE)[1, 1]),
as.numeric(global(depth_m030, min, na.rm = TRUE)[1, 1]),
as.numeric(global(depth_final, min, na.rm = TRUE)[1, 1])
),
max_depth_m = c(
as.numeric(global(depth23, max, na.rm = TRUE)[1, 1]),
as.numeric(global(depth_m030, max, na.rm = TRUE)[1, 1]),
as.numeric(global(depth_final, max, na.rm = TRUE)[1, 1])
),
mean_depth_m = c(
as.numeric(global(depth23, mean, na.rm = TRUE)[1, 1]),
as.numeric(global(depth_m030, mean, na.rm = TRUE)[1, 1]),
as.numeric(global(depth_final, mean, na.rm = TRUE)[1, 1])
),
valid_cells = c(
as.numeric(global(!is.na(depth23), sum, na.rm = TRUE)[1, 1]),
as.numeric(global(!is.na(depth_m030), sum, na.rm = TRUE)[1, 1]),
as.numeric(global(!is.na(depth_final), sum, na.rm = TRUE)[1, 1])
)
)
write_csv(depth_stats, "outputs/tables/depth_stats_2023_m030_combined.csv")
knitr::kable(depth_stats, digits = 2)
| raster | min_depth_m | max_depth_m | mean_depth_m | valid_cells |
|---|---|---|---|---|
| 2023 measured topobathy | 0 | 13.31 | 4.48 | 3268954 |
| M030 Gardiners Bay base | 0 | 29.54 | 7.31 | 4838533 |
| Combined depth layer | 0 | 29.54 | 7.17 | 5466411 |
source_key <- tibble(
depth_source = c(1, 2),
source_label = c("2023 NOAA topobathy", "M030 Gardiners Bay 30 m bathymetry")
)
write_csv(source_key, "outputs/tables/depth_source_key_2023_m030.csv")
knitr::kable(source_key)
| depth_source | source_label |
|---|---|
| 1 | 2023 NOAA topobathy |
| 2 | M030 Gardiners Bay 30 m bathymetry |
sample_raster_df <- function(r, value_name, size = 150000, method = "regular") {
df <- terra::spatSample(
r,
size = size,
method = method,
as.df = TRUE,
xy = TRUE,
na.rm = TRUE
) |>
as_tibble(.name_repair = "minimal")
df <- df[, 1:3]
names(df) <- c("x", "y", value_name)
df
}
mode_numeric <- function(x, ...) {
x <- x[!is.na(x)]
if (length(x) == 0) return(NA_real_)
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
make_source_outline <- function(source_raster, fact = 20) {
src_lowres <- terra::aggregate(source_raster, fact = fact, fun = mode_numeric, na.rm = TRUE)
src_poly <- as.polygons(src_lowres, dissolve = TRUE, na.rm = TRUE)
st_as_sf(src_poly) |>
mutate(
depth_source = factor(
depth_source,
levels = c(1, 2),
labels = c("2023 topobathy", "M030 Gardiners Bay")
)
) |>
filter(!is.na(depth_source))
}
depth_vals <- values(depth_final, na.rm = TRUE)
depth_98 <- as.numeric(quantile(depth_vals, 0.98, na.rm = TRUE))
depth_plot <- clamp(depth_final, 0, depth_98, values = TRUE)
depth_df <- sample_raster_df(depth_plot, "depth_m", size = 180000)
source_outline <- make_source_outline(depth_source, fact = 25)
p_depth <- ggplot() +
geom_raster(data = depth_df, aes(x = x, y = y, fill = depth_m)) +
{if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.18, inherit.aes = FALSE)} +
geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35, inherit.aes = FALSE) +
geom_sf(data = source_outline, aes(color = depth_source), fill = NA, linewidth = 0.45, inherit.aes = FALSE) +
coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
scale_fill_gradient(low = "#d9f0ff", high = "#08306b", name = "Depth (m)", limits = c(0, depth_98)) +
scale_color_manual(
values = c("2023 topobathy" = "steelblue4", "M030 Gardiners Bay" = "orange3"),
name = "Depth source outline"
) +
labs(
title = "Figure 1. Combined Peconic bathymetry layer",
subtitle = paste0("2023 topobathy retained where available; M030 fills broader bathymetry. Ramp capped at 98th percentile = ", round(depth_98, 2), " m."),
x = NULL,
y = NULL,
caption = "Blue/orange outlines mark the approximate source footprint used in the splice. TIGER area-water boundaries, when available, are drawn only as visual shoreline context."
) +
theme_minimal() +
theme(legend.position = "bottom")
print(p_depth)
ggsave("figures/figure_01_combined_peconic_bathymetry.png", p_depth, width = 9, height = 5.5, dpi = 300)
depth_hist <- tibble(depth_m = depth_vals)
p_depth_hist <- ggplot(depth_hist, aes(x = depth_m)) +
geom_histogram(bins = 70, fill = "steelblue4", color = "white") +
labs(
title = "Figure 2. Combined depth distribution",
subtitle = "Distribution after retaining 2023 topobathymetry and filling remaining bathymetry from M030",
x = "Depth (m)",
y = "Raster cell count"
) +
theme_minimal()
print(p_depth_hist)
ggsave("figures/figure_02_combined_depth_distribution.png", p_depth_hist, width = 7.5, height = 4.5, dpi = 300)
The final depth layer is used to derive slope, then slope is split into relative classes. I use relative categories here because the aim is to separate broad bathymetric regimes inside this estuary, rather than impose a universal slope threshold. Low-slope cells approximate flats and broad shallow shelves. High-slope cells identify sharper gradients, channel edges, and places where bathymetry may be more relevant to exchange or mixing proxies. These classes will later become the fixed spatial bins for chlorophyll-a, SST-change, dissolved oxygen, nitrogen, or eelgrass comparisons.
slope <- terrain(depth_final, v = "slope", unit = "degrees")
names(slope) <- "slope_deg"
writeRaster(slope, "outputs/rasters/peconic_slope_deg_10m.tif", overwrite = TRUE)
slope_vals <- values(slope, na.rm = TRUE)
slope_99 <- as.numeric(quantile(slope_vals, 0.99, na.rm = TRUE))
slope_plot <- clamp(slope, 0, slope_99, values = TRUE)
slope_df <- sample_raster_df(slope_plot, "slope_deg", size = 180000)
p_slope <- ggplot() +
geom_raster(data = slope_df, aes(x = x, y = y, fill = slope_deg)) +
{if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.18, inherit.aes = FALSE)} +
geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35, inherit.aes = FALSE) +
coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
scale_fill_gradient(low = "#f7fbff", high = "#08519c", name = "Slope (°)", limits = c(0, slope_99)) +
labs(
title = "Figure 3. Bathymetric slope from combined depth layer",
subtitle = paste0("Slope ramp capped at the 99th percentile = ", round(slope_99, 2), "° for display."),
x = NULL,
y = NULL
) +
theme_minimal() +
theme(legend.position = "bottom")
print(p_slope)
ggsave("figures/figure_03_bathymetric_slope.png", p_slope, width = 9, height = 5.5, dpi = 300)
# Coarse depth classes used as physical proxy bins.
depth_class <- classify(
depth_final,
rcl = matrix(
c(
0, 3, 1,
3, 8, 2,
8, Inf, 3
),
ncol = 3,
byrow = TRUE
)
)
names(depth_class) <- "depth_class"
depth_key <- tibble(
depth_class = c(1, 2, 3),
depth_label = c("shallow_0_3m", "mid_3_8m", "deep_gt_8m")
)
# Relative slope classes: terciles of the observed slope distribution.
slope_q <- quantile(slope_vals, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE)
slope_class <- classify(
slope,
rcl = matrix(
c(
slope_q[1], slope_q[2], 1,
slope_q[2], slope_q[3], 2,
slope_q[3], slope_q[4], 3
),
ncol = 3,
byrow = TRUE
)
)
names(slope_class) <- "slope_class"
slope_key <- tibble(
slope_class = c(1, 2, 3),
slope_label = c("low_slope", "moderate_slope", "high_slope"),
lower_deg = as.numeric(slope_q[1:3]),
upper_deg = as.numeric(slope_q[2:4])
)
# Cross depth class with slope class to make combined bathymetric regimes.
bathy_regime <- depth_class * 10 + slope_class
names(bathy_regime) <- "bathy_regime"
bathy_key <- expand_grid(
depth_class = c(1, 2, 3),
slope_class = c(1, 2, 3)
) |>
left_join(depth_key, by = "depth_class") |>
left_join(slope_key |> select(slope_class, slope_label), by = "slope_class") |>
mutate(
bathy_regime = depth_class * 10 + slope_class,
bathy_label = paste(depth_label, slope_label, sep = " / ")
)
writeRaster(depth_class, "outputs/rasters/peconic_depth_class_10m.tif", overwrite = TRUE)
writeRaster(slope_class, "outputs/rasters/peconic_slope_class_10m.tif", overwrite = TRUE)
writeRaster(bathy_regime, "outputs/rasters/peconic_bathymetric_regime_10m.tif", overwrite = TRUE)
write_csv(depth_key, "outputs/tables/depth_class_key.csv")
write_csv(slope_key, "outputs/tables/slope_class_key.csv")
write_csv(bathy_key, "outputs/tables/bathymetric_regime_key.csv")
knitr::kable(slope_key, digits = 4)
| slope_class | slope_label | lower_deg | upper_deg |
|---|---|---|---|
| 1 | low_slope | 0.0000 | 0.1691 |
| 2 | moderate_slope | 0.1691 | 0.4715 |
| 3 | high_slope | 0.4715 | 41.8144 |
knitr::kable(bathy_key)
| depth_class | slope_class | depth_label | slope_label | bathy_regime | bathy_label |
|---|---|---|---|---|---|
| 1 | 1 | shallow_0_3m | low_slope | 11 | shallow_0_3m / low_slope |
| 1 | 2 | shallow_0_3m | moderate_slope | 12 | shallow_0_3m / moderate_slope |
| 1 | 3 | shallow_0_3m | high_slope | 13 | shallow_0_3m / high_slope |
| 2 | 1 | mid_3_8m | low_slope | 21 | mid_3_8m / low_slope |
| 2 | 2 | mid_3_8m | moderate_slope | 22 | mid_3_8m / moderate_slope |
| 2 | 3 | mid_3_8m | high_slope | 23 | mid_3_8m / high_slope |
| 3 | 1 | deep_gt_8m | low_slope | 31 | deep_gt_8m / low_slope |
| 3 | 2 | deep_gt_8m | moderate_slope | 32 | deep_gt_8m / moderate_slope |
| 3 | 3 | deep_gt_8m | high_slope | 33 | deep_gt_8m / high_slope |
bathy_df <- sample_raster_df(bathy_regime, "bathy_regime", size = 180000) |>
mutate(bathy_regime = as.numeric(bathy_regime)) |>
left_join(bathy_key, by = "bathy_regime")
regime_cols <- c(
"#deebf7", "#c6dbef", "#9ecae1",
"#6baed6", "#4292c6", "#2171b5",
"#08519c", "#08306b", "#041f4a"
)
names(regime_cols) <- unique(bathy_key$bathy_label)
p_regime <- ggplot() +
geom_raster(data = bathy_df, aes(x = x, y = y, fill = bathy_label)) +
{if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.18, inherit.aes = FALSE)} +
geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35, inherit.aes = FALSE) +
coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
scale_fill_manual(values = regime_cols, name = "Bathymetric regime") +
labs(
title = "Figure 4. Bathymetric proxy regimes",
subtitle = "Depth classes crossed with relative slope classes; used later as fixed spatial bins for water-quality proxies.",
x = NULL,
y = NULL
) +
theme_minimal() +
theme(
legend.position = "bottom",
legend.text = element_text(size = 7)
)
print(p_regime)
ggsave("figures/figure_04_bathymetric_proxy_regimes.png", p_regime, width = 9, height = 5.5, dpi = 300)
The next spatial layer is the Peconic Estuary nitrogen loading model. I treat it as a static watershed-pressure layer over the 2014–2024 analysis window. Land use, wastewater inputs, and fertilizer sources can change through time. Here, the layer provides a subwatershed-scale gradient of modeled land-based nitrogen pressure that can be attached to monitoring stations.
The Peconic nitrogen loading model contains 43 subwatersheds and
reports source categories including atmospheric deposition, wastewater
treatment facilities, septic systems, agriculture, lawns, and parks/golf
courses. The key field is kg_N_per_year_per_Ha, a yield
normalized by subwatershed area. Total is retained as the
absolute modeled load.
Most SCDHS stations sit in open water, while the NLM layer is mapped as terrestrial and shoreline-adjacent drainage polygons. Each station therefore inherits the nitrogen yield of the nearest modeled subwatershed, and the station-to-subwatershed distance is retained. This makes nitrogen loading a discrete shoreline-proximity pressure covariate, with some error expected around embayment geometry and tidal exchange. For visual context, I also extend the subwatershed influence zones across the estuary with a nearest-polygon/Voronoi-style partition clipped to the Peconic boundary.
The station layer becomes the limiting geometry from here forward. Each SCDHS station is assigned fixed spatial covariates: bathymetric depth, slope/regime, nearest nitrogen yield, and nearest-subwatershed distance. Later DO, chlorophyll-a, Secchi, temperature, salinity, and nutrient observations can be summarized by station-year and joined to this table.
The ArcGIS web-map metadata exposes the usable feature layer endpoint below. The shell chunk is kept for reproducibility, but the R workflow uses the local GeoPackage if it exists.
mkdir -p data/nitrogen
curl -L -G \
"https://services.arcgis.com/JsDD4qdG5r2a7hR5/arcgis/rest/services/NLM/FeatureServer/0/query" \
--data-urlencode "where=1=1" \
--data-urlencode "outFields=*" \
--data-urlencode "returnGeometry=true" \
--data-urlencode "f=geojson" \
-o data/nitrogen/peconic_nlm_subwatersheds.geojson
ogrinfo -so data/nitrogen/peconic_nlm_subwatersheds.geojson
ogr2ogr \
-f GPKG \
data/nitrogen/peconic_nlm_subwatersheds.gpkg \
data/nitrogen/peconic_nlm_subwatersheds.geojson
nlm_path <- "data/nitrogen/peconic_nlm_subwatersheds.gpkg"
nlm_geojson_path <- "data/nitrogen/peconic_nlm_subwatersheds.geojson"
nlm_url <- paste0(
"https://services.arcgis.com/JsDD4qdG5r2a7hR5/arcgis/rest/services/",
"NLM/FeatureServer/0/query?",
"where=1%3D1&outFields=*&returnGeometry=true&f=geojson"
)
if (file.exists(nlm_path)) {
nlm <- st_read(nlm_path, quiet = TRUE)
} else if (file.exists(nlm_geojson_path)) {
nlm <- st_read(nlm_geojson_path, quiet = TRUE)
} else {
nlm <- st_read(nlm_url, quiet = TRUE)
}
nlm <- nlm |>
st_make_valid() |>
st_transform(st_crs(pep_6347))
# Save a stable local copy for later runs.
st_write(nlm, "data/nitrogen/peconic_nlm_subwatersheds.gpkg", delete_dsn = TRUE, quiet = TRUE)
# Create a visual-only nearest-subwatershed influence surface across the Peconic boundary.
# This shows which terrestrial NLM polygon is closest to each part of the estuary frame.
nlm_pts <- st_point_on_surface(nlm)
nlm_voronoi_geom <- st_voronoi(st_union(st_geometry(nlm_pts)), envelope = st_as_sfc(st_bbox(pep_6347))) |>
st_collection_extract("POLYGON")
nlm_voronoi <- st_as_sf(nlm_voronoi_geom) |>
st_set_crs(st_crs(pep_6347)) |>
st_make_valid() |>
st_intersection(pep_6347)
nlm_voronoi_idx <- st_nearest_feature(st_centroid(nlm_voronoi), nlm_pts)
nlm_voronoi <- bind_cols(
nlm_voronoi,
nlm |>
st_drop_geometry() |>
select(subwatershed_code, kg_N_per_year_per_Ha) |>
slice(nlm_voronoi_idx)
)
nlm_summary <- nlm |>
st_drop_geometry() |>
select(
subwatershed_code,
atmospheric,
wwtf,
septic,
agriculture,
lawns,
parks_golf,
Total,
kg_N_per_year_per_Ha,
PercentofTotal
) |>
arrange(desc(kg_N_per_year_per_Ha))
write_csv(nlm_summary, "outputs/tables/nitrogen_loading_subwatershed_summary.csv")
knitr::kable(head(nlm_summary, 12), digits = 2)
| subwatershed_code | atmospheric | wwtf | septic | agriculture | lawns | parks_golf | Total | kg_N_per_year_per_Ha | PercentofTotal |
|---|---|---|---|---|---|---|---|---|---|
| NF0 | 2191.16 | 0.00 | 5778.99 | 10199.53 | 620.17 | 841.71 | 19631.57 | 13.47 | 7.35 |
| SF8 | 186.41 | 0.00 | 1138.94 | 0.00 | 131.21 | 0.00 | 1456.56 | 12.08 | 0.55 |
| NF15 | 1669.06 | 0.00 | 4409.92 | 4911.54 | 470.05 | 736.34 | 12196.92 | 10.83 | 4.57 |
| NF14 | 1523.93 | 0.00 | 4345.45 | 4566.14 | 460.60 | 2.87 | 10898.98 | 10.80 | 4.08 |
| NF6 | 278.92 | 0.00 | 957.58 | 623.64 | 117.41 | 0.00 | 1977.56 | 10.51 | 0.74 |
| NF12 | 96.46 | 0.00 | 526.22 | 0.00 | 51.62 | 0.00 | 674.29 | 10.49 | 0.25 |
| NF8 | 1703.26 | 0.00 | 6009.13 | 2292.92 | 652.48 | 0.00 | 10657.79 | 9.75 | 3.99 |
| NF13 | 212.35 | 0.00 | 1060.87 | 0.00 | 109.53 | 0.00 | 1382.75 | 9.42 | 0.52 |
| NF11 | 1075.23 | 0.00 | 1981.35 | 3443.52 | 210.01 | 0.00 | 6710.10 | 9.29 | 2.51 |
| FB1 | 9396.97 | 15118.66 | 11746.76 | 12617.40 | 1670.21 | 2362.59 | 52912.59 | 8.75 | 19.81 |
| NF9 | 402.09 | 0.00 | 1807.75 | 0.00 | 199.37 | 0.00 | 2409.22 | 8.72 | 0.90 |
| NF5 | 843.54 | 0.00 | 336.89 | 4056.35 | 49.65 | 0.00 | 5286.43 | 8.09 | 1.98 |
p_nlm <- ggplot() +
geom_sf(data = nlm_voronoi, aes(fill = kg_N_per_year_per_Ha), color = NA, alpha = 0.22) +
geom_sf(data = nlm, aes(fill = kg_N_per_year_per_Ha), color = "white", linewidth = 0.2) +
{if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.15, inherit.aes = FALSE)} +
geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35) +
coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
scale_fill_gradient(low = "#e5f5e0", high = "#006d2c", name = "kg N / yr / ha") +
labs(
title = "Figure 5. Peconic nitrogen yield by subwatershed",
subtitle = "Solid polygons are NLM subwatersheds; transparent fill extends nearest-subwatershed influence across the estuary frame",
x = NULL,
y = NULL,
caption = "The transparent bay-area partition shows the nearest NLM polygon across the estuary frame."
) +
theme_minimal() +
theme(legend.position = "bottom")
print(p_nlm)
ggsave("figures/figure_05_nitrogen_yield_subwatersheds.png", p_nlm, width = 9, height = 5.5, dpi = 300)
stations_path <- "data/water_quality/scdhs_peconic/Station locations - updated 102824.csv"
stations_raw <- read_csv(
stations_path,
show_col_types = FALSE,
locale = locale(encoding = "Windows-1252")
) |>
clean_names()
stations_sf <- stations_raw |>
filter(!is.na(latitude), !is.na(longitude)) |>
st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove = FALSE) |>
st_transform(st_crs(pep_6347))
# Keep stations inside the Peconic boundary. This avoids carrying unrelated county stations forward.
peconic_stations <- stations_sf[st_intersects(stations_sf, pep_6347, sparse = FALSE)[, 1], ]
nrow(peconic_stations)
## [1] 334
# Bathymetric covariates come from the fixed raster products created above.
station_bathy <- terra::extract(
c(depth_final, slope, depth_class, slope_class, bathy_regime),
vect(peconic_stations)
) |>
as_tibble() |>
select(-ID)
peconic_station_covariates <- bind_cols(
peconic_stations,
station_bathy
) |>
left_join(depth_key, by = "depth_class") |>
left_join(slope_key |> select(slope_class, slope_label), by = "slope_class") |>
left_join(bathy_key |> select(bathy_regime, bathy_label), by = "bathy_regime")
# NLM polygons are terrestrial/subwatershed features.
# Marine station points inherit the nearest NLM polygon as a shoreline-pressure covariate.
nlm_for_join <- nlm |>
select(
subwatershed_code,
subwatershed,
atmospheric,
wwtf,
septic,
agriculture,
lawns,
parks_golf,
Total,
kg_N_per_year_per_Ha,
PercentofTotal
)
nearest_nlm_idx <- st_nearest_feature(peconic_station_covariates, nlm_for_join)
nearest_nlm_attrs <- nlm_for_join[nearest_nlm_idx, ] |> st_drop_geometry()
nearest_nlm_distance_m <- as.numeric(
st_distance(
peconic_station_covariates,
nlm_for_join[nearest_nlm_idx, ],
by_element = TRUE
)
)
station_nlm <- bind_cols(
peconic_station_covariates,
nearest_nlm_attrs,
nearest_nlm_distance_m = nearest_nlm_distance_m
)
station_nlm_table <- station_nlm |> st_drop_geometry()
write_csv(station_nlm_table, "outputs/tables/peconic_station_bathymetry_nearest_nitrogen_covariates.csv")
st_write(station_nlm, "outputs/peconic_station_bathymetry_nearest_nitrogen_covariates.gpkg", delete_dsn = TRUE, quiet = TRUE)
knitr::kable(
station_nlm_table |>
select(any_of(c(
"record_id", "record_name", "water_body_area", "water_body",
"depth_m", "slope_deg", "depth_label", "slope_label", "bathy_label",
"subwatershed_code", "kg_N_per_year_per_Ha", "nearest_nlm_distance_m",
"Total", "septic", "agriculture", "wwtf"
))) |>
head(12),
digits = 2
)
| record_id | record_name | water_body_area | water_body | depth_m | slope_deg | depth_label | slope_label | bathy_label | subwatershed_code | kg_N_per_year_per_Ha | nearest_nlm_distance_m | Total | septic | agriculture | wwtf |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SITE-060069 | Peconic Estuary | Peconic Estuary | Great Peconic Bay | 2.48 | 0.97 | shallow_0_3m | high_slope | shallow_0_3m / high_slope | RI | 1.49 | 143.67 | 276.46 | 19.70 | 0.00 | 0.00 |
| SITE-060081 | Peconic Estuary | Peconic Estuary | Hog Creek | 1.78 | 0.92 | shallow_0_3m | high_slope | shallow_0_3m / high_slope | NF12 | 10.49 | 131.26 | 674.29 | 526.22 | 0.00 | 0.00 |
| SITE-060100 | Peconic Estuary | Peconic Estuary | Great Peconic Bay | 1.01 | 5.75 | shallow_0_3m | high_slope | shallow_0_3m / high_slope | NF14 | 10.80 | 531.41 | 10898.98 | 4345.45 | 4566.14 | 0.00 |
| SITE-060101 | Peconic Estuary | Peconic Estuary | East Creek | 1.97 | 1.62 | shallow_0_3m | high_slope | shallow_0_3m / high_slope | NF0 | 13.47 | 15.81 | 19631.57 | 5778.99 | 10199.53 | 0.00 |
| SITE-060102 | Peconic Estuary | Peconic Estuary | Cutchogue Harbor | 4.09 | 0.04 | mid_3_8m | low_slope | mid_3_8m / low_slope | NF13 | 9.42 | 332.81 | 1382.75 | 1060.87 | 0.00 | 0.00 |
| SITE-060103 | Peconic Estuary | Peconic Estuary | East Creek | 3.01 | 2.24 | mid_3_8m | high_slope | mid_3_8m / high_slope | NF14 | 10.80 | 27.23 | 10898.98 | 4345.45 | 4566.14 | 0.00 |
| SITE-060104 | Peconic Estuary | Peconic Estuary | North Sea Harbor | 3.74 | 4.20 | mid_3_8m | high_slope | mid_3_8m / high_slope | SF2 | 5.40 | 14.53 | 15510.50 | 9004.15 | 796.36 | 32.21 |
| SITE-060105 | Peconic Estuary | Peconic Estuary | NA | 4.40 | 0.79 | mid_3_8m | high_slope | mid_3_8m / high_slope | NF11 | 9.29 | 563.67 | 6710.10 | 1981.35 | 3443.52 | 0.00 |
| SITE-060106 | Peconic Estuary | Peconic Estuary | Goose Creek | 2.66 | 5.30 | shallow_0_3m | high_slope | shallow_0_3m / high_slope | NF9 | 8.72 | 19.51 | 2409.22 | 1807.75 | 0.00 | 0.00 |
| SITE-060107 | Peconic Estuary | Peconic Estuary | Town Creek | 3.14 | 1.94 | mid_3_8m | high_slope | mid_3_8m / high_slope | NF8 | 9.75 | 70.66 | 10657.79 | 6009.13 | 2292.92 | 0.00 |
| SITE-060108 | Peconic Estuary | Peconic Estuary | Southold Bay | 5.70 | 0.46 | mid_3_8m | moderate_slope | mid_3_8m / moderate_slope | NF9 | 8.72 | 645.00 | 2409.22 | 1807.75 | 0.00 | 0.00 |
| SITE-060109 | Peconic Estuary | Peconic Estuary | Hashamomuck Pond | 1.24 | 4.90 | shallow_0_3m | high_slope | shallow_0_3m / high_slope | NF8 | 9.75 | 0.00 | 10657.79 | 6009.13 | 2292.92 | 0.00 |
This map checks the station assignment step. The polygons show modeled nitrogen yield, while station colors show the local bathymetric depth class extracted from the DEM. The table written above preserves the nearest-subwatershed distance so this proxy can be filtered later if a station is too far from a modeled drainage polygon.
p_station_cov <- ggplot() +
geom_sf(data = nlm_voronoi, aes(fill = kg_N_per_year_per_Ha), color = NA, alpha = 0.18) +
geom_sf(data = nlm, aes(fill = kg_N_per_year_per_Ha), color = "white", linewidth = 0.15, alpha = 0.85) +
{if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.15, inherit.aes = FALSE)} +
geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35) +
geom_sf(data = station_nlm, aes(color = depth_label), size = 1.6, alpha = 0.85) +
coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
scale_fill_gradient(low = "#e5f5e0", high = "#006d2c", name = "kg N / yr / ha") +
scale_color_manual(
values = c(
"shallow_0_3m" = "#d9f0ff",
"mid_3_8m" = "#3182bd",
"deep_gt_8m" = "#08306b"
),
name = "Station depth class",
na.value = "grey30"
) +
labs(
title = "Figure 6. Station covariates: nitrogen yield and bathymetric class",
subtitle = "SCDHS stations are assigned DEM depth/slope classes and the nearest NLM subwatershed yield",
x = NULL,
y = NULL
) +
theme_minimal() +
theme(legend.position = "bottom")
print(p_station_cov)
ggsave("figures/figure_06_station_covariates_nitrogen_depth.png", p_station_cov, width = 9, height = 5.5, dpi = 300)
The station covariate table preserves the spatial assumptions used here: DEM-derived bathymetric class, nearest NLM subwatershed yield, and the distance from each station to that assigned subwatershed. Water-quality observations can then be summarized by station-year and joined to these fixed covariates for the 2014–2024 comparison window.
MUR SST is added here as a coarse thermal-variability layer rather than as a detailed circulation model. The files were curled from the NOAA CoastWatch ERDDAP MUR daily SST endpoint, clipped at download time to a broader Peconic bounding box, and downloaded daily from April through August for 2014–2025. I use these rasters to create simple year-level temperature covariates: monthly mean SST, cell-level SST standard deviation, and a seasonal surface-temperature variability map. Cells with stronger April–August variability may reflect more rapid heating/cooling, exchange, wind mixing, or overturn-type behavior, while lower variability may indicate more stable surface-water conditions.
For the final analysis, SST is treated as a supporting thermal-context layer rather than direct evidence of upwelling. The main outputs retained from this block are a compact year-level SST covariate table, a year-level SST trend figure, station-level SST extraction, and SST summaries by DEM-derived depth and slope class. I drop the heavier spatial SST map here because it was not doing much additional work for the argument, and because the more useful unit for the later analysis is the monitoring station-year.
sst_dir <- "data/sst/mur_daily_apr_aug"
dir.create("outputs/tables", recursive = TRUE, showWarnings = FALSE)
dir.create("outputs/rasters/sst", recursive = TRUE, showWarnings = FALSE)
dir.create("figures", recursive = TRUE, showWarnings = FALSE)
sst_files <- list.files(
sst_dir,
pattern = "^mur_sst_peconic_\\d{8}\\.nc$",
full.names = TRUE
)
sst_index <- tibble(file = sst_files) |>
mutate(
date = as.Date(str_extract(basename(file), "\\d{8}"), format = "%Y%m%d"),
year = as.integer(format(date, "%Y")),
month_num = as.integer(format(date, "%m")),
month = factor(month.abb[month_num], levels = month.abb[4:8])
) |>
filter(month_num %in% 4:8) |>
arrange(date)
sst_file_counts <- sst_index |>
count(year, month) |>
tidyr::pivot_wider(names_from = month, values_from = n)
knitr::kable(
sst_file_counts,
caption = "MUR SST file counts by year and month. Files are daily from April through August."
)
| year | Apr | May | Jun | Jul | Aug |
|---|---|---|---|---|---|
| 2014 | 30 | 31 | 30 | 31 | 31 |
| 2015 | 30 | 31 | 30 | 31 | 31 |
| 2016 | 30 | 31 | 30 | 31 | 31 |
| 2017 | 30 | 31 | 30 | 31 | 31 |
| 2018 | 30 | 31 | 30 | 31 | 31 |
| 2019 | 30 | 31 | 30 | 31 | 31 |
| 2020 | 30 | 31 | 30 | 31 | 31 |
| 2021 | 30 | 31 | 30 | 31 | 31 |
| 2022 | 30 | 31 | 30 | 31 | 31 |
| 2023 | 30 | 31 | 30 | 31 | 31 |
| 2024 | 30 | 31 | 30 | 31 | 31 |
| 2025 | 30 | 31 | 30 | 31 | 31 |
read_sst_raster_c <- function(f) {
r <- rast(f)[[1]]
rng <- global(r, range, na.rm = TRUE)
if (rng[1, 2] > 100) {
r <- r - 273.15
}
names(r) <- "sst_c"
r
}
read_daily_sst_mean <- function(f) {
as.numeric(global(read_sst_raster_c(f), mean, na.rm = TRUE)[1, 1])
}
Each daily raster is first collapsed to a Peconic-box mean temperature. Those daily means are then averaged by year and month. In the same loop, daily rasters are stacked by year to calculate April–August cell-level mean SST, SST standard deviation, and SST range. The year table is the important output here: it lets me compare whether 2019 was unusual in either mean SST or SST variability.
sst_daily <- sst_index |>
mutate(mean_sst_c = vapply(file, read_daily_sst_mean, numeric(1)))
write_csv(sst_daily, "outputs/tables/mur_daily_mean_sst_peconic_apr_aug.csv")
sst_monthly_mean <- sst_daily |>
group_by(year, month_num, month) |>
summarise(
n_days = n(),
mean_monthly_sst_c = mean(mean_sst_c, na.rm = TRUE),
.groups = "drop"
)
write_csv(sst_monthly_mean, "outputs/tables/mur_monthly_mean_sst_peconic_apr_aug.csv")
monthly_cell_sd <- list()
seasonal_cell_sd <- list()
seasonal_cell_mean <- list()
seasonal_cell_delta <- list()
station_sst_rows <- list()
station_attrs <- station_nlm |>
st_drop_geometry() |>
mutate(station_id = as.character(record_id))
for (yr in sort(unique(sst_index$year))) {
year_files <- sst_index |>
filter(year == yr) |>
pull(file)
year_stack <- rast(lapply(year_files, read_sst_raster_c))
year_sd <- app(year_stack, sd, na.rm = TRUE)
year_mean <- app(year_stack, mean, na.rm = TRUE)
year_delta <- app(year_stack, max, na.rm = TRUE) - app(year_stack, min, na.rm = TRUE)
names(year_sd) <- paste0("sst_sd_apr_aug_", yr)
names(year_mean) <- paste0("sst_mean_apr_aug_", yr)
names(year_delta) <- paste0("sst_delta_apr_aug_", yr)
writeRaster(year_sd, paste0("outputs/rasters/sst/sst_apr_aug_cell_sd_", yr, ".tif"), overwrite = TRUE)
writeRaster(year_mean, paste0("outputs/rasters/sst/sst_apr_aug_cell_mean_", yr, ".tif"), overwrite = TRUE)
writeRaster(year_delta, paste0("outputs/rasters/sst/sst_apr_aug_cell_delta_", yr, ".tif"), overwrite = TRUE)
seasonal_cell_sd[[as.character(yr)]] <- year_sd
seasonal_cell_mean[[as.character(yr)]] <- year_mean
seasonal_cell_delta[[as.character(yr)]] <- year_delta
stations_sst <- st_transform(station_nlm, st_crs(crs(year_mean)))
station_extracted <- terra::extract(
c(year_mean, year_sd),
vect(stations_sst)
) |>
as_tibble() |>
select(-ID)
station_sst_rows[[as.character(yr)]] <- bind_cols(
station_attrs |>
select(any_of(c(
"station_id", "record_id", "record_name", "water_body", "water_body_area",
"depth_m", "slope_deg", "depth_label", "slope_label", "bathy_label",
"kg_N_per_year_per_Ha", "nearest_nlm_distance_m"
))),
station_extracted
) |>
rename(
mur_apr_aug_mean_sst_c = starts_with("sst_mean_apr_aug_"),
mur_apr_aug_sd_sst_c = starts_with("sst_sd_apr_aug_")
) |>
mutate(year = yr)
for (mo in 4:8) {
files_ym <- sst_index |>
filter(year == yr, month_num == mo) |>
pull(file)
month_stack <- rast(lapply(files_ym, read_sst_raster_c))
cell_sd <- app(month_stack, sd, na.rm = TRUE)
cell_delta <- app(month_stack, max, na.rm = TRUE) - app(month_stack, min, na.rm = TRUE)
names(cell_sd) <- paste0("sst_cell_sd_", yr, "_", sprintf("%02d", mo))
names(cell_delta) <- paste0("sst_cell_delta_", yr, "_", sprintf("%02d", mo))
writeRaster(cell_sd, paste0("outputs/rasters/sst/sst_cell_sd_", yr, "_", sprintf("%02d", mo), ".tif"), overwrite = TRUE)
writeRaster(cell_delta, paste0("outputs/rasters/sst/sst_cell_delta_", yr, "_", sprintf("%02d", mo), ".tif"), overwrite = TRUE)
monthly_cell_sd[[paste(yr, mo, sep = "_")]] <- tibble(
year = yr,
month_num = mo,
month = month.abb[mo],
n_days = length(files_ym),
mean_cell_sd_c = as.numeric(global(cell_sd, mean, na.rm = TRUE)[1, 1]),
median_cell_sd_c = as.numeric(global(cell_sd, median, na.rm = TRUE)[1, 1]),
max_cell_sd_c = as.numeric(global(cell_sd, max, na.rm = TRUE)[1, 1]),
mean_cell_delta_c = as.numeric(global(cell_delta, mean, na.rm = TRUE)[1, 1]),
max_cell_delta_c = as.numeric(global(cell_delta, max, na.rm = TRUE)[1, 1])
)
}
}
station_year_sst <- bind_rows(station_sst_rows)
write_csv(station_year_sst, "outputs/tables/mur_station_year_sst_by_slope_depth_apr_aug.csv")
sst_cell_sd_monthly <- bind_rows(monthly_cell_sd)
write_csv(sst_cell_sd_monthly, "outputs/tables/mur_monthly_cell_sd_sst_peconic_apr_aug.csv")
sst_cell_sd_yearly <- sst_cell_sd_monthly |>
group_by(year) |>
summarise(
mean_apr_aug_cell_sd_c = mean(mean_cell_sd_c, na.rm = TRUE),
median_apr_aug_cell_sd_c = median(median_cell_sd_c, na.rm = TRUE),
max_monthly_cell_sd_c = max(max_cell_sd_c, na.rm = TRUE),
mean_apr_aug_cell_delta_c = mean(mean_cell_delta_c, na.rm = TRUE),
max_monthly_cell_delta_c = max(max_cell_delta_c, na.rm = TRUE),
.groups = "drop"
) |>
left_join(
sst_monthly_mean |>
group_by(year) |>
summarise(mean_apr_aug_sst_c = mean(mean_monthly_sst_c, na.rm = TRUE), .groups = "drop"),
by = "year"
) |>
mutate(
mean_sst_rank_warmest = dense_rank(desc(mean_apr_aug_sst_c)),
variability_rank_highest = dense_rank(desc(mean_apr_aug_cell_sd_c))
)
write_csv(sst_cell_sd_yearly, "outputs/tables/mur_yearly_sst_covariates_apr_aug.csv")
knitr::kable(
sst_cell_sd_yearly,
digits = 2,
caption = "Year-level MUR SST covariates. Cell SD is the average of monthly per-cell SST standard deviations across April--August."
)
| year | mean_apr_aug_cell_sd_c | median_apr_aug_cell_sd_c | max_monthly_cell_sd_c | mean_apr_aug_cell_delta_c | max_monthly_cell_delta_c | mean_apr_aug_sst_c | mean_sst_rank_warmest | variability_rank_highest |
|---|---|---|---|---|---|---|---|---|
| 2014 | 1.23 | 1.07 | 2.19 | 4.12 | 8.06 | 15.03 | 12 | 11 |
| 2015 | 1.46 | 1.47 | 2.94 | 5.39 | 12.95 | 15.15 | 11 | 3 |
| 2016 | 1.37 | 1.11 | 2.49 | 4.53 | 8.85 | 16.12 | 5 | 6 |
| 2017 | 1.48 | 1.60 | 2.45 | 4.72 | 6.94 | 15.51 | 10 | 2 |
| 2018 | 1.50 | 1.23 | 2.43 | 4.85 | 7.98 | 15.81 | 9 | 1 |
| 2019 | 1.43 | 1.38 | 2.49 | 4.68 | 7.81 | 15.97 | 7 | 4 |
| 2020 | 1.25 | 1.06 | 2.60 | 4.31 | 8.42 | 16.06 | 6 | 10 |
| 2021 | 1.27 | 1.00 | 2.50 | 4.41 | 8.21 | 15.92 | 8 | 9 |
| 2022 | 1.29 | 1.29 | 2.43 | 4.29 | 7.72 | 16.67 | 1 | 8 |
| 2023 | 1.20 | 1.20 | 1.97 | 4.21 | 6.97 | 16.55 | 3 | 12 |
| 2024 | 1.32 | 1.14 | 2.87 | 4.24 | 7.70 | 16.59 | 2 | 7 |
| 2025 | 1.40 | 1.25 | 2.29 | 4.97 | 8.41 | 16.27 | 4 | 5 |
The year-level table is used to classify whether particular years were warm, variable, or both. The station-level SST table is then summarized by bathymetric depth and slope class, so that the SST diagnostics remain aligned with the station-year covariate structure used later.
sst_long <- sst_cell_sd_yearly |>
select(
year,
mean_apr_aug_sst_c,
mean_apr_aug_cell_sd_c,
mean_apr_aug_cell_delta_c
) |>
pivot_longer(cols = -year, names_to = "metric", values_to = "value") |>
mutate(
metric = recode(
metric,
mean_apr_aug_sst_c = "Mean SST (°C)",
mean_apr_aug_cell_sd_c = "Mean cell SST SD (°C)",
mean_apr_aug_cell_delta_c = "Mean cell SST range (°C)"
)
)
p_sst_compact <- ggplot(sst_long, aes(x = year, y = value)) +
geom_line(linewidth = 0.7) +
geom_point(size = 1.8) +
geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.6) +
facet_wrap(~ metric, scales = "free_y", ncol = 1) +
labs(
title = "Figure 7. April--August MUR SST covariates by year",
subtitle = "Dashed line marks the 2019 bay scallop collapse year",
x = "Year",
y = NULL
) +
theme_minimal()
ggsave("figures/figure_07_mur_sst_covariates_by_year.png", p_sst_compact, width = 8, height = 7, dpi = 300)
knitr::include_graphics("figures/figure_07_mur_sst_covariates_by_year.png")
sst_corr_year <- tibble(
metric = c(
"Mean April--August SST",
"Mean April--August cell SD",
"Median April--August cell SD",
"Mean April--August cell range",
"Maximum monthly cell range"
),
spearman_rho_with_year = c(
cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$mean_apr_aug_sst_c, method = "spearman", use = "complete.obs"),
cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$mean_apr_aug_cell_sd_c, method = "spearman", use = "complete.obs"),
cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$median_apr_aug_cell_sd_c, method = "spearman", use = "complete.obs"),
cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$mean_apr_aug_cell_delta_c, method = "spearman", use = "complete.obs"),
cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$max_monthly_cell_delta_c, method = "spearman", use = "complete.obs")
)
)
write_csv(sst_corr_year, "outputs/tables/mur_sst_correlations_with_year.csv")
knitr::kable(
sst_corr_year,
digits = 2,
caption = "Rank correlations between year and April--August MUR SST covariates."
)
| metric | spearman_rho_with_year |
|---|---|
| Mean April–August SST | 0.81 |
| Mean April–August cell SD | -0.25 |
| Median April–August cell SD | -0.09 |
| Mean April–August cell range | -0.16 |
| Maximum monthly cell range | -0.34 |
sst_by_depth_year <- station_year_sst |>
filter(!is.na(depth_label), !is.na(mur_apr_aug_sd_sst_c)) |>
group_by(year, depth_label) |>
summarise(
n_stations = n(),
median_sst_c = median(mur_apr_aug_mean_sst_c, na.rm = TRUE),
median_sst_sd_c = median(mur_apr_aug_sd_sst_c, na.rm = TRUE),
.groups = "drop"
)
sst_by_slope_year <- station_year_sst |>
filter(!is.na(slope_label), !is.na(mur_apr_aug_sd_sst_c)) |>
group_by(year, slope_label) |>
summarise(
n_stations = n(),
median_sst_c = median(mur_apr_aug_mean_sst_c, na.rm = TRUE),
median_sst_sd_c = median(mur_apr_aug_sd_sst_c, na.rm = TRUE),
.groups = "drop"
)
write_csv(sst_by_depth_year, "outputs/tables/mur_sst_by_depth_class_year.csv")
write_csv(sst_by_slope_year, "outputs/tables/mur_sst_by_slope_class_year.csv")
p_sst_depth <- ggplot(
sst_by_depth_year,
aes(x = year, y = median_sst_sd_c, color = depth_label, group = depth_label)
) +
geom_line(linewidth = 0.7) +
geom_point(size = 1.8) +
geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.6) +
labs(
title = "Figure 8. April--August SST variability by station depth class",
subtitle = "MUR SST SD extracted to monitoring stations; depth classes are DEM-derived",
x = "Year",
y = "Median station SST SD (°C)",
color = "Depth class"
) +
theme_minimal()
ggsave("figures/figure_08_mur_sst_sd_by_depth_class.png", p_sst_depth, width = 9, height = 5, dpi = 300)
knitr::include_graphics("figures/figure_08_mur_sst_sd_by_depth_class.png")
p_sst_slope <- ggplot(
sst_by_slope_year,
aes(x = year, y = median_sst_sd_c, color = slope_label, group = slope_label)
) +
geom_line(linewidth = 0.7) +
geom_point(size = 1.8) +
geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.6) +
labs(
title = "Figure 9. April--August SST variability by station slope class",
subtitle = "MUR SST SD extracted to monitoring stations; slope classes are DEM-derived",
x = "Year",
y = "Median station SST SD (°C)",
color = "Slope class"
) +
theme_minimal()
ggsave("figures/figure_09_mur_sst_sd_by_slope_class.png", p_sst_slope, width = 9, height = 5, dpi = 300)
knitr::include_graphics("figures/figure_09_mur_sst_sd_by_slope_class.png")
sst_year_classes <- sst_cell_sd_yearly |>
mutate(
mean_sst_z = as.numeric(scale(mean_apr_aug_sst_c)),
variability_z = as.numeric(scale(mean_apr_aug_cell_sd_c)),
thermal_retention_score = rowMeans(cbind(mean_sst_z, -variability_z), na.rm = TRUE),
sst_year_class = case_when(
thermal_retention_score >= quantile(thermal_retention_score, 0.75, na.rm = TRUE) ~ "higher thermal-retention pressure",
thermal_retention_score <= quantile(thermal_retention_score, 0.25, na.rm = TRUE) ~ "lower thermal-retention pressure",
TRUE ~ "middle"
)
) |>
arrange(desc(thermal_retention_score))
write_csv(sst_year_classes, "outputs/tables/mur_sst_year_thermal_pressure_classes.csv")
knitr::kable(
sst_year_classes,
digits = 2,
caption = "SST-based year classes. Higher scores indicate warmer April--August SST combined with lower SST variability."
)
| year | mean_apr_aug_cell_sd_c | median_apr_aug_cell_sd_c | max_monthly_cell_sd_c | mean_apr_aug_cell_delta_c | max_monthly_cell_delta_c | mean_apr_aug_sst_c | mean_sst_rank_warmest | variability_rank_highest | mean_sst_z | variability_z | thermal_retention_score | sst_year_class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2023 | 1.20 | 1.20 | 1.97 | 4.21 | 6.97 | 16.55 | 3 | 12 | 1.08 | -1.45 | 1.27 | higher thermal-retention pressure |
| 2022 | 1.29 | 1.29 | 2.43 | 4.29 | 7.72 | 16.67 | 1 | 8 | 1.32 | -0.55 | 0.94 | higher thermal-retention pressure |
| 2024 | 1.32 | 1.14 | 2.87 | 4.24 | 7.70 | 16.59 | 2 | 7 | 1.17 | -0.26 | 0.72 | higher thermal-retention pressure |
| 2020 | 1.25 | 1.06 | 2.60 | 4.31 | 8.42 | 16.06 | 6 | 10 | 0.16 | -0.99 | 0.57 | middle |
| 2021 | 1.27 | 1.00 | 2.50 | 4.41 | 8.21 | 15.92 | 8 | 9 | -0.10 | -0.79 | 0.35 | middle |
| 2025 | 1.40 | 1.25 | 2.29 | 4.97 | 8.41 | 16.27 | 4 | 5 | 0.56 | 0.49 | 0.04 | middle |
| 2016 | 1.37 | 1.11 | 2.49 | 4.53 | 8.85 | 16.12 | 5 | 6 | 0.28 | 0.22 | 0.03 | middle |
| 2014 | 1.23 | 1.07 | 2.19 | 4.12 | 8.06 | 15.03 | 12 | 11 | -1.76 | -1.16 | -0.30 | middle |
| 2019 | 1.43 | 1.38 | 2.49 | 4.68 | 7.81 | 15.97 | 7 | 4 | 0.00 | 0.79 | -0.39 | middle |
| 2018 | 1.50 | 1.23 | 2.43 | 4.85 | 7.98 | 15.81 | 9 | 1 | -0.31 | 1.44 | -0.87 | lower thermal-retention pressure |
| 2017 | 1.48 | 1.60 | 2.45 | 4.72 | 6.94 | 15.51 | 10 | 2 | -0.86 | 1.24 | -1.05 | lower thermal-retention pressure |
| 2015 | 1.46 | 1.47 | 2.94 | 5.39 | 12.95 | 15.15 | 11 | 3 | -1.54 | 1.03 | -1.28 | lower thermal-retention pressure |
The important result from this block is negative but useful. The MUR SST summaries do not identify 2019 as a clear basin-wide anomaly in either April–August mean SST or April–August SST variability. That weakens a simple wedge-like or seasonal surface-temperature anomaly explanation for the 2019 scallop collapse, at least at the spatial and temporal scale used here. It does not rule out finer-scale bottom-water stress, short heat events, dissolved-oxygen crashes, parasite timing, or interacting physiological stress. For this project, SST is therefore retained as broad thermal context and as a covariate in the later station-year table, while the actual explanatory weight remains with station water-quality records, bathymetric setting, and nitrogen-loading context.
The final analysis unit is a station-year. Fixed spatial variables come from earlier sections: bathymetric depth, bathymetric slope, bathymetric regime, nearest Peconic nitrogen-loading subwatershed, and distance to that subwatershed. Temporal variables come from the SCDHS water-quality workbook and from the MUR SST products. The goal here is to assemble a compact covariate table and check whether water-quality stress is aligned with the physical and watershed-pressure variables developed above.
The working questions are deliberately limited:
The stress logic is directional. Higher chlorophyll-a, higher nitrogen loading, higher summer temperature, lower dissolved oxygen, lower SST variability, lower slope, and shallower water are treated as more stressful. The SST component is a surface-temperature variability proxy, so it should be read as evidence of thermal behavior at the surface rather than direct proof of overturn or upwelling.
The SCDHS workbook is read from the two Peconic station sheets.
Station identifiers are converted into the same SITE-060000
style used by the station-location file, which allows the water-quality
records to join to the bathymetry and nitrogen covariate table.
wq_path <- "data/water_quality/scdhs_peconic/Peconics SCDHS WQ data - 1976 - 2024 DRAFT - updated 041025.xlsx"
# The two sheets have some columns inferred as different types by readxl.
# Converting both sheets to character before bind_rows() prevents type conflicts;
# the fields used below are then parsed back to numeric explicitly.
wq060_raw <- read_excel(wq_path, sheet = "Pec 060 stations ") |>
clean_names() |>
mutate(across(everything(), as.character))
wq200_raw <- read_excel(wq_path, sheet = "Pec 200 stations") |>
clean_names() |>
mutate(across(everything(), as.character))
parse_wq_number <- function(x) {
x <- as.character(x)
x <- str_replace_all(x, "<\\s*", "")
x <- str_replace_all(x, ">\\s*", "")
x <- str_replace_all(x, ",", "")
suppressWarnings(as.numeric(x))
}
mean_na <- function(x) {
if (all(is.na(x))) NA_real_ else mean(x, na.rm = TRUE)
}
min_na <- function(x) {
if (all(is.na(x))) NA_real_ else min(x, na.rm = TRUE)
}
wq_clean <- bind_rows(
wq060_raw |> mutate(program_sheet = "Pec 060"),
wq200_raw |> mutate(program_sheet = "Pec 200")
) |>
transmute(
bay_station = parse_wq_number(bay_station),
sample_date = as.Date(date),
year = as.integer(format(sample_date, "%Y")),
month = as.integer(format(sample_date, "%m")),
do_mg_l = parse_wq_number(dissolved_oxygen_mg_l),
chla_ug_l = parse_wq_number(chlorophyll_a_total_ug_l),
secchi_m = parse_wq_number(secchi_ft) * 0.3048,
nitrate_mg_l = parse_wq_number(nitrate_mg_l),
nox_mg_l = parse_wq_number(nitrate_nitrite_mg_l),
tkn_mg_l = parse_wq_number(total_kjeldahl_nitrogen_mg_l),
tn_mg_l = parse_wq_number(total_nitrogen_mg_l),
salinity_ppt = parse_wq_number(salinity_psu),
station_temp_c = parse_wq_number(temperature_0c)
) |>
filter(
year >= 2014,
year <= 2025,
!is.na(bay_station),
!is.na(sample_date)
) |>
mutate(station_id = sprintf("SITE-%06d", as.integer(bay_station)))
station_year_wq <- wq_clean |>
filter(month %in% 3:7) |>
group_by(station_id, year) |>
summarise(
n_wq_samples = n(),
mean_do_mg_l = mean_na(do_mg_l),
min_do_mg_l = min_na(do_mg_l),
mean_chla_ug_l = mean_na(chla_ug_l),
mean_secchi_m = mean_na(secchi_m),
mean_nitrate_mg_l = mean_na(nitrate_mg_l),
mean_nox_mg_l = mean_na(nox_mg_l),
mean_tkn_mg_l = mean_na(tkn_mg_l),
mean_tn_mg_l = mean_na(tn_mg_l),
mean_salinity_ppt = mean_na(salinity_ppt),
mean_station_temp_c = mean_na(station_temp_c),
.groups = "drop"
)
write_csv(station_year_wq, "outputs/tables/scdhs_station_year_wq_2014_2025_mar_jul.csv")
knitr::kable(
station_year_wq |> head(12),
digits = 2,
caption = "Station-year water-quality summaries from SCDHS Peconic monitoring records, March--July."
)
| station_id | year | n_wq_samples | mean_do_mg_l | min_do_mg_l | mean_chla_ug_l | mean_secchi_m | mean_nitrate_mg_l | mean_nox_mg_l | mean_tkn_mg_l | mean_tn_mg_l | mean_salinity_ppt | mean_station_temp_c |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SITE-060101 | 2014 | 10 | 7.72 | 4.0 | 4.41 | 1.37 | NA | NA | NA | NA | 25.96 | 16.57 |
| SITE-060101 | 2015 | 16 | 6.76 | 3.2 | 6.89 | 1.28 | NA | NA | NA | NA | 27.08 | 18.09 |
| SITE-060101 | 2016 | 10 | 6.70 | 3.1 | 10.05 | 1.34 | NA | NA | NA | NA | 27.26 | 15.96 |
| SITE-060101 | 2017 | 8 | 8.07 | 3.3 | 4.86 | 1.57 | NA | NA | NA | NA | 26.77 | 16.54 |
| SITE-060101 | 2018 | 8 | 7.71 | 5.0 | 7.79 | 1.52 | NA | NA | NA | NA | 25.75 | 17.42 |
| SITE-060101 | 2019 | 4 | 8.12 | 5.0 | 8.58 | 1.52 | NA | NA | NA | NA | 24.70 | 16.05 |
| SITE-060101 | 2020 | 2 | 5.45 | 5.3 | 7.79 | 0.91 | NA | NA | NA | NA | 27.80 | 27.70 |
| SITE-060101 | 2021 | 6 | 8.55 | 3.4 | 12.48 | 1.45 | NA | NA | NA | NA | 25.78 | 15.08 |
| SITE-060101 | 2022 | 8 | 7.32 | 4.2 | 14.58 | 1.26 | NA | NA | NA | NA | 27.09 | 17.42 |
| SITE-060101 | 2023 | 5 | 7.62 | 5.5 | 3.91 | 1.83 | NA | NA | NA | NA | 26.82 | 19.30 |
| SITE-060101 | 2024 | 10 | 7.42 | 4.2 | 5.36 | 1.32 | NA | NA | NA | NA | 24.25 | 18.79 |
| SITE-060102 | 2014 | 10 | 8.20 | 4.6 | 2.18 | 1.83 | NA | NA | NA | NA | 27.10 | 15.93 |
Nitrogen loading and bathymetry are static over the analysis window; SST and water-quality values vary by year. This chunk creates the combined station-year table used for the correlation screen and stress-index plots.
station_covariates <- read_csv(
"outputs/tables/peconic_station_bathymetry_nearest_nitrogen_covariates.csv",
show_col_types = FALSE
) |>
mutate(station_id = as.character(record_id))
station_year_sst_for_join <- read_csv(
"outputs/tables/mur_station_year_sst_by_slope_depth_apr_aug.csv",
show_col_types = FALSE
) |>
select(station_id, year, mur_apr_aug_mean_sst_c, mur_apr_aug_sd_sst_c)
station_year_model <- station_year_wq |>
left_join(station_covariates, by = "station_id") |>
left_join(station_year_sst_for_join, by = c("station_id", "year"))
write_csv(station_year_model, "outputs/tables/station_year_combined_covariates_2014_2025.csv")
knitr::kable(
station_year_model |>
select(any_of(c(
"station_id", "year", "mean_chla_ug_l", "min_do_mg_l",
"depth_m", "slope_deg", "kg_N_per_year_per_Ha",
"mur_apr_aug_mean_sst_c", "mur_apr_aug_sd_sst_c"
))) |>
head(12),
digits = 2,
caption = "Combined station-year table used for the final covariate screen."
)
| station_id | year | mean_chla_ug_l | min_do_mg_l | depth_m | slope_deg | kg_N_per_year_per_Ha | mur_apr_aug_mean_sst_c | mur_apr_aug_sd_sst_c |
|---|---|---|---|---|---|---|---|---|
| SITE-060101 | 2014 | 4.41 | 4.0 | 1.97 | 1.62 | 13.47 | 15.55 | 6.33 |
| SITE-060101 | 2015 | 6.89 | 3.2 | 1.97 | 1.62 | 13.47 | 15.60 | 6.99 |
| SITE-060101 | 2016 | 10.05 | 3.1 | 1.97 | 1.62 | 13.47 | 16.65 | 6.38 |
| SITE-060101 | 2017 | 4.86 | 3.3 | 1.97 | 1.62 | 13.47 | 16.00 | 6.12 |
| SITE-060101 | 2018 | 7.79 | 5.0 | 1.97 | 1.62 | 13.47 | 16.14 | 6.96 |
| SITE-060101 | 2019 | 8.58 | 5.0 | 1.97 | 1.62 | 13.47 | 16.38 | 6.15 |
| SITE-060101 | 2020 | 7.79 | 5.3 | 1.97 | 1.62 | 13.47 | 16.50 | 6.40 |
| SITE-060101 | 2021 | 12.48 | 3.4 | 1.97 | 1.62 | 13.47 | 16.38 | 5.97 |
| SITE-060101 | 2022 | 14.58 | 4.2 | 1.97 | 1.62 | 13.47 | 16.96 | 6.43 |
| SITE-060101 | 2023 | 3.91 | 5.5 | 1.97 | 1.62 | 13.47 | 16.96 | 5.91 |
| SITE-060101 | 2024 | 5.36 | 4.2 | 1.97 | 1.62 | 13.47 | 16.94 | 5.86 |
| SITE-060102 | 2014 | 2.18 | 4.6 | 4.09 | 0.04 | 9.42 | 15.41 | 6.32 |
A Spearman correlation matrix is the first check because several variables are skewed, bounded, or partly ordinal. The matrix is not a mechanism test; it is a quick way to see whether the proposed stress variables move together in the expected directions.
cor_vars <- station_year_model |>
select(any_of(c(
"min_do_mg_l",
"mean_do_mg_l",
"mean_chla_ug_l",
"mean_secchi_m",
"mean_nitrate_mg_l",
"mean_nox_mg_l",
"mean_tkn_mg_l",
"mean_tn_mg_l",
"depth_m",
"slope_deg",
"kg_N_per_year_per_Ha",
"nearest_nlm_distance_m",
"mur_apr_aug_mean_sst_c",
"mur_apr_aug_sd_sst_c"
))) |>
select(where(is.numeric))
cor_mat <- cor(cor_vars, use = "pairwise.complete.obs", method = "spearman")
cor_df <- as.data.frame(as.table(cor_mat)) |>
rename(var1 = Var1, var2 = Var2, spearman_r = Freq)
pairwise_cor_tests <- combn(names(cor_vars), 2, simplify = FALSE) |>
lapply(function(v) {
x <- cor_vars[[v[1]]]
y <- cor_vars[[v[2]]]
ok <- complete.cases(x, y)
if (sum(ok) < 10 || sd(x[ok], na.rm = TRUE) == 0 || sd(y[ok], na.rm = TRUE) == 0) {
return(tibble(var1 = v[1], var2 = v[2], n = sum(ok), spearman_r = NA_real_, p_value = NA_real_))
}
test <- suppressWarnings(cor.test(x[ok], y[ok], method = "spearman", exact = FALSE))
tibble(
var1 = v[1],
var2 = v[2],
n = sum(ok),
spearman_r = unname(test$estimate),
p_value = test$p.value
)
}) |>
bind_rows() |>
arrange(desc(abs(spearman_r)))
write_csv(cor_df, "outputs/tables/station_year_spearman_correlations.csv")
write_csv(pairwise_cor_tests, "outputs/tables/station_year_pairwise_spearman_tests.csv")
p_cor <- ggplot(cor_df, aes(x = var1, y = var2, fill = spearman_r)) +
geom_tile(color = "white", linewidth = 0.2) +
scale_fill_gradient2(
low = "#2166ac",
mid = "white",
high = "#b2182b",
midpoint = 0,
limits = c(-1, 1),
name = "Spearman r"
) +
labs(
title = "Figure 10. Correlation screen for station-year covariates",
subtitle = "Spearman rank correlations; 2014--2025 station-year summaries",
x = NULL,
y = NULL
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("figures/figure_10_station_year_correlation_matrix.png", p_cor, width = 8.5, height = 7, dpi = 300)
knitr::include_graphics("figures/figure_10_station_year_correlation_matrix.png")
knitr::kable(
pairwise_cor_tests |>
filter(!is.na(spearman_r)) |>
slice_head(n = 15),
digits = 3,
caption = "Strongest pairwise Spearman associations among station-year covariates."
)
| var1 | var2 | n | spearman_r | p_value |
|---|---|---|---|---|
| mean_nox_mg_l | depth_m | 13 | -0.866 | 0 |
| min_do_mg_l | mean_do_mg_l | 753 | 0.703 | 0 |
| depth_m | nearest_nlm_distance_m | 527 | 0.679 | 0 |
| mean_secchi_m | depth_m | 424 | 0.630 | 0 |
| mean_nox_mg_l | kg_N_per_year_per_Ha | 196 | 0.596 | 0 |
| mean_nitrate_mg_l | mean_nox_mg_l | 137 | 0.587 | 0 |
| mean_do_mg_l | mean_secchi_m | 452 | 0.543 | 0 |
| mean_secchi_m | nearest_nlm_distance_m | 452 | 0.525 | 0 |
| min_do_mg_l | mean_secchi_m | 452 | 0.524 | 0 |
| slope_deg | nearest_nlm_distance_m | 470 | -0.514 | 0 |
| min_do_mg_l | mean_chla_ug_l | 514 | -0.501 | 0 |
| mean_chla_ug_l | mean_secchi_m | 419 | -0.472 | 0 |
| mean_nitrate_mg_l | kg_N_per_year_per_Ha | 139 | 0.432 | 0 |
| mean_chla_ug_l | kg_N_per_year_per_Ha | 515 | 0.394 | 0 |
| min_do_mg_l | mean_nox_mg_l | 192 | 0.392 | 0 |
The index is scaled so that larger values indicate greater proxy stress. Direct response variables are weighted more heavily than static spatial covariates. Dissolved oxygen is inverted because lower oxygen is the stress condition. SST variability and slope are also inverted because in my working hypothesis I’m assuming more variable/mixed and steeper-gradient areas as less prone to stagnant warm-water stress.
z_safe <- function(x) {
x <- as.numeric(x)
sx <- sd(x, na.rm = TRUE)
if (is.na(sx) || sx == 0) return(rep(NA_real_, length(x)))
as.numeric((x - mean(x, na.rm = TRUE)) / sx)
}
trend_test <- function(dat, value_col) {
x <- dat$year
y <- dat[[value_col]]
ok <- complete.cases(x, y)
if (sum(ok) < 4 || sd(y[ok], na.rm = TRUE) == 0) {
return(tibble(metric = value_col, n = sum(ok), spearman_rho = NA_real_, p_value = NA_real_))
}
test <- suppressWarnings(cor.test(x[ok], y[ok], method = "spearman", exact = FALSE))
tibble(
metric = value_col,
n = sum(ok),
spearman_rho = unname(test$estimate),
p_value = test$p.value
)
}
station_year_stress <- station_year_model |>
mutate(
z_chla = z_safe(mean_chla_ug_l),
z_low_do = -z_safe(min_do_mg_l),
z_n_yield = z_safe(kg_N_per_year_per_Ha),
z_warm_sst = z_safe(mur_apr_aug_mean_sst_c),
z_low_sst_variability = -z_safe(mur_apr_aug_sd_sst_c),
z_low_slope = -z_safe(slope_deg),
z_shallow = -z_safe(depth_m),
stress_index =
0.25 * z_chla +
0.25 * z_low_do +
0.15 * z_n_yield +
0.15 * z_warm_sst +
0.10 * z_low_sst_variability +
0.05 * z_low_slope +
0.05 * z_shallow
)
write_csv(station_year_stress, "outputs/tables/station_year_stress_index_2014_2025.csv")
stress_year_summary <- station_year_stress |>
group_by(year) |>
summarise(
n_station_years = sum(!is.na(stress_index)),
median_stress_index = median(stress_index, na.rm = TRUE),
mean_stress_index = mean(stress_index, na.rm = TRUE),
upper_quartile_stress = quantile(stress_index, 0.75, na.rm = TRUE),
.groups = "drop"
)
stress_trend_tests <- bind_rows(
trend_test(stress_year_summary, "median_stress_index"),
trend_test(stress_year_summary, "mean_stress_index"),
trend_test(stress_year_summary, "upper_quartile_stress")
)
write_csv(stress_year_summary, "outputs/tables/station_year_stress_summary_by_year.csv")
write_csv(stress_trend_tests, "outputs/tables/station_year_stress_trend_tests.csv")
knitr::kable(
stress_year_summary,
digits = 2,
caption = "Annual summary of the provisional station-year stress index."
)
| year | n_station_years | median_stress_index | mean_stress_index | upper_quartile_stress |
|---|---|---|---|---|
| 2014 | 39 | -0.39 | -0.34 | -0.22 |
| 2015 | 39 | -0.62 | -0.53 | -0.43 |
| 2016 | 39 | -0.05 | 0.01 | 0.17 |
| 2017 | 39 | -0.16 | -0.04 | 0.10 |
| 2018 | 39 | -0.27 | -0.28 | -0.06 |
| 2019 | 41 | -0.14 | -0.05 | 0.26 |
| 2020 | 40 | -0.01 | 0.01 | 0.13 |
| 2021 | 43 | 0.03 | 0.17 | 0.39 |
| 2022 | 43 | -0.03 | 0.08 | 0.23 |
| 2023 | 43 | 0.22 | 0.26 | 0.46 |
| 2024 | 50 | 0.19 | 0.28 | 0.53 |
knitr::kable(
stress_trend_tests,
digits = 4,
caption = "Spearman trend tests for annual stress-index summaries."
)
| metric | n | spearman_rho | p_value |
|---|---|---|---|
| median_stress_index | 11 | 0.8909 | 2e-04 |
| mean_stress_index | 11 | 0.8636 | 6e-04 |
| upper_quartile_stress | 11 | 0.8727 | 5e-04 |
p_stress_year <- ggplot(station_year_stress, aes(x = factor(year), y = stress_index)) +
geom_boxplot(fill = "grey90", color = "grey30", outlier.alpha = 0.45) +
stat_summary(fun = mean, geom = "point", color = "#b2182b", size = 2) +
geom_vline(xintercept = which(sort(unique(station_year_stress$year)) == 2019), linetype = "dashed", alpha = 0.5) +
labs(
title = "Figure 11. Station-year stress index by year",
subtitle = "Higher values indicate higher chlorophyll/nitrogen/temperature and lower oxygen/variability/slope/depth",
x = "Year",
y = "Relative stress index"
) +
theme_minimal()
ggsave("figures/figure_11_station_year_stress_index.png", p_stress_year, width = 8.5, height = 4.8, dpi = 300)
knitr::include_graphics("figures/figure_11_station_year_stress_index.png")
stress_components_long <- station_year_stress |>
select(year, z_chla, z_low_do, z_n_yield, z_warm_sst, z_low_sst_variability, z_low_slope, z_shallow) |>
pivot_longer(cols = -year, names_to = "component", values_to = "z_value") |>
group_by(year, component) |>
summarise(median_component = median(z_value, na.rm = TRUE), .groups = "drop")
p_components <- ggplot(stress_components_long, aes(x = year, y = median_component, color = component)) +
geom_line(linewidth = 0.6) +
geom_point(size = 1.5) +
geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.5) +
labs(
title = "Figure 12. Median stress-index components by year",
subtitle = "Static components separate station groups; water-quality and SST components drive year-to-year change",
x = "Year",
y = "Median z-scored component",
color = "Component"
) +
theme_minimal()
ggsave("figures/figure_12_stress_components_by_year.png", p_components, width = 9, height = 5, dpi = 300)
knitr::include_graphics("figures/figure_12_stress_components_by_year.png")
Each station-year stress value is assigned to the nearest shellfish-closure polygon, and closure-year stress is summarized from nearby stations. The output gives an environmental stress summary around closure polygons while leaving scallop abundance and landings outside the model.
closures <- st_read("data/nysdec_shellfish_closures.geojson", quiet = TRUE) |>
mutate(closure_id = row_number())
station_geom <- st_read(
"outputs/peconic_station_bathymetry_nearest_nitrogen_covariates.gpkg",
quiet = TRUE
) |>
mutate(station_id = as.character(record_id)) |>
select(station_id)
station_stress_sf <- station_geom |>
left_join(station_year_stress, by = "station_id") |>
filter(!is.na(year), !is.na(stress_index))
closures_m <- st_transform(closures, 6347)
stations_m <- st_transform(station_stress_sf, 6347)
nearest_closure <- st_nearest_feature(stations_m, closures_m)
nearest_closure_dist_m <- as.numeric(st_distance(stations_m, closures_m[nearest_closure, ], by_element = TRUE))
station_closure_stress <- stations_m |>
mutate(
closure_id = closures_m$closure_id[nearest_closure],
nearest_closure_distance_m = nearest_closure_dist_m
)
closure_year_stress <- station_closure_stress |>
st_drop_geometry() |>
group_by(year, closure_id) |>
summarise(
n_stations = n(),
mean_stress_index = mean(stress_index, na.rm = TRUE),
max_stress_index = max(stress_index, na.rm = TRUE),
mean_nearest_station_distance_m = mean(nearest_closure_distance_m, na.rm = TRUE),
.groups = "drop"
)
closure_stress_series <- closure_year_stress |>
group_by(year) |>
summarise(
mean_closure_stress = mean(mean_stress_index, na.rm = TRUE),
upper_closure_stress = quantile(mean_stress_index, 0.75, na.rm = TRUE),
.groups = "drop"
)
closure_trend_tests <- bind_rows(
trend_test(closure_stress_series, "mean_closure_stress"),
trend_test(closure_stress_series, "upper_closure_stress")
)
write_csv(closure_year_stress, "outputs/tables/shellfish_closure_year_stress_index.csv")
write_csv(closure_stress_series, "outputs/tables/shellfish_closure_stress_series.csv")
write_csv(closure_trend_tests, "outputs/tables/shellfish_closure_stress_trend_tests.csv")
knitr::kable(
closure_trend_tests,
digits = 4,
caption = "Spearman trend tests for shellfish-closure stress summaries."
)
| metric | n | spearman_rho | p_value |
|---|---|---|---|
| mean_closure_stress | 11 | 0.9182 | 1e-04 |
| upper_closure_stress | 11 | 0.8909 | 2e-04 |
p_closure_hist <- ggplot(closure_year_stress, aes(x = factor(year), y = mean_stress_index)) +
geom_boxplot(fill = "grey90", color = "grey35", outlier.alpha = 0.45) +
stat_summary(fun = mean, geom = "point", color = "#b2182b", size = 2) +
labs(
title = "Figure 13. Shellfish-closure stress summaries by year",
subtitle = "Closure stress is approximated from nearest station-year stress values",
x = "Year",
y = "Mean closure stress index"
) +
theme_minimal()
ggsave("figures/figure_13_shellfish_closure_stress_by_year.png", p_closure_hist, width = 8.5, height = 4.8, dpi = 300)
knitr::include_graphics("figures/figure_13_shellfish_closure_stress_by_year.png")
p_closure_ts <- ggplot(closure_stress_series, aes(x = year, y = mean_closure_stress)) +
geom_line(linewidth = 0.6, color = "#2166ac") +
geom_point(size = 2, color = "#2166ac") +
geom_line(aes(y = upper_closure_stress), linewidth = 0.4, color = "#b2182b", linetype = "dashed") +
geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.5) +
labs(
title = "Figure 14. Mean shellfish-closure stress through time",
subtitle = "Red dashed series shows the upper quartile of closure stress values",
x = "Year",
y = "Stress index"
) +
theme_minimal()
ggsave("figures/figure_14_shellfish_closure_stress_timeseries.png", p_closure_ts, width = 8.5, height = 4.8, dpi = 300)
knitr::include_graphics("figures/figure_14_shellfish_closure_stress_timeseries.png")
The strongest result from the SST section is that 2019 was not clearly anomalous in basin-wide April–August mean SST or SST variability. However, the later station-year analysis gives a stronger pattern, where mean water-quality stress rises through time, and the closure-level summaries show the same broad increase. Earlier years, especially 2014–2015 and 2018, sit lower in the stress distribution, while 2021–2024 occupy the upper part of the index, with 2023–2024 especially high.
Although static nitrogen pressure and bathymetric setting really only separate stations spatially, year-to-year movement is carried more by water-quality and thermal covariates. Chlorophyll-a, dissolved oxygen, seasonal SST, and low SST variability do not all peak in the same year, yet their combined score shifts upward after 2019. This lines up with the failure of the fishery to recover: the collapse year itself is not uniquely explained by the broad MUR SST summaries, but the post-collapse period remains environmentally unfavorable in the combined station-year stack.
As per the closure plots, once station stress is summarized to the nearest shellfish-closure polygons, across the station-year output the main result is an increase in combined stress after the 2019 collapse year. The annual stress-index table moves from mostly negative values in 2014–2018 to mostly positive values after 2020, with the highest mean and upper-quartile values appearing in 2023–2024.
The trend tests also help to quantify this pattern visually, wherein the mean, median, and upper-quartile station-year stress summaries are all positively correlated with year. The closure summary shows the same directional structure when station stress is transferred to the nearest shellfish-closure polygons.
In practical terms, the recent Peconic record does not look like a short 2019-only anomaly; in fact, it seems more like a persistent worsening of the covariate stack after the collapse… (no wonder the scallops haven’t recovered yet.)
This opens up the possibility of the mas dying events being caused by the possible intrusion of a novel parasitoid (such as hte coccidian parasite discussed in the PEC report [4], although there still remains a dearth of evidence regarding any biotic explanation.
[1] National Oceanic and Atmospheric Administration, National
Geodetic Survey. 2023 NOAA NGS Topobathy Lidar DEM for Long Island
Sound, New York, product 10274. NOAA Digital Coast. Accessed 2026. Used
for high-resolution measured topobathymetry where coverage exists.
https://www.fisheries.noaa.gov/inport/item/74870
[2] National Centers for Environmental Information. Continuously
Updated Digital Elevation Model (CUDEM): Ninth arc-second resolution
bathymetric-topographic tiles. NOAA NCEI. Accessed 2026. Used as broader
bathymetric base coverage where the 2023 topobathymetric lidar product
did not return bottom elevations.
https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.dem:199919
[3] Peconic Bay Scallop Technical Review Committee. 2020.
Preliminary Findings and Recommendations of the Peconic Bay Scallop
Technical Review Committee. Peconic Estuary Partnership. Used for
the 2019 adult bay scallop die-off context, landings figure, and
discussion of temperature, dissolved oxygen, disease, and monitoring
needs.
https://www.peconicestuary.org/wp-content/uploads/2020/03/Preliminary-Findings-and-Recommendations-of-the-Peconic-Bay-Scallop-Technical-Review-Committee-3-5-20-2.pdf
[4] New York State Department of Environmental Conservation. 2025.
Bay Scallop Species Status Assessment. NYSDEC State Wildlife
Action Plan materials. Used for species-status context and linked
stressors including high summer water temperature, low dissolved oxygen,
and coccidian parasite infection.
https://extapps.dec.ny.gov/fs/programs/dfw/SWAP2025/Marine%20Invertebrates/Bay%20scallop.pdf
[5] Lloyd, S. 2014. Nitrogen Load Modeling to Forty-Three
Subwatersheds of the Peconic Estuary. The Nature Conservancy, in
partnership with the Peconic Estuary Program. Used as the source for the
Peconic Nitrogen Loading Model subwatershed nitrogen-yield layer.
https://www.peconicestuary.org/wp-content/uploads/2017/06/Nitrogenloadmodelingtoforty-thr.pdf
[6] Suffolk County Department of Health Services. 2025. Peconics SCDHS Water Quality Data, 1976–2024, draft updated 2025-04-10 and Station Locations, updated 2024-10-28. Local project files supplied through the SCDHS/Peconic data package. Used for station locations and station-level dissolved oxygen, chlorophyll-a, Secchi depth, temperature, salinity, and nutrient summaries.
[7] Jet Propulsion Laboratory MUR MEaSUREs Project. Multi-scale
Ultra-high Resolution Sea Surface Temperature Analysis, GHRSST Level 4,
version 4.1. Accessed through NOAA CoastWatch ERDDAP. Used for
April–August daily SST subsets and year-level thermal covariates.
https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.html
[8] New York State Department of Environmental Conservation.
Shellfishing and public shellfish mapper resources. NYSDEC. Accessed
2026. Used as management geography for approximate closure-level stress
summaries.
https://dec.ny.gov/things-to-do/shellfishing
[9] Li, C., and O’Donnell, J. 1997. Tidally driven residual circulation in shallow estuaries with lateral depth variation. Journal of Geophysical Research: Oceans 102(C13):27915–27929. doi:10.1029/97JC02330.
[10] Cornell Cooperative Extension Suffolk County. Peconic bay
scallop restoration and monitoring program summaries. Accessed 2026.
Used for recent restoration and die-off context.
https://ccesuffolk.org/