knitr::opts_chunk$set(
  project_root <- normalizePath("~/Hunter/S26/GTECH-385/final-project", mustWork = TRUE),
  knitr::opts_knit$set(root.dir = project_root),
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.width = 9,
  fig.height = 5.5,
  dpi = 300
)

library(terra)
library(sf)
library(dplyr)
library(tidyr)
library(ggplot2)
library(readr)
library(readxl)
library(janitor)
library(stringr)

# Pulling from TIGER to provide shoreline for context in figures 
has_tigris <- requireNamespace("tigris", quietly = TRUE)
if (has_tigris) {
  options(tigris_use_cache = TRUE)
}
dir.create("figures", showWarnings = FALSE)
dir.create("outputs", showWarnings = FALSE)
dir.create("outputs/rasters", recursive = TRUE, showWarnings = FALSE)
dir.create("outputs/rasters/sst", recursive = TRUE, showWarnings = FALSE)
dir.create("outputs/tables", recursive = TRUE, showWarnings = FALSE)

1 Framing and data sources

The sudden recent collapse of the Peconic bay scallop industry has been the immediate inspiration behind this project. Despite several bay restoration efforts and seemingly healthy juvenile recruitment, over summer 2019 NYSDEC reported catastrophic adult bay scallop mortality, exceeding 90% mortality across the overall Peconic region, with some closures nearing a complete decimation of stock [3]. The Peconic Estuary Partnership convened a pretty intensive technical review committee after the 2019 adult die-off to assess possible drivers, and NYSDEC’s species assessment links the collapse to interacting stressors including high summer water temperatures, low dissolved oxygen, and a coccidian parasite — although the actual cause of the crash is still not really well understood [4].

Peconic Bay scallop landings context from the Peconic Bay report [3].

In an attempt to demystify this anomaly, I will be looking across a ~10 year timescale. This squarely centers the 2019 event while still allowing enough wiggle room for some greater ecological context. I’m using a 2014–2025 snapshot as the study window, which allows for a baseline prior to the crash, the actual scallop collapse interval between 2019–2022, and finally the current window from ~2024–2025. The end goal is to create a station-year covariate table that checks whether collapse-period years and nearby shellfish-management areas also carry stacked stress signatures: high modeled nitrogen pressure, high chlorophyll-a, low dissolved oxygen, shallow/low-slope bathymetry, and warmer or less variable SST. Unfortunately, despite bay scallop closures being readily available as a spatial dataset, per-closure landings are absent; as a result, I can only give an approximation of risk rather than a definite cause.

The main R libraries are kept close to the course workflow. terra handles raster reading, reprojection, masking, slope creation, and raster extraction; sf handles station points, subwatersheds, closure polygons, and distance operations; dplyr, tidyr, readr, readxl, janitor, and stringr handle table cleanup; ggplot2 creates the static figures. tigris is optional and only adds shoreline/water-context geometry to maps.

The working datasets are summarized below before moving into the actual raster/vector processing.

Dataset	Source	Role in workflow
NOAA 2023 NGS Long Island Sound Topobathymetric DEM, product 10274	NOAA coastal lidar DEM repository	High-resolution nearshore/topobathymetric depth where measured coverage exists [1].
NOAA/NCEI Gardiners Bay M030 30 m Bathymetric DEM	NOAA/NCEI regional bathymetric DEM	Broader Peconic/Gardiners bathymetric base where the 2023 topobathy has no measured cells [2].
Peconic Estuary Program boundary	Suffolk County / Peconic Estuary Program boundary layer	Initial clipping frame for the study area. It includes land, so water cells are later isolated from DEM elevation values below 0 m.
TIGER/Line Area Water polygons	U.S. Census TIGER/Line	Optional shoreline/water-context geometry for maps only.
Peconic nitrogen loading model	Stephen Lloyd’s 2014 NLM application for the Peconic Estuary, exposed through the NLM ArcGIS FeatureServer	Static land-based nitrogen-pressure layer for 43 modeled Peconic subwatersheds. Numeric fields include atmospheric deposition, wastewater treatment facilities, septic systems, agriculture, lawns, parks/golf, total load, and kg N per year per hectare [5].
SCDHS Peconic water-quality workbook	Suffolk County Department of Health Services	Station observations for dissolved oxygen, chlorophyll-a, Secchi depth, temperature, salinity, and nutrient species where available [6].
NOAA CoastWatch MUR SST	NOAA CoastWatch ERDDAP daily MUR SST	April–August daily surface-temperature fields from 2014–2025, later summarized into yearly and monthly SST covariates [7].
NYSDEC shellfish closure polygons	NYSDEC shellfish closure data	Management/context polygons for summarizing environmental stress near shellfish-relevant areas [8].

The bathymetry layer combines the two finished NOAA DEM products. The 2023 topobathymetric product was downloaded from the NOAA coastal lidar DEM repository as tiled GeoTIFFs through the product VRT. It supplies the finer nearshore detail where coverage exists [1]. The Gardiners Bay M030 DEM supplies broader bathymetric coverage for the Peconic/Gardiners system [2]. In R, both products are converted to positive water depth by retaining only cells below 0 m. The final depth layer keeps the 2023 topobathymetry wherever measured coverage exists, then fills remaining bathymetric cells from M030.

The nitrogen layer comes from Stephen Lloyd’s 2014 nitrogen loading model for the Peconic Estuary [5]. The model divides the estuary watershed into 43 subwatersheds and estimates nitrogen loading by source category, including atmospheric deposition, wastewater treatment facilities, septic systems, agriculture, lawns, parks, and golf courses. In this workflow, each water-quality station receives the modeled nitrogen yield of the nearest subwatershed. That assignment is used as a coarse shoreline-proximity pressure variable and kept alongside the station-to-subwatershed distance.

After the bathymetry and nitrogen setup, the workflow shifts to station-year summaries. SCDHS station observations are summarized over the warm season, then joined to fixed spatial covariates: extracted depth, extracted slope, bathymetric class, nearest modeled nitrogen yield, and distance to the nearest nitrogen-loading subwatershed. MUR SST is summarized separately by year and month to create April–August thermal exposure and variability metrics. These derived tables are then used for the final covariate comparisons and stress-index summaries.

Note that several preparation steps are done outside R for speed and file-size reasons (to prime datasets for actual ingestion and evaluation using the actual software). GDAL handles DEM clipping, raster conversion, and VRT-based DEM ingestion; curl and wget handle ERDDAP, FeatureServer, and NetCDF downloads. The relevant terminal commands are documented in sidecar repository files, especially the README, so that this R Markdown file can focus on the derived layers and analysis rather than re-downloading every source file during knitting.

I will only be providing one example of the intial GDAL workflow (External DEM preparation with GDAL). For a full step-by-step walk through of this process refer to the accompanying git repository for direct replication of methodology: https://github.com/gamfeld/GTECH385-peconic-estuary-health-analysis

2 Bathymetric indexing

2.1 Load the DEM products and boundary

paths <- list(
  dem_2023 = "data/dem/peconic_dem_10m_boundary_10274.tif",
  dem_m030 = "data/dem/gardiners_bay_m030_30m.nc",
  pep_boundary = "data/Peconic_Estuary_Program_Boundary.geojson"
)

dem23 <- rast(paths$dem_2023)
m030_raw <- rast(paths$dem_m030)
pep_boundary <- st_read(paths$pep_boundary, quiet = TRUE)

# Project the PEP boundary to the 2023 DEM CRS for masking and plotting.
pep_6347 <- st_transform(pep_boundary, crs(dem23))
pep_v <- vect(pep_6347)

dem23

## class       : SpatRaster 
## size        : 3807, 8603, 1  (nrow, ncol, nlyr)
## resolution  : 10, 10  (x, y)
## extent      : 678110, 764140, 4525640, 4563710  (xmin, xmax, ymin, ymax)
## coord. ref. : NAD83(2011) / UTM zone 18N (EPSG:6347) 
## source      : peconic_dem_10m_boundary_10274.tif 
## name        : peconic_dem_10m_boundary_10274 
## min value   :                      -13.31304 
## max value   :                       55.06969

m030_raw

## class       : SpatRaster 
## size        : 1010, 2064, 1  (nrow, ncol, nlyr)
## resolution  : 30, 30  (x, y)
## extent      : 698199.6, 760119.6, 4530077, 4560377  (xmin, xmax, ymin, ymax)
## coord. ref. : NAD83 / UTM zone 18N (EPSG:26918) 
## source      : gardiners_bay_m030_30m.nc 
## varname     : Band1 (GDAL Band Number 1) 
## name        : Band1

2.2 Optional shoreline / water-context layer

# TIGER area-water polygons are only used for visual context.
shoreline_ctx <- NULL

if (has_tigris) {
  shoreline_ctx <- tryCatch({
    suffolk_water <- tigris::area_water(state = "NY", county = "Suffolk", class = "sf")
    nassau_water  <- tigris::area_water(state = "NY", county = "Nassau",  class = "sf")

    bind_rows(suffolk_water, nassau_water) |>
      st_transform(st_crs(pep_6347)) |>
      st_crop(st_bbox(pep_6347))
  }, error = function(e) NULL)
}

if (is.null(shoreline_ctx)) {
  message("No TIGER area-water shoreline context loaded; using PEP boundary only.")
}

2.3 Inspect the raw DEM ranges

dem_ranges <- tibble(
  raster = c("2023 topobathymetric DEM", "M030 Gardiners Bay DEM"),
  min_elevation_m = c(
    as.numeric(global(dem23, min, na.rm = TRUE)[1, 1]),
    as.numeric(global(m030_raw, min, na.rm = TRUE)[1, 1])
  ),
  max_elevation_m = c(
    as.numeric(global(dem23, max, na.rm = TRUE)[1, 1]),
    as.numeric(global(m030_raw, max, na.rm = TRUE)[1, 1])
  )
)

knitr::kable(dem_ranges, digits = 2)

raster	min_elevation_m	max_elevation_m
2023 topobathymetric DEM	-13.31	55.07
M030 Gardiners Bay DEM	-29.60	0.87

2.4 Create the combined depth layer

# Mask the 2023 product to the PEP boundary. It is already clipped, but this keeps the step explicit.
dem23_clip <- mask(crop(dem23, pep_v), pep_v)

# Project M030 onto the 2023 10 m working grid, then mask to the same PEP boundary.
m030_10m <- project(m030_raw, dem23_clip, method = "bilinear")
m030_clip <- mask(crop(m030_10m, pep_v), pep_v)

# Convert measured below-zero elevations to positive water depth.
# Cells >= 0 are land/upland or above vertical datum and are removed from bathymetric analysis.
depth23 <- abs(ifel(dem23_clip < 0, dem23_clip, NA))
depth_m030 <- abs(ifel(m030_clip < 0, m030_clip, NA))

names(depth23) <- "depth_2023_m"
names(depth_m030) <- "depth_m030_m"

# Keep 2023 topobathy wherever it exists; fill remaining bathymetric cells from M030.
depth_final <- cover(depth23, depth_m030)
names(depth_final) <- "depth_m"

# Track which DEM supplied each cell.
depth_source <- ifel(!is.na(depth23), 1, ifel(!is.na(depth_m030), 2, NA))
names(depth_source) <- "depth_source"

writeRaster(depth23, "outputs/rasters/depth_2023_topobathy_measured_10m.tif", overwrite = TRUE)
writeRaster(depth_m030, "outputs/rasters/depth_m030_gardiners_resampled_10m.tif", overwrite = TRUE)
writeRaster(depth_final, "outputs/rasters/peconic_depth_10m_2023_with_m030_base.tif", overwrite = TRUE)
writeRaster(depth_source, "outputs/rasters/peconic_depth_source_2023_vs_m030.tif", overwrite = TRUE)

2.5 Depth statistics

depth_stats <- tibble(
  raster = c("2023 measured topobathy", "M030 Gardiners Bay base", "Combined depth layer"),
  min_depth_m = c(
    as.numeric(global(depth23, min, na.rm = TRUE)[1, 1]),
    as.numeric(global(depth_m030, min, na.rm = TRUE)[1, 1]),
    as.numeric(global(depth_final, min, na.rm = TRUE)[1, 1])
  ),
  max_depth_m = c(
    as.numeric(global(depth23, max, na.rm = TRUE)[1, 1]),
    as.numeric(global(depth_m030, max, na.rm = TRUE)[1, 1]),
    as.numeric(global(depth_final, max, na.rm = TRUE)[1, 1])
  ),
  mean_depth_m = c(
    as.numeric(global(depth23, mean, na.rm = TRUE)[1, 1]),
    as.numeric(global(depth_m030, mean, na.rm = TRUE)[1, 1]),
    as.numeric(global(depth_final, mean, na.rm = TRUE)[1, 1])
  ),
  valid_cells = c(
    as.numeric(global(!is.na(depth23), sum, na.rm = TRUE)[1, 1]),
    as.numeric(global(!is.na(depth_m030), sum, na.rm = TRUE)[1, 1]),
    as.numeric(global(!is.na(depth_final), sum, na.rm = TRUE)[1, 1])
  )
)

write_csv(depth_stats, "outputs/tables/depth_stats_2023_m030_combined.csv")
knitr::kable(depth_stats, digits = 2)

raster	max_depth_m	mean_depth_m	valid_cells
2023 measured topobathy	13.31	4.48	3268954
M030 Gardiners Bay base	29.54	7.31	4838533
Combined depth layer	29.54	7.17	5466411

source_key <- tibble(
  depth_source = c(1, 2),
  source_label = c("2023 NOAA topobathy", "M030 Gardiners Bay 30 m bathymetry")
)
write_csv(source_key, "outputs/tables/depth_source_key_2023_m030.csv")
knitr::kable(source_key)

depth_source	source_label
1	2023 NOAA topobathy
2	M030 Gardiners Bay 30 m bathymetry

3 Depth and source visual checks

sample_raster_df <- function(r, value_name, size = 150000, method = "regular") {
  df <- terra::spatSample(
    r,
    size = size,
    method = method,
    as.df = TRUE,
    xy = TRUE,
    na.rm = TRUE
  ) |>
    as_tibble(.name_repair = "minimal")

  df <- df[, 1:3]
  names(df) <- c("x", "y", value_name)
  df
}

mode_numeric <- function(x, ...) {
  x <- x[!is.na(x)]
  if (length(x) == 0) return(NA_real_)
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

make_source_outline <- function(source_raster, fact = 20) {
  src_lowres <- terra::aggregate(source_raster, fact = fact, fun = mode_numeric, na.rm = TRUE)
  src_poly <- as.polygons(src_lowres, dissolve = TRUE, na.rm = TRUE)
  st_as_sf(src_poly) |>
    mutate(
      depth_source = factor(
        depth_source,
        levels = c(1, 2),
        labels = c("2023 topobathy", "M030 Gardiners Bay")
      )
    ) |>
    filter(!is.na(depth_source))
}

3.1 Combined depth map

depth_vals <- values(depth_final, na.rm = TRUE)
depth_98 <- as.numeric(quantile(depth_vals, 0.98, na.rm = TRUE))
depth_plot <- clamp(depth_final, 0, depth_98, values = TRUE)

depth_df <- sample_raster_df(depth_plot, "depth_m", size = 180000)
source_outline <- make_source_outline(depth_source, fact = 25)

p_depth <- ggplot() +
  geom_raster(data = depth_df, aes(x = x, y = y, fill = depth_m)) +
  {if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.18, inherit.aes = FALSE)} +
  geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35, inherit.aes = FALSE) +
  geom_sf(data = source_outline, aes(color = depth_source), fill = NA, linewidth = 0.45, inherit.aes = FALSE) +
  coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
  scale_fill_gradient(low = "#d9f0ff", high = "#08306b", name = "Depth (m)", limits = c(0, depth_98)) +
  scale_color_manual(
    values = c("2023 topobathy" = "steelblue4", "M030 Gardiners Bay" = "orange3"),
    name = "Depth source outline"
  ) +
  labs(
    title = "Figure 1. Combined Peconic bathymetry layer",
    subtitle = paste0("2023 topobathy retained where available; M030 fills broader bathymetry. Ramp capped at 98th percentile = ", round(depth_98, 2), " m."),
    x = NULL,
    y = NULL,
    caption = "Blue/orange outlines mark the approximate source footprint used in the splice. TIGER area-water boundaries, when available, are drawn only as visual shoreline context."
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

print(p_depth)

ggsave("figures/figure_01_combined_peconic_bathymetry.png", p_depth, width = 9, height = 5.5, dpi = 300)

3.2 Depth distribution

depth_hist <- tibble(depth_m = depth_vals)

p_depth_hist <- ggplot(depth_hist, aes(x = depth_m)) +
  geom_histogram(bins = 70, fill = "steelblue4", color = "white") +
  labs(
    title = "Figure 2. Combined depth distribution",
    subtitle = "Distribution after retaining 2023 topobathymetry and filling remaining bathymetry from M030",
    x = "Depth (m)",
    y = "Raster cell count"
  ) +
  theme_minimal()

print(p_depth_hist)

ggsave("figures/figure_02_combined_depth_distribution.png", p_depth_hist, width = 7.5, height = 4.5, dpi = 300)

4 Slope and bathymetric proxy classes

The final depth layer is used to derive slope, then slope is split into relative classes. I use relative categories here because the aim is to separate broad bathymetric regimes inside this estuary, rather than impose a universal slope threshold. Low-slope cells approximate flats and broad shallow shelves. High-slope cells identify sharper gradients, channel edges, and places where bathymetry may be more relevant to exchange or mixing proxies. These classes will later become the fixed spatial bins for chlorophyll-a, SST-change, dissolved oxygen, nitrogen, or eelgrass comparisons.

4.1 Calculate bathymetric slope

slope <- terrain(depth_final, v = "slope", unit = "degrees")
names(slope) <- "slope_deg"

writeRaster(slope, "outputs/rasters/peconic_slope_deg_10m.tif", overwrite = TRUE)

slope_vals <- values(slope, na.rm = TRUE)
slope_99 <- as.numeric(quantile(slope_vals, 0.99, na.rm = TRUE))
slope_plot <- clamp(slope, 0, slope_99, values = TRUE)
slope_df <- sample_raster_df(slope_plot, "slope_deg", size = 180000)

p_slope <- ggplot() +
  geom_raster(data = slope_df, aes(x = x, y = y, fill = slope_deg)) +
  {if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.18, inherit.aes = FALSE)} +
  geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35, inherit.aes = FALSE) +
  coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
  scale_fill_gradient(low = "#f7fbff", high = "#08519c", name = "Slope (°)", limits = c(0, slope_99)) +
  labs(
    title = "Figure 3. Bathymetric slope from combined depth layer",
    subtitle = paste0("Slope ramp capped at the 99th percentile = ", round(slope_99, 2), "° for display."),
    x = NULL,
    y = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

print(p_slope)

ggsave("figures/figure_03_bathymetric_slope.png", p_slope, width = 9, height = 5.5, dpi = 300)

4.2 Define depth and slope regimes

# Coarse depth classes used as physical proxy bins.
depth_class <- classify(
  depth_final,
  rcl = matrix(
    c(
      0, 3, 1,
      3, 8, 2,
      8, Inf, 3
    ),
    ncol = 3,
    byrow = TRUE
  )
)
names(depth_class) <- "depth_class"

depth_key <- tibble(
  depth_class = c(1, 2, 3),
  depth_label = c("shallow_0_3m", "mid_3_8m", "deep_gt_8m")
)

# Relative slope classes: terciles of the observed slope distribution.
slope_q <- quantile(slope_vals, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE)

slope_class <- classify(
  slope,
  rcl = matrix(
    c(
      slope_q[1], slope_q[2], 1,
      slope_q[2], slope_q[3], 2,
      slope_q[3], slope_q[4], 3
    ),
    ncol = 3,
    byrow = TRUE
  )
)
names(slope_class) <- "slope_class"

slope_key <- tibble(
  slope_class = c(1, 2, 3),
  slope_label = c("low_slope", "moderate_slope", "high_slope"),
  lower_deg = as.numeric(slope_q[1:3]),
  upper_deg = as.numeric(slope_q[2:4])
)

# Cross depth class with slope class to make combined bathymetric regimes.
bathy_regime <- depth_class * 10 + slope_class
names(bathy_regime) <- "bathy_regime"

bathy_key <- expand_grid(
  depth_class = c(1, 2, 3),
  slope_class = c(1, 2, 3)
) |>
  left_join(depth_key, by = "depth_class") |>
  left_join(slope_key |> select(slope_class, slope_label), by = "slope_class") |>
  mutate(
    bathy_regime = depth_class * 10 + slope_class,
    bathy_label = paste(depth_label, slope_label, sep = " / ")
  )

writeRaster(depth_class, "outputs/rasters/peconic_depth_class_10m.tif", overwrite = TRUE)
writeRaster(slope_class, "outputs/rasters/peconic_slope_class_10m.tif", overwrite = TRUE)
writeRaster(bathy_regime, "outputs/rasters/peconic_bathymetric_regime_10m.tif", overwrite = TRUE)
write_csv(depth_key, "outputs/tables/depth_class_key.csv")
write_csv(slope_key, "outputs/tables/slope_class_key.csv")
write_csv(bathy_key, "outputs/tables/bathymetric_regime_key.csv")

knitr::kable(slope_key, digits = 4)

slope_class	slope_label	lower_deg	upper_deg
1	low_slope	0.0000	0.1691
2	moderate_slope	0.1691	0.4715
3	high_slope	0.4715	41.8144

knitr::kable(bathy_key)

depth_class	slope_class	depth_label	slope_label	bathy_regime	bathy_label
1	1	shallow_0_3m	low_slope	11	shallow_0_3m / low_slope
1	2	shallow_0_3m	moderate_slope	12	shallow_0_3m / moderate_slope
1	3	shallow_0_3m	high_slope	13	shallow_0_3m / high_slope
2	1	mid_3_8m	low_slope	21	mid_3_8m / low_slope
2	2	mid_3_8m	moderate_slope	22	mid_3_8m / moderate_slope
2	3	mid_3_8m	high_slope	23	mid_3_8m / high_slope
3	1	deep_gt_8m	low_slope	31	deep_gt_8m / low_slope
3	2	deep_gt_8m	moderate_slope	32	deep_gt_8m / moderate_slope
3	3	deep_gt_8m	high_slope	33	deep_gt_8m / high_slope

4.3 Bathymetric regime map

bathy_df <- sample_raster_df(bathy_regime, "bathy_regime", size = 180000) |>
  mutate(bathy_regime = as.numeric(bathy_regime)) |>
  left_join(bathy_key, by = "bathy_regime")

regime_cols <- c(
  "#deebf7", "#c6dbef", "#9ecae1",
  "#6baed6", "#4292c6", "#2171b5",
  "#08519c", "#08306b", "#041f4a"
)
names(regime_cols) <- unique(bathy_key$bathy_label)

p_regime <- ggplot() +
  geom_raster(data = bathy_df, aes(x = x, y = y, fill = bathy_label)) +
  {if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.18, inherit.aes = FALSE)} +
  geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35, inherit.aes = FALSE) +
  coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
  scale_fill_manual(values = regime_cols, name = "Bathymetric regime") +
  labs(
    title = "Figure 4. Bathymetric proxy regimes",
    subtitle = "Depth classes crossed with relative slope classes; used later as fixed spatial bins for water-quality proxies.",
    x = NULL,
    y = NULL
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.text = element_text(size = 7)
  )

print(p_regime)

ggsave("figures/figure_04_bathymetric_proxy_regimes.png", p_regime, width = 9, height = 5.5, dpi = 300)

5 Nitrogen loading and station covariates

The next spatial layer is the Peconic Estuary nitrogen loading model. I treat it as a static watershed-pressure layer over the 2014–2024 analysis window. Land use, wastewater inputs, and fertilizer sources can change through time. Here, the layer provides a subwatershed-scale gradient of modeled land-based nitrogen pressure that can be attached to monitoring stations.

The Peconic nitrogen loading model contains 43 subwatersheds and reports source categories including atmospheric deposition, wastewater treatment facilities, septic systems, agriculture, lawns, and parks/golf courses. The key field is kg_N_per_year_per_Ha, a yield normalized by subwatershed area. Total is retained as the absolute modeled load.

Most SCDHS stations sit in open water, while the NLM layer is mapped as terrestrial and shoreline-adjacent drainage polygons. Each station therefore inherits the nitrogen yield of the nearest modeled subwatershed, and the station-to-subwatershed distance is retained. This makes nitrogen loading a discrete shoreline-proximity pressure covariate, with some error expected around embayment geometry and tidal exchange. For visual context, I also extend the subwatershed influence zones across the estuary with a nearest-polygon/Voronoi-style partition clipped to the Peconic boundary.

The station layer becomes the limiting geometry from here forward. Each SCDHS station is assigned fixed spatial covariates: bathymetric depth, slope/regime, nearest nitrogen yield, and nearest-subwatershed distance. Later DO, chlorophyll-a, Secchi, temperature, salinity, and nutrient observations can be summarized by station-year and joined to this table.

5.1 Nitrogen loading data download

The ArcGIS web-map metadata exposes the usable feature layer endpoint below. The shell chunk is kept for reproducibility, but the R workflow uses the local GeoPackage if it exists.

mkdir -p data/nitrogen

curl -L -G \
  "https://services.arcgis.com/JsDD4qdG5r2a7hR5/arcgis/rest/services/NLM/FeatureServer/0/query" \
  --data-urlencode "where=1=1" \
  --data-urlencode "outFields=*" \
  --data-urlencode "returnGeometry=true" \
  --data-urlencode "f=geojson" \
  -o data/nitrogen/peconic_nlm_subwatersheds.geojson

ogrinfo -so data/nitrogen/peconic_nlm_subwatersheds.geojson

ogr2ogr \
  -f GPKG \
  data/nitrogen/peconic_nlm_subwatersheds.gpkg \
  data/nitrogen/peconic_nlm_subwatersheds.geojson

5.2 Load and map nitrogen yield

nlm_path <- "data/nitrogen/peconic_nlm_subwatersheds.gpkg"
nlm_geojson_path <- "data/nitrogen/peconic_nlm_subwatersheds.geojson"
nlm_url <- paste0(
  "https://services.arcgis.com/JsDD4qdG5r2a7hR5/arcgis/rest/services/",
  "NLM/FeatureServer/0/query?",
  "where=1%3D1&outFields=*&returnGeometry=true&f=geojson"
)

if (file.exists(nlm_path)) {
  nlm <- st_read(nlm_path, quiet = TRUE)
} else if (file.exists(nlm_geojson_path)) {
  nlm <- st_read(nlm_geojson_path, quiet = TRUE)
} else {
  nlm <- st_read(nlm_url, quiet = TRUE)
}

nlm <- nlm |>
  st_make_valid() |>
  st_transform(st_crs(pep_6347))

# Save a stable local copy for later runs.
st_write(nlm, "data/nitrogen/peconic_nlm_subwatersheds.gpkg", delete_dsn = TRUE, quiet = TRUE)

# Create a visual-only nearest-subwatershed influence surface across the Peconic boundary.
# This shows which terrestrial NLM polygon is closest to each part of the estuary frame.
nlm_pts <- st_point_on_surface(nlm)
nlm_voronoi_geom <- st_voronoi(st_union(st_geometry(nlm_pts)), envelope = st_as_sfc(st_bbox(pep_6347))) |>
  st_collection_extract("POLYGON")
nlm_voronoi <- st_as_sf(nlm_voronoi_geom) |>
  st_set_crs(st_crs(pep_6347)) |>
  st_make_valid() |>
  st_intersection(pep_6347)

nlm_voronoi_idx <- st_nearest_feature(st_centroid(nlm_voronoi), nlm_pts)
nlm_voronoi <- bind_cols(
  nlm_voronoi,
  nlm |>
    st_drop_geometry() |>
    select(subwatershed_code, kg_N_per_year_per_Ha) |>
    slice(nlm_voronoi_idx)
)

nlm_summary <- nlm |>
  st_drop_geometry() |>
  select(
    subwatershed_code,
    atmospheric,
    wwtf,
    septic,
    agriculture,
    lawns,
    parks_golf,
    Total,
    kg_N_per_year_per_Ha,
    PercentofTotal
  ) |>
  arrange(desc(kg_N_per_year_per_Ha))

write_csv(nlm_summary, "outputs/tables/nitrogen_loading_subwatershed_summary.csv")
knitr::kable(head(nlm_summary, 12), digits = 2)

subwatershed_code	atmospheric	wwtf	septic	agriculture	lawns	parks_golf	Total	kg_N_per_year_per_Ha	PercentofTotal
NF0	2191.16	0.00	5778.99	10199.53	620.17	841.71	19631.57	13.47	7.35
SF8	186.41	0.00	1138.94	0.00	131.21	0.00	1456.56	12.08	0.55
NF15	1669.06	0.00	4409.92	4911.54	470.05	736.34	12196.92	10.83	4.57
NF14	1523.93	0.00	4345.45	4566.14	460.60	2.87	10898.98	10.80	4.08
NF6	278.92	0.00	957.58	623.64	117.41	0.00	1977.56	10.51	0.74
NF12	96.46	0.00	526.22	0.00	51.62	0.00	674.29	10.49	0.25
NF8	1703.26	0.00	6009.13	2292.92	652.48	0.00	10657.79	9.75	3.99
NF13	212.35	0.00	1060.87	0.00	109.53	0.00	1382.75	9.42	0.52
NF11	1075.23	0.00	1981.35	3443.52	210.01	0.00	6710.10	9.29	2.51
FB1	9396.97	15118.66	11746.76	12617.40	1670.21	2362.59	52912.59	8.75	19.81
NF9	402.09	0.00	1807.75	0.00	199.37	0.00	2409.22	8.72	0.90
NF5	843.54	0.00	336.89	4056.35	49.65	0.00	5286.43	8.09	1.98

p_nlm <- ggplot() +
  geom_sf(data = nlm_voronoi, aes(fill = kg_N_per_year_per_Ha), color = NA, alpha = 0.22) +
  geom_sf(data = nlm, aes(fill = kg_N_per_year_per_Ha), color = "white", linewidth = 0.2) +
  {if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.15, inherit.aes = FALSE)} +
  geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35) +
  coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
  scale_fill_gradient(low = "#e5f5e0", high = "#006d2c", name = "kg N / yr / ha") +
  labs(
    title = "Figure 5. Peconic nitrogen yield by subwatershed",
    subtitle = "Solid polygons are NLM subwatersheds; transparent fill extends nearest-subwatershed influence across the estuary frame",
    x = NULL,
    y = NULL,
    caption = "The transparent bay-area partition shows the nearest NLM polygon across the estuary frame."
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

print(p_nlm)

ggsave("figures/figure_05_nitrogen_yield_subwatersheds.png", p_nlm, width = 9, height = 5.5, dpi = 300)

5.3 Station locations as the next limiting geometry

stations_path <- "data/water_quality/scdhs_peconic/Station locations - updated 102824.csv"

stations_raw <- read_csv(
  stations_path,
  show_col_types = FALSE,
  locale = locale(encoding = "Windows-1252")
) |>
  clean_names()

stations_sf <- stations_raw |>
  filter(!is.na(latitude), !is.na(longitude)) |>
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove = FALSE) |>
  st_transform(st_crs(pep_6347))

# Keep stations inside the Peconic boundary. This avoids carrying unrelated county stations forward.
peconic_stations <- stations_sf[st_intersects(stations_sf, pep_6347, sparse = FALSE)[, 1], ]

nrow(peconic_stations)

## [1] 334

5.4 Add bathymetry and nearest-subwatershed nitrogen covariates to stations

# Bathymetric covariates come from the fixed raster products created above.
station_bathy <- terra::extract(
  c(depth_final, slope, depth_class, slope_class, bathy_regime),
  vect(peconic_stations)
) |>
  as_tibble() |>
  select(-ID)

peconic_station_covariates <- bind_cols(
  peconic_stations,
  station_bathy
) |>
  left_join(depth_key, by = "depth_class") |>
  left_join(slope_key |> select(slope_class, slope_label), by = "slope_class") |>
  left_join(bathy_key |> select(bathy_regime, bathy_label), by = "bathy_regime")

# NLM polygons are terrestrial/subwatershed features.
# Marine station points inherit the nearest NLM polygon as a shoreline-pressure covariate.
nlm_for_join <- nlm |>
  select(
    subwatershed_code,
    subwatershed,
    atmospheric,
    wwtf,
    septic,
    agriculture,
    lawns,
    parks_golf,
    Total,
    kg_N_per_year_per_Ha,
    PercentofTotal
  )

nearest_nlm_idx <- st_nearest_feature(peconic_station_covariates, nlm_for_join)
nearest_nlm_attrs <- nlm_for_join[nearest_nlm_idx, ] |> st_drop_geometry()
nearest_nlm_distance_m <- as.numeric(
  st_distance(
    peconic_station_covariates,
    nlm_for_join[nearest_nlm_idx, ],
    by_element = TRUE
  )
)

station_nlm <- bind_cols(
  peconic_station_covariates,
  nearest_nlm_attrs,
  nearest_nlm_distance_m = nearest_nlm_distance_m
)

station_nlm_table <- station_nlm |> st_drop_geometry()

write_csv(station_nlm_table, "outputs/tables/peconic_station_bathymetry_nearest_nitrogen_covariates.csv")
st_write(station_nlm, "outputs/peconic_station_bathymetry_nearest_nitrogen_covariates.gpkg", delete_dsn = TRUE, quiet = TRUE)

knitr::kable(
  station_nlm_table |>
    select(any_of(c(
      "record_id", "record_name", "water_body_area", "water_body",
      "depth_m", "slope_deg", "depth_label", "slope_label", "bathy_label",
      "subwatershed_code", "kg_N_per_year_per_Ha", "nearest_nlm_distance_m",
      "Total", "septic", "agriculture", "wwtf"
    ))) |>
    head(12),
  digits = 2
)

record_id	record_name	water_body_area	water_body	depth_m	slope_deg	depth_label	slope_label	bathy_label	subwatershed_code	kg_N_per_year_per_Ha	nearest_nlm_distance_m	Total	septic	agriculture	wwtf
SITE-060069	Peconic Estuary	Peconic Estuary	Great Peconic Bay	2.48	0.97	shallow_0_3m	high_slope	shallow_0_3m / high_slope	RI	1.49	143.67	276.46	19.70	0.00	0.00
SITE-060081	Peconic Estuary	Peconic Estuary	Hog Creek	1.78	0.92	shallow_0_3m	high_slope	shallow_0_3m / high_slope	NF12	10.49	131.26	674.29	526.22	0.00	0.00
SITE-060100	Peconic Estuary	Peconic Estuary	Great Peconic Bay	1.01	5.75	shallow_0_3m	high_slope	shallow_0_3m / high_slope	NF14	10.80	531.41	10898.98	4345.45	4566.14	0.00
SITE-060101	Peconic Estuary	Peconic Estuary	East Creek	1.97	1.62	shallow_0_3m	high_slope	shallow_0_3m / high_slope	NF0	13.47	15.81	19631.57	5778.99	10199.53	0.00
SITE-060102	Peconic Estuary	Peconic Estuary	Cutchogue Harbor	4.09	0.04	mid_3_8m	low_slope	mid_3_8m / low_slope	NF13	9.42	332.81	1382.75	1060.87	0.00	0.00
SITE-060103	Peconic Estuary	Peconic Estuary	East Creek	3.01	2.24	mid_3_8m	high_slope	mid_3_8m / high_slope	NF14	10.80	27.23	10898.98	4345.45	4566.14	0.00
SITE-060104	Peconic Estuary	Peconic Estuary	North Sea Harbor	3.74	4.20	mid_3_8m	high_slope	mid_3_8m / high_slope	SF2	5.40	14.53	15510.50	9004.15	796.36	32.21
SITE-060105	Peconic Estuary	Peconic Estuary	NA	4.40	0.79	mid_3_8m	high_slope	mid_3_8m / high_slope	NF11	9.29	563.67	6710.10	1981.35	3443.52	0.00
SITE-060106	Peconic Estuary	Peconic Estuary	Goose Creek	2.66	5.30	shallow_0_3m	high_slope	shallow_0_3m / high_slope	NF9	8.72	19.51	2409.22	1807.75	0.00	0.00
SITE-060107	Peconic Estuary	Peconic Estuary	Town Creek	3.14	1.94	mid_3_8m	high_slope	mid_3_8m / high_slope	NF8	9.75	70.66	10657.79	6009.13	2292.92	0.00
SITE-060108	Peconic Estuary	Peconic Estuary	Southold Bay	5.70	0.46	mid_3_8m	moderate_slope	mid_3_8m / moderate_slope	NF9	8.72	645.00	2409.22	1807.75	0.00	0.00
SITE-060109	Peconic Estuary	Peconic Estuary	Hashamomuck Pond	1.24	4.90	shallow_0_3m	high_slope	shallow_0_3m / high_slope	NF8	9.75	0.00	10657.79	6009.13	2292.92	0.00

5.5 Station covariate map

This map checks the station assignment step. The polygons show modeled nitrogen yield, while station colors show the local bathymetric depth class extracted from the DEM. The table written above preserves the nearest-subwatershed distance so this proxy can be filtered later if a station is too far from a modeled drainage polygon.

p_station_cov <- ggplot() +
  geom_sf(data = nlm_voronoi, aes(fill = kg_N_per_year_per_Ha), color = NA, alpha = 0.18) +
  geom_sf(data = nlm, aes(fill = kg_N_per_year_per_Ha), color = "white", linewidth = 0.15, alpha = 0.85) +
  {if (!is.null(shoreline_ctx)) geom_sf(data = shoreline_ctx, fill = NA, color = "grey35", linewidth = 0.15, inherit.aes = FALSE)} +
  geom_sf(data = pep_6347, fill = NA, color = "black", linewidth = 0.35) +
  geom_sf(data = station_nlm, aes(color = depth_label), size = 1.6, alpha = 0.85) +
  coord_sf(crs = st_crs(pep_6347), expand = FALSE) +
  scale_fill_gradient(low = "#e5f5e0", high = "#006d2c", name = "kg N / yr / ha") +
  scale_color_manual(
    values = c(
      "shallow_0_3m" = "#d9f0ff",
      "mid_3_8m" = "#3182bd",
      "deep_gt_8m" = "#08306b"
    ),
    name = "Station depth class",
    na.value = "grey30"
  ) +
  labs(
    title = "Figure 6. Station covariates: nitrogen yield and bathymetric class",
    subtitle = "SCDHS stations are assigned DEM depth/slope classes and the nearest NLM subwatershed yield",
    x = NULL,
    y = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

print(p_station_cov)

ggsave("figures/figure_06_station_covariates_nitrogen_depth.png", p_station_cov, width = 9, height = 5.5, dpi = 300)

The station covariate table preserves the spatial assumptions used here: DEM-derived bathymetric class, nearest NLM subwatershed yield, and the distance from each station to that assigned subwatershed. Water-quality observations can then be summarized by station-year and joined to these fixed covariates for the 2014–2024 comparison window.

6 MUR SST summer-temperature diagnostics

MUR SST is added here as a coarse thermal-variability layer rather than as a detailed circulation model. The files were curled from the NOAA CoastWatch ERDDAP MUR daily SST endpoint, clipped at download time to a broader Peconic bounding box, and downloaded daily from April through August for 2014–2025. I use these rasters to create simple year-level temperature covariates: monthly mean SST, cell-level SST standard deviation, and a seasonal surface-temperature variability map. Cells with stronger April–August variability may reflect more rapid heating/cooling, exchange, wind mixing, or overturn-type behavior, while lower variability may indicate more stable surface-water conditions.

For the final analysis, SST is treated as a supporting thermal-context layer rather than direct evidence of upwelling. The main outputs retained from this block are a compact year-level SST covariate table, a year-level SST trend figure, station-level SST extraction, and SST summaries by DEM-derived depth and slope class. I drop the heavier spatial SST map here because it was not doing much additional work for the argument, and because the more useful unit for the later analysis is the monitoring station-year.

6.1 Read MUR SST files

sst_dir <- "data/sst/mur_daily_apr_aug"

dir.create("outputs/tables", recursive = TRUE, showWarnings = FALSE)
dir.create("outputs/rasters/sst", recursive = TRUE, showWarnings = FALSE)
dir.create("figures", recursive = TRUE, showWarnings = FALSE)

sst_files <- list.files(
  sst_dir,
  pattern = "^mur_sst_peconic_\\d{8}\\.nc$",
  full.names = TRUE
)

sst_index <- tibble(file = sst_files) |>
  mutate(
    date = as.Date(str_extract(basename(file), "\\d{8}"), format = "%Y%m%d"),
    year = as.integer(format(date, "%Y")),
    month_num = as.integer(format(date, "%m")),
    month = factor(month.abb[month_num], levels = month.abb[4:8])
  ) |>
  filter(month_num %in% 4:8) |>
  arrange(date)

sst_file_counts <- sst_index |>
  count(year, month) |>
  tidyr::pivot_wider(names_from = month, values_from = n)

knitr::kable(
  sst_file_counts,
  caption = "MUR SST file counts by year and month. Files are daily from April through August."
)

MUR SST file counts by year and month. Files are daily from April through August.
year	Apr	May	Jun	Jul	Aug
2014	30	31	30	31	31
2015	30	31	30	31	31
2016	30	31	30	31	31
2017	30	31	30	31	31
2018	30	31	30	31	31
2019	30	31	30	31	31
2020	30	31	30	31	31
2021	30	31	30	31	31
2022	30	31	30	31	31
2023	30	31	30	31	31
2024	30	31	30	31	31
2025	30	31	30	31	31

read_sst_raster_c <- function(f) {
  r <- rast(f)[[1]]
  rng <- global(r, range, na.rm = TRUE)

  if (rng[1, 2] > 100) {
    r <- r - 273.15
  }

  names(r) <- "sst_c"
  r
}

read_daily_sst_mean <- function(f) {
  as.numeric(global(read_sst_raster_c(f), mean, na.rm = TRUE)[1, 1])
}

6.2 Year-level SST covariates

Each daily raster is first collapsed to a Peconic-box mean temperature. Those daily means are then averaged by year and month. In the same loop, daily rasters are stacked by year to calculate April–August cell-level mean SST, SST standard deviation, and SST range. The year table is the important output here: it lets me compare whether 2019 was unusual in either mean SST or SST variability.

sst_daily <- sst_index |>
  mutate(mean_sst_c = vapply(file, read_daily_sst_mean, numeric(1)))

write_csv(sst_daily, "outputs/tables/mur_daily_mean_sst_peconic_apr_aug.csv")

sst_monthly_mean <- sst_daily |>
  group_by(year, month_num, month) |>
  summarise(
    n_days = n(),
    mean_monthly_sst_c = mean(mean_sst_c, na.rm = TRUE),
    .groups = "drop"
  )

write_csv(sst_monthly_mean, "outputs/tables/mur_monthly_mean_sst_peconic_apr_aug.csv")

monthly_cell_sd <- list()
seasonal_cell_sd <- list()
seasonal_cell_mean <- list()
seasonal_cell_delta <- list()
station_sst_rows <- list()

station_attrs <- station_nlm |>
  st_drop_geometry() |>
  mutate(station_id = as.character(record_id))

for (yr in sort(unique(sst_index$year))) {
  year_files <- sst_index |>
    filter(year == yr) |>
    pull(file)

  year_stack <- rast(lapply(year_files, read_sst_raster_c))
  year_sd <- app(year_stack, sd, na.rm = TRUE)
  year_mean <- app(year_stack, mean, na.rm = TRUE)
  year_delta <- app(year_stack, max, na.rm = TRUE) - app(year_stack, min, na.rm = TRUE)

  names(year_sd) <- paste0("sst_sd_apr_aug_", yr)
  names(year_mean) <- paste0("sst_mean_apr_aug_", yr)
  names(year_delta) <- paste0("sst_delta_apr_aug_", yr)

  writeRaster(year_sd, paste0("outputs/rasters/sst/sst_apr_aug_cell_sd_", yr, ".tif"), overwrite = TRUE)
  writeRaster(year_mean, paste0("outputs/rasters/sst/sst_apr_aug_cell_mean_", yr, ".tif"), overwrite = TRUE)
  writeRaster(year_delta, paste0("outputs/rasters/sst/sst_apr_aug_cell_delta_", yr, ".tif"), overwrite = TRUE)

  seasonal_cell_sd[[as.character(yr)]] <- year_sd
  seasonal_cell_mean[[as.character(yr)]] <- year_mean
  seasonal_cell_delta[[as.character(yr)]] <- year_delta

  stations_sst <- st_transform(station_nlm, st_crs(crs(year_mean)))

  station_extracted <- terra::extract(
    c(year_mean, year_sd),
    vect(stations_sst)
  ) |>
    as_tibble() |>
    select(-ID)

  station_sst_rows[[as.character(yr)]] <- bind_cols(
    station_attrs |>
      select(any_of(c(
        "station_id", "record_id", "record_name", "water_body", "water_body_area",
        "depth_m", "slope_deg", "depth_label", "slope_label", "bathy_label",
        "kg_N_per_year_per_Ha", "nearest_nlm_distance_m"
      ))),
    station_extracted
  ) |>
    rename(
      mur_apr_aug_mean_sst_c = starts_with("sst_mean_apr_aug_"),
      mur_apr_aug_sd_sst_c = starts_with("sst_sd_apr_aug_")
    ) |>
    mutate(year = yr)

  for (mo in 4:8) {
    files_ym <- sst_index |>
      filter(year == yr, month_num == mo) |>
      pull(file)

    month_stack <- rast(lapply(files_ym, read_sst_raster_c))
    cell_sd <- app(month_stack, sd, na.rm = TRUE)
    cell_delta <- app(month_stack, max, na.rm = TRUE) - app(month_stack, min, na.rm = TRUE)

    names(cell_sd) <- paste0("sst_cell_sd_", yr, "_", sprintf("%02d", mo))
    names(cell_delta) <- paste0("sst_cell_delta_", yr, "_", sprintf("%02d", mo))

    writeRaster(cell_sd, paste0("outputs/rasters/sst/sst_cell_sd_", yr, "_", sprintf("%02d", mo), ".tif"), overwrite = TRUE)
    writeRaster(cell_delta, paste0("outputs/rasters/sst/sst_cell_delta_", yr, "_", sprintf("%02d", mo), ".tif"), overwrite = TRUE)

    monthly_cell_sd[[paste(yr, mo, sep = "_")]] <- tibble(
      year = yr,
      month_num = mo,
      month = month.abb[mo],
      n_days = length(files_ym),
      mean_cell_sd_c = as.numeric(global(cell_sd, mean, na.rm = TRUE)[1, 1]),
      median_cell_sd_c = as.numeric(global(cell_sd, median, na.rm = TRUE)[1, 1]),
      max_cell_sd_c = as.numeric(global(cell_sd, max, na.rm = TRUE)[1, 1]),
      mean_cell_delta_c = as.numeric(global(cell_delta, mean, na.rm = TRUE)[1, 1]),
      max_cell_delta_c = as.numeric(global(cell_delta, max, na.rm = TRUE)[1, 1])
    )
  }
}

station_year_sst <- bind_rows(station_sst_rows)
write_csv(station_year_sst, "outputs/tables/mur_station_year_sst_by_slope_depth_apr_aug.csv")

sst_cell_sd_monthly <- bind_rows(monthly_cell_sd)
write_csv(sst_cell_sd_monthly, "outputs/tables/mur_monthly_cell_sd_sst_peconic_apr_aug.csv")

sst_cell_sd_yearly <- sst_cell_sd_monthly |>
  group_by(year) |>
  summarise(
    mean_apr_aug_cell_sd_c = mean(mean_cell_sd_c, na.rm = TRUE),
    median_apr_aug_cell_sd_c = median(median_cell_sd_c, na.rm = TRUE),
    max_monthly_cell_sd_c = max(max_cell_sd_c, na.rm = TRUE),
    mean_apr_aug_cell_delta_c = mean(mean_cell_delta_c, na.rm = TRUE),
    max_monthly_cell_delta_c = max(max_cell_delta_c, na.rm = TRUE),
    .groups = "drop"
  ) |>
  left_join(
    sst_monthly_mean |>
      group_by(year) |>
      summarise(mean_apr_aug_sst_c = mean(mean_monthly_sst_c, na.rm = TRUE), .groups = "drop"),
    by = "year"
  ) |>
  mutate(
    mean_sst_rank_warmest = dense_rank(desc(mean_apr_aug_sst_c)),
    variability_rank_highest = dense_rank(desc(mean_apr_aug_cell_sd_c))
  )

write_csv(sst_cell_sd_yearly, "outputs/tables/mur_yearly_sst_covariates_apr_aug.csv")

knitr::kable(
  sst_cell_sd_yearly,
  digits = 2,
  caption = "Year-level MUR SST covariates. Cell SD is the average of monthly per-cell SST standard deviations across April--August."
)

Year-level MUR SST covariates. Cell SD is the average of monthly per-cell SST standard deviations across April–August.
year	mean_apr_aug_cell_sd_c	median_apr_aug_cell_sd_c	max_monthly_cell_sd_c	mean_apr_aug_cell_delta_c	max_monthly_cell_delta_c	mean_apr_aug_sst_c	mean_sst_rank_warmest	variability_rank_highest
2014	1.23	1.07	2.19	4.12	8.06	15.03	12	11
2015	1.46	1.47	2.94	5.39	12.95	15.15	11	3
2016	1.37	1.11	2.49	4.53	8.85	16.12	5	6
2017	1.48	1.60	2.45	4.72	6.94	15.51	10	2
2018	1.50	1.23	2.43	4.85	7.98	15.81	9	1
2019	1.43	1.38	2.49	4.68	7.81	15.97	7	4
2020	1.25	1.06	2.60	4.31	8.42	16.06	6	10
2021	1.27	1.00	2.50	4.41	8.21	15.92	8	9
2022	1.29	1.29	2.43	4.29	7.72	16.67	1	8
2023	1.20	1.20	1.97	4.21	6.97	16.55	3	12
2024	1.32	1.14	2.87	4.24	7.70	16.59	2	7
2025	1.40	1.25	2.29	4.97	8.41	16.27	4	5

6.3 SST trends, depth classes, and slope classes

The year-level table is used to classify whether particular years were warm, variable, or both. The station-level SST table is then summarized by bathymetric depth and slope class, so that the SST diagnostics remain aligned with the station-year covariate structure used later.

sst_long <- sst_cell_sd_yearly |>
  select(
    year,
    mean_apr_aug_sst_c,
    mean_apr_aug_cell_sd_c,
    mean_apr_aug_cell_delta_c
  ) |>
  pivot_longer(cols = -year, names_to = "metric", values_to = "value") |>
  mutate(
    metric = recode(
      metric,
      mean_apr_aug_sst_c = "Mean SST (°C)",
      mean_apr_aug_cell_sd_c = "Mean cell SST SD (°C)",
      mean_apr_aug_cell_delta_c = "Mean cell SST range (°C)"
    )
  )

p_sst_compact <- ggplot(sst_long, aes(x = year, y = value)) +
  geom_line(linewidth = 0.7) +
  geom_point(size = 1.8) +
  geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.6) +
  facet_wrap(~ metric, scales = "free_y", ncol = 1) +
  labs(
    title = "Figure 7. April--August MUR SST covariates by year",
    subtitle = "Dashed line marks the 2019 bay scallop collapse year",
    x = "Year",
    y = NULL
  ) +
  theme_minimal()

ggsave("figures/figure_07_mur_sst_covariates_by_year.png", p_sst_compact, width = 8, height = 7, dpi = 300)
knitr::include_graphics("figures/figure_07_mur_sst_covariates_by_year.png")

sst_corr_year <- tibble(
  metric = c(
    "Mean April--August SST",
    "Mean April--August cell SD",
    "Median April--August cell SD",
    "Mean April--August cell range",
    "Maximum monthly cell range"
  ),
  spearman_rho_with_year = c(
    cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$mean_apr_aug_sst_c, method = "spearman", use = "complete.obs"),
    cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$mean_apr_aug_cell_sd_c, method = "spearman", use = "complete.obs"),
    cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$median_apr_aug_cell_sd_c, method = "spearman", use = "complete.obs"),
    cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$mean_apr_aug_cell_delta_c, method = "spearman", use = "complete.obs"),
    cor(sst_cell_sd_yearly$year, sst_cell_sd_yearly$max_monthly_cell_delta_c, method = "spearman", use = "complete.obs")
  )
)

write_csv(sst_corr_year, "outputs/tables/mur_sst_correlations_with_year.csv")

knitr::kable(
  sst_corr_year,
  digits = 2,
  caption = "Rank correlations between year and April--August MUR SST covariates."
)

Rank correlations between year and April–August MUR SST covariates.
metric	spearman_rho_with_year
Mean April–August SST	0.81
Mean April–August cell SD	-0.25
Median April–August cell SD	-0.09
Mean April–August cell range	-0.16
Maximum monthly cell range	-0.34

sst_by_depth_year <- station_year_sst |>
  filter(!is.na(depth_label), !is.na(mur_apr_aug_sd_sst_c)) |>
  group_by(year, depth_label) |>
  summarise(
    n_stations = n(),
    median_sst_c = median(mur_apr_aug_mean_sst_c, na.rm = TRUE),
    median_sst_sd_c = median(mur_apr_aug_sd_sst_c, na.rm = TRUE),
    .groups = "drop"
  )

sst_by_slope_year <- station_year_sst |>
  filter(!is.na(slope_label), !is.na(mur_apr_aug_sd_sst_c)) |>
  group_by(year, slope_label) |>
  summarise(
    n_stations = n(),
    median_sst_c = median(mur_apr_aug_mean_sst_c, na.rm = TRUE),
    median_sst_sd_c = median(mur_apr_aug_sd_sst_c, na.rm = TRUE),
    .groups = "drop"
  )

write_csv(sst_by_depth_year, "outputs/tables/mur_sst_by_depth_class_year.csv")
write_csv(sst_by_slope_year, "outputs/tables/mur_sst_by_slope_class_year.csv")

p_sst_depth <- ggplot(
  sst_by_depth_year,
  aes(x = year, y = median_sst_sd_c, color = depth_label, group = depth_label)
) +
  geom_line(linewidth = 0.7) +
  geom_point(size = 1.8) +
  geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.6) +
  labs(
    title = "Figure 8. April--August SST variability by station depth class",
    subtitle = "MUR SST SD extracted to monitoring stations; depth classes are DEM-derived",
    x = "Year",
    y = "Median station SST SD (°C)",
    color = "Depth class"
  ) +
  theme_minimal()

ggsave("figures/figure_08_mur_sst_sd_by_depth_class.png", p_sst_depth, width = 9, height = 5, dpi = 300)
knitr::include_graphics("figures/figure_08_mur_sst_sd_by_depth_class.png")

p_sst_slope <- ggplot(
  sst_by_slope_year,
  aes(x = year, y = median_sst_sd_c, color = slope_label, group = slope_label)
) +
  geom_line(linewidth = 0.7) +
  geom_point(size = 1.8) +
  geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.6) +
  labs(
    title = "Figure 9. April--August SST variability by station slope class",
    subtitle = "MUR SST SD extracted to monitoring stations; slope classes are DEM-derived",
    x = "Year",
    y = "Median station SST SD (°C)",
    color = "Slope class"
  ) +
  theme_minimal()

ggsave("figures/figure_09_mur_sst_sd_by_slope_class.png", p_sst_slope, width = 9, height = 5, dpi = 300)
knitr::include_graphics("figures/figure_09_mur_sst_sd_by_slope_class.png")

sst_year_classes <- sst_cell_sd_yearly |>
  mutate(
    mean_sst_z = as.numeric(scale(mean_apr_aug_sst_c)),
    variability_z = as.numeric(scale(mean_apr_aug_cell_sd_c)),
    thermal_retention_score = rowMeans(cbind(mean_sst_z, -variability_z), na.rm = TRUE),
    sst_year_class = case_when(
      thermal_retention_score >= quantile(thermal_retention_score, 0.75, na.rm = TRUE) ~ "higher thermal-retention pressure",
      thermal_retention_score <= quantile(thermal_retention_score, 0.25, na.rm = TRUE) ~ "lower thermal-retention pressure",
      TRUE ~ "middle"
    )
  ) |>
  arrange(desc(thermal_retention_score))

write_csv(sst_year_classes, "outputs/tables/mur_sst_year_thermal_pressure_classes.csv")

knitr::kable(
  sst_year_classes,
  digits = 2,
  caption = "SST-based year classes. Higher scores indicate warmer April--August SST combined with lower SST variability."
)

SST-based year classes. Higher scores indicate warmer April–August SST combined with lower SST variability.
year	mean_apr_aug_cell_sd_c	median_apr_aug_cell_sd_c	max_monthly_cell_sd_c	mean_apr_aug_cell_delta_c	max_monthly_cell_delta_c	mean_apr_aug_sst_c	mean_sst_rank_warmest	variability_rank_highest	mean_sst_z	variability_z	thermal_retention_score	sst_year_class
2023	1.20	1.20	1.97	4.21	6.97	16.55	3	12	1.08	-1.45	1.27	higher thermal-retention pressure
2022	1.29	1.29	2.43	4.29	7.72	16.67	1	8	1.32	-0.55	0.94	higher thermal-retention pressure
2024	1.32	1.14	2.87	4.24	7.70	16.59	2	7	1.17	-0.26	0.72	higher thermal-retention pressure
2020	1.25	1.06	2.60	4.31	8.42	16.06	6	10	0.16	-0.99	0.57	middle
2021	1.27	1.00	2.50	4.41	8.21	15.92	8	9	-0.10	-0.79	0.35	middle
2025	1.40	1.25	2.29	4.97	8.41	16.27	4	5	0.56	0.49	0.04	middle
2016	1.37	1.11	2.49	4.53	8.85	16.12	5	6	0.28	0.22	0.03	middle
2014	1.23	1.07	2.19	4.12	8.06	15.03	12	11	-1.76	-1.16	-0.30	middle
2019	1.43	1.38	2.49	4.68	7.81	15.97	7	4	0.00	0.79	-0.39	middle
2018	1.50	1.23	2.43	4.85	7.98	15.81	9	1	-0.31	1.44	-0.87	lower thermal-retention pressure
2017	1.48	1.60	2.45	4.72	6.94	15.51	10	2	-0.86	1.24	-1.05	lower thermal-retention pressure
2015	1.46	1.47	2.94	5.39	12.95	15.15	11	3	-1.54	1.03	-1.28	lower thermal-retention pressure

The important result from this block is negative but useful. The MUR SST summaries do not identify 2019 as a clear basin-wide anomaly in either April–August mean SST or April–August SST variability. That weakens a simple wedge-like or seasonal surface-temperature anomaly explanation for the 2019 scallop collapse, at least at the spatial and temporal scale used here. It does not rule out finer-scale bottom-water stress, short heat events, dissolved-oxygen crashes, parasite timing, or interacting physiological stress. For this project, SST is therefore retained as broad thermal context and as a covariate in the later station-year table, while the actual explanatory weight remains with station water-quality records, bathymetric setting, and nitrogen-loading context.

7 Combined station-year covariate analysis

The final analysis unit is a station-year. Fixed spatial variables come from earlier sections: bathymetric depth, bathymetric slope, bathymetric regime, nearest Peconic nitrogen-loading subwatershed, and distance to that subwatershed. Temporal variables come from the SCDHS water-quality workbook and from the MUR SST products. The goal here is to assemble a compact covariate table and check whether water-quality stress is aligned with the physical and watershed-pressure variables developed above.

7.1 Working analysis structure

The working questions are deliberately limited:

Do station-year dissolved oxygen and chlorophyll-a values covary with static nitrogen loading and bathymetric setting?
Do warmer or lower-variability SST years coincide with higher station-level water-quality stress?
Can the station-year stress score be summarized to nearby shellfish-closure polygons as a management-oriented spatial proxy?

The stress logic is directional. Higher chlorophyll-a, higher nitrogen loading, higher summer temperature, lower dissolved oxygen, lower SST variability, lower slope, and shallower water are treated as more stressful. The SST component is a surface-temperature variability proxy, so it should be read as evidence of thermal behavior at the surface rather than direct proof of overturn or upwelling.

7.2 Load SCDHS station water-quality records

The SCDHS workbook is read from the two Peconic station sheets. Station identifiers are converted into the same SITE-060000 style used by the station-location file, which allows the water-quality records to join to the bathymetry and nitrogen covariate table.

wq_path <- "data/water_quality/scdhs_peconic/Peconics SCDHS WQ data - 1976 - 2024 DRAFT - updated 041025.xlsx"

# The two sheets have some columns inferred as different types by readxl.
# Converting both sheets to character before bind_rows() prevents type conflicts;
# the fields used below are then parsed back to numeric explicitly.
wq060_raw <- read_excel(wq_path, sheet = "Pec 060 stations ") |>
  clean_names() |>
  mutate(across(everything(), as.character))

wq200_raw <- read_excel(wq_path, sheet = "Pec 200 stations") |>
  clean_names() |>
  mutate(across(everything(), as.character))

parse_wq_number <- function(x) {
  x <- as.character(x)
  x <- str_replace_all(x, "<\\s*", "")
  x <- str_replace_all(x, ">\\s*", "")
  x <- str_replace_all(x, ",", "")
  suppressWarnings(as.numeric(x))
}

mean_na <- function(x) {
  if (all(is.na(x))) NA_real_ else mean(x, na.rm = TRUE)
}

min_na <- function(x) {
  if (all(is.na(x))) NA_real_ else min(x, na.rm = TRUE)
}

wq_clean <- bind_rows(
  wq060_raw |> mutate(program_sheet = "Pec 060"),
  wq200_raw |> mutate(program_sheet = "Pec 200")
) |>
  transmute(
    bay_station = parse_wq_number(bay_station),
    sample_date = as.Date(date),
    year = as.integer(format(sample_date, "%Y")),
    month = as.integer(format(sample_date, "%m")),
    do_mg_l = parse_wq_number(dissolved_oxygen_mg_l),
    chla_ug_l = parse_wq_number(chlorophyll_a_total_ug_l),
    secchi_m = parse_wq_number(secchi_ft) * 0.3048,
    nitrate_mg_l = parse_wq_number(nitrate_mg_l),
    nox_mg_l = parse_wq_number(nitrate_nitrite_mg_l),
    tkn_mg_l = parse_wq_number(total_kjeldahl_nitrogen_mg_l),
    tn_mg_l = parse_wq_number(total_nitrogen_mg_l),
    salinity_ppt = parse_wq_number(salinity_psu),
    station_temp_c = parse_wq_number(temperature_0c)
  ) |>
  filter(
    year >= 2014,
    year <= 2025,
    !is.na(bay_station),
    !is.na(sample_date)
  ) |>
  mutate(station_id = sprintf("SITE-%06d", as.integer(bay_station)))

station_year_wq <- wq_clean |>
  filter(month %in% 3:7) |>
  group_by(station_id, year) |>
  summarise(
    n_wq_samples = n(),
    mean_do_mg_l = mean_na(do_mg_l),
    min_do_mg_l = min_na(do_mg_l),
    mean_chla_ug_l = mean_na(chla_ug_l),
    mean_secchi_m = mean_na(secchi_m),
    mean_nitrate_mg_l = mean_na(nitrate_mg_l),
    mean_nox_mg_l = mean_na(nox_mg_l),
    mean_tkn_mg_l = mean_na(tkn_mg_l),
    mean_tn_mg_l = mean_na(tn_mg_l),
    mean_salinity_ppt = mean_na(salinity_ppt),
    mean_station_temp_c = mean_na(station_temp_c),
    .groups = "drop"
  )

write_csv(station_year_wq, "outputs/tables/scdhs_station_year_wq_2014_2025_mar_jul.csv")

knitr::kable(
  station_year_wq |> head(12),
  digits = 2,
  caption = "Station-year water-quality summaries from SCDHS Peconic monitoring records, March--July."
)

Station-year water-quality summaries from SCDHS Peconic monitoring records, March–July.
station_id	year	n_wq_samples	mean_do_mg_l	min_do_mg_l	mean_chla_ug_l	mean_secchi_m	mean_nitrate_mg_l	mean_nox_mg_l	mean_tkn_mg_l	mean_tn_mg_l	mean_salinity_ppt	mean_station_temp_c
SITE-060101	2014	10	7.72	4.0	4.41	1.37	NA	NA	NA	NA	25.96	16.57
SITE-060101	2015	16	6.76	3.2	6.89	1.28	NA	NA	NA	NA	27.08	18.09
SITE-060101	2016	10	6.70	3.1	10.05	1.34	NA	NA	NA	NA	27.26	15.96
SITE-060101	2017	8	8.07	3.3	4.86	1.57	NA	NA	NA	NA	26.77	16.54
SITE-060101	2018	8	7.71	5.0	7.79	1.52	NA	NA	NA	NA	25.75	17.42
SITE-060101	2019	4	8.12	5.0	8.58	1.52	NA	NA	NA	NA	24.70	16.05
SITE-060101	2020	2	5.45	5.3	7.79	0.91	NA	NA	NA	NA	27.80	27.70
SITE-060101	2021	6	8.55	3.4	12.48	1.45	NA	NA	NA	NA	25.78	15.08
SITE-060101	2022	8	7.32	4.2	14.58	1.26	NA	NA	NA	NA	27.09	17.42
SITE-060101	2023	5	7.62	5.5	3.91	1.83	NA	NA	NA	NA	26.82	19.30
SITE-060101	2024	10	7.42	4.2	5.36	1.32	NA	NA	NA	NA	24.25	18.79
SITE-060102	2014	10	8.20	4.6	2.18	1.83	NA	NA	NA	NA	27.10	15.93

7.3 Join fixed spatial covariates and annual SST covariates

Nitrogen loading and bathymetry are static over the analysis window; SST and water-quality values vary by year. This chunk creates the combined station-year table used for the correlation screen and stress-index plots.

station_covariates <- read_csv(
  "outputs/tables/peconic_station_bathymetry_nearest_nitrogen_covariates.csv",
  show_col_types = FALSE
) |>
  mutate(station_id = as.character(record_id))

station_year_sst_for_join <- read_csv(
  "outputs/tables/mur_station_year_sst_by_slope_depth_apr_aug.csv",
  show_col_types = FALSE
) |>
  select(station_id, year, mur_apr_aug_mean_sst_c, mur_apr_aug_sd_sst_c)

station_year_model <- station_year_wq |>
  left_join(station_covariates, by = "station_id") |>
  left_join(station_year_sst_for_join, by = c("station_id", "year"))

write_csv(station_year_model, "outputs/tables/station_year_combined_covariates_2014_2025.csv")

knitr::kable(
  station_year_model |>
    select(any_of(c(
      "station_id", "year", "mean_chla_ug_l", "min_do_mg_l",
      "depth_m", "slope_deg", "kg_N_per_year_per_Ha",
      "mur_apr_aug_mean_sst_c", "mur_apr_aug_sd_sst_c"
    ))) |>
    head(12),
  digits = 2,
  caption = "Combined station-year table used for the final covariate screen."
)

Combined station-year table used for the final covariate screen.
station_id	year	mean_chla_ug_l	min_do_mg_l	depth_m	slope_deg	kg_N_per_year_per_Ha	mur_apr_aug_mean_sst_c	mur_apr_aug_sd_sst_c
SITE-060101	2014	4.41	4.0	1.97	1.62	13.47	15.55	6.33
SITE-060101	2015	6.89	3.2	1.97	1.62	13.47	15.60	6.99
SITE-060101	2016	10.05	3.1	1.97	1.62	13.47	16.65	6.38
SITE-060101	2017	4.86	3.3	1.97	1.62	13.47	16.00	6.12
SITE-060101	2018	7.79	5.0	1.97	1.62	13.47	16.14	6.96
SITE-060101	2019	8.58	5.0	1.97	1.62	13.47	16.38	6.15
SITE-060101	2020	7.79	5.3	1.97	1.62	13.47	16.50	6.40
SITE-060101	2021	12.48	3.4	1.97	1.62	13.47	16.38	5.97
SITE-060101	2022	14.58	4.2	1.97	1.62	13.47	16.96	6.43
SITE-060101	2023	3.91	5.5	1.97	1.62	13.47	16.96	5.91
SITE-060101	2024	5.36	4.2	1.97	1.62	13.47	16.94	5.86
SITE-060102	2014	2.18	4.6	4.09	0.04	9.42	15.41	6.32

7.4 Correlation screen

A Spearman correlation matrix is the first check because several variables are skewed, bounded, or partly ordinal. The matrix is not a mechanism test; it is a quick way to see whether the proposed stress variables move together in the expected directions.

cor_vars <- station_year_model |>
  select(any_of(c(
    "min_do_mg_l",
    "mean_do_mg_l",
    "mean_chla_ug_l",
    "mean_secchi_m",
    "mean_nitrate_mg_l",
    "mean_nox_mg_l",
    "mean_tkn_mg_l",
    "mean_tn_mg_l",
    "depth_m",
    "slope_deg",
    "kg_N_per_year_per_Ha",
    "nearest_nlm_distance_m",
    "mur_apr_aug_mean_sst_c",
    "mur_apr_aug_sd_sst_c"
  ))) |>
  select(where(is.numeric))

cor_mat <- cor(cor_vars, use = "pairwise.complete.obs", method = "spearman")

cor_df <- as.data.frame(as.table(cor_mat)) |>
  rename(var1 = Var1, var2 = Var2, spearman_r = Freq)

pairwise_cor_tests <- combn(names(cor_vars), 2, simplify = FALSE) |>
  lapply(function(v) {
    x <- cor_vars[[v[1]]]
    y <- cor_vars[[v[2]]]
    ok <- complete.cases(x, y)

    if (sum(ok) < 10 || sd(x[ok], na.rm = TRUE) == 0 || sd(y[ok], na.rm = TRUE) == 0) {
      return(tibble(var1 = v[1], var2 = v[2], n = sum(ok), spearman_r = NA_real_, p_value = NA_real_))
    }

    test <- suppressWarnings(cor.test(x[ok], y[ok], method = "spearman", exact = FALSE))

    tibble(
      var1 = v[1],
      var2 = v[2],
      n = sum(ok),
      spearman_r = unname(test$estimate),
      p_value = test$p.value
    )
  }) |>
  bind_rows() |>
  arrange(desc(abs(spearman_r)))

write_csv(cor_df, "outputs/tables/station_year_spearman_correlations.csv")
write_csv(pairwise_cor_tests, "outputs/tables/station_year_pairwise_spearman_tests.csv")

p_cor <- ggplot(cor_df, aes(x = var1, y = var2, fill = spearman_r)) +
  geom_tile(color = "white", linewidth = 0.2) +
  scale_fill_gradient2(
    low = "#2166ac",
    mid = "white",
    high = "#b2182b",
    midpoint = 0,
    limits = c(-1, 1),
    name = "Spearman r"
  ) +
  labs(
    title = "Figure 10. Correlation screen for station-year covariates",
    subtitle = "Spearman rank correlations; 2014--2025 station-year summaries",
    x = NULL,
    y = NULL
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave("figures/figure_10_station_year_correlation_matrix.png", p_cor, width = 8.5, height = 7, dpi = 300)
knitr::include_graphics("figures/figure_10_station_year_correlation_matrix.png")

knitr::kable(
  pairwise_cor_tests |>
    filter(!is.na(spearman_r)) |>
    slice_head(n = 15),
  digits = 3,
  caption = "Strongest pairwise Spearman associations among station-year covariates."
)

Strongest pairwise Spearman associations among station-year covariates.
var1	var2	n	spearman_r
mean_nox_mg_l	depth_m	13	-0.866
min_do_mg_l	mean_do_mg_l	753	0.703
depth_m	nearest_nlm_distance_m	527	0.679
mean_secchi_m	depth_m	424	0.630
mean_nox_mg_l	kg_N_per_year_per_Ha	196	0.596
mean_nitrate_mg_l	mean_nox_mg_l	137	0.587
mean_do_mg_l	mean_secchi_m	452	0.543
mean_secchi_m	nearest_nlm_distance_m	452	0.525
min_do_mg_l	mean_secchi_m	452	0.524
slope_deg	nearest_nlm_distance_m	470	-0.514
min_do_mg_l	mean_chla_ug_l	514	-0.501
mean_chla_ug_l	mean_secchi_m	419	-0.472
mean_nitrate_mg_l	kg_N_per_year_per_Ha	139	0.432
mean_chla_ug_l	kg_N_per_year_per_Ha	515	0.394
min_do_mg_l	mean_nox_mg_l	192	0.392

7.5 Provisional station-year stress index

The index is scaled so that larger values indicate greater proxy stress. Direct response variables are weighted more heavily than static spatial covariates. Dissolved oxygen is inverted because lower oxygen is the stress condition. SST variability and slope are also inverted because in my working hypothesis I’m assuming more variable/mixed and steeper-gradient areas as less prone to stagnant warm-water stress.

z_safe <- function(x) {
  x <- as.numeric(x)
  sx <- sd(x, na.rm = TRUE)
  if (is.na(sx) || sx == 0) return(rep(NA_real_, length(x)))
  as.numeric((x - mean(x, na.rm = TRUE)) / sx)
}

trend_test <- function(dat, value_col) {
  x <- dat$year
  y <- dat[[value_col]]
  ok <- complete.cases(x, y)

  if (sum(ok) < 4 || sd(y[ok], na.rm = TRUE) == 0) {
    return(tibble(metric = value_col, n = sum(ok), spearman_rho = NA_real_, p_value = NA_real_))
  }

  test <- suppressWarnings(cor.test(x[ok], y[ok], method = "spearman", exact = FALSE))

  tibble(
    metric = value_col,
    n = sum(ok),
    spearman_rho = unname(test$estimate),
    p_value = test$p.value
  )
}

station_year_stress <- station_year_model |>
  mutate(
    z_chla = z_safe(mean_chla_ug_l),
    z_low_do = -z_safe(min_do_mg_l),
    z_n_yield = z_safe(kg_N_per_year_per_Ha),
    z_warm_sst = z_safe(mur_apr_aug_mean_sst_c),
    z_low_sst_variability = -z_safe(mur_apr_aug_sd_sst_c),
    z_low_slope = -z_safe(slope_deg),
    z_shallow = -z_safe(depth_m),
    stress_index =
      0.25 * z_chla +
      0.25 * z_low_do +
      0.15 * z_n_yield +
      0.15 * z_warm_sst +
      0.10 * z_low_sst_variability +
      0.05 * z_low_slope +
      0.05 * z_shallow
  )

write_csv(station_year_stress, "outputs/tables/station_year_stress_index_2014_2025.csv")

stress_year_summary <- station_year_stress |>
  group_by(year) |>
  summarise(
    n_station_years = sum(!is.na(stress_index)),
    median_stress_index = median(stress_index, na.rm = TRUE),
    mean_stress_index = mean(stress_index, na.rm = TRUE),
    upper_quartile_stress = quantile(stress_index, 0.75, na.rm = TRUE),
    .groups = "drop"
  )

stress_trend_tests <- bind_rows(
  trend_test(stress_year_summary, "median_stress_index"),
  trend_test(stress_year_summary, "mean_stress_index"),
  trend_test(stress_year_summary, "upper_quartile_stress")
)

write_csv(stress_year_summary, "outputs/tables/station_year_stress_summary_by_year.csv")
write_csv(stress_trend_tests, "outputs/tables/station_year_stress_trend_tests.csv")

knitr::kable(
  stress_year_summary,
  digits = 2,
  caption = "Annual summary of the provisional station-year stress index."
)

Annual summary of the provisional station-year stress index.
year	n_station_years	median_stress_index	mean_stress_index	upper_quartile_stress
2014	39	-0.39	-0.34	-0.22
2015	39	-0.62	-0.53	-0.43
2016	39	-0.05	0.01	0.17
2017	39	-0.16	-0.04	0.10
2018	39	-0.27	-0.28	-0.06
2019	41	-0.14	-0.05	0.26
2020	40	-0.01	0.01	0.13
2021	43	0.03	0.17	0.39
2022	43	-0.03	0.08	0.23
2023	43	0.22	0.26	0.46
2024	50	0.19	0.28	0.53

knitr::kable(
  stress_trend_tests,
  digits = 4,
  caption = "Spearman trend tests for annual stress-index summaries."
)

Spearman trend tests for annual stress-index summaries.
metric	n	spearman_rho	p_value
median_stress_index	11	0.8909	2e-04
mean_stress_index	11	0.8636	6e-04
upper_quartile_stress	11	0.8727	5e-04

p_stress_year <- ggplot(station_year_stress, aes(x = factor(year), y = stress_index)) +
  geom_boxplot(fill = "grey90", color = "grey30", outlier.alpha = 0.45) +
  stat_summary(fun = mean, geom = "point", color = "#b2182b", size = 2) +
  geom_vline(xintercept = which(sort(unique(station_year_stress$year)) == 2019), linetype = "dashed", alpha = 0.5) +
  labs(
    title = "Figure 11. Station-year stress index by year",
    subtitle = "Higher values indicate higher chlorophyll/nitrogen/temperature and lower oxygen/variability/slope/depth",
    x = "Year",
    y = "Relative stress index"
  ) +
  theme_minimal()

ggsave("figures/figure_11_station_year_stress_index.png", p_stress_year, width = 8.5, height = 4.8, dpi = 300)
knitr::include_graphics("figures/figure_11_station_year_stress_index.png")

stress_components_long <- station_year_stress |>
  select(year, z_chla, z_low_do, z_n_yield, z_warm_sst, z_low_sst_variability, z_low_slope, z_shallow) |>
  pivot_longer(cols = -year, names_to = "component", values_to = "z_value") |>
  group_by(year, component) |>
  summarise(median_component = median(z_value, na.rm = TRUE), .groups = "drop")

p_components <- ggplot(stress_components_long, aes(x = year, y = median_component, color = component)) +
  geom_line(linewidth = 0.6) +
  geom_point(size = 1.5) +
  geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.5) +
  labs(
    title = "Figure 12. Median stress-index components by year",
    subtitle = "Static components separate station groups; water-quality and SST components drive year-to-year change",
    x = "Year",
    y = "Median z-scored component",
    color = "Component"
  ) +
  theme_minimal()

ggsave("figures/figure_12_stress_components_by_year.png", p_components, width = 9, height = 5, dpi = 300)
knitr::include_graphics("figures/figure_12_stress_components_by_year.png")

7.6 Approximate shellfish-closure stress summaries

Each station-year stress value is assigned to the nearest shellfish-closure polygon, and closure-year stress is summarized from nearby stations. The output gives an environmental stress summary around closure polygons while leaving scallop abundance and landings outside the model.

closures <- st_read("data/nysdec_shellfish_closures.geojson", quiet = TRUE) |>
  mutate(closure_id = row_number())

station_geom <- st_read(
  "outputs/peconic_station_bathymetry_nearest_nitrogen_covariates.gpkg",
  quiet = TRUE
) |>
  mutate(station_id = as.character(record_id)) |>
  select(station_id)

station_stress_sf <- station_geom |>
  left_join(station_year_stress, by = "station_id") |>
  filter(!is.na(year), !is.na(stress_index))

closures_m <- st_transform(closures, 6347)
stations_m <- st_transform(station_stress_sf, 6347)

nearest_closure <- st_nearest_feature(stations_m, closures_m)
nearest_closure_dist_m <- as.numeric(st_distance(stations_m, closures_m[nearest_closure, ], by_element = TRUE))

station_closure_stress <- stations_m |>
  mutate(
    closure_id = closures_m$closure_id[nearest_closure],
    nearest_closure_distance_m = nearest_closure_dist_m
  )

closure_year_stress <- station_closure_stress |>
  st_drop_geometry() |>
  group_by(year, closure_id) |>
  summarise(
    n_stations = n(),
    mean_stress_index = mean(stress_index, na.rm = TRUE),
    max_stress_index = max(stress_index, na.rm = TRUE),
    mean_nearest_station_distance_m = mean(nearest_closure_distance_m, na.rm = TRUE),
    .groups = "drop"
  )

closure_stress_series <- closure_year_stress |>
  group_by(year) |>
  summarise(
    mean_closure_stress = mean(mean_stress_index, na.rm = TRUE),
    upper_closure_stress = quantile(mean_stress_index, 0.75, na.rm = TRUE),
    .groups = "drop"
  )

closure_trend_tests <- bind_rows(
  trend_test(closure_stress_series, "mean_closure_stress"),
  trend_test(closure_stress_series, "upper_closure_stress")
)

write_csv(closure_year_stress, "outputs/tables/shellfish_closure_year_stress_index.csv")
write_csv(closure_stress_series, "outputs/tables/shellfish_closure_stress_series.csv")
write_csv(closure_trend_tests, "outputs/tables/shellfish_closure_stress_trend_tests.csv")

knitr::kable(
  closure_trend_tests,
  digits = 4,
  caption = "Spearman trend tests for shellfish-closure stress summaries."
)

Spearman trend tests for shellfish-closure stress summaries.
metric	n	spearman_rho	p_value
mean_closure_stress	11	0.9182	1e-04
upper_closure_stress	11	0.8909	2e-04

p_closure_hist <- ggplot(closure_year_stress, aes(x = factor(year), y = mean_stress_index)) +
  geom_boxplot(fill = "grey90", color = "grey35", outlier.alpha = 0.45) +
  stat_summary(fun = mean, geom = "point", color = "#b2182b", size = 2) +
  labs(
    title = "Figure 13. Shellfish-closure stress summaries by year",
    subtitle = "Closure stress is approximated from nearest station-year stress values",
    x = "Year",
    y = "Mean closure stress index"
  ) +
  theme_minimal()

ggsave("figures/figure_13_shellfish_closure_stress_by_year.png", p_closure_hist, width = 8.5, height = 4.8, dpi = 300)
knitr::include_graphics("figures/figure_13_shellfish_closure_stress_by_year.png")

p_closure_ts <- ggplot(closure_stress_series, aes(x = year, y = mean_closure_stress)) +
  geom_line(linewidth = 0.6, color = "#2166ac") +
  geom_point(size = 2, color = "#2166ac") +
  geom_line(aes(y = upper_closure_stress), linewidth = 0.4, color = "#b2182b", linetype = "dashed") +
  geom_vline(xintercept = 2019, linetype = "dashed", alpha = 0.5) +
  labs(
    title = "Figure 14. Mean shellfish-closure stress through time",
    subtitle = "Red dashed series shows the upper quartile of closure stress values",
    x = "Year",
    y = "Stress index"
  ) +
  theme_minimal()

ggsave("figures/figure_14_shellfish_closure_stress_timeseries.png", p_closure_ts, width = 8.5, height = 4.8, dpi = 300)
knitr::include_graphics("figures/figure_14_shellfish_closure_stress_timeseries.png")

8 Overall synthesis

The strongest result from the SST section is that 2019 was not clearly anomalous in basin-wide April–August mean SST or SST variability. However, the later station-year analysis gives a stronger pattern, where mean water-quality stress rises through time, and the closure-level summaries show the same broad increase. Earlier years, especially 2014–2015 and 2018, sit lower in the stress distribution, while 2021–2024 occupy the upper part of the index, with 2023–2024 especially high.

Although static nitrogen pressure and bathymetric setting really only separate stations spatially, year-to-year movement is carried more by water-quality and thermal covariates. Chlorophyll-a, dissolved oxygen, seasonal SST, and low SST variability do not all peak in the same year, yet their combined score shifts upward after 2019. This lines up with the failure of the fishery to recover: the collapse year itself is not uniquely explained by the broad MUR SST summaries, but the post-collapse period remains environmentally unfavorable in the combined station-year stack.

As per the closure plots, once station stress is summarized to the nearest shellfish-closure polygons, across the station-year output the main result is an increase in combined stress after the 2019 collapse year. The annual stress-index table moves from mostly negative values in 2014–2018 to mostly positive values after 2020, with the highest mean and upper-quartile values appearing in 2023–2024.

The trend tests also help to quantify this pattern visually, wherein the mean, median, and upper-quartile station-year stress summaries are all positively correlated with year. The closure summary shows the same directional structure when station stress is transferred to the nearest shellfish-closure polygons.

In practical terms, the recent Peconic record does not look like a short 2019-only anomaly; in fact, it seems more like a persistent worsening of the covariate stack after the collapse… (no wonder the scallops haven’t recovered yet.)

This opens up the possibility of the mas dying events being caused by the possible intrusion of a novel parasitoid (such as hte coccidian parasite discussed in the PEC report [4], although there still remains a dearth of evidence regarding any biotic explanation.

9 References and source notes

[1] National Oceanic and Atmospheric Administration, National Geodetic Survey. 2023 NOAA NGS Topobathy Lidar DEM for Long Island Sound, New York, product 10274. NOAA Digital Coast. Accessed 2026. Used for high-resolution measured topobathymetry where coverage exists.
https://www.fisheries.noaa.gov/inport/item/74870

[2] National Centers for Environmental Information. Continuously Updated Digital Elevation Model (CUDEM): Ninth arc-second resolution bathymetric-topographic tiles. NOAA NCEI. Accessed 2026. Used as broader bathymetric base coverage where the 2023 topobathymetric lidar product did not return bottom elevations.
https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ngdc.mgg.dem:199919

[3] Peconic Bay Scallop Technical Review Committee. 2020. Preliminary Findings and Recommendations of the Peconic Bay Scallop Technical Review Committee. Peconic Estuary Partnership. Used for the 2019 adult bay scallop die-off context, landings figure, and discussion of temperature, dissolved oxygen, disease, and monitoring needs.
https://www.peconicestuary.org/wp-content/uploads/2020/03/Preliminary-Findings-and-Recommendations-of-the-Peconic-Bay-Scallop-Technical-Review-Committee-3-5-20-2.pdf

[4] New York State Department of Environmental Conservation. 2025. Bay Scallop Species Status Assessment. NYSDEC State Wildlife Action Plan materials. Used for species-status context and linked stressors including high summer water temperature, low dissolved oxygen, and coccidian parasite infection.
https://extapps.dec.ny.gov/fs/programs/dfw/SWAP2025/Marine%20Invertebrates/Bay%20scallop.pdf

[5] Lloyd, S. 2014. Nitrogen Load Modeling to Forty-Three Subwatersheds of the Peconic Estuary. The Nature Conservancy, in partnership with the Peconic Estuary Program. Used as the source for the Peconic Nitrogen Loading Model subwatershed nitrogen-yield layer.
https://www.peconicestuary.org/wp-content/uploads/2017/06/Nitrogenloadmodelingtoforty-thr.pdf

[6] Suffolk County Department of Health Services. 2025. Peconics SCDHS Water Quality Data, 1976–2024, draft updated 2025-04-10 and Station Locations, updated 2024-10-28. Local project files supplied through the SCDHS/Peconic data package. Used for station locations and station-level dissolved oxygen, chlorophyll-a, Secchi depth, temperature, salinity, and nutrient summaries.

[7] Jet Propulsion Laboratory MUR MEaSUREs Project. Multi-scale Ultra-high Resolution Sea Surface Temperature Analysis, GHRSST Level 4, version 4.1. Accessed through NOAA CoastWatch ERDDAP. Used for April–August daily SST subsets and year-level thermal covariates.
https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.html

[8] New York State Department of Environmental Conservation. Shellfishing and public shellfish mapper resources. NYSDEC. Accessed 2026. Used as management geography for approximate closure-level stress summaries.
https://dec.ny.gov/things-to-do/shellfishing

[9] Li, C., and O’Donnell, J. 1997. Tidally driven residual circulation in shallow estuaries with lateral depth variation. Journal of Geophysical Research: Oceans 102(C13):27915–27929. doi:10.1029/97JC02330.

[10] Cornell Cooperative Extension Suffolk County. Peconic bay scallop restoration and monitoring program summaries. Accessed 2026. Used for recent restoration and die-off context.
https://ccesuffolk.org/

Bathymetric and Water-Quality Proxies for Eutrophication Risk in the Peconic Estuary

E. Feusi

2026-05-16