Hard Coral Cover — Bayesian Binomial BGLMM

HTHH eruption impact on Tongan reef hard coral

Published

April 29, 2026

Overview

This document fits a series of Bayesian Generalised Linear Mixed Models (BGLMMs) to post-tsunami total hard coral point count data from 104 replicated Tongan reef sites surveyed pre- (2017/2018) and post-eruption (2024) following the January 2022 Hunga Tonga–Hunga Ha’apai (HTHH) eruption.

Research question: What environmental drivers (wave exposure and distance from the eruption epicentre) predict post-eruption hard coral cover decline on Tongan reefs, after accounting for pre-eruption baseline cover?

Hard coral groups included (from lookup table classification_lookup_benthic_260226.csv, coarse_group == "hard coral"): branching acropora, corymbose and digitate acropora, table acropora, encrusting coral, massive coral, foliose coral, fungiidae, other branching coral, millepora. Total hard coral COUNT per transect is the sum of counts across all these groups.

Response variable: Post-eruption total hard coral point counts out of total benthic points per transect (COUNT | trials(TOTAL)), modelled with a binomial likelihood and logit link.

Pre-cover as variable intercept (0 + pre_cover_scaled): Pre-eruption hard coral cover anchors the model baseline. Unlike sargassum (63.9% zeros, mean = 0.17%), hard coral pre-cover is well distributed (mean = 16.0%, SD = 11.7%, only 1.7% zeros, max|z| = 3.86). No transformation is needed — raw z-scoring is appropriate and the variable intercept is ecologically meaningful: sites with high pre-eruption coral had more to lose.

Key difference from sargassum analysis: Hard coral cover declined post-eruption (pre mean = 16.0% → post mean = 8.2%), the opposite direction to sargassum. Priors are adjusted accordingly — coral is not a rare group so the intercept prior is centred higher than for sargassum.


1. Setup

Show code
library(MASS)         # glm.nb and other stats functions # MASS masks tidyverse - load before tidyverse to avoid select() conflict
library(tidyverse)    # data wrangling and plotting
library(brms)         # Bayesian regression models
library(tidybayes)    # tidy posterior draws
library(bayesplot)    # MCMC diagnostics and posterior plots
library(loo)          # LOO-CV model comparison
library(patchwork)    # combining ggplot panels
library(here)         # relative file paths
library(knitr)        # kable tables
library(future)       # parallel computation
library(broom)        # tidy model output
library(broom.mixed)  # tidy output for mixed models
library(rstan)        # Stan interface (stan_trace, stan_ac etc)
library(effects)      # partial effects plots
library(emmeans)      # estimating marginal means
library(DHARMa)       # residual diagnostics
library(flextable)

# Resolve any remaining namespace conflicts
select    <- dplyr::select
filter    <- dplyr::filter
mutate    <- dplyr::mutate
summarise <- dplyr::summarise

options(brms.backend = "cmdstanr")
theme_set(theme_bw(base_size = 11))
color_scheme_set("blue")
set.seed(42)

region_cols <- c(
  "Tongatapu" = "#C4A35A",
  "Haapai"    = "#7A3D38",
  "Vavau"     = "#6B5070"
)

# ── File paths ──────────────────────────────────────────────────────────────
doc_name <- "14_coral_BGLMM_analysis"

csv_path   <- here("output csvs", doc_name)
model_path <- here("models", doc_name)

dir.create(csv_path,   showWarnings = FALSE, recursive = TRUE)
dir.create(model_path, showWarnings = FALSE, recursive = TRUE)

#cat("Input:", data_path, "\n")
cat("CSVs:", csv_path, "\n")
CSVs: /Users/jc928558/Library/CloudStorage/GoogleDrive-lucy.k.southworth@gmail.com/My Drive/JCU (Google drive)/PhD/Thesis Chapters:papers/Thesis Chapters and related docs/CH1 - Benthic shifts/Ch1_Analysis/Reefcloud_data_analyses_ch1/output csvs/14_coral_BGLMM_analysis 
Show code
cat("Models:", model_path, "\n")
Models: /Users/jc928558/Library/CloudStorage/GoogleDrive-lucy.k.southworth@gmail.com/My Drive/JCU (Google drive)/PhD/Thesis Chapters:papers/Thesis Chapters and related docs/CH1 - Benthic shifts/Ch1_Analysis/Reefcloud_data_analyses_ch1/models/14_coral_BGLMM_analysis 

2. Data Preparation

2.1 Load data and join wave covariates

Loads the coarse-group benthic transect data and joins site-level wave model values from a separate CSV. The wave CSV contains three modelled maximum wave height estimates per site (GN, SV, PDC0 models). The join is by site name — check the output confirms zero missing wave values after joining.

Show code
# Coarse group benthic data (pre and post, all coarse groups)
dat_coarse <- read.csv(here("output csvs/08_replicated_sites_prepost_dist_delta_coarse_corrected_for_quarto/transect_coarsegroup_count_cov_withaddedzeros.csv"))

# Read wave model data — stored in covariate data folder
wave <- read.csv(here("covariate data",
                       "wave_model_extracted_values_110326.csv")) %>%
  mutate(site_country_region = factor(site_country_region,
                                       levels = c("Tongatapu", "Haapai", "Vavau")))

# Join wave values to coarse group data (site level — one value per site)
dat_coarse <- dat_coarse %>%
  left_join(
    wave %>% distinct(site,
                       HTHH_pdc_GN4km3_3600_etamax,
                       HTHH_pdc_SV4km3_etamax_end,
                       pdc0_4km3_fr1e2_etamax_end),
    by = "site"
  )

cat("Coarse group data:\n")
Coarse group data:
Show code
cat("  Rows:", nrow(dat_coarse), "\n")
  Rows: 6656 
Show code
cat("  Coarse groups:", paste(sort(unique(dat_coarse$coarse_group)),
                               collapse = ", "), "\n")
  Coarse groups: algae, consolidated substrate, EAM, hard coral, other invert, other nonbenthic, soft coral, unconsolidated substrate 
Show code
cat("  N sites:", length(unique(dat_coarse$site)), "\n")
  N sites: 104 
Show code
cat("  Wave join successful:",
    !any(is.na(dat_coarse$HTHH_pdc_GN4km3_3600_etamax)), "\n")
  Wave join successful: TRUE 
Show code
cat("  N sites missing wave data:",
    sum(is.na(dat_coarse$HTHH_pdc_GN4km3_3600_etamax) &
          dat_coarse$coarse_group == "hard coral" &
          dat_coarse$pre_post == "post"), "\n")
  N sites missing wave data: 0 

2.2 Filter to hard coral

Subsets the data to coarse_group == "hard coral". Unlike the morph-group dataset, the coarse CSV already has one row per transect per group — no summing across morph groups is needed. The summary table confirms the expected decline in mean cover from pre to post eruption.

Show code
dat_hc <- dat_coarse %>%
  filter(coarse_group == "hard coral") %>%
  mutate(
    pre_post = factor(pre_post, levels = c("pre", "post")),
    site_country_region = factor(site_country_region,
                                  levels = c("Tongatapu", "Haapai", "Vavau"))
  )

cat("Hard coral rows:", nrow(dat_hc), "\n")
Hard coral rows: 832 
Show code
cat("Pre:", sum(dat_hc$pre_post == "pre"),
    "| Post:", sum(dat_hc$pre_post == "post"), "\n")
Pre: 416 | Post: 416 
Show code
dat_hc %>%
  group_by(pre_post) %>%
  summarise(
    mean_cover = round(mean(COVER) * 100, 2),
    sd_cover   = round(sd(COVER) * 100, 2),
    max_cover  = round(max(COVER) * 100, 2),
    pct_zeros  = round(mean(COVER == 0) * 100, 1),
    .groups    = "drop"
  ) %>%
  kable(caption = "Hard coral cover summary by survey period (%)")
Hard coral cover summary by survey period (%)
pre_post mean_cover sd_cover max_cover pct_zeros
pre 15.98 11.73 61.29 1.7
post 8.19 9.58 46.00 0.0
Show code
# pre cover mean =15.98
# post cover mean = 8.19

2.3 Explore distributions

Three plots: (1) pre-eruption cover histogram — shows distribution shape and zero inflation; (2) post-eruption histogram — compare shape and mean to pre; (3) boxplots by region and period — shows whether regional patterns differ. Hard coral is expected to show a clear decline post-eruption across all regions, with possible variation in magnitude between regions.

Show code
pre_hc  <- dat_hc %>% filter(pre_post == "pre")
post_hc <- dat_hc %>% filter(pre_post == "post")

p1 <- ggplot(pre_hc, aes(x = COVER * 100)) +
  geom_histogram(bins = 40, fill = "steelblue", alpha = 0.8) +
  labs(title = "Pre-eruption hard coral cover",
       subtitle = paste0(round(mean(pre_hc$COVER == 0) * 100, 1),
                         "% zeros, mean = ",
                         round(mean(pre_hc$COVER) * 100, 1), "%"),
       x = "Cover (%)", y = "Count")

p2 <- ggplot(post_hc, aes(x = COVER * 100)) +
  geom_histogram(bins = 40, fill = "coral", alpha = 0.8) +
  labs(title = "Post-eruption hard coral cover",
       subtitle = paste0(round(mean(post_hc$COVER == 0) * 100, 1),
                         "% zeros, mean = ",
                         round(mean(post_hc$COVER) * 100, 1), "%"),
       x = "Cover (%)", y = "Count")

p3 <- dat_hc %>%
  ggplot(aes(y = COVER * 100, x = pre_post,
             fill = site_country_region)) +
  geom_boxplot(alpha = 0.7, outlier.size = 0.8) +
  scale_fill_manual(values = region_cols) +
  facet_wrap(~site_country_region, ncol = 3) +
  labs(title = "Hard coral cover by region and period",
       x = "Survey period", y = "Hard coral cover (%)") +
  theme(legend.position = "none",
        strip.background = element_blank(),
        strip.text = element_text(face = "bold"))

(p1 + p2) / p3

2.4 Pre-to-post trajectories

Each line connects the pre and post cover at a single transect. For hard coral, nearly all lines should slope downward — cover declined post-eruption. The spread of line lengths shows how much the magnitude of decline varied between transects. If lines fan out (some large declines, some small), it suggests site-level predictors may explain that variation — justifying inclusion of distance and wave height as fixed effects. Vava’u (the furthest region from the eruption)also increased marginally in coral cover so some lines will increase here.

Show code
dat_hc %>%
  mutate(
    transect_uid = paste(site, survey_transect_number, sep = "_")
  ) %>%
  ggplot(aes(y = COVER * 100, x = pre_post)) +
  geom_line(aes(group = transect_uid, colour = site_country_region),
            alpha = 0.3, linewidth = 0.4) +
  geom_point(aes(colour = site_country_region),
             alpha = 0.3, size = 0.8) +
  scale_colour_manual(values = region_cols) +
  labs(x = "Survey period", y = "Hard coral cover (%)",
       colour = "Region",
       title = "Pre to post hard coral trajectories by transect",
       subtitle = "Most transects declined — magnitude of decline varies by site") +
  theme(legend.position = "bottom")

2.5 Assess pre-cover variable / predictor

Data-driven assessment: Pre-coral cover has mean = 16.0%, SD = 11.7%, only 1.7% zeros, and max|z| = 3.86 when raw z-scored. This is a well-behaved continuous distribution — unlike sargassum (63.9% zeros, max|z| = 8.63), no transformation is needed. Raw z-scoring is used directly.

Show code
pre_cover_vals <- pre_hc$COVER

raw_z <- scale(pre_cover_vals)

cat("Pre hard coral cover statistics:\n")
Pre hard coral cover statistics:
Show code
cat(sprintf("  mean  = %.4f (%.1f%%)\n", mean(pre_cover_vals),
            mean(pre_cover_vals) * 100))
  mean  = 0.1598 (16.0%)
Show code
cat(sprintf("  SD    = %.4f (%.1f%%)\n", sd(pre_cover_vals),
            sd(pre_cover_vals) * 100))
  SD    = 0.1173 (11.7%)
Show code
cat(sprintf("  max   = %.4f (%.1f%%)\n", max(pre_cover_vals),
            max(pre_cover_vals) * 100))
  max   = 0.6129 (61.3%)
Show code
cat(sprintf("  zeros = %d / %d (%.1f%%)\n",
            sum(pre_cover_vals == 0), length(pre_cover_vals),
            mean(pre_cover_vals == 0) * 100))
  zeros = 7 / 416 (1.7%)
Show code
cat(sprintf("  max|z| raw = %.2f — acceptable, no transformation needed\n", # the max scaled (z-score) pre-cover value is only just above 3 standard deviations above the mean - this is acceptable and wont affect your model so no need to transform this pre cover predictor
            max(abs(raw_z))))
  max|z| raw = 3.86 — acceptable, no transformation needed
Show code
ggplot(data.frame(x = pre_cover_vals * 100), aes(x = x)) +
  geom_histogram(bins = 40, fill = "steelblue", alpha = 0.8) +
  labs(title = "Pre-cover distribution — hard coral",
       subtitle = paste0("mean = ", round(mean(pre_cover_vals) * 100, 1),
                         "%, SD = ", round(sd(pre_cover_vals) * 100, 1),
                         "%, max|z| = ", round(max(abs(raw_z)), 2),
                         " — raw z-score used directly (no transformation needed)"),
       x = "Pre-eruption hard coral cover (%)", y = "Count")

2.6 Sites with greatest hard coral decline

Summarises post-eruption cover at the site level (mean across 4 transects) and calculates delta = post - pre. Most negative delta = greatest decline. Distance and wave height are included to visually inspect whether sites with the largest declines tend to be closer to or more wave-exposed to the volcano. This is a descriptive check before modelling — not a statistical test.

Show code
# Calculate mean pre-eruption coral cover per site from pre-survey transects
pre_site_hc <- dat_hc %>%
  filter(pre_post == "pre") %>%          # keep only pre-eruption rows
  group_by(site) %>%                     # group by site
  summarise(pre_mean_cov = mean(COVER),  # mean pre cover across transects within site
            .groups = "drop")            # ungroup after summarise

dat_hc %>%
  filter(pre_post == "post") %>%         # keep only post-eruption rows
  group_by(site, site_country_region, year,
           dist_km, pdc0_4km3_fr1e2_etamax_end) %>%  # group by site and covariates
  summarise(
    post_mean = mean(COVER),             # mean post-eruption cover per site
    post_max  = max(COVER),              # max post-eruption cover per site
    n         = n(),                     # number of transects per site
    .groups   = "drop"
  ) %>%
  left_join(pre_site_hc, by = "site") %>%  # join pre-cover means onto post data
  mutate(delta = post_mean - pre_mean_cov) %>%  # calculate change: negative = decline. this is essentially what you need for the map deltas per site
  arrange(delta) %>%                     # sort ascending — most negative delta first
  head(20) %>%                           # keep top 20 sites with greatest decline
  mutate(across(where(is.numeric), ~ round(.x, 3))) %>%  # round all numeric columns
  select(site, site_country_region, year, dist_km,
         pdc0_4km3_fr1e2_etamax_end,
         pre_mean_cov, post_mean, delta) %>%   # keep relevant columns only
  rename(region   = site_country_region,
         wave_PDC0  = pdc0_4km3_fr1e2_etamax_end,  # NOTE:— PDC0 was best model for hard coral
        pre_cov_mean  = pre_mean_cov,
         post_cov_mean = post_mean) %>%
  kable(caption = "Sites with greatest hard coral decline (most negative delta)")
Sites with greatest hard coral decline (most negative delta)
site region year dist_km wave_PDC0 pre_cov_mean post_cov_mean delta
Fafa control close Tongatapu 2024 64.278 1.515 0.473 0.007 -0.466
Sandy Beach Bommie Haapai 2024 147.386 1.184 0.411 0.010 -0.401
Sopu Far Tongatapu 2024 61.771 1.004 0.444 0.060 -0.384
Atata SMA 3 Tongatapu 2024 58.762 1.968 0.422 0.045 -0.377
Uiha FHR Town Haapai 2024 125.273 0.222 0.361 0.034 -0.327
Nomuka Town 2 Haapai 2024 69.103 2.459 0.336 0.014 -0.322
Atata SMA 1 Tongatapu 2024 58.250 1.832 0.307 0.012 -0.295
Atata SMA 2 Tongatapu 2024 58.281 1.845 0.288 0.032 -0.256
Nomuka FHR 1 Haapai 2024 69.786 1.637 0.259 0.009 -0.249
Uiha FHR Far Haapai 2024 125.423 0.211 0.284 0.042 -0.242
Felemea FHR 3 Bommie Haapai 2024 123.598 0.342 0.248 0.010 -0.239
Mango SMA Far Haapai 2024 73.474 1.925 0.314 0.075 -0.238
Nomuka FHR 2 Haapai 2024 69.774 1.637 0.259 0.039 -0.220
Nomuka Iki outside 1 Haapai 2024 67.038 3.270 0.237 0.023 -0.214
Nomuka Town 1 Haapai 2024 69.123 2.459 0.212 0.007 -0.205
Kelefesia 2 Haapai 2024 67.598 1.113 0.211 0.008 -0.202
Kelefesia 1 Haapai 2024 67.579 1.028 0.197 0.002 -0.195
Atata FHR 3 Tongatapu 2024 57.398 3.320 0.205 0.010 -0.195
Atata FHR 2 Tongatapu 2024 57.402 3.320 0.206 0.015 -0.191
Mango FHR Close Haapai 2024 74.708 1.532 0.217 0.041 -0.177

2.7 Build post-only dataset with nested IDs

Creates the modelling dataset. We model post-eruption counts only, using pre-cover as a fixed effect (variable intercept). Steps: (1) extract pre-cover per transect from pre rows; (2) join to post rows so each post transect has its matching pre-cover; (3) create manual nested IDs for the three-level random effects structure. Nested IDs are built manually (region_num_site_num) rather than relying on site names, which gives clean independent variance components at each spatial level. All continuous predictors are z-scored so coefficients are on a comparable scale.

Show code
# Extract pre-cover per transect
pre_cover_transect_hc <- dat_hc %>%
  filter(pre_post == "pre") %>%
  select(site, survey_transect_number, pre_cover = COVER)

cat("Pre-cover extracted for", nrow(pre_cover_transect_hc), "transects\n")
Pre-cover extracted for 416 transects
Show code
cat("Zeros:", sum(pre_cover_transect_hc$pre_cover == 0), "/",
    nrow(pre_cover_transect_hc),
    "(", round(mean(pre_cover_transect_hc$pre_cover == 0) * 100, 1), "%)\n")
Zeros: 7 / 416 ( 1.7 %)
Show code
# Region key for nested IDs
region_key <- tibble(
  site_country_region = c("Tongatapu", "Haapai", "Vavau"),
  region_num          = c(1, 2, 3)
)

# Build post-only dataset
coral <- dat_hc %>%
  filter(pre_post == "post") %>%
  left_join(pre_cover_transect_hc,
            by = c("site", "survey_transect_number")) %>%
  left_join(region_key, by = "site_country_region") %>%
  group_by(site_country_region) %>%
  mutate(site_num = as.numeric(factor(site))) %>% # create site number IDs per region
  ungroup() %>%
  mutate(
    # Pre-cover: raw z-score — no transformation needed
    # mean = 16%, SD = 11.7%, max|z| = 3.86, only 1.7% zeros
    pre_cover_scaled = as.numeric(scale(pre_cover)),

    # Environmental predictors: z-scored
    dist_scaled     = as.numeric(scale(dist_km)),
    depth_scaled    = as.numeric(scale(survey_depth)),
    wave_GN_scaled  = as.numeric(scale(HTHH_pdc_GN4km3_3600_etamax)),
    wave_SV_scaled  = as.numeric(scale(HTHH_pdc_SV4km3_etamax_end)),
    wave_PDC0_scaled = as.numeric(scale(pdc0_4km3_fr1e2_etamax_end)),

    # Manual nested IDs
    region_id   = factor(region_num),
    site_id     = factor(paste(region_num, site_num, sep = "_")),
    transect_id = factor(paste(region_num, site_num,
                                survey_transect_number, sep = "_")),

    site_country_region = factor(site_country_region,
                                  levels = c("Tongatapu", "Haapai", "Vavau"))
  )

# store mean and SD for back-transformation later
wave_mean <- mean(coral$pdc0_4km3_fr1e2_etamax_end)
wave_sd   <- sd(coral$pdc0_4km3_fr1e2_etamax_end)

cat("\nPost dataset:\n")

Post dataset:
Show code
cat("  Rows:", nrow(coral), "\n")
  Rows: 416 
Show code
cat("  N region levels:", length(unique(coral$region_id)), "\n")
  N region levels: 3 
Show code
cat("  N site levels:", length(unique(coral$site_id)), "\n")
  N site levels: 104 
Show code
cat("  N transect levels:", length(unique(coral$transect_id)), "\n")
  N transect levels: 416 
Show code
cat("  pre_cover_scaled range:",
    round(range(coral$pre_cover_scaled), 2), "\n")
  pre_cover_scaled range: -1.36 3.86 
Show code
cat("  max |z|:", round(max(abs(coral$pre_cover_scaled)), 2), "\n")
  max |z|: 3.86 
Show code
cat("  Mean TOTAL:", round(mean(coral$TOTAL)), "\n")
  Mean TOTAL: 1859 
Show code
cat("  Post cover zeros:", sum(coral$COVER == 0), "/", nrow(coral), "\n")
  Post cover zeros: 0 / 416 
Show code
write.csv(coral, file.path(csv_path, "coral_postdat.csv"), row.names = FALSE)
cat("\nSaved:", file.path(csv_path, "coral_postdat.csv"), "\n")

Saved: /Users/jc928558/Library/CloudStorage/GoogleDrive-lucy.k.southworth@gmail.com/My Drive/JCU (Google drive)/PhD/Thesis Chapters:papers/Thesis Chapters and related docs/CH1 - Benthic shifts/Ch1_Analysis/Reefcloud_data_analyses_ch1/output csvs/14_coral_BGLMM_analysis/coral_postdat.csv 

2.8 Covariate correlations

Checks for multicollinearity between predictors. High correlation between wave models (r > 0.9) means they are measuring essentially the same spatial pattern — only one should be used. Correlation between wave and distance reflects geography: Tongatapu sites are close to the volcano AND experienced high wave heights. If r > 0.7 between distance and wave, their independent effects will be difficult to separate in the model.

Show code
coral %>%
  select(pre_cover_scaled, dist_scaled,
         wave_GN_scaled, wave_SV_scaled, wave_PDC0_scaled) %>%
  cor() %>%
  round(3) %>%
  kable(caption = "Correlations between scaled predictors")
Correlations between scaled predictors
pre_cover_scaled dist_scaled wave_GN_scaled wave_SV_scaled wave_PDC0_scaled
pre_cover_scaled 1.000 -0.375 -0.016 0.034 0.052
dist_scaled -0.375 1.000 -0.479 -0.550 -0.565
wave_GN_scaled -0.016 -0.479 1.000 0.968 0.959
wave_SV_scaled 0.034 -0.550 0.968 1.000 0.973
wave_PDC0_scaled 0.052 -0.565 0.959 0.973 1.000
Show code
# visualise
coral %>%
  select(pre_cover_scaled, dist_scaled, wave_PDC0_scaled) %>%
  pairs()

Distance and wave are moderately correlated (r = −0.48 to −0.57 across wave models) — sites closer to the volcano received higher wave exposure. This is ecologically expected, but means distance and wave are not fully independent predictors. This collinearity likely contributes to the wide credible interval on dist_scaled in M2 and M3 once wave is included — the two predictors are competing to explain overlapping variance.

Distance and pre-cover are moderately correlated (r = −0.375) — sites closer to the volcano had higher pre-eruption coral cover. Meaning distance, pre-cover and wave are all partially collinear, which may contribute to the non-significant pre-cover coefficient in M3.

Pre-cover and wave are weakly correlated (r = 0.05 for PDC0) — negligible collinearity between these two predictors.


3. Prior Specification

McElreath (2020): Priors should reflect genuine prior knowledge. Hard coral is a dominant benthic group on healthy Pacific reefs ~ 11% Priors adjusted accordingly.

3.1 Prior rationale for hard coral

Key differences from sargassum priors:

With 0 + pre_cover_scaled there is no free intercept — the pre_cover coefficient anchors the baseline. For hard coral:

  • Pre-cover is well distributed (mean 16%, max 61%) — the coefficient represents a genuine slope relationship between pre and post cover. normal(0, 1) is appropriate — a coefficient of 1 means a 1 SD increase in pre-cover is associated with a 1 logit unit increase in post-cover, which is a moderate to large effect
  • Distance and wave effects: normal(0, 1) — same as sargassum
  • Random effect SDs: student_t(3, 0, 0.5) — same as sargassum, constrains to moderate between-site variation. Hard coral has more sites with non-zero cover so this may be slightly less restrictive than needed — check prior predictive check

3.2 Check default priors

Shows what brms would use if no priors were specified. The flat student_t default for fixed effects is too vague — it allows implausibly large coefficients. Checking defaults here justifies why we set our own priors rather than accepting the defaults. The key thing to note is that with 0 + pre_cover_scaled, the pre-cover coefficient has class = "b" not class = "Intercept" — this matters for prior specification.

Show code
cat("Default priors — null model:\n")
Default priors — null model:
Show code
bf(COUNT | trials(TOTAL) ~ 1 +
     (1 | region_id) + (1 | site_id) + (1 | transect_id)) |>
  get_prior(data = coral, family = binomial()) |>
  print()
                prior     class      coef       group resp dpar nlpar lb ub
 student_t(3, 0, 2.5) Intercept                                            
 student_t(3, 0, 2.5)        sd                                        0   
 student_t(3, 0, 2.5)        sd             region_id                  0   
 student_t(3, 0, 2.5)        sd Intercept   region_id                  0   
 student_t(3, 0, 2.5)        sd               site_id                  0   
 student_t(3, 0, 2.5)        sd Intercept     site_id                  0   
 student_t(3, 0, 2.5)        sd           transect_id                  0   
 student_t(3, 0, 2.5)        sd Intercept transect_id                  0   
       source
      default
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
Show code
cat("\nDefault priors — 0 + pre_cover model:\n")

Default priors — 0 + pre_cover model:
Show code
bf(COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled +
     (1 | region_id) + (1 | site_id) + (1 | transect_id)) |>
  get_prior(data = coral, family = binomial()) |>
  print()
                prior class             coef       group resp dpar nlpar lb ub
               (flat)     b                                                   
               (flat)     b pre_cover_scaled                                  
 student_t(3, 0, 2.5)    sd                                               0   
 student_t(3, 0, 2.5)    sd                    region_id                  0   
 student_t(3, 0, 2.5)    sd        Intercept   region_id                  0   
 student_t(3, 0, 2.5)    sd                      site_id                  0   
 student_t(3, 0, 2.5)    sd        Intercept     site_id                  0   
 student_t(3, 0, 2.5)    sd                  transect_id                  0   
 student_t(3, 0, 2.5)    sd        Intercept transect_id                  0   
       source
      default
 (vectorized)
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)

These are the default priors for the null model: student_t(3, 0, 2.5) Intercept # global intercept student_t(3, 0, 2.5) sd # random effect SDs

or.. priors_null <- c( prior(student_t(3, 0, 2.5), class = “Intercept”), # global intercept prior(student_t(3, 0, 2.5), class = “sd”) # random effect SDs )

These are the default priors for the variable intercept model: NB: no Intercept row at all - pre_cover_scaled takes its place as a b class coefficient and controls the baseline predictions

(flat) b # fixed effects including pre_cover_scaled - (flat) for the b class (fixed effects) — meaning completely uninformative, any value from −∞ to +∞ is equally plausible. student_t(3, 0, 2.5) sd # random effect SDs

or.. priors_full <- c( prior(flat(), class = “b”), # fixed effects including pre_cover_scaled prior(student_t(3, 0, 2.5), class = “sd”) )

Notes on variable intercept model priors:

Moved fixed effects prior (including precover) to normal(0, 1): This is the standard weakly informative prior for logistic regression coefficients recommended by Gelman, McElreath, and the Stan team. It says “effects are probably moderate in size” — a coefficient of ±2 on the logit scale is already a large effect (corresponds to roughly a 4-fold change in odds), so normal(0, 1) is genuinely weakly informative while still providing enough regularisation to keep sampling stable.

The normal(0, 1) on b centres predictions at 50% cover for a site with mean pre-cover (z scaled = 0), which is why the M1 prior predictive looks wide regardless of how much it is tighetened (see pp checks below).

https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations

For pre_cover_scaled — the normal(0, 1) is acting as the variable intercept prior. The plogis back-transformations output are showing what range of post-eruption cover the prior considers plausible at mean pre-cover (z = 0), which is 50%, this is wide.

For dist_scaled and wave_PDC0_scaled — the same normal(0, 1) prior applies but the interpretation is different. Here it’s saying a 1 SD change in distance or wave height shifts the logit by ±2 at most, corresponding to cover changing from ~12% to ~88%. This is a reasonable weakly informative prior for a slope coefficient — it’s not saying we expect large effects, just that we’re not ruling them out.

For sd terms — student_t(3, 0, 0.5) is a separate prior controlling between-group variance at region, site and transect levels, not shown in the plogis back-transformations. ## 3.3 Define priors

Two prior sets: one for M0 (which has a free intercept) and one for M1–M4 (which use 0 + pre_cover_scaled). The printed plogis values translate the logit-scale priors into cover percentages so you can assess whether the prior range is biologically reasonable. For example normal(0, 1) for the pre_cover coefficient means a 1 SD change in pre-cover shifts expected post-cover by up to ±88% on the probability scale — wide but not absurd.

Show code
# Null model
priors_null <- c(
  prior(normal(-2, 1), class = "Intercept"), #  tightened the global intercept to normal(-2, 1) to centre predictions at ~11% cover — more ecologically appropriate compared to the default prior.
  prior(student_t(3, 0, 0.5), class = "sd")  # tightened the random effects sd to student_t(3, 0, 0.5) to constrain between-group variation to a more realistic range.
)

# Models 1-4: 0 + pre_cover_scaled as variable intercept
# normal(0, 1) for all fixed effects (pre_cover_scaled, dist, wave):
#   Hard coral is not rare — a moderate slope between pre and post cover
#   is expected. plogis(0 ± 2) = 12% to 88%, covers plausible range.
#   Same prior applied to dist and wave — moderate expected effects.
# student_t(3, 0, 0.5) for SDs: constrains between-group variation

priors_full <- c(
  prior(normal(0, 1), class = "b"), # changed this from the flat prior - to the standard weakly informative prior for logistic regression coefficients - suggested by McElreath, Gelman and the Stan team
  prior(student_t(3, 0, 0.5), class = "sd") # same as null - tightened the random effects sd to student_t(3, 0, 0.5) to constrain between-group variation to a more realistic range.
)

cat("Priors:\n")
Priors:
Show code
cat("  pre_cover_scaled: normal(0, 1)\n")
  pre_cover_scaled: normal(0, 1)
Show code
cat("    plogis(-2) =", round(plogis(-2) * 100, 1), "% cover\n")
    plogis(-2) = 11.9 % cover
Show code
cat("    plogis( 0) =", round(plogis( 0) * 100, 1), "% cover\n")
    plogis( 0) = 50 % cover
Show code
cat("    plogis(+2) =", round(plogis(+2) * 100, 1), "% cover\n")
    plogis(+2) = 88.1 % cover
Show code
cat("  dist, wave: normal(0, 1)\n")
  dist, wave: normal(0, 1)
Show code
cat("  SD: student_t(3, 0, 0.5)\n")
  SD: student_t(3, 0, 0.5)
Show code
# Define priors for M5:
priors_m5 <- c(
  # Global intercept: centred at logit(-2) = ~11% cover, consistent with
  # observed post-eruption mean (~8%). SD = 1 allows range of ~1-50% cover
  # at ±2SD. Same prior used for M0 null model intercept.
  prior(normal(-2, 1), class = "Intercept"),
  
  # Fixed effects (dist_scaled, pre_cover_scaled, wave_PDC0_scaled and their
  # interaction): normal(0, 1) is weakly informative on the logit scale —
  # plogis(±2) = 12-88% cover, allows moderate to large effects without
  # forcing extreme predictions. Consistent with M1-M4 priors_full.
  prior(normal(0, 1),  class = "b"),
  
  # Random effect SDs (region, site, transect): student_t(3, 0, 0.5) 
  # constrains between-group variance to realistic range. Heavy tails
  # allow for larger variance if data support it. Lower bound = 0
  # enforced automatically as SDs cannot be negative.
  prior(student_t(3, 0, 0.5), class = "sd")
)

3.4 Prior predictive check

Fits the null model using priors only — no data. This generates hypothetical datasets from the prior distribution alone and lets us ask: do these priors generate biologically plausible hard coral counts before seeing any data?

What to look for in the plots: - Density overlay: Light blue lines (prior predictions) should span a plausible range. Not all clustered at zero and not uniformly flat across the entire count range. - Mean count: The observed mean (dark bar) should sit within the prior predictive distribution — not in the extreme tail. - Max count: Prior should be capable of generating values at least as large as the observed maximum. - Proportion zeros: Prior should generate a similar proportion of zero counts to what is observed. Hard coral has very few zeros (~0%) so the prior should not predict excess zeros.

Show code
m_prior <- brm(
  formula      = bf(COUNT | trials(TOTAL) ~ 1 +
                      (1 | region_id) + (1 | site_id) + (1 | transect_id)),
  data         = coral,
  family       = binomial(link = "logit"),
  prior        = priors_null,
  sample_prior = "only",
  chains       = 4,
  iter         = 3000,
  warmup       = 1000,
  cores        = 4,
  seed         = 42,
  backend      = "cmdstanr",
  refresh      = 500
)
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 2 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 2 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 3 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 4 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 1 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 1 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 2 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 2 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 4 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 3 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 3 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 4 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 4 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 1 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 1 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 2 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 3 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 4 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 1 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 2 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 3 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 4 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 1 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 1 finished in 1.8 seconds.
Chain 2 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 3 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 2 finished in 1.8 seconds.
Chain 3 finished in 1.8 seconds.
Chain 4 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 4 finished in 1.9 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.8 seconds.
Total execution time: 2.1 seconds.
Show code
set.seed(42)
prop_zero <- function(x) mean(x == 0)

p1 <- pp_check(m_prior, type = "dens_overlay", ndraws = 100) +
  labs(title = "Prior predictive: density",
       subtitle = "Light = prior predictions, dark = observed (reference only)")
p2 <- pp_check(m_prior, type = "stat", stat = "mean") +
  labs(title = "Prior predictive: mean count",
       subtitle = paste0("Observed mean = ", round(mean(coral$COUNT))))
p3 <- pp_check(m_prior, type = "stat", stat = "max") +
  labs(title = "Prior predictive: max count",
       subtitle = paste0("Observed max = ", max(coral$COUNT)))
p4 <- pp_check(m_prior, type = "stat", stat = prop_zero) +
  labs(title = "Prior predictive: proportion zeros",
       subtitle = paste0("Observed = ", round(mean(coral$COUNT == 0), 3)))

(p1 + p2) / (p3 + p4) +
  plot_annotation(title = "Prior predictive checks — hard coral null model")

Mean count: After lowering the intercept to normal(-2, 1), the observed mean (155) sits near the mode of the prior predictive distribution. Earlier iterations with normal(0, 1.5) allowed unrealistically high cover scenarios with the distribution skewed far to the right — the current prior is much more ecologically grounded.

Max count: The prior still allows max counts up to ~3500, but the observed max (972) now sits near the left shoulder rather than being buried in the left tail. The binomial is bounded by TOTAL (~1859 points) so counts above that are impossible — the remaining right tail reflects prior uncertainty about extreme-cover transects, which is acceptable at this stage.

Proportion zeros: The prior allows up to ~0.5 zero-count transects whereas the observed proportion is ~0. This is not a problem for the null model, but worth monitoring in M1 — sites with very low pre-cover could legitimately have zero post-cover, so the prior is not unreasonable. If the posterior PPC over-predicts zeros in M1+, revisit. ## 3.5 Prior predictive check — M1 (variable intercept)

Checks whether priors_full produce biologically plausible hard coral counts under the M1 formula (0 + pre_cover_scaled). This is necessary because M1 removes the global intercept — predictions are now anchored entirely to pre_cover_scaled, so the normal(0, 1) prior on class = "b" is doing different work than the Intercept prior in M0. In particular, the extreme end of the pre-cover distribution (max|z| = 3.86) could push predictions toward implausibly high cover if the prior is too wide. This check confirms the prior is reasonable before committing to the full MCMC run.

What to look for: The same four panels as 3.4. The observed mean (155) and max (972) should sit comfortably within — not in the tail of — the prior predictive distributions. The proportion-zeros panel should remain low. Compare directly to 3.4 — if the distributions look similar, the change from intercept to variable-intercept parameterisation has not introduced implausible prior predictions.

Show code
m1_prior <- brm(
  formula      =  bf(COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled +
            (1 | region_id) + (1 | site_id) + (1 | transect_id)),
  data         = coral,
  family       = binomial(link = "logit"),
  prior        = priors_full,
  sample_prior = "only",
  chains       = 4,
  iter         = 3000,
  warmup       = 1000,
  cores        = 4,
  seed         = 42,
  backend      = "cmdstanr",
  refresh      = 500
)
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 2 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 2 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 3 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 4 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 1 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 1 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 2 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 2 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 4 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 3 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 3 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 4 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 4 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 1 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 1 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 2 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 3 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 4 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 1 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 2 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 3 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 4 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 1 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 2 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 4 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 1 finished in 1.9 seconds.
Chain 2 finished in 1.9 seconds.
Chain 4 finished in 1.9 seconds.
Chain 3 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 3 finished in 2.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.9 seconds.
Total execution time: 2.2 seconds.
Show code
set.seed(42)
p1 <- pp_check(m1_prior, type = "dens_overlay", ndraws = 100) +
  labs(title = "Prior predictive: density — M1",
       subtitle = "Light = prior predictions, dark = observed (reference only)")
p2 <- pp_check(m1_prior, type = "stat", stat = "mean") +
  labs(title = "Prior predictive: mean count — M1",
       subtitle = paste0("Observed mean = ", round(mean(coral$COUNT))))
p3 <- pp_check(m1_prior, type = "stat", stat = "max") +
  labs(title = "Prior predictive: max count — M1",
       subtitle = paste0("Observed max = ", max(coral$COUNT)))
p4 <- pp_check(m1_prior, type = "stat", stat = prop_zero) +
  labs(title = "Prior predictive: proportion zeros — M1",
       subtitle = paste0("Observed = ", round(mean(coral$COUNT == 0), 3)))

(p1 + p2) / (p3 + p4) +
  plot_annotation(title = "Prior predictive checks — M1 variable intercept")

Show code
m1_prior |>
  conditional_effects(conditions = data.frame(TOTAL = 100)) |>
  plot(points = TRUE) #  prior predictive conditional effects plot showing what the normal(0, 1) prior on pre_cover_scaled considers plausible before seeing any data.

If the distributions look implausible (e.g. mean count distribution shifted far right, or extreme pre-cover sites pushing predictions to near 100%), tighten the pre_cover_scaled prior from normal(0, 1) to normal(0, 0.5) and recheck before fitting M1–M4.

Show code
m5_prior <- brm(
  formula = bf(COUNT | trials(TOTAL) ~ 1 + dist_scaled +
                 pre_cover_scaled * wave_PDC0_scaled +
                 (1 | region_id) + (1 | site_id) + (1 | transect_id)),
  data         = coral,
  family       = binomial(link = "logit"),
  prior        = priors_m5,
  sample_prior = "only",
  chains = 4, iter = 3000, warmup = 1000,
  cores = 4, seed = 42, backend = "cmdstanr"
)
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 3000 [  3%]  (Warmup) 
Chain 1 Iteration:  200 / 3000 [  6%]  (Warmup) 
Chain 1 Iteration:  300 / 3000 [ 10%]  (Warmup) 
Chain 1 Iteration:  400 / 3000 [ 13%]  (Warmup) 
Chain 2 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 3000 [  3%]  (Warmup) 
Chain 2 Iteration:  200 / 3000 [  6%]  (Warmup) 
Chain 2 Iteration:  300 / 3000 [ 10%]  (Warmup) 
Chain 2 Iteration:  400 / 3000 [ 13%]  (Warmup) 
Chain 2 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 2 Iteration:  600 / 3000 [ 20%]  (Warmup) 
Chain 2 Iteration:  700 / 3000 [ 23%]  (Warmup) 
Chain 3 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 3000 [  3%]  (Warmup) 
Chain 3 Iteration:  200 / 3000 [  6%]  (Warmup) 
Chain 3 Iteration:  300 / 3000 [ 10%]  (Warmup) 
Chain 3 Iteration:  400 / 3000 [ 13%]  (Warmup) 
Chain 3 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 3 Iteration:  600 / 3000 [ 20%]  (Warmup) 
Chain 3 Iteration:  700 / 3000 [ 23%]  (Warmup) 
Chain 4 Iteration:    1 / 3000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 3000 [  3%]  (Warmup) 
Chain 4 Iteration:  200 / 3000 [  6%]  (Warmup) 
Chain 4 Iteration:  300 / 3000 [ 10%]  (Warmup) 
Chain 4 Iteration:  400 / 3000 [ 13%]  (Warmup) 
Chain 4 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 4 Iteration:  600 / 3000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 3000 [ 16%]  (Warmup) 
Chain 1 Iteration:  600 / 3000 [ 20%]  (Warmup) 
Chain 1 Iteration:  700 / 3000 [ 23%]  (Warmup) 
Chain 1 Iteration:  800 / 3000 [ 26%]  (Warmup) 
Chain 1 Iteration:  900 / 3000 [ 30%]  (Warmup) 
Chain 1 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 1 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 2 Iteration:  800 / 3000 [ 26%]  (Warmup) 
Chain 2 Iteration:  900 / 3000 [ 30%]  (Warmup) 
Chain 2 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 2 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 3 Iteration:  800 / 3000 [ 26%]  (Warmup) 
Chain 3 Iteration:  900 / 3000 [ 30%]  (Warmup) 
Chain 3 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 3 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 4 Iteration:  700 / 3000 [ 23%]  (Warmup) 
Chain 4 Iteration:  800 / 3000 [ 26%]  (Warmup) 
Chain 4 Iteration:  900 / 3000 [ 30%]  (Warmup) 
Chain 1 Iteration: 1100 / 3000 [ 36%]  (Sampling) 
Chain 2 Iteration: 1100 / 3000 [ 36%]  (Sampling) 
Chain 4 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
Chain 4 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
Chain 1 Iteration: 1200 / 3000 [ 40%]  (Sampling) 
Chain 1 Iteration: 1300 / 3000 [ 43%]  (Sampling) 
Chain 2 Iteration: 1200 / 3000 [ 40%]  (Sampling) 
Chain 3 Iteration: 1100 / 3000 [ 36%]  (Sampling) 
Chain 3 Iteration: 1200 / 3000 [ 40%]  (Sampling) 
Chain 4 Iteration: 1100 / 3000 [ 36%]  (Sampling) 
Chain 1 Iteration: 1400 / 3000 [ 46%]  (Sampling) 
Chain 2 Iteration: 1300 / 3000 [ 43%]  (Sampling) 
Chain 3 Iteration: 1300 / 3000 [ 43%]  (Sampling) 
Chain 4 Iteration: 1200 / 3000 [ 40%]  (Sampling) 
Chain 4 Iteration: 1300 / 3000 [ 43%]  (Sampling) 
Chain 1 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1400 / 3000 [ 46%]  (Sampling) 
Chain 2 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1400 / 3000 [ 46%]  (Sampling) 
Chain 3 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1400 / 3000 [ 46%]  (Sampling) 
Chain 1 Iteration: 1600 / 3000 [ 53%]  (Sampling) 
Chain 1 Iteration: 1700 / 3000 [ 56%]  (Sampling) 
Chain 2 Iteration: 1600 / 3000 [ 53%]  (Sampling) 
Chain 3 Iteration: 1600 / 3000 [ 53%]  (Sampling) 
Chain 4 Iteration: 1500 / 3000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1800 / 3000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1700 / 3000 [ 56%]  (Sampling) 
Chain 2 Iteration: 1800 / 3000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1700 / 3000 [ 56%]  (Sampling) 
Chain 4 Iteration: 1600 / 3000 [ 53%]  (Sampling) 
Chain 4 Iteration: 1700 / 3000 [ 56%]  (Sampling) 
Chain 1 Iteration: 1900 / 3000 [ 63%]  (Sampling) 
Chain 3 Iteration: 1800 / 3000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1800 / 3000 [ 60%]  (Sampling) 
Chain 1 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 2 Iteration: 1900 / 3000 [ 63%]  (Sampling) 
Chain 2 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 3 Iteration: 1900 / 3000 [ 63%]  (Sampling) 
Chain 4 Iteration: 1900 / 3000 [ 63%]  (Sampling) 
Chain 1 Iteration: 2100 / 3000 [ 70%]  (Sampling) 
Chain 2 Iteration: 2100 / 3000 [ 70%]  (Sampling) 
Chain 3 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 3 Iteration: 2100 / 3000 [ 70%]  (Sampling) 
Chain 4 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
Chain 4 Iteration: 2100 / 3000 [ 70%]  (Sampling) 
Chain 1 Iteration: 2200 / 3000 [ 73%]  (Sampling) 
Chain 1 Iteration: 2300 / 3000 [ 76%]  (Sampling) 
Chain 2 Iteration: 2200 / 3000 [ 73%]  (Sampling) 
Chain 2 Iteration: 2300 / 3000 [ 76%]  (Sampling) 
Chain 3 Iteration: 2200 / 3000 [ 73%]  (Sampling) 
Chain 4 Iteration: 2200 / 3000 [ 73%]  (Sampling) 
Chain 1 Iteration: 2400 / 3000 [ 80%]  (Sampling) 
Chain 2 Iteration: 2400 / 3000 [ 80%]  (Sampling) 
Chain 3 Iteration: 2300 / 3000 [ 76%]  (Sampling) 
Chain 4 Iteration: 2300 / 3000 [ 76%]  (Sampling) 
Chain 1 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 1 Iteration: 2600 / 3000 [ 86%]  (Sampling) 
Chain 2 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 3 Iteration: 2400 / 3000 [ 80%]  (Sampling) 
Chain 3 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 4 Iteration: 2400 / 3000 [ 80%]  (Sampling) 
Chain 4 Iteration: 2500 / 3000 [ 83%]  (Sampling) 
Chain 1 Iteration: 2700 / 3000 [ 90%]  (Sampling) 
Chain 2 Iteration: 2600 / 3000 [ 86%]  (Sampling) 
Chain 2 Iteration: 2700 / 3000 [ 90%]  (Sampling) 
Chain 3 Iteration: 2600 / 3000 [ 86%]  (Sampling) 
Chain 3 Iteration: 2700 / 3000 [ 90%]  (Sampling) 
Chain 4 Iteration: 2600 / 3000 [ 86%]  (Sampling) 
Chain 1 Iteration: 2800 / 3000 [ 93%]  (Sampling) 
Chain 1 Iteration: 2900 / 3000 [ 96%]  (Sampling) 
Chain 2 Iteration: 2800 / 3000 [ 93%]  (Sampling) 
Chain 2 Iteration: 2900 / 3000 [ 96%]  (Sampling) 
Chain 3 Iteration: 2800 / 3000 [ 93%]  (Sampling) 
Chain 4 Iteration: 2700 / 3000 [ 90%]  (Sampling) 
Chain 4 Iteration: 2800 / 3000 [ 93%]  (Sampling) 
Chain 1 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 2 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 3 Iteration: 2900 / 3000 [ 96%]  (Sampling) 
Chain 3 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 4 Iteration: 2900 / 3000 [ 96%]  (Sampling) 
Chain 4 Iteration: 3000 / 3000 [100%]  (Sampling) 
Chain 1 finished in 2.2 seconds.
Chain 2 finished in 2.2 seconds.
Chain 3 finished in 2.2 seconds.
Chain 4 finished in 2.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 2.2 seconds.
Total execution time: 2.3 seconds.
Show code
set.seed(42)
p1 <- pp_check(m5_prior, type = "dens_overlay", ndraws = 100) +
  labs(title = "Prior predictive: density — M5",
       subtitle = "Light = prior predictions, dark = observed (reference only)")
p2 <- pp_check(m5_prior, type = "stat", stat = "mean") +
  labs(title = "Prior predictive: mean count — M5",
       subtitle = paste0("Observed mean = ", round(mean(coral$COUNT))))
p3 <- pp_check(m5_prior, type = "stat", stat = "max") +
  labs(title = "Prior predictive: max count — M5",
       subtitle = paste0("Observed max = ", max(coral$COUNT)))
p4 <- pp_check(m5_prior, type = "stat", stat = prop_zero) +
  labs(title = "Prior predictive: proportion zeros — M5",
       subtitle = paste0("Observed = ", round(mean(coral$COUNT == 0), 3)))

(p1 + p2) / (p3 + p4) +
  plot_annotation(title = "Prior predictive checks — M5")

Show code
m5_prior |>
  conditional_effects(conditions = data.frame(TOTAL = 100)) |>
  plot(points = TRUE) #  prior predictive conditional effects plot showing what the normal(0, 1) prior on pre_cover_scaled considers plausible before seeing any data.

M5 prior predictive check: The global intercept prior normal(-2, 1) is doing its job — the mean count distribution is right-skewed with the observed mean (155) sitting near the left shoulder rather than being buried in the far left tail as seen in the M1 prior predictive. The max count distribution still has a wide right tail but the observed max (972) sits near the mode. Proportion zeros is low and well-contained near zero, consistent with coral being present at nearly all sites.

The density overlay shows prior draws spanning a wider range than the observed data — expected for a weakly informative prior. The normal(-2, 1) intercept is successfully anchoring predictions around ecologically plausible cover values (~11% at mean predictors) while still allowing the data to update freely. Priors are acceptable — proceed with fitting M5.

4. MCMC Settings

MCMC settings control how the sampler explores the posterior distribution. chains = 4 runs four independent chains — they should converge to the same distribution. iter = 5000 total iterations per chain; warmup = 1000 are discarded as the sampler tunes itself, leaving 4000 posterior draws per chain (16000 total). adapt_delta = 0.99 makes the sampler more cautious — reducing divergent transitions at the cost of slower sampling. Increase to 0.999 if you get divergence warnings. max_treedepth = 15 controls sampler depth — increase to 20 if you get max treedepth warnings.

Show code
# Julie: iter=10000, warmup=500, adapt_delta=0.999
# Our settings: iter=5000, warmup=1000, adapt_delta=0.99
mcmc_args <- list(
  chains  = 4,
  iter    = 5000,
  warmup  = 1000,
  cores   = 4,
  seed    = 42,
  backend = "cmdstanr",
  control = list(adapt_delta = 0.999, max_treedepth = 15),
  save_pars = save_pars(all = TRUE)
)

5. Model Formulas

Five models are defined here with increasing complexity. Each model is a superset of the previous — M1 adds to M0, M2 adds to M1, and so on. This incremental structure lets LOO-CV identify at which step predictors stop improving predictions. COUNT | trials(TOTAL) specifies a binomial response — COUNT hard coral points out of TOTAL benthic points. The three random intercepts (1 | region_id) + (1 | site_id) + (1 | transect_id) partition variance across the three spatial scales in the survey design.

Show code
# M0: Null
m0f <- bf(COUNT | trials(TOTAL) ~ 1 +
            (1 | region_id) + (1 | site_id) + (1 | transect_id))

# M1: Pre-cover as variable intercept
m1f <- bf(COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled +
            (1 | region_id) + (1 | site_id) + (1 | transect_id))

# M2: Add distance
m2f <- bf(COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled +
            (1 | region_id) + (1 | site_id) + (1 | transect_id))

# M3: Add wave — looped over GN, SV, PDC0
# M4: Distance x wave interaction

#M5: 
m5f <-COUNT | trials(TOTAL) ~ 1 +  dist_scaled +  pre_cover_scaled * wave_PDC0_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id)

6. Model 0: Null

The null model has no fixed predictors — only the three random intercepts. It establishes the baseline variance structure and gives us the overall post-eruption mean cover. The primary outputs are the three random effect SDs which show at which spatial scale coral cover varies most. If sd_site is large relative to sd_transect, site-level covariates (wave, distance) are the most promising predictors to add.

Show code
rds_m0 <- file.path(model_path, "m0_null.RDS")

# FIT AND SAVE — unhash to refit and overwrite
# m0 <- brm(
#   formula = m0f, data = coral,
#   family = binomial(link = "logit"),
#   prior = priors_null, sample_prior = "yes",
#   chains = mcmc_args$chains, iter = mcmc_args$iter,
#   warmup = mcmc_args$warmup, cores = mcmc_args$cores,
#   seed = mcmc_args$seed, backend = mcmc_args$backend,
#   control = mcmc_args$control, save_pars = mcmc_args$save_pars
# )
# saveRDS(m0, rds_m0); cat("M0 fitted and saved.\n")

# LOAD SAVED MODEL
m0 <- readRDS(rds_m0); cat("M0 loaded.\n")
M0 loaded.
Show code
summary(m0)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 1 + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.41      0.30     0.03     1.16 1.00     4561     8513

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.25      0.09     1.09     1.45 1.00     6852    10414

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     7022    11581

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -2.99      0.31    -3.52    -2.28 1.00    11681    10645

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Interpreting M0: The intercept back-transforms to the overall mean hard coral cover. Compare to sargassum M0 — sd_site should be large since sites vary considerably in hard coral cover. sd_transect reflects within-site patchiness.

M0 summary: Convergence is clean — all Rhat = 1.00 and ESS well above acceptable thresholds. The 3 remaining divergences are negligible and expected given only 3 region levels; increasing adapt_delta further is not warranted. Site-level variation dominates (sd_site = 1.25, CI: 1.09–1.45), confirming that site-level covariates (wave, distance) are the most promising predictors to add.

Regional variation is uncertain (sd_region = 0.41, CI: 0.03–1.16) — expected with only 3 levels.

The intercept (−2.99) back-transforms to ~4.8% mean post-eruption cover (plogis(-2.99)), slightly below the observed 8.2% as the random effects absorb site-level variation.

M0 back-transformed intercept: Global mean post-eruption hard coral cover = 5.0% (95% CI: 2.87–9.25%). Below the observed raw mean (8.2%) as expected — the intercept averages over the random effect distribution. Wide CI reflects uncertain regional variance (sd_region CI: 0.03–1.16). ## 6.1 Convergence

M0 refit (28April) — convergence clean: All Rhat = 1.00, all ESS well above thresholds. 3 divergent transitions — acceptable given only 3 region levels and the near-zero boundary of sd_region (CI: 0.03–1.16); increasing adapt_delta further is not warranted. Estimates are stable and consistent with the previous fit — intercept −2.99 (back-transforms to ~5% cover), sd_site = 1.25 confirming site-level variation dominates. Model saved as m0_null.RDS.

Show code
as_draws_df(m0) %>%
  mutate(prob = plogis(b_Intercept) * 100) %>%
  summarise(
    mean_pct = round(mean(prob), 2),
    lower_95 = round(quantile(prob, 0.025), 2),
    upper_95 = round(quantile(prob, 0.975), 2)
  ) %>%
  kable(caption = "M0 intercept — mean post-eruption hard coral cover (%)")
M0 intercept — mean post-eruption hard coral cover (%)
mean_pct lower_95 upper_95
5.33 2.98 11.37

Trace plots: Each coloured line is one chain. Good mixing looks like a hairy caterpillar — chains overlap and zig-zag freely around a stable mean. Drifting, separated or spiky chains indicate poor convergence.

Rhat: Measures chain agreement. Values < 1.01 indicate convergence. Values > 1.01 mean chains have not mixed — do not interpret results.

ESS ratio: Effective sample size as a proportion of total draws. Values > 0.1 are acceptable; > 0.5 is good. Low ESS means high autocorrelation — the sampler is not exploring efficiently.

Show code
pars_m0 <- m0 |> get_variables() |> str_subset("^b_.*|^sd_.*")
m0$fit |> stan_trace(pars = pars_m0) +
  labs(title = "Trace plots — M0")

M0 trace plots: Convergence is clean for all parameters. b_Intercept, sd_site_id, and sd_transect_id show good chain mixing — classic hairy caterpillar pattern with all four chains overlapping around a stable mean. sd_region_id shows wider excursions and occasional spikes near zero, expected with only 3 levels — the sampler is exploring the near-zero boundary of the variance parameter. Not a concern given Rhat = 1.00 and good ESS.

Show code
p_rhat <- m0$fit |> stan_rhat(pars = pars_m0) +
  labs(title = "Rhat — all < 1.01")
p_ess  <- m0$fit |> stan_ess(pars = pars_m0) +
  labs(title = "ESS ratio — all > 0.1")
p_rhat / p_ess

M0 Rhat and ESS: All Rhat < 1.001 — well within the convergence threshold of 1.01. ESS ratios range ~0.38–0.70, all comfortably above the 0.1 minimum. The lower ESS ratio (~0.38) corresponds to sd_region_id — expected given only 3 levels and the near-zero boundary exploration seen in the trace plots. No action needed.

6.2 Prior vs posterior

Compares the prior distribution (outline) to the posterior (shaded) for each parameter. This only works when the model was fitted with sample_prior = "yes".

What to look for: - Posterior much narrower than prior = data is informative for that parameter - Posterior similar to prior = data has little information (common for sd_region with only 3 regions) - Posterior shifted from prior centre = data updated prior beliefs about the direction or magnitude of the effect

Show code
# Get names of all prior draws saved in m0 (only exist if sample_prior = "yes")
prior_pars_m0 <- variables(m0) |> str_subset("^prior_")

# Check if any prior draws exist before attempting to plot
if (length(prior_pars_m0) > 0) {
  
  # Select only parameters that exist in the model from this named list
  # intersect() prevents errors if a parameter name is missing
  plot_pars <- intersect(
    c("prior_b_Intercept",              "b_Intercept",               # intercept: prior and posterior
      "prior_sd_region_id__Intercept",  "sd_region_id__Intercept",   # region SD: prior and posterior
      "prior_sd_site_id__Intercept",    "sd_site_id__Intercept",     # site SD: prior and posterior
      "prior_sd_transect_id__Intercept","sd_transect_id__Intercept"),# transect SD: prior and posterior
    variables(m0)
  )
  
  # Plot prior vs posterior distributions for selected parameters
  mcmc_areas(as.array(m0), pars = plot_pars) +
    labs(title = "Prior vs posterior — M0",
         subtitle = "Narrow posterior = data is informative")
  
} else {
  # Safety message if model was fitted without sample_prior = "yes"
  cat("No prior samples — refit with sample_prior = 'yes'.\n")
}

M0 prior vs posterior: The data updated all parameters meaningfully. b_Intercept — posterior is much narrower than the prior and shifted left to ~−3, confirming the data strongly inform the global mean cover estimate. sd_site_id and sd_transect_id — very tight posteriors, data are highly informative at these levels given 104 sites and 416 transects. sd_region_id — posterior is broader with a long right tail, as expected with only 3 levels; the prior is only partially updated. This wide posterior will propagate uncertainty into regional-level predictions throughout all models.

6.3 Posterior predictive check

Simulates new datasets from the fitted model and compares them to the observed data. A well-fitting model should generate replicates that look like the observed data.

  • Density overlay: Dark line = observed, light lines = model replicates. The model should reproduce the overall shape of the count distribution.
  • Mean: The observed mean (dark bar) should sit within the predictive distribution. If it falls in the tail, the model is mis-estimating the average.
  • Maximum: Checks whether the model can generate values as extreme as the observed maximum — important for detecting underdispersion.
  • Proportion zeros: Hard coral has very few zeros. If the model over-predicts zeros it is not capturing the data structure well.
Show code
set.seed(42)
p1 <- pp_check(m0, type = "dens_overlay", ndraws = 100) +
  labs(title = "PPC: Density — M0")
p2 <- pp_check(m0, type = "stat", stat = "mean") +
  labs(title = "PPC: Mean")
p3 <- pp_check(m0, type = "stat", stat = "max") +
  labs(title = "PPC: Maximum")
p4 <- pp_check(m0, type = "stat", stat = prop_zero) +
  labs(title = "PPC: Proportion zeros")
(p1 + p2) / (p3 + p4) +
  plot_annotation(title = "PPC — M0")

M0 posterior predictive check: Excellent fit across all four panels. Density: simulated replicates closely track the observed distribution — the left-skewed shape with a mode near zero and long right tail is well reproduced. Mean: observed mean (155) sits near the centre of the predictive distribution, indicating no systematic bias in the global mean. Maximum: observed max (972) falls centrally — the model can generate extreme high-cover transects without underdispersion. Proportion zeros: observed zero proportion sits within the predictive distribution and is very low (~0.005), consistent with hard coral being present at nearly all transects. Overall the binomial null model fits the data well — a strong baseline before adding predictors.

6.4 Variance partitioning

Converts the random effect SDs from M0 into percentage variance at each spatial level. Each SD is squared to give variance, then expressed as a proportion of total variance. This tells you where most of the natural variation in hard coral cover sits: - High region % → regional baseline differences dominate (e.g. Vava’u vs Tongatapu have fundamentally different coral communities) - High site % → site-level covariates (wave, distance) are most promising predictors — the pattern to hope for - High transect % → within-site patchiness dominates — environmental predictors measured at site level will struggle to explain it

Show code
var_df <- as_draws_df(m0) %>%
  mutate(
    var_r = sd_region_id__Intercept^2,
    var_s = sd_site_id__Intercept^2,
    var_t = sd_transect_id__Intercept^2,
    var_total = var_r + var_s + var_t
  ) %>%
  summarise(
    pct_region   = round(mean(var_r / var_total) * 100, 1),
    pct_site     = round(mean(var_s / var_total) * 100, 1),
    pct_transect = round(mean(var_t / var_total) * 100, 1)
  )

cat("Variance partitioning — M0:\n"); print(var_df)
Variance partitioning — M0:
# A tibble: 1 × 3
  pct_region pct_site pct_transect
       <dbl>    <dbl>        <dbl>
1       10.9     71.9         17.3
Show code
var_df %>%
  pivot_longer(everything(), names_to = "level", values_to = "pct") %>%
  mutate(level = str_remove(level, "pct_"),
         level = factor(level, levels = c("region","site","transect"))) %>%
  ggplot(aes(x = level, y = pct, fill = level)) +
  geom_col(width = 0.6) +
  scale_fill_manual(values = c("region"="#6B5070",
                                "site"="#7A3D38",
                                "transect"="#C4A35A")) +
  labs(x = "Spatial level", y = "% variance",
       title = "Variance partitioning — M0 hard coral") +
  theme(legend.position = "none")

M0 variance partitioning: Site-level variation dominates, accounting for ~73% of total variance — confirming that between-site differences in hard coral cover are the primary source of variation in the data. Transect-level (within-site) patchiness accounts for ~18%, and regional differences only ~10%. This is the pattern to hope for — the dominance of site-level variance means site-level covariates (wave exposure, distance from volcano) have the potential to explain a meaningful proportion of the total variation. If region dominated instead, fixed effects would be largely redundant given the regional random effect already in the model.

6.5 LOO

Leave-one-out cross-validation estimates how well the model predicts new observations. The key output is elpd_loo (expected log pointwise predictive density) — higher is better. Pareto k values > 0.7 flag influential observations the model struggles to predict when left out. High k values are expected for M0 since with no fixed predictors the model cannot generalise. This baseline value is used for comparison with later models.

Show code
# FIT AND SAVE — unhash to recalculate
# loo_m0 <- loo(m0, seed = 42)
# saveRDS(loo_m0, file.path(model_path, "loo_m0.RDS"))

# LOAD SAVED
loo_m0 <- readRDS(file.path(model_path, "loo_m0.RDS"))

print(loo_m0)

Computed from 16000 by 416 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1758.8 14.0
p_loo       364.5  5.4
looic      3517.5 28.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.4]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)       9    2.2%   402     
   (0.7, 1]   (bad)      353   84.9%   <NA>    
   (1, Inf)   (very bad)  54   13.0%   <NA>    
See help('pareto-k-diagnostic') for details.

7. Model 1: Pre-Cover Variable Intercept

Adds pre-eruption hard coral cover as the variable intercept (0 + pre_cover_scaled). This anchors the model baseline to pre-eruption conditions — sites with more coral pre-eruption are expected to retain more post-eruption. Unlike sargassum (63.9% zeros), hard coral pre-cover is well distributed so this relationship is ecologically meaningful and statistically estimable. Compare sd_site from M0 — if it decreases, pre-cover is explaining between-site variance.

Show code
rds_m1 <- file.path(model_path, "m1_precover.RDS")

# FIT AND SAVE — unhash to refit and overwrite
# m1 <- brm(
#   formula = m1f, data = coral,
#   family = binomial(link = "logit"),
#   prior = priors_full, sample_prior = "yes",
#   chains = mcmc_args$chains, iter = mcmc_args$iter,
#   warmup = mcmc_args$warmup, cores = mcmc_args$cores,
#   seed = mcmc_args$seed, backend = mcmc_args$backend,
#   control = mcmc_args$control, save_pars = mcmc_args$save_pars
# )
# saveRDS(m1, rds_m1); cat("M1 fitted and saved.\n")

# LOAD SAVED MODEL
m1 <- readRDS(rds_m1); cat("M1 loaded.\n")
M1 loaded.
Show code
summary(m1)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.58      0.97     1.41     4.98 1.00    14412    11142

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.24      0.09     1.08     1.44 1.00     8365    11369

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     7220    10575

Regression Coefficients:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
pre_cover_scaled     0.08      0.06    -0.04     0.21 1.00    18189    12654

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Interpreting M1: A positive pre_cover_scaled coefficient means sites with more pre-eruption coral also have more post-eruption coral — persistence. A coefficient near 1 would suggest proportional survival. A coefficient < 1 suggests high-cover sites lost proportionally more. Compare sd_site to M0 — if it decreases, pre-cover is absorbing between-site variation.

M1 summary: Convergence clean — all Rhat = 1.00, all ESS well above thresholds. No divergent transitions.

pre_cover_scaled: β = 0.08 (95% CI: −0.04 to 0.21) — CI crosses zero, weak positive association between pre and post cover but not reliably different from zero. Adding pre-cover as a variable intercept did not substantially improve predictions over M0 — sd_site essentially unchanged (1.24 vs 1.25 in M0), confirming pre-cover is not absorbing meaningful between-site variance. sd_region increased substantially from M0 (0.41 → 2.58) — the removal of the global intercept has shifted variance into the regional random effect.

PPC density — M1: Adding pre-cover should improve the model’s ability to reproduce the observed distribution. The light lines should track the observed dark line more closely than M0’s PPC. If not, pre-cover is not substantially improving predictive fit despite absorbing variance.

Show code
set.seed(42)
pp_check(m1, type = "dens_overlay", ndraws = 100) +
  labs(title = "PPC: Density — M1")

LOO — M1: Compare elpd_loo to M0. A positive ΔELPD means M1 predicts better than M0 — pre-cover improved predictions. If ΔELPD ± SE overlaps zero the improvement is not reliable.

Show code
# FIT AND SAVE — unhash to recalculate
# loo_m1 <- loo(m1, seed = 42)
# saveRDS(loo_m1, file.path(model_path, "loo_m1.RDS"))

# LOAD SAVED
loo_m1 <- readRDS(file.path(model_path, "loo_m1.RDS"))
print(loo_m1)

Computed from 16000 by 416 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1759.7 14.1
p_loo       365.1  5.4
looic      3519.4 28.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.1]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)      16    3.8%   253     
   (0.7, 1]   (bad)      350   84.1%   <NA>    
   (1, Inf)   (very bad)  50   12.0%   <NA>    
See help('pareto-k-diagnostic') for details.

LOO: 400/416 observations (96%) have Pareto k > 0.7 — same pattern as M0, a structural feature of binomial models with large trial numbers rather than model misspecification. MCSE of elpd_loo is NA, meaning the LOO estimate is unreliable. Moment matching recommended before comparing to other models. elpd_loo = −1759.7, marginally better than M0 (to be confirmed via formal comparison). Model saved as m1_precover.RDS.


8. Model 2: Add Distance

Adds distance from the volcano as a fixed effect alongside pre-cover. Tests whether the spatial gradient of eruption impact explains post-eruption coral decline beyond the pre-eruption baseline. A negative coefficient would indicate greater coral loss at sites closer to the volcano — consistent with greater physical disturbance (wave, tephra, pressure) in the near field. Check whether sd_site decreases from M1 — if so, distance is explaining site-level variance.

Show code
rds_m2 <- file.path(model_path, "m2_dist.RDS")

# FIT AND SAVE — unhash to refit and overwrite
# m2 <- brm(
#   formula = m2f, data = coral,
#   family = binomial(link = "logit"),
#   prior = priors_full, sample_prior = "yes",
#   chains = mcmc_args$chains, iter = mcmc_args$iter,
#   warmup = mcmc_args$warmup, cores = mcmc_args$cores,
#   seed = mcmc_args$seed, backend = mcmc_args$backend,
#   control = mcmc_args$control, save_pars = mcmc_args$save_pars
# )
# saveRDS(m2, rds_m2); cat("M2 fitted and saved.\n")

# LOAD SAVED MODEL
m2 <- readRDS(rds_m2); cat("M2 loaded.\n")
M2 loaded.
Show code
summary(m2)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.70      0.96     1.50     5.12 1.00    15854    12083

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.21      0.09     1.04     1.41 1.00     9624    12138

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     6193     9684

Regression Coefficients:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
pre_cover_scaled     0.10      0.07    -0.03     0.22 1.00    29912    11627
dist_scaled          0.69      0.33     0.03     1.36 1.00    17436    12612

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Interpreting dist_scaled for coral: A negative coefficient means more coral loss at sites closer to the volcano — the expected pattern if wave disturbance, blast pressure or tephra deposition caused greater damage near the epicentre. A positive coefficient would be unexpected for coral.

PPC — M2: Adding distance should improve fit if distance explains meaningful between-site variation. Improved tracking of the observed distribution relative to M1 indicates distance is adding predictive value.

Show code
set.seed(42)
pp_check(m2, type = "dens_overlay", ndraws = 100) +
  labs(title = "PPC: Density — M2")

LOO — M2: Compare to M1. If ΔELPD is positive and SE does not overlap zero, distance meaningfully improves predictions beyond pre-cover alone.

Show code
# FIT AND SAVE — unhash to recalculate
# loo_m2 <- loo(m2, seed = 42)
# saveRDS(loo_m2, file.path(model_path, "loo_m2.RDS"))

# LOAD SAVED
loo_m2 <- readRDS(file.path(model_path, "loo_m2.RDS"))
print(loo_m2)

Computed from 16000 by 416 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1757.6 14.2
p_loo       363.1  5.3
looic      3515.2 28.5
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.0]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)      13    3.1%   296     
   (0.7, 1]   (bad)      352   84.6%   <NA>    
   (1, Inf)   (very bad)  51   12.3%   <NA>    
See help('pareto-k-diagnostic') for details.

9. Model 3: Add Wave Height — All Three Models

Adds wave height as an additional fixed effect. Three wave models (GN, SV, PDC0) represent different estimates of maximum wave height at each site from different tsunami simulation approaches. Each is fitted separately and compared via LOO and model weights to identify which wave model best predicts coral decline. For hard coral, a negative wave coefficient is expected — greater physical wave disturbance associated with greater coral mortality.

Show code
# Original loop that fitted GN, SV and PDC0 wave models for comparison:

# wave_cols   <- c("wave_GN_scaled", "wave_SV_scaled", "wave_PDC0_scaled")
# wave_labels <- c("GN", "SV", "PDC0")
# m3_list     <- list()
# loo_m3_list <- list()

# for (i in seq_along(wave_cols)) {
#   rds_i <- file.path(model_path, paste0("m3_wave_", wave_labels[i], ".RDS"))
#   if (file.exists(rds_i)) {
#     m3_list[[i]] <- readRDS(rds_i); cat("M3 (", wave_labels[i], ") loaded.\n")
#   } else {
#     f_i <- bf(paste0("COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled + ",
#       wave_cols[i], " + (1 | region_id) + (1 | site_id) + (1 | transect_id)"))
#     m3_list[[i]] <- brm(formula = f_i, data = coral,
#       family = binomial(link = "logit"), prior = priors_full,
#       sample_prior = "yes", chains = mcmc_args$chains, iter = mcmc_args$iter,
#       warmup = mcmc_args$warmup, cores = mcmc_args$cores, seed = mcmc_args$seed,
#       backend = mcmc_args$backend, control = mcmc_args$control,
#       save_pars = mcmc_args$save_pars)
#     saveRDS(m3_list[[i]], rds_i); cat("M3 (", wave_labels[i], ") fitted and saved.\n")
#   }
#   loo_m3_list[[i]] <- loo(m3_list[[i]])
# }
# names(m3_list)     <- wave_labels
# names(loo_m3_list) <- wave_labels
# PDC0 selected as best wave model (LOO weight = 0.99, GN = 0.006, SV = 0.001)

9.1 Wave model comparison

loo_compare ranks the three wave models by predictive performance. Model weights express this as relative probabilities — the wave model with the highest weight best predicts post-eruption coral cover. Update best_wave below to match the winner. If two models have similar weights (e.g. 0.45 and 0.55), they perform equivalently and the choice is less critical.

Show code
# Original wave comparison code — kept for transparency:

# cat("=== LOO comparison — wave models ===\n")
# loo_compare(loo_m3_list$GN, loo_m3_list$SV, loo_m3_list$PDC0)
# w_wave <- brms::model_weights(
#   m3_list$GN, m3_list$SV, m3_list$PDC0, weights = "loo"
# )
# tibble(wave_model = c("GN","SV","PDC0"),
#        loo_weight = round(as.numeric(w_wave), 4)) %>%
#   arrange(desc(loo_weight)) %>%
#   kable(caption = "LOO weights — wave models in M3")

# Results from wave model comparison:
tibble(
  wave_model = c("PDC0", "GN", "SV"),
  loo_weight = c(0.9939, 0.0057, 0.0005)
) %>%
  kable(caption = "LOO weights — wave models in M3. PDC0 selected as best.")
LOO weights — wave models in M3. PDC0 selected as best.
wave_model loo_weight
PDC0 0.9939
GN 0.0057
SV 0.0005
LOO weights — wave models in M3
wave_model loo_weight
PDC0 0.9939
GN 0.0057
SV 0.0005
Show code
rds_m3 <- file.path(model_path, "m3_wave_PDC0.RDS")

# FIT AND SAVE — unhash to refit and overwrite
# m3 <- brm(
#   formula = bf(COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled +
#                  wave_PDC0_scaled +
#                  (1 | region_id) + (1 | site_id) + (1 | transect_id)),
#   data         = coral,
#   family       = binomial(link = "logit"),
#   prior        = priors_full,
#   sample_prior = "yes",
#   chains       = mcmc_args$chains,
#   iter         = mcmc_args$iter,
#   warmup       = mcmc_args$warmup,
#   cores        = mcmc_args$cores,
#   seed         = mcmc_args$seed,
#   backend      = mcmc_args$backend,
#   control      = mcmc_args$control,
#   save_pars    = mcmc_args$save_pars
# )
# saveRDS(m3, rds_m3); cat("M3 PDC0 fitted and saved.\n")

# LOAD SAVED MODEL
m3 <- readRDS(rds_m3); cat("M3 loaded.\n")
M3 loaded.
Show code
summary(m3)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled + wave_PDC0_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.71      1.02     1.50     5.23 1.00    13428    11014

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.17      0.09     1.01     1.35 1.00     8585    10329

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     6565     9305

Regression Coefficients:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
pre_cover_scaled     0.08      0.06    -0.04     0.21 1.00    20335    10836
dist_scaled          0.49      0.33    -0.15     1.13 1.00    12071    11579
wave_PDC0_scaled    -0.43      0.15    -0.73    -0.15 1.00    13675    10800

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
# FIT AND SAVE LOO— unhash to recalculate
# loo_m3 <- loo(m3, seed = 42)
# saveRDS(loo_m3, file.path(model_path, "loo_m3.RDS"))

# LOAD SAVED LOO
loo_m3 <- readRDS(file.path(model_path, "loo_m3.RDS"))

print(loo_m3)

Computed from 16000 by 416 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1757.0 14.1
p_loo       362.4  5.1
looic      3514.1 28.1
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.2]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)      12    2.9%   238     
   (0.7, 1]   (bad)      358   86.1%   <NA>    
   (1, Inf)   (very bad)  46   11.1%   <NA>    
See help('pareto-k-diagnostic') for details.

Interpreting wave coefficient for coral: A negative coefficient means higher wave exposure is associated with lower post-eruption coral cover — consistent with physical wave disturbance causing coral mortality. This is the expected direction for coral (opposite to sargassum).

PPC — M3: The density overlay for the best wave model should track the observed distribution as well or better than M2. If there is little visual improvement over M2 despite LOO improvement, wave height is adding signal that matters for prediction but does not dramatically change the distribution shape.

Show code
set.seed(42)
pp_check(m3, type = "dens_overlay", ndraws = 100) +
  labs(title = "PPC: Density — M3 wave PDC0")


10. Model 4: Distance × Wave Interaction

Tests whether the effect of wave height on coral loss varies with distance from the volcano. A negative interaction (dist × wave) would mean wave disturbance caused more coral loss at near-field sites — where eruption energy was already higher. A positive interaction would mean wave effects are stronger at distant sites — less expected ecologically. If the interaction CI crosses zero, M3 is preferred as the simpler model.

Show code
rds_m4 <- file.path(model_path, "m4_interaction_PDC0.RDS")

# FIT AND SAVE — unhash to refit and overwrite
# m4 <- brm(
#   formula = bf(COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled +
#                  dist_scaled * wave_PDC0_scaled +
#                  (1 | region_id) + (1 | site_id) + (1 | transect_id)),
#   data         = coral,
#   family       = binomial(link = "logit"),
#   prior        = priors_full,
#   sample_prior = "yes",
#   chains       = mcmc_args$chains,
#   iter         = mcmc_args$iter,
#   warmup       = mcmc_args$warmup,
#   cores        = mcmc_args$cores,
#   seed         = mcmc_args$seed,
#   backend      = mcmc_args$backend,
#   control      = mcmc_args$control,
#   save_pars    = mcmc_args$save_pars
# )
# saveRDS(m4, rds_m4); cat("M4 fitted and saved.\n")

# LOAD SAVED MODEL
m4 <- readRDS(rds_m4); cat("M4 loaded.\n")
M4 loaded.
Show code
summary(m4)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled * wave_PDC0_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.76      1.03     1.48     5.33 1.00    15844    12595

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.17      0.09     1.01     1.36 1.00     8704    10746

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     6050     9923

Regression Coefficients:
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
pre_cover_scaled                 0.08      0.06    -0.05     0.20 1.00    25869
dist_scaled                      0.50      0.33    -0.18     1.14 1.00    17571
wave_PDC0_scaled                -0.46      0.21    -0.87    -0.04 1.00    19199
dist_scaled:wave_PDC0_scaled    -0.05      0.25    -0.54     0.45 1.00    20161
                             Tail_ESS
pre_cover_scaled                10934
dist_scaled                     12201
wave_PDC0_scaled                11365
dist_scaled:wave_PDC0_scaled    12111

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

LOO — M4: Compare to M3. If adding the interaction does not improve LOO (ΔELPD ≈ 0 or negative), the wave effect does not meaningfully vary with distance and M3 is the preferred model. Interactions require more data to estimate reliably so a modest ΔELPD may not justify the added complexity.

Show code
# FIT AND SAVE — unhash to recalculate
# loo_m4 <- loo(m4, seed = 42)
# saveRDS(loo_m4, file.path(model_path, "loo_m4.RDS"))

# LOAD SAVED
loo_m4 <- readRDS(file.path(model_path, "loo_m4.RDS"))
print(loo_m4)

Computed from 16000 by 416 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1762.5 14.4
p_loo       367.9  5.6
looic      3524.9 28.8
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.0]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)      14    3.4%   240     
   (0.7, 1]   (bad)      344   82.7%   <NA>    
   (1, Inf)   (very bad)  58   13.9%   <NA>    
See help('pareto-k-diagnostic') for details.

11. Model 5: Intercept + Precover x wave

Does the effect of wave exposure on post-eruption coral cover vary depending on >how much coral was present before the eruption?” — i.e. did sites with high >pre-eruption coral respond differently to wave disturbance than low >pre-eruption coral sites. It also includes a global intercept (1 +) rather than >the variable intercept parameterisation, and distance as a main effect only.

Show code
rds_m5 <- file.path(model_path, "m5_precover_wave_interaction.RDS")

# FIT AND SAVE — unhash to refit and overwrite
# m5 <- brm(
#   formula = bf(COUNT | trials(TOTAL) ~ 1 + dist_scaled +
#                  pre_cover_scaled * wave_PDC0_scaled +
#                  (1 | region_id) + (1 | site_id) + (1 | transect_id)),
#   data         = coral,
#   family       = binomial(link = "logit"),
#   prior        = priors_m5,
#   sample_prior = "yes",
#   chains       = mcmc_args$chains,
#   iter         = mcmc_args$iter,
#   warmup       = mcmc_args$warmup,
#   cores        = mcmc_args$cores,
#   seed         = mcmc_args$seed,
#   backend      = mcmc_args$backend,
#   control      = mcmc_args$control,
#   save_pars    = mcmc_args$save_pars
# )
# saveRDS(m5, rds_m5); cat("M5 fitted and saved.\n")

# LOAD SAVED MODEL
m5 <- readRDS(rds_m5); cat("M5 loaded.\n")
M5 loaded.
Show code
summary(m5)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 1 + dist_scaled + pre_cover_scaled * wave_PDC0_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.44      0.36     0.02     1.30 1.00     7048    10633

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.18      0.09     1.02     1.37 1.00     9779    12186

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     7136    10368

Regression Coefficients:
                                  Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept                            -3.02      0.32    -3.58    -2.26 1.00
dist_scaled                           0.27      0.24    -0.15     0.80 1.00
pre_cover_scaled                      0.09      0.07    -0.04     0.22 1.00
wave_PDC0_scaled                     -0.38      0.15    -0.68    -0.09 1.00
pre_cover_scaled:wave_PDC0_scaled     0.06      0.09    -0.12     0.24 1.00
                                  Bulk_ESS Tail_ESS
Intercept                            17291    10773
dist_scaled                          12242    10538
pre_cover_scaled                     26497    11891
wave_PDC0_scaled                     14608    12117
pre_cover_scaled:wave_PDC0_scaled    25386    12027

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Show code
# FIT AND SAVE LOO — unhash to recalculate
# loo_m5 <- loo(m5, seed = 42)
# saveRDS(loo_m5, file.path(model_path, "loo_m5.RDS"))

# LOAD SAVED LOO
loo_m5 <- readRDS(file.path(model_path, "loo_m5.RDS"))
print(loo_m5)

Computed from 16000 by 416 log-likelihood matrix.

         Estimate   SE
elpd_loo  -1760.4 14.0
p_loo       365.9  5.2
looic      3520.8 28.1
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.0]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)       5    1.2%   318     
   (0.7, 1]   (bad)      347   83.4%   <NA>    
   (1, Inf)   (very bad)  64   15.4%   <NA>    
See help('pareto-k-diagnostic') for details.

12. Model Comparison

12.1 LOO-CV

Ranks all five models by predictive performance. The best model (elpd_diff = 0) is listed first. elpd_diff shows how much worse each model is relative to the best. se_diff is the standard error of that difference — if elpd_diff is less than 2× se_diff, models are considered equivalent and the simpler one is preferred. For coral we expect M1 and M3 to improve substantially over M0 given the clear pre-post decline.

Show code
cat("=== LOO comparison — all models ===\n")
=== LOO comparison — all models ===
Show code
# save LOO comparison matrix
#loo_comp <- loo_compare(loo_m0, loo_m1, loo_m2, loo_m3, loo_m4, loo_m5)

#Store the LOO compare matrix
#saveRDS(loo_comp, file.path(model_path, "loo_comp.RDS"))

#LOAD LOO COMPARISON
loo_comp <- readRDS(file.path(model_path, "loo_comp.RDS"))

print(loo_comp)
   elpd_diff se_diff
m3  0.0       0.0   
m2 -0.6       3.0   
m0 -1.7       3.3   
m1 -2.7       3.1   
m5 -3.3       3.2   
m4 -5.4       3.3   
Show code
# Julie suggested — send Julie results once run
#loo0 <- loo(m0, moment_match = TRUE) # moment match is taking too long to run
#loo3 <- loo(m3, moment_match = TRUE)

#moment_match = TRUE is a correction method that tries to fix the high Pareto k values.
#When standard LOO finds observations with high Pareto k (meaning leaving that observation out causes a large perturbation to the posterior), moment_match attempts to reuse the existing MCMC draws and transform them to better approximate what the posterior would look like without that observation — rather than refitting the model from scratch.

LOO comparison — all models: The best model always has elpd_diff = 0 and se_diff = 0. All other models have negative elpd_diff — showing how much worse they are than the best. The larger the negative elpd_diff relative to se_diff, the more confidently you can say that model is worse.

M3 (pre_cover + dist + wave PDC0) ranks first with elpd_diff = 0. However all differences are small relative to their SE — no model is reliably better than any other (elpd_diff / se_diff < 2 for all comparisons).

Specifically: - M2 vs M3: −0.6 / 3.0 — negligible, distance alone performs essentially as well as distance + wave - M0 vs M3: −1.7 / 3.3 — null model is competitive, fixed effects adding little beyond random effects - M1 vs M3: −2.7 / 3.1 — pre-cover alone performs poorly - M5 vs M3: −3.3 / 3.2 — pre-cover × wave interaction not supported - M4 vs M3: −5.4 / 3.3 — distance × wave interaction not supported, M4 is the worst performing model

M3 is selected as the best model on the basis of ranking first and having the most ecologically interpretable fixed effects (wave PDC0 CI excludes zero). Note that LOO estimates are unreliable for all models due to high Pareto k values (>96% observations with k > 0.7) — interpret rankings cautiously. Stacking weights and moment-matched LOO comparison with M0 are reported separately as more reliable alternatives.

12.2 Model weights

Model weights convert LOO differences into probabilities — the probability that each model is the best given the data. Weights sum to 1. A weight of 0.7 for one model indicates strong support. If weights are spread across models (e.g. 0.4 and 0.4), two models are performing similarly and you can report both or use the simpler one. For the sargassum analysis M3 received weight 0.72 — check whether the pattern is similar or clearer for coral given its less zero-inflated distribution.

Show code
#w_all <- brms::model_weights(m0, m1, m2, m3, m4, m5, weights = "loo")

#tibble(
#  model = c("M0: null",
#            "M1: + pre_cover",
#            "M2: + dist",
#            "M3: + wave PDC0",
#            "M4: dist x wave PDC0",
#            "M5: + pre_cover x wave PDC0"),
#  loo_weight = round(as.numeric(w_all), 4) # round to 4 decimal places
#) %>%
#  arrange(desc(loo_weight)) %>%
#  kable(caption = "LOO model weights. Sum to 1. Higher = better supported.")


# Create table with LOO comparisons, weights and stacking weights

# stacking model weights
w_stack <- model_weights(m0, m1, m2, m3, m4, m5, weights = "stacking", seed = 42)

# LOO weights
w_loo <- model_weights(m0, m1, m2, m3, m4, m5, weights = "loo", seed = 42)

# model name lookup — must match order of models passed above
model_labels <- c(
  m0 = "M0: null",
  m1 = "M1: + pre_cover",
  m2 = "M2: + dist",
  m3 = "M3: + wave PDC0",
  m4 = "M4: dist x wave PDC0",
  m5 = "M5: pre_cover x wave PDC0"
)

# build table by extracting values directly from loo_comp
# build the table object once
comp_table <- as_tibble(loo_comp, rownames = "model_id") %>%
  mutate(
    model        = model_labels[model_id],
    loo_weight   = round(as.numeric(w_loo)[match(model_id, names(w_loo))], 4),
    stack_weight = round(as.numeric(w_stack)[match(model_id, names(w_stack))], 4)
  ) %>%
  select(model, elpd_diff, se_diff, loo_weight, stack_weight) %>%
  mutate(across(c(elpd_diff, se_diff), ~ round(.x, 1))) %>%
  arrange(desc(stack_weight))

# render table in quarto
comp_table %>%
  kable(
    digits = 4,
    caption = "Model comparison. elpd_diff and se_diff from LOO-CV (M3 as reference).
    LOO weights and stacking weights sum to 1. Note: LOO estimates unreliable 
    due to high Pareto k values (>96% observations k > 0.7) — stacking weights 
    are the preferred comparison metric."
  )
Model comparison. elpd_diff and se_diff from LOO-CV (M3 as reference). LOO weights and stacking weights sum to 1. Note: LOO estimates unreliable due to high Pareto k values (>96% observations k > 0.7) — stacking weights are the preferred comparison metric.
model elpd_diff se_diff loo_weight stack_weight
M3: + wave PDC0 0.0 0.0 0.5437 0.4771
M2: + dist -0.6 3.0 0.3022 0.3366
M0: null -1.7 3.3 0.0955 0.1862
M1: + pre_cover -2.7 3.1 0.0371 0.0001
M5: pre_cover x wave PDC0 -3.3 3.2 0.0191 0.0000
M4: dist x wave PDC0 -5.4 3.3 0.0024 0.0000
Show code
# save table as word doc with caption
comp_table %>%
  flextable() %>%
  set_caption("Model comparison. elpd_diff and se_diff from LOO-CV (M3 as reference).
    LOO weights and stacking weights sum to 1. Note: LOO estimates unreliable 
    due to high Pareto k values (>96% observations k > 0.7) — stacking weights 
    are the preferred comparison metric.") %>%
  autofit() %>%
  save_as_docx(path = here("output csvs", "14_coral_BGLMM_analysis", "coral_variableintercept_BGLMM_loo_weights_model_comparison_table.docx"))

LOO-CV ranked M3 first but all elpd_diff values were within 2 SE of zero, indicating no model was reliably superior. Given high Pareto k diagnostics across all models (>96% of observations k > 0.7), stacking weights are reported as the preferred comparison metric. Stacking assigned highest weight to M3 (0.48) followed by M2 (0.34) and M0 (0.19), with M1, M4 and M5 receiving negligible weight — consistent with distance and wave PDC0 being the meaningful predictors of post-eruption coral decline.

12.3 Variance partitioning across models

Tracks how random effect SDs change as fixed predictors are added. If sd_site decreases as each predictor is added, that predictor is explaining between-site variance. The total reduction from M0 to the best model quantifies how much of the spatial variation in coral decline is explained by pre-cover, distance and wave height combined. sd_transect should remain relatively stable since within-site patchiness is not explained by site-level covariates.

Show code
extract_sds <- function(model, model_name) {
  as_draws_df(model) %>%
    summarise(
      sd_region   = round(mean(sd_region_id__Intercept), 3),
      sd_site     = round(mean(sd_site_id__Intercept), 3),
      sd_transect = round(mean(sd_transect_id__Intercept), 3)
    ) %>%
    mutate(model = model_name)
}

sd_table <- bind_rows(
  extract_sds(m0, "M0: null"),
  extract_sds(m1, "M1: + pre_cover"),
  extract_sds(m2, "M2: + dist"),
  extract_sds(m3, "M3: + wave PDC0"),
  extract_sds(m4, "M4: dist x wave PDC0"),
  extract_sds(m5, "M5: + pre_cover x wave PDC0")
) %>%
  select(model, sd_region, sd_site, sd_transect)

# render in quarto
sd_table %>%
  kable(caption = "Random effect SDs across models. Decreasing sd_site = predictors explaining between-site variation.")
Random effect SDs across models. Decreasing sd_site = predictors explaining between-site variation.
model sd_region sd_site sd_transect
M0: null 0.408 1.254 0.612
M1: + pre_cover 2.585 1.242 0.612
M2: + dist 2.702 1.213 0.612
M3: + wave PDC0 2.713 1.167 0.612
M4: dist x wave PDC0 2.764 1.173 0.612
M5: + pre_cover x wave PDC0 0.441 1.184 0.611
Show code
# save as word doc
sd_table %>%
  flextable() %>%
  set_caption("Random effect SDs across models. Decreasing sd_site = predictors explaining between-site variation.") %>%
  autofit() %>%
  save_as_docx(path = here("output csvs", "14_coral_BGLMM_analysis", "hardcoral_BGLMM_variableintercepts_sd_variance_table.docx"))
Random effect SDs across models. Decreasing sd_site = predictors explaining between-site variation.
model sd_region sd_site sd_transect
M0: null 0.408 1.254 0.612
M1: + pre_cover 2.585 1.242 0.612
M2: + dist 2.702 1.213 0.612
M3: + wave PDC0 2.713 1.167 0.612
M4: dist x wave PDC0 2.764 1.173 0.612
M5: + pre_cover x wave PDC0 0.441 1.184 0.611

Variance partitioning across models: sd_transect is stable across all models (0.611–0.612) — within-site patchiness is not explained by any site-level predictor, as expected.

sd_site decreases progressively as predictors are added (M0: 1.254 → M3: 1.167), with wave PDC0 producing the largest reduction — consistent with it being the strongest fixed effect predictor of between-site variation in post-eruption coral cover. The interaction terms in M4 and M5 do not reduce sd_site further, consistent with their negligible stacking weights.

sd_region increases dramatically in M1–M4 (0.408 → ~2.7) when the global intercept is removed (0 + pre_cover_scaled) — the regional random effect expands to compensate for the absent intercept. M5 restores the global intercept and sd_region returns to near-M0 levels (0.441), confirming this is a parameterisation effect rather than a biological signal.

M0: 0.408 — low, regional variance well constrained with global intercept M1–M4: jumps to ~2.6–2.8 — removing the global intercept (0 +) massively inflates regional variance. The regional random effect is expanding to compensate for the missing intercept M5: 0.441 — drops back to near M0 level because M5 restores the global intercept (1 +)

The fixed effects are explaining some between-site variance (sd_site decreasing) but the regional random effect behaviour differs fundamentally between the two parameterisations.


13. Best Model: Full Diagnostics

Full diagnostics for the winning model from Section 11. Update m_best and m_best_name to match whichever model had the highest LOO weight. All downstream plots and tables will update automatically.

Show code
# UPDATE to winning model from LOO weights

m_best_name <- "M3: pre_cover + dist + wave PDC0"  # update to match
cat("Best model:", m_best_name, "\n")
Best model: M3: pre_cover + dist + wave PDC0 
Show code
summary(m3)
 Family: binomial 
  Links: mu = logit 
Formula: COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled + wave_PDC0_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) 
   Data: coral (Number of observations: 416) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~region_id (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.71      1.02     1.50     5.23 1.00    13428    11014

~site_id (Number of levels: 104) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.17      0.09     1.01     1.35 1.00     8585    10329

~transect_id (Number of levels: 416) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.03     0.56     0.67 1.00     6565     9305

Regression Coefficients:
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
pre_cover_scaled     0.08      0.06    -0.04     0.21 1.00    20335    10836
dist_scaled          0.49      0.33    -0.15     1.13 1.00    12071    11579
wave_PDC0_scaled    -0.43      0.15    -0.73    -0.15 1.00    13675    10800

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Family: binomial Links: mu = logit Formula: COUNT | trials(TOTAL) ~ 0 + pre_cover_scaled + dist_scaled + wave_PDC0_scaled + (1 | region_id) + (1 | site_id) + (1 | transect_id) Data: coral (Number of observations: 416) Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1; total post-warmup draws = 16000

Multilevel Hyperparameters: ~region_id (Number of levels: 3) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sd(Intercept) 2.71 1.02 1.50 5.23 1.00 13428 11014

~site_id (Number of levels: 104) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sd(Intercept) 1.17 0.09 1.01 1.35 1.00 8585 10329

~transect_id (Number of levels: 416) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sd(Intercept) 0.61 0.03 0.56 0.67 1.00 6565 9305

Regression Coefficients: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS pre_cover_scaled 0.08 0.06 -0.04 0.21 1.00 20335 10836 dist_scaled 0.49 0.33 -0.15 1.13 1.00 12071 11579 wave_PDC0_scaled -0.43 0.15 -0.73 -0.15 1.00 13675 10800

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

M3 summary (best model): Convergence clean — all Rhat = 1.00, all ESS well above thresholds. No divergent transitions.

wave_PDC0_scaled: β = −0.43 (95% CI: −0.73 to −0.15) — the only fixed effect with a CI excluding zero. Higher wave exposure is associated with lower post-eruption coral cover, suggesting physical wave disturbance from the HTHH tsunami influenced coral mortality. A 1 SD increase in wave height corresponds to a shift of −0.43 on the logit scale in predicted post-eruption cover.

dist_scaled: β = 0.49 (95% CI: −0.15 to 1.13) — CI crosses zero. Positive direction suggests more coral retention at sites closer to the volcano, but this is unreliable and likely confounded with the regional random effect (sd_region = 2.71) which is absorbing the broad spatial gradient.

pre_cover_scaled: β = 0.08 (95% CI: −0.04 to 0.21) — CI crosses zero. Weak positive persistence signal — sites with more coral before the eruption did not reliably retain more coral after, suggesting the eruption was a levelling event that overrode baseline coral state.

Random effects: sd_site = 1.17 dominates — site identity remains the strongest predictor of post-eruption coral cover even after accounting for wave, distance and pre-cover. sd_region = 2.71 (CI: 1.50–5.23) is large and uncertain, reflecting the 0 + parameterisation inflating regional variance as discussed in section 11.3.

13.1 Convergence

Full convergence check for the best model. With more parameters than M0 (pre_cover + dist + wave + random effects), trace plots should still show good mixing. Pay particular attention to b_dist_scaled and b_wave_scaled — if these show poor mixing, the sampler is struggling with the fixed effects, possibly due to collinearity between distance and wave. If Rhat > 1.01 for any parameter, increase adapt_delta to 0.999 and refit.

Written in my results (subject to change!): Hard coral cover declined substantially between pre-eruption (2017/2018: mean 16%) and post-eruption (2024: mean 8.2%) surveys across all three regions, representing an approximate 50% reduction in cover. Model selection across five candidate models using LOO-CV and stacking weights identified M3 (pre-eruption cover + distance + wave height PDC0) as the best supported model (stacking weight = 0.48), followed by M2 (distance only; stacking weight = 0.34). All LOO-CV elpd_diff values were within 2 SE of zero, indicating no model was decisively superior. Given high Pareto k diagnostics across all models (>96% of observations k > 0.7), stacking weights are reported as the preferred comparison metric. The model converged well with all Rhat = 1.00 and effective sample sizes well above acceptable thresholds.

Site-level variation dominated the random effect structure (sd_site = 1.17, 95% CI: 1.01–1.35), accounting for the largest proportion of variance in post-eruption coral cover, with transect-level patchiness (sd_transect = 0.61, 95% CI: 0.56–0.67) and regional differences (sd_region = 2.71, 95% CI: 1.50–5.23) contributing less. The persistence of strong site-level variance after accounting for all fixed predictors indicates that unmeasured site-level factors beyond wave exposure and distance from the volcano contributed substantially to determining post-eruption coral cover.

Wave height from the PDC0 tsunami model was the only fixed effect predictor with a credible interval excluding zero (β = −0.43, 95% CI: −0.73 to −0.15), with higher wave exposure associated with lower post-eruption coral cover. Pre-eruption coral cover did not reliably predict post-eruption cover (β = 0.08, 95% CI: −0.04 to 0.21), suggesting the HTHH eruption was a levelling event that overrode baseline coral state — sites with high pre-eruption cover were not buffered against post-eruption decline. Distance from the volcano was also not a reliable predictor after accounting for wave exposure and random effects (β = 0.49, 95% CI: −0.15 to 1.13), likely reflecting confounding between distance and the regional random effect given only three regions spanning the distance gradient.

Show code
pars_m3 <- m3 |> get_variables() |> str_subset("^b_.*|^sd_.*")
m3$fit |> stan_trace(pars = pars_m3) +
  labs(title = paste("Trace plots —", m_best_name))

Show code
p_rhat <- m3$fit |> stan_rhat(pars = pars_m3) +
  labs(title = "Rhat")
p_ess  <- m3$fit |> stan_ess(pars = pars_m3) +
  labs(title = "ESS ratio")
p_rhat / p_ess

M3 trace plots: All fixed effect chains show good mixing — classic hairy caterpillar pattern forb_pre_cover_scaled, b_dist_scaled and b_wave_PDC0_scaled display stable sampling with all four chains overlapping throughout. sd_region_id shows wider excursions and occasional spikes — expected with only 3 region levels. sd_site_id and sd_transect_id mix well.

Rhat: All parameters < 1.001 — well within the convergence threshold of 1.01.

ESS ratios range ~0.40–1.25, all well above the 0.1 minimum. Convergence is clean.

Rhat < 1.01 & ESS ratio > 0.1 thresholds: Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2021). Rank-normalization, folding, and localization: An >improved Rhat for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718.

13.2 Prior vs posterior

For the best model, this shows how much the data updated beliefs about each fixed effect coefficient. For coral — unlike sargassum — we expect the data to update the priors meaningfully for pre_cover_scaled since the pre-cover distribution is well spread and informative. A very tight posterior for pre_cover_scaled would confirm strong persistence signal. Wide posteriors for distance and wave (similar to sargassum) would again suggest confounding with the regional random effect.

Show code
# get all prior parameter names
prior_pars <- variables(m3) |> str_subset("^prior_")

# get all posterior parameter names  
post_pars <- variables(m3) |> str_subset("^b_|^sd_")

# combine and plot
plot_pars <- intersect(c(prior_pars, post_pars), variables(m3))

mcmc_areas(as.array(m3), pars = plot_pars) +
  labs(title = "Prior vs posterior — M3: pre_cover + dist + wave PDC0")

Result of Prior vs Posterior: Each parameter has two distributions plotted — the prior (wide, flat) and the posterior (narrow, peaked). The narrower the posterior relative to the prior, the more the data updated beliefs about that parameter. The posterior is the updated probability distribution for each parameter after the model has seen all 416 transects of coral count data. It represents what the model believes about each parameter value given both the prior and the data combined.

Reading each parameter: prior_b — wide prior spanning roughly −5 to +5, confirming normal(0,1) is weakly informative for the fixed effects.

b_pre_cover_scaled, b_dist_scaled, b_wave_PDC0_scaled — all three posteriors are very tight and close to zero relative to the prior. The data have updated these strongly — the posterior is much narrower than the prior.

prior_sd_region_id, prior_sd_site_id, prior_sd_transect_id — all show very wide priors extending far to the right (up to ~20), confirming student_t(3, 0, 0.5) has heavy tails.

sd_region_id__Intercept — posterior is still quite wide with a long right tail — the data only partially updated this, consistent with only 3 region levels.

sd_site_id__Intercept and sd_transect_id__Intercept — very tight posteriors, data were highly informative at these levels given 104 sites and 416 transects.

13.3 Posterior predictive checks

Four PPC panels for the best model. Density overlay: simulated count proportions should track the observed distribution — for coral (mean ~8%, few zeros) the model should reproduce both the mode and the right tail without the zero-inflation issues seen in sargassum. Mean: the vertical line (observed mean) should sit well within the simulated distribution; a systematic offset indicates the intercept is mis-specified. Maximum: checks whether the model can reproduce extreme high-cover transects — binomial models sometimes under-predict the tail if overdispersion is present (consider adding an observation-level random effect if the max is consistently underpredicted). Proportion zeros: with only ~1.7% pre-cover zeros and a post-eruption distribution still containing points, the zero proportion should be low and well-captured; poor zero reproduction here would suggest more zero-inflation than expected. Overall, the coral PPC should look better than sargassum given the more continuous response.

Show code
set.seed(42)
p1 <- pp_check(m3, type = "dens_overlay", ndraws = 100) +
  labs(title = "PPC: Density")
p2 <- pp_check(m3, type = "stat", stat = "mean") +
  labs(title = "PPC: Mean")
p3 <- pp_check(m3, type = "stat", stat = "max") +
  labs(title = "PPC: Maximum")
p4 <- pp_check(m3, type = "stat", stat = prop_zero) +
  labs(title = "PPC: Proportion zeros")
(p1 + p2) / (p3 + p4) +
  plot_annotation(title = paste("PPC —", m_best_name))

M3 posterior predictive check: Excellent fit across all four panels, consistent with M0 PPC quality.

Density: simulated replicates closely track the observed distribution - the left-skewed shape with a mode near zero and long right tail is well reproduced.

Mean: observed mean (155) sits near the centre of the predictive distribution with no systematic bias.

Maximum: observed max (972) falls near the mode of the predictive distribution — the model reproduces extreme high-cover transects well.

Proportion zeros: observed zero proportion (~0.005) sits within the predictive distribution and is very low, consistent with hard coral being present at nearly all transects. Adding pre-cover, distance and wave PDC0 as fixed predictors has not degraded the model fit relative to M0 — the binomial likelihood is well specified for these data.

13.4 Conditional effects

Marginal effect plots for each fixed predictor, holding others at their mean. conditions = data.frame(TOTAL = 1859) sets the trial denominator to the dataset mean so predictions are on the cover (proportion) scale rather than raw counts. pre_cover_scaled: expect a positive slope — higher baseline coral predicts higher post-eruption cover; a steep slope would indicate strong persistence. dist_scaled: the direction indicates whether sites closer to or further from the volcano retained more coral; a negative slope would mean greater loss near the eruption. wave_scaled: a negative slope would indicate greater coral loss at wave-exposed sites, consistent with physical disturbance. Note that these are marginal (average across random effects) so the uncertainty bands will be wide — the random effects absorb much of the site-level variation. Points are individual transect observations and should show substantial scatter around the smooth.

Show code
m3 |>
  conditional_effects(conditions = data.frame(TOTAL = 1859)) |>
  plot(points = TRUE)

Show code
#plot allc conditional effects in a 3 panel plot
ce_plots <- plot(
  m3 |> conditional_effects(conditions = data.frame(TOTAL = 1859)),
  points = TRUE,
  plot = FALSE  # returns list of ggplot objects without printing
)

ce_plots$pre_cover_scaled + 
ce_plots$dist_scaled + 
ce_plots$wave_PDC0_scaled +
plot_layout(ncol = 3) +
plot_annotation(title = "Conditional effects — M3: pre_cover + dist + wave PDC0")

Conditional effects — M3: All three predictors shown with other predictors held at their mean (z = 0). Predicted counts are out of TOTAL = 1859 points. Note predicted values sit around 500–1000 counts (~27–54% cover) — high relative to the actual observed post-eruption mean of ~8% — this may be due to the absence of a global intercept in the 0 + parameterisation.

pre_cover_scaled: Shallow positive slope with a wide uncertainty band that includes zero across the full range — consistent with the non-significant coefficient (β = 0.08, CI crosses zero). The data points show high scatter with no clear trend.

dist_scaled: Positive slope — predicted coral cover increases with distance from the volcano. Wide uncertainty band consistent with the non-significant coefficient (β = 0.49, CI crosses zero).

wave_PDC0_scaled: Clear negative slope — predicted coral cover decreases with increasing wave exposure. The uncertainty band is narrower than for the other two predictors, consistent with wave PDC0 being the only coefficient with a CI excluding zero (β = −0.43, CI: −0.73 to −0.15).

Show code
#extract original scale of the wave variable to back-transform the x axis from z-scores to the actual wave height units
#mean(coral$wave_PDC0_scaled)   # should be ~0
#sd(coral$wave_PDC0_scaled)     # need this to back-transform
#mean(coral$pdc0_4km3_fr1e2_etamax_end)  # original wave mean
#sd(coral$pdc0_4km3_fr1e2_etamax_end)    # original wave SD
#range(coral$pdc0_4km3_fr1e2_etamax_end) # original wave range

# extract conditional effects for wave_PDC0_scaled
#ce_wave <- conditional_effects(
#  m3,
#  effects    = "wave_PDC0_scaled",
#  conditions = data.frame(TOTAL = 1859)
#)

# extract the data frame from the conditional effects object
#ce_df <- ce_wave$wave_PDC0_scaled

# back-transform x axis from scaled z-score to original wave height (m)
# original = z * sd + mean
#wave_mean <- 1.29284
#wave_sd   <- 1.195784

#ce_df <- ce_df %>%
#  mutate(
#   wave_original = wave_PDC0_scaled * wave_sd + wave_mean,  # back-transform x
#    cover_pct     = estimate__  / 1859 * 100,                # convert count to %
#    lower_pct     = lower__     / 1859 * 100,                # lower 95% CI to %
#    upper_pct     = upper__     / 1859 * 100                 # upper 95% CI to %
#  )

# plot
#ggplot(ce_df, aes(x = wave_original, y = cover_pct)) +
#  geom_ribbon(aes(ymin = lower_pct, ymax = upper_pct),
#              fill = "grey80", alpha = 0.6) +
#  geom_line(linewidth = 1) +
#  labs(
#    x     = "Maximum wave height — PDC0 (m)",
#    y     = "Predicted hard coral cover (%)",
#    title = "Predicted post-eruption hard coral cover by wave exposure",
#    caption = "Posterior mean with 95% credible interval. Other predictors held at mean."
#  ) +
#  theme_classic()
Show code
# posterior predictive marginal effects plot

# create a sequence of wave values across the observed range
wave_seq <- seq(
  min(coral$pdc0_4km3_fr1e2_etamax_end),
  max(coral$pdc0_4km3_fr1e2_etamax_end),
  length.out = 100
)

# create a prediction dataframe using observed mean values for other predictors
pred_df <- data.frame(
  wave_PDC0_scaled = (wave_seq - wave_mean) / wave_sd,
  pre_cover_scaled = mean(coral$pre_cover_scaled),  # = 0
  dist_scaled      = mean(coral$dist_scaled),        # = 0
  TOTAL            = 1859
)

# get fitted values with uncertainty
fitted_vals <- fitted(m3, newdata = pred_df, 
                      re_formula = NA,  # marginalise over random effects
                      summary = TRUE)

# combine and convert to %
p <-pred_df %>%
  mutate(
    wave_original = wave_seq,
    cover_pct     = fitted_vals[, "Estimate"]  / 1859 * 100,
    lower_pct     = fitted_vals[, "Q2.5"]      / 1859 * 100,
    upper_pct     = fitted_vals[, "Q97.5"]     / 1859 * 100
  ) %>%
  ggplot(aes(x = wave_original, y = cover_pct)) +
  geom_ribbon(aes(ymin = lower_pct, ymax = upper_pct),
              fill = "grey80", alpha = 0.6) +
  geom_line(linewidth = 1) +
  labs(
    x       = "Maximum wave height — PDC0 (m)",
    y       = "Predicted hard coral cover (%)",
    title   = "Predicted post-eruption hard coral cover by wave exposure",
    caption = "Posterior mean with 95% credible interval. Other predictors held at mean."
  ) +
  theme_classic()
p

Predicted post-eruption hard coral cover by wave exposure: Model-derived predictions from M3 showing the relationship between maximum modelled wave height (PDC0, metres) and predicted post-eruption hard coral cover (%). The black line is the posterior mean prediction and the grey band is the 95% credible interval. Predictions are for a hypothetical average transect (TOTAL = 1859 points) with distance and pre-cover held at their mean values (z = 0). Predicted cover declines with increasing wave exposure, consistent with the negative wave coefficient (β = −0.43, 95% CI: −0.73 to −0.15). These are model-predicted proportions.

Predictions are unrealistically high for coral cover for post hard coral. Mean cover for hardcoral in the post period is ~8%

13.5 Coefficient plot

Posterior distributions for all fixed effects on the logit scale, with 95% credible intervals. The dashed line at zero is the null hypothesis (no effect). Unlike sargassum, where all CIs crossed zero, coral is expected to show at least one clear directional effect given the large pre-post decline (16% → 8%). Key things to look for: (1) b_pre_cover_scaled — does the CI exclude zero? A positive coefficient entirely above zero confirms that pre-eruption cover is a reliable predictor of post-eruption retention. (2) b_dist_scaled — direction and CI overlap with zero. If it crosses zero, distance is still confounded with region despite being included as a fixed effect. (3) b_wave_scaled — negative coefficient not crossing zero would be ecologically meaningful (wave exposure drove coral loss). A tight cluster of all CIs crossing zero would indicate that, as with sargassum, the random effects are doing most of the explanatory work.

Show code
mcmc_areas(
  as.array(m3),
  pars = m3 |>
    get_variables() |>
    str_subset("^b_") |>
    str_subset("prior", negate = TRUE),
  prob = 0.95
) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey50") +
  labs(
    title    = paste("Posterior coefficients —", m_best_name),
    subtitle = "95% CI. Negative = lower hard coral cover. CI not crossing 0 = reliable effect.",
    x        = "Estimate (logit scale)"
  )

Posterior coefficient plot — M3: Posterior distributions for all three fixed effect coefficients on the logit scale. The dashed vertical line at zero is the null (no effect). The shaded region represents the 95% credible interval.

b_pre_cover_scaled: Narrow posterior centred just above zero, with the 95% CI straddling zero — no reliable effect of pre-eruption cover on post-eruption coral cover.

b_dist_scaled: Wide, flat posterior spanning from negative to strongly positive values, with the 95% CI crossing zero — no reliable effect of distance from the volcano on post-eruption coral cover. The wide posterior reflects high uncertainty in this coefficient.

b_wave_PDC0_scaled: Posterior centred around −0.43 and entirely to the left of zero — the 95% CI does not cross zero, confirming wave exposure is the only reliable predictor of post-eruption coral cover in M3. Higher wave exposure is associated with lower coral cover.


14. Results — Template

The results table reports posterior mean, SD, and 95% credible interval for each fixed effect coefficient on the logit scale. prob_neg is the posterior probability that the coefficient is negative — values near 0 or 1 indicate directional certainty; values near 0.5 indicate no clear direction. To interpret coefficient magnitudes: on the logit scale, a coefficient of ±0.5 corresponds to a moderate shift in cover probability; ±1 is a large shift. Back-transform key estimates using plogis(intercept + coef) to express predicted cover at specific predictor values.

For the template paragraph below: fill in [VALUES] from the actual model output. The note at the end about comparison to sargassum is important context — if coral fixed-effect CIs also cross zero, discuss this explicitly as evidence that regional identity (captured by the random effects) is the dominant spatial predictor of post-eruption benthic change across both functional groups.

Show code
as_draws_df(m3) %>%                          # extract posterior draws as a dataframe
  select(starts_with("b_")) %>%              # keep only fixed effect coefficient columns
  select(-contains("prior")) %>%             # remove prior draw columns
  pivot_longer(everything(),                 # reshape from wide to long format —
               names_to = "parameter",       # one row per parameter per draw
               values_to = "value") %>%
  group_by(parameter) %>%                    # group by parameter name
  summarise(
    mean     = round(mean(value), 3),        # posterior mean
    sd       = round(sd(value), 3),          # posterior standard deviation
    lower_95 = round(quantile(value, 0.025), 3),  # lower 95% credible interval
    upper_95 = round(quantile(value, 0.975), 3),  # upper 95% credible interval
    prob_neg = round(mean(value < 0), 3),    # posterior probability that coefficient is negative
    .groups  = "drop"
  ) %>%
  kable(caption = paste("Fixed effects —", m_best_name,
                        ". prob_neg = P(estimate < 0)."))  # render as table in quarto
Fixed effects — M3: pre_cover + dist + wave PDC0 . prob_neg = P(estimate < 0).
parameter mean sd lower_95 upper_95 prob_neg
b_dist_scaled 0.489 0.329 -0.148 1.130 0.067
b_pre_cover_scaled 0.080 0.064 -0.044 0.208 0.102
b_wave_PDC0_scaled -0.433 0.148 -0.727 -0.148 0.998

Fixed effects table — M3: Posterior means, SDs and 95% credible intervals on the logit scale. prob_neg is the posterior probability that the coefficient is negative — values near 1 indicate strong directional certainty.

b_wave_PDC0_scaled (mean = −0.433, 95% CI: −0.727 to −0.148, prob_neg = 0.998) — 99.8% posterior probability that the wave effect is negative, confirming wave exposure reliably predicts lower post-eruption coral cover.

b_dist_scaled (mean = 0.489, 95% CI: −0.148 to 1.130, prob_neg = 0.067) — 93.3% posterior probability that the distance effect is positive, but CI crosses zero so not reliable.

b_pre_cover_scaled (mean = 0.080, 95% CI: −0.044 to 0.208, prob_neg = 0.102) — 89.8% posterior probability that the pre-cover effect is positive, but CI crosses zero so not reliable.

Template results paragraph — replace [VALUES] with actual output: NOT DONE YET.

The null model (M0) revealed that variation in post-eruption hard coral cover was predominantly structured at the site level (sd_site = [VALUE], 95% CI: [LOWER]–[UPPER]), with smaller variation at the transect level (sd_transect = [VALUE]) and uncertain regional-level variation (sd_region = [VALUE], wide CI reflecting only 3 regions).

Model comparison via LOO-CV and Bayesian model weights identified [MODEL NAME] as best supported (LOO weight = [VALUE]). Pre-eruption hard coral cover (M1) [substantially/weakly] improved predictions (ΔELPD = [VALUE] ± [SE]), consistent with sites that had more coral pre-eruption also retaining more post-eruption. Distance from the volcano (M2) [improved/did not improve] fit further (ΔELPD = [VALUE] ± [SE]). The [GN/SV/PDC0] wave model (M3) received [the highest/similar] weight ([VALUE]).

In the best model, pre-eruption cover showed a [positive] association with post-eruption coral (β = [VALUE], 95% CI: [LOWER]–[UPPER]) — sites with higher baseline retained more coral post-eruption. Distance showed a [negative/positive] effect (β = [VALUE], 95% CI: [LOWER]–[UPPER]) — [greater/less] coral loss at sites [closer to/further from] the volcano. Wave height showed a [negative] effect (β = [VALUE], 95% CI: [LOWER]–[UPPER]) — [greater coral loss at more wave-exposed sites/ update based on actual direction].

Note: unlike sargassum where all fixed effect CIs crossed zero, hard coral coefficients are expected to show clearer directional effects given the substantial pre-post decline (mean 16.0% → 8.2%) and the more continuous pre-cover distribution.


15. Session Info

Show code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS 26.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Brisbane
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] flextable_0.9.11        DHARMa_0.4.7            emmeans_1.11.1         
 [4] effects_4.2-5           carData_3.0-5           rstan_2.36.0.9000      
 [7] StanHeaders_2.36.0.9000 broom.mixed_0.2.9.6     broom_1.0.7            
[10] future_1.70.0           knitr_1.49              here_1.0.2             
[13] patchwork_1.3.0         loo_2.8.0               bayesplot_1.11.1       
[16] tidybayes_3.0.7         brms_2.22.0             Rcpp_1.1.0             
[19] lubridate_1.9.4         forcats_1.0.0           stringr_1.5.1          
[22] dplyr_1.1.4             purrr_1.0.4             readr_2.1.5            
[25] tidyr_1.3.1             tibble_3.3.0            ggplot2_4.0.0          
[28] tidyverse_2.0.0         MASS_7.3-61            

loaded via a namespace (and not attached):
 [1] DBI_1.2.3               Rdpack_2.6.3            gridExtra_2.3          
 [4] inline_0.3.21           rlang_1.1.6             magrittr_2.0.3         
 [7] furrr_0.3.1             matrixStats_1.5.0       compiler_4.4.2         
[10] systemfonts_1.3.2       vctrs_0.6.5             pkgconfig_2.0.3        
[13] arrayhelpers_1.1-0      fastmap_1.2.0           backports_1.5.0        
[16] rmarkdown_2.29          tzdb_0.5.0              nloptr_2.2.1           
[19] ragg_1.3.3              xfun_0.52               jsonlite_2.0.0         
[22] uuid_1.2-1              parallel_4.4.2          R6_2.6.1               
[25] stringi_1.8.7           RColorBrewer_1.1-3      parallelly_1.46.1      
[28] boot_1.3-31             estimability_1.5.1      Matrix_1.7-1           
[31] splines_4.4.2           nnet_7.3-19             timechange_0.3.0       
[34] tidyselect_1.2.1        rstudioapi_0.17.1       abind_1.4-8            
[37] yaml_2.3.10             codetools_0.2-20        curl_6.2.0             
[40] listenv_0.9.1           pkgbuild_1.4.8          lattice_0.22-6         
[43] withr_3.0.2             bridgesampling_1.1-2    S7_0.2.0               
[46] askpass_1.2.1           posterior_1.6.1         coda_0.19-4.1          
[49] evaluate_1.0.3          survival_3.7-0          RcppParallel_5.1.10    
[52] survey_4.5              zip_2.3.2               xml2_1.3.6             
[55] ggdist_3.3.2            pillar_1.11.0           tensorA_0.36.2.1       
[58] checkmate_2.3.2         stats4_4.4.2            insight_1.3.1          
[61] reformulas_0.4.0        distributional_0.5.0    generics_0.1.4         
[64] rprojroot_2.1.1         hms_1.1.3               rstantools_2.4.0       
[67] scales_1.4.0            minqa_1.2.8             globals_0.19.1         
[70] xtable_1.8-4            glue_1.8.0              gdtools_0.5.0          
[73] tools_4.4.2             data.table_1.17.0       lme4_1.1-37            
[76] mvtnorm_1.3-3           grid_4.4.2              mitools_2.4            
[79] rbibutils_2.3           QuickJSR_1.8.0          colorspace_2.1-1       
[82] nlme_3.1-166            cli_3.6.5               textshaping_1.0.0      
[85] officer_0.7.3           fontBitstreamVera_0.1.1 svUnit_1.0.6           
[88] Brobdingnag_1.2-9       V8_6.0.4                gtable_0.3.6           
[91] fontquiver_0.2.1        digest_0.6.37           htmlwidgets_1.6.4      
[94] farver_2.1.2            htmltools_0.5.8.1       lifecycle_1.0.4        
[97] openssl_2.3.2           fontLiberation_0.1.0