This notebook cleans and analyses the Round-2 Expert Elicitation (EE) export to produce policy-relevant diagnostics of:
The survey is structured around taxa × habitat group
combinations. Some taxa are represented by multiple
survey_taxa rows within the same habitat group
(notably freshwater assemblages and lichens). To make sure headline
results are not an artefact of that survey-row multiplicity, the
analysis reports key summaries under two transparent aggregation
assumptions:
survey_taxa
rows contribute as-is (faithful to survey structure).Where A and B agree, conclusions are robust to the aggregation choice; where they differ, results should be interpreted as sensitive to how survey rows represent ecological “units”.
This analysis is structured as expert × taxa:habitat (survey row) × scenario observations.
Policy magnitudes (Q2) are explicitly defined as changes relative to the counterfactual (Q1). Therefore the implied signed outcome under policy is:
\[ \text{Absolute policy outcome (signed \%)} = \text{Counterfactual signed \%} + \text{Policy delta signed \%} \]
This ensures we keep separate:
Example (signed magnitudes): - If Q1b = 20 and Q1a indicates “decrease” → counterfactual = −20% - If Q2b = 10 and Q2a indicates “increase” → delta = +10% - Absolute outcome under policy = −20 + 10 = −10%
In the code, this quantity is stored as:
abs_policy_mag_signed = cf_mag_signed + pol_mag_signed
Outputs include summary tables, compact plots, category distributions, and a panel-size sensitivity test (resampling stability).
# Packages
# ============================================================
library(readxl)
library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
library(readr)
library(ggplot2)
library(forcats)
library(knitr)
library(kableExtra)
library(scales)
# ------------------------------------------------------------
# Input file - update 2 March 2026
# ------------------------------------------------------------
#xlsx_path <- "./data/p591243814707_benjamin.firth_38179087.xlsx"
xlsx_path <- "./data/p591243814707_benjamin.firth_38179087_update_260302.xlsx"
df_raw <- read_excel(xlsx_path)
tibble(
n_rows = nrow(df_raw),
n_cols = ncol(df_raw),
n_Q_cols = sum(str_detect(names(df_raw), "^Q\\d"))
) %>%
kable(caption = "Raw Excel shape") %>%
kable_styling(full_width = FALSE)
| n_rows | n_cols | n_Q_cols |
|---|---|---|
| 7 | 445 | 420 |
Some respondent cells contain HTML tags
(e.g. <span style="...">No meaningful change</span>).
We remove tags while preserving the displayed text.
Some, but not all respondent cells contain HTML (e.g. No meaningful change). Remove tags while preserving text:
strip_html <- function(x) {
if (is.character(x) || inherits(x, "factor")) {
x2 <- as.character(x)
x2 <- str_replace_all(x2, "<[^>]+>", "")
str_squish(x2)
} else {
x
}
}
df_clean <- df_raw %>%
mutate(across(everything(), strip_html))
id_gj) mapped from
Contact_name (keep all experts)Two respondents are missing some metadata but do have
Contact_name. To keep all experts while removing
identifiable information, we assign a stable integer ID (1..N) based on
Contact_name and do not carry forward names.
df_clean <- df_clean %>%
mutate(Contact_name = str_squish(as.character(Contact_name)))
contact_key <- df_clean %>%
distinct(Contact_name) %>%
filter(!is.na(Contact_name) & Contact_name != "") %>%
arrange(Contact_name) %>%
mutate(id_gj = row_number())
# save if needed later
#write.csv(contact_key, "./data/respondent_key_lookup.csv", row.names = FALSE)
# Expert mapping: Contact_name → id_gj
df1 <- df_clean %>%
left_join(contact_key, by = "Contact_name")
df_tidy (expert ×
survey_taxa × scenario)We pivot all Q* columns into long format, parse
scenario, then pivot wide.
Parsing rules:
Q1* = "counterfactual" scenarioQ2*_*_{1..4} =
"policy_1".."policy_4" scenariodf_q <- df1 %>%
select(id_gj, matches("^Q\\d"))
long_q <- df_q %>%
pivot_longer(
cols = matches("^Q\\d"),
names_to = "colname",
values_to = "value",
values_transform = list(value = as.character) # avoid type conflicts
) %>%
mutate(
q_code = str_match(colname, "^(Q\\d+[a-z])_\\d+")[,2],
idx = as.integer(str_match(colname, "^(Q\\d+[a-z])_(\\d+)")[,3]),
scen_num = str_match(colname, "^(Q\\d+[a-z])_\\d+_(\\d+)")[,3],
scen_num = if_else(is.na(scen_num), NA_integer_, as.integer(scen_num)),
scenario = case_when(
str_detect(q_code, "^Q1") ~ "counterfactual",
str_detect(q_code, "^Q2") & !is.na(scen_num) ~ paste0("policy_", scen_num),
TRUE ~ NA_character_
),
question = str_trim(str_match(colname, ":\\s*([^\\(]+)\\s*\\(")[,2]),
survey_taxa = str_trim(str_match(colname, "\\((.*)\\)\\s*$")[,2])
) %>%
mutate(
col_out = paste0(
q_code, "_",
str_replace_all(str_to_lower(question), "[^a-z0-9]+", "_") %>%
str_replace_all("^_|_$", "")
)
) %>%
filter(!is.na(scenario)) %>%
select(id_gj, survey_taxa, scenario, col_out, value)
# reshape_wide
df_tidy <- long_q %>%
pivot_wider(
id_cols = c(id_gj, survey_taxa, scenario),
names_from = col_out,
values_from = value,
values_fn = ~ {
if (length(.x) == 0) NA_character_
else if (length(.x) == 1) .x
else paste(.x, collapse = " | ")
}
) %>%
arrange(id_gj, survey_taxa, scenario)
df_tidy %>%
count(scenario) %>%
kable(caption = "Scenario counts in df_tidy") %>%
kable_styling(full_width = FALSE)
| scenario | n |
|---|---|
| counterfactual | 147 |
| policy_1 | 147 |
| policy_2 | 147 |
| policy_3 | 147 |
| policy_4 | 147 |
habitat parsingGoal: 1. Keep habitat_group logic
exactly as-is (broad keyword mapping).
2. Set habitat_type to NA for all rows EXCEPT the
five listed survey_taxa values, where we want a human-readable,
standardised habitat_type label.
Why: - Many survey_taxa strings do not have a trailing “(…)” substring, and for reporting we only want habitat_type populated for the handful of taxa where we explicitly want a more granular label.
Output targets (exact): - “Ditch & Wetland Channel Freshwater invert assemblages” -> “Ditch & Wetland Channel” - “Running-Water Freshwater invert assemblages (Rivers/Streams)” -> “Running-Water (Rivers/Streams)” - “Standing-Water Freshwater invert assemblages (Ponds/Lakes)” -> “Standing-Water (Ponds/Lakes)” - “Saxicolous Lichens (Rock & Stone)” -> “Saxicolous (Rock & Stone)” - “Terricolous Heath/Bog Lichens” -> “Terricolous (Heath/Bog)”
# ============================================================
# Habitat parsing
# ============================================================
# Helper:
# Return NA unless survey_taxa is one of the five special cases
habitat_type <- function(survey_taxa) {
x <- stringr::str_squish(as.character(survey_taxa))
dplyr::case_when(
# Freshwater assemblages
stringr::str_detect(x, "^Ditch\\s*&\\s*Wetland\\s*Channel\\s+Freshwater\\s+invert\\s+assemblages\\s*$") ~
"Ditch & Wetland Channel",
stringr::str_detect(x, "^Running\\-Water\\s+Freshwater\\s+invert\\s+assemblages\\s*\\(Rivers/Streams\\)\\s*$") ~
"Running-Water (Rivers/Streams)",
stringr::str_detect(x, "^Standing\\-Water\\s+Freshwater\\s+invert\\s+assemblages\\s*\\(Ponds/Lakes\\)\\s*$") ~
"Standing-Water (Ponds/Lakes)",
# Lichens
stringr::str_detect(x, "^Saxicolous\\s+Lichens\\s*\\(Rock\\s*&\\s*Stone\\)\\s*$") ~
"Saxicolous (Rock & Stone)",
stringr::str_detect(x, "^Terricolous\\s+Heath/Bog\\s+Lichens\\s*$") ~
"Terricolous (Heath/Bog)",
TRUE ~ NA_character_
)
}
df_tidy <- df_tidy %>%
dplyr::mutate(
# --- habitat_type: NA except for five explicitly defined taxa ---
habitat_type = habitat_type(survey_taxa),
# Primary classification text for habitat_group mapping:
# keep our original intent: use habitat_type if present, else survey_taxa.
hg_txt = stringr::str_to_lower(dplyr::coalesce(habitat_type, survey_taxa)),
# --- habitat_group mapping (UNCHANGED) ---
habitat_group = dplyr::case_when(
stringr::str_detect(hg_txt, "wetland|aquatic|water\\b|water\\-|water\\s+margin|river|stream|rivers|streams|pond|ponds|lake|lakes|ditch|channel|riparian|rheophytic|rheo") ~
"Freshwater/Wetland",
stringr::str_detect(hg_txt, "woodland|parkland|scrub|deadwood|epiphyt|humidity|veteran|ancient|saproxylic") ~
"Woodland",
stringr::str_detect(hg_txt, "grassland|heath|bog|open\\-habitat|open\\s+habitat|open\\s+ground|bare\\-ground|bare\\s+ground|exposed\\s+substrates?|terricolous") ~
"Open habitats",
stringr::str_detect(hg_txt, "rock|stone|saxicolous") ~
"Open habitats",
TRUE ~ "Unknown"
),
# Force non-missing even if something slips through
habitat_group = dplyr::if_else(is.na(habitat_group) | habitat_group == "", "Unknown", habitat_group),
# taxa grouping (unchanged idea; keyword-based)
taxa = dplyr::case_when(
stringr::str_detect(survey_taxa, stringr::regex("\\bDiptera\\b", ignore_case = TRUE)) ~ "Diptera",
stringr::str_detect(survey_taxa, stringr::regex("\\bBeetles\\b", ignore_case = TRUE)) ~ "Beetles",
stringr::str_detect(survey_taxa, stringr::regex("Vascular Plants\\b", ignore_case = TRUE)) ~ "Vascular plants",
stringr::str_detect(survey_taxa, stringr::regex("\\bLichens\\b", ignore_case = TRUE)) ~ "Lichens",
stringr::str_detect(survey_taxa, stringr::regex("\\bBryophytes?\\b", ignore_case = TRUE)) ~ "Bryophytes",
stringr::str_detect(survey_taxa, stringr::regex("\\bSpiders\\b", ignore_case = TRUE)) ~ "Spiders",
stringr::str_detect(survey_taxa, stringr::regex("Freshwater invert assemblages\\b", ignore_case = TRUE)) ~ "Freshwater invertebrate assemblages",
TRUE ~ "Other/unknown"
)
) %>%
dplyr::select(-hg_txt)
# Diagnostics: do we have any NA habitat_group now?
df_tidy %>%
summarise(
n_rows = n(),
n_habitat_group_na = sum(is.na(habitat_group)),
n_habitat_group_unknown = sum(habitat_group == "Unknown", na.rm = TRUE)
) %>%
kable(caption = "habitat_group diagnostics (should have 0 NA)") %>%
kable_styling(full_width = FALSE)
| n_rows | n_habitat_group_na | n_habitat_group_unknown |
|---|---|---|
| 735 | 0 | 0 |
# Coverage: show habitat_group by survey_taxa (useful spot-check)
df_tidy %>%
distinct(survey_taxa, habitat_type, habitat_group) %>%
arrange(habitat_group, survey_taxa) %>%
kable(caption = "Mapping check: survey_taxa to habitat_group") %>%
kable_styling(full_width = FALSE) %>%
scroll_box(height = "320px")
| survey_taxa | habitat_type | habitat_group |
|---|---|---|
| Ditch & Wetland Channel Freshwater invert assemblages | Ditch & Wetland Channel | Freshwater/Wetland |
| Running-Water Freshwater invert assemblages (Rivers/Streams) | Running-Water (Rivers/Streams) | Freshwater/Wetland |
| Standing-Water Freshwater invert assemblages (Ponds/Lakes) | Standing-Water (Ponds/Lakes) | Freshwater/Wetland |
| Wetland & Aquatic Beetles | NA | Freshwater/Wetland |
| Wetland & Aquatic Vascular Plants | NA | Freshwater/Wetland |
| Wetland & Rheophytic Bryophytes | NA | Freshwater/Wetland |
| Wetland & Riparian Spiders | NA | Freshwater/Wetland |
| Wetland & Water-Margin Diptera | NA | Freshwater/Wetland |
| Open Grassland & Heath Vascular Plants | NA | Open habitats |
| Open Grassland/Heath/Bare-Ground Spiders | NA | Open habitats |
| Open-Ground Bryophytes (Grassland, Heath, Exposed Substrates) | NA | Open habitats |
| Open-Habitat & Flower-Visiting Diptera | NA | Open habitats |
| Open-Habitat Ground Beetles (Grassland & Heath) | NA | Open habitats |
| Saxicolous Lichens (Rock & Stone) | Saxicolous (Rock & Stone) | Open habitats |
| Terricolous Heath/Bog Lichens | Terricolous (Heath/Bog) | Open habitats |
| Epiphytic Lichens (Woodland & Parkland) | NA | Woodland |
| Saproxylic & Woodland Diptera | NA | Woodland |
| Saproxylic Beetles (Woodland Deadwood) | NA | Woodland |
| Woodland & Scrub Spiders | NA | Woodland |
| Woodland & Scrub Vascular Plants | NA | Woodland |
| Woodland Humidity & Epiphyte Bryophytes | NA | Woodland |
We now translate/compute:
cf_mag_signed + pol_mag_signedImportant convention: responses of “do not know/unsure” are treated the same as missing.
# ============================================================
# Scoring helpers
# ============================================================
score_dist <- function(x){
x0 <- str_squish(tolower(as.character(x)))
x0[x0 %in% c("do not know/unsure","do not know","unsure","don't know","dont know","dk","")] <- NA_character_
case_when(
is.na(x0) ~ NA_real_,
str_detect(x0, "robust|significant|large") & str_detect(x0, "decrease") ~ 1,
str_detect(x0, "moderate") & str_detect(x0, "decrease") ~ 2,
str_detect(x0, "no meaningful|no change") ~ 3,
str_detect(x0, "moderate") & str_detect(x0, "increase") ~ 4,
str_detect(x0, "robust|significant|large") & str_detect(x0, "increase") ~ 5,
TRUE ~ NA_real_
)
}
score_cert <- function(x){
x0 <- str_squish(tolower(as.character(x)))
x0[x0 %in% c("do not know/unsure","do not know","unsure","don't know","dont know","dk","")] <- NA_character_
case_when(
is.na(x0) ~ NA_real_,
x0 %in% c("very low","very_low","very-low") ~ 1,
x0 %in% c("low") ~ 2,
x0 %in% c("medium","moderate") ~ 3,
x0 %in% c("high") ~ 4,
x0 %in% c("very high","very_high","very-high") ~ 5,
TRUE ~ NA_real_
)
}
dir_sign <- function(x){
x0 <- str_squish(tolower(as.character(x)))
x0[x0 %in% c("do not know/unsure","do not know","unsure","don't know","dont know","dk","")] <- NA_character_
case_when(
is.na(x0) ~ NA_real_,
str_detect(x0, "decrease") ~ -1,
str_detect(x0, "increase") ~ 1,
str_detect(x0, "no meaningful|no change") ~ 0,
TRUE ~ NA_real_
)
}
# ============================================================
# Score and sign responses
# ============================================================
Q1a <- "Q1a_likely_change_in_distribution"
Q1b <- "Q1b_magnitude_of_change"
Q1c <- "Q1c_your_certainty"
Q2a <- "Q2a_likely_change_in_distribution_compared_to_counterfactual"
Q2b <- "Q2b_magnitude_of_change_compared_to_the_counterfactual"
Q2c <- "Q2c_your_certainty"
df_scored <- df_tidy %>%
mutate(
# parse numeric magnitudes (percent scale assumed in survey)
Q1b_num = parse_number(as.character(.data[[Q1b]]), na = c("N/A","NA","na","n/a","")),
Q2b_num = parse_number(as.character(.data[[Q2b]]), na = c("N/A","NA","na","n/a","")),
# scored categories
cf_dist = score_dist(.data[[Q1a]]),
pol_dist = score_dist(.data[[Q2a]]),
cf_cert = score_cert(.data[[Q1c]]),
pol_cert = score_cert(.data[[Q2c]]),
# direction
cf_dir = dir_sign(.data[[Q1a]]),
pol_dir = dir_sign(.data[[Q2a]]),
# signed magnitudes
cf_mag_signed = if_else(is.na(Q1b_num) | is.na(cf_dir), NA_real_, Q1b_num * cf_dir),
pol_mag_signed = if_else(is.na(Q2b_num) | is.na(pol_dir), NA_real_, Q2b_num * pol_dir)
)
# ============================================================
# Attach counterfactual (Q1) values onto policy rows
# so we can compute implied absolute outcomes later
# ============================================================
scenario_levels <- c("policy_1","policy_2","policy_3","policy_4")
cf_lookup <- df_scored %>%
filter(scenario == "counterfactual") %>%
transmute(
id_gj, survey_taxa,
cf_dist2 = cf_dist,
cf_cert2 = cf_cert,
cf_mag_signed2 = cf_mag_signed
)
df_scored2 <- df_scored %>%
left_join(cf_lookup, by = c("id_gj", "survey_taxa")) %>%
mutate(
cf_dist = cf_dist2,
cf_cert = cf_cert2,
cf_mag_signed = cf_mag_signed2
) %>%
select(-cf_dist2, -cf_cert2, -cf_mag_signed2) %>%
mutate(
abs_policy_mag_signed = if_else(
scenario %in% scenario_levels,
cf_mag_signed + pol_mag_signed,
cf_mag_signed
),
delta_cert = if_else(
scenario %in% scenario_levels,
pol_cert - cf_cert,
NA_real_
)
)
df_scored2 %>%
summarise(
n_rows = n(),
n_experts = n_distinct(id_gj),
n_survey_taxa = n_distinct(survey_taxa),
n_cf_signed = sum(!is.na(cf_mag_signed)),
n_pol_delta_signed = sum(!is.na(pol_mag_signed)),
n_abs_policy_signed = sum(!is.na(abs_policy_mag_signed))
) %>%
kable(caption = "Coverage: signed magnitudes and absolute policy outcomes") %>%
kable_styling(full_width = FALSE)
| n_rows | n_experts | n_survey_taxa | n_cf_signed | n_pol_delta_signed | n_abs_policy_signed |
|---|---|---|---|---|---|
| 735 | 7 | 21 | 705 | 564 | 705 |
# Expected columns from reshape
Q1a <- "Q1a_likely_change_in_distribution"
Q1b <- "Q1b_magnitude_of_change"
Q1c <- "Q1c_your_certainty"
Q2a <- "Q2a_likely_change_in_distribution_compared_to_counterfactual"
Q2b <- "Q2b_magnitude_of_change_compared_to_the_counterfactual"
Q2c <- "Q2c_your_certainty"
df_scored <- df_tidy %>%
mutate(
Q1b_num = parse_number(as.character(.data[[Q1b]]), na = c("N/A","NA","na","n/a","")),
Q2b_num = parse_number(as.character(.data[[Q2b]]), na = c("N/A","NA","na","n/a","")),
cf_dist = score_dist(.data[[Q1a]]),
pol_dist = score_dist(.data[[Q2a]]),
cf_cert = score_cert(.data[[Q1c]]),
pol_cert = score_cert(.data[[Q2c]]),
cf_dir = dir_sign(.data[[Q1a]]),
pol_dir = dir_sign(.data[[Q2a]]),
cf_mag_signed = if_else(is.na(Q1b_num) | is.na(cf_dir), NA_real_, Q1b_num * cf_dir),
pol_mag_signed = if_else(is.na(Q2b_num) | is.na(pol_dir), NA_real_, Q2b_num * pol_dir),
abs_policy_mag_signed = cf_mag_signed + pol_mag_signed,
delta_cert = pol_cert - cf_cert
)
df_scored %>%
summarise(
n_rows = n(),
n_cf_mag = sum(!is.na(Q1b_num)),
n_pol_mag = sum(!is.na(Q2b_num)),
n_cf_signed = sum(!is.na(cf_mag_signed)),
n_delta_signed = sum(!is.na(pol_mag_signed))
) %>%
kable(caption = "Coverage: magnitudes and signed magnitudes") %>%
kable_styling(full_width = FALSE)
| n_rows | n_cf_mag | n_pol_mag | n_cf_signed | n_delta_signed |
|---|---|---|---|---|
| 735 | 141 | 564 | 141 | 564 |
# ============================================================
# FIX: attach counterfactual (Q1) values onto policy rows
# so abs_policy_mag_signed and delta_cert can be computed
# ============================================================
scenario_levels <- c("policy_1","policy_2","policy_3","policy_4")
# 1) Build a CF lookup (one row per id_gj × survey_taxa)
cf_lookup <- df_scored %>%
filter(scenario == "counterfactual") %>%
transmute(
id_gj, survey_taxa,
cf_dist2 = cf_dist,
cf_cert2 = cf_cert,
cf_mag_signed2 = cf_mag_signed
)
# 2) Join CF values onto ALL rows (including policy rows)
df_scored2 <- df_scored %>%
left_join(cf_lookup, by = c("id_gj", "survey_taxa")) %>%
mutate(
# overwrite cf_* with joined versions (so they exist on policy rows too)
cf_dist = cf_dist2,
cf_cert = cf_cert2,
cf_mag_signed = cf_mag_signed2
) %>%
select(-cf_dist2, -cf_cert2, -cf_mag_signed2) %>%
mutate(
# Now these work (because cf_* exist on policy rows)
abs_policy_mag_signed = if_else(
scenario %in% scenario_levels,
cf_mag_signed + pol_mag_signed,
cf_mag_signed
),
delta_cert = if_else(
scenario %in% scenario_levels,
pol_cert - cf_cert,
NA_real_
)
)
# 3) Quick sanity check: abs_policy_mag_signed should no longer be all NA
df_scored2 %>%
group_by(scenario) %>%
summarise(
n = n(),
n_abs_policy_mag = sum(!is.na(abs_policy_mag_signed)),
n_delta_cert = sum(!is.na(delta_cert)),
.groups = "drop"
) %>% print()
## # A tibble: 5 × 4
## scenario n n_abs_policy_mag n_delta_cert
## <chr> <int> <int> <int>
## 1 counterfactual 147 141 0
## 2 policy_1 147 141 147
## 3 policy_2 147 141 147
## 4 policy_3 147 141 147
## 5 policy_4 147 141 147
The dataset is balanced across scenarios and experts. All scenarios are evaluated on the same underlying baseline expectations, ensuring comparability.
The elicitation design is such that the data includes
multiple survey_taxa rows for some taxa
within a single habitat group (notably freshwater assemblages and
lichens). If we compute means across all survey_taxa rows,
these taxa would contribute more weight to habitat-group and overall
summaries simply because they have more rows.
To make this transparent, we run the analysis under two assumptions:
Unit: > expert × scenario × survey_taxa
This preserves the survey structure exactly.
Unit: > expert × scenario × taxa × habitat_group
This averages across multiple survey_taxa rows that fall
into the same taxa:habitat_group cell for each expert and scenario.
Both assumptions are shown side-by-side for: - overall scenario summaries, and - habitat_group summaries.
scenario_levels <- c("counterfactual","policy_1","policy_2","policy_3","policy_4")
# A) Row-based: keep all survey_taxa rows (no collapsing)
df_A <- df_scored2 %>%
filter(scenario %in% scenario_levels) %>%
filter(!(taxa == "Lichens" & habitat_group == "Freshwater/Wetland")) %>%
mutate(assumption = "A (rows = all survey_taxa)")
# B) Collapse within habitat: unique taxa:habitat_group per expert × scenario
df_B <- df_A %>%
group_by(id_gj, scenario, habitat_group, taxa) %>%
summarise(
cf_mag_signed = mean(cf_mag_signed, na.rm = TRUE),
pol_mag_signed = mean(pol_mag_signed, na.rm = TRUE),
abs_policy_mag_signed = mean(abs_policy_mag_signed, na.rm = TRUE),
cf_dist = mean(cf_dist, na.rm = TRUE),
pol_dist = mean(pol_dist, na.rm = TRUE),
cf_cert = mean(cf_cert, na.rm = TRUE),
pol_cert = mean(pol_cert, na.rm = TRUE),
n_survey_taxa_rows_collapsed = n(),
.groups = "drop"
) %>%
mutate(assumption = "B (collapsed = unique taxa:habitat_group)")
# Combined object (handy for faceting tables/plots)
df_AB <- bind_rows(df_A, df_B)
# Diagnostics: show where collapsing occurred
df_B %>%
filter(n_survey_taxa_rows_collapsed > 1) %>%
count(habitat_group, taxa, n_survey_taxa_rows_collapsed, sort = TRUE) %>%
kable(caption = "Where Assumption B collapses multiple survey_taxa rows into one taxa:habitat_group unit") %>%
kable_styling(full_width = FALSE)
| habitat_group | taxa | n_survey_taxa_rows_collapsed | n |
|---|---|---|---|
| Freshwater/Wetland | Freshwater invertebrate assemblages | 3 | 35 |
| Open habitats | Lichens | 2 | 35 |
This section establishes the counterfactual (“business-as-usual”) baseline from Q1, before we interpret policy deltas from Q2. Because Q2 magnitudes are explicitly defined relative to Q1, we need a clear view of baseline “pull” to answer the policy-relevant question: do uplifts merely slow decline, or do they offset it sufficiently to deliver stability/recovery by 2042?
We report results under two aggregation assumptions:
Assumption A (rows = all survey_taxa): each survey row contributes as an observation. This reflects the survey design directly, but can implicitly overweight taxa that were split into multiple survey_taxa rows within a habitat group (notably freshwater assemblages and lichens).
Assumption B (collapsed = unique taxa:habitat_group): within each expert × scenario, multiple survey_taxa rows that represent the same taxa within the same habitat group are collapsed to one unit (averaged). This is the most direct way to prevent survey-row multiplicity from driving habitat-group summaries.
Importantly, taxa-level comparisons are not expected to change materially across A vs B, because the collapsing only affects where a taxon has multiple survey rows within the same habitat group. By construction, the main place where A vs B can matter is therefore habitat-group summaries.
cf_AB <- df_AB %>%
filter(scenario == "counterfactual")
# Headline baseline (A vs B)
cf_AB %>%
group_by(assumption) %>%
summarise(
n_units = n(),
n_experts = n_distinct(id_gj),
n_taxa = n_distinct(taxa),
n_hab_groups = n_distinct(habitat_group),
mean_cf = mean(cf_mag_signed, na.rm = TRUE),
median_cf = median(cf_mag_signed, na.rm = TRUE),
sd_cf = sd(cf_mag_signed, na.rm = TRUE),
p10 = quantile(cf_mag_signed, 0.10, na.rm = TRUE),
p90 = quantile(cf_mag_signed, 0.90, na.rm = TRUE),
prop_negative = mean(cf_mag_signed < 0, na.rm = TRUE),
prop_zero = mean(cf_mag_signed == 0, na.rm = TRUE),
prop_positive = mean(cf_mag_signed > 0, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(across(where(is.numeric), ~ round(.x, 2))) %>%
kable(caption = "Counterfactual baseline (Q1 signed %): Assumptions A and B") %>%
kable_styling(full_width = FALSE)
| assumption | n_units | n_experts | n_taxa | n_hab_groups | mean_cf | median_cf | sd_cf | p10 | p90 | prop_negative | prop_zero | prop_positive |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A (rows = all survey_taxa) | 147 | 7 | 7 | 3 | -12.33 | -10 | 11.46 | -30 | 0 | 0.79 | 0.13 | 0.07 |
| B (collapsed = unique taxa:habitat_group) | 126 | 7 | 7 | 3 | -11.95 | -10 | 11.74 | -30 | 0 | 0.78 | 0.15 | 0.07 |
The baseline tables report:
mean_cf: mean signed counterfactual
change (%), where negative values imply decline by 2042.prop_negative: share of units with baseline decline
(< 0), which is a compact signal of how widespread negative
expectations are.mean_cert: average reported certainty (1–5 scale).At headline level, the baseline is strongly negative under both assumptions (mean around −12% in A and slightly less negative in B), and the proportion negative is high (around four-fifths). This indicates that, in expert judgement, continued decline is the default state without further intervention.
The habitat-group baseline table is shown explicitly under both assumptions because this is where multiplicity can matter. In these data, the only systematic collapsing in B is:
so A can overweight those components in habitat-group means relative to other taxa that appear as a single survey row within habitat.
This table is shown under both assumptions because it is precisely where survey-row multiplicity can matter.
cf_by_hab_AB <- cf_AB %>%
group_by(assumption, habitat_group) %>%
summarise(
n = n(),
mean_cf = mean(cf_mag_signed, na.rm = TRUE),
prop_negative = mean(cf_mag_signed < 0, na.rm = TRUE),
mean_cert = mean(cf_cert, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(assumption, mean_cf)
cf_by_hab_AB %>%
mutate(
mean_cf = round(mean_cf, 2),
mean_cert = round(mean_cert, 2),
prop_negative = scales::percent(prop_negative, accuracy = 1)
) %>%
kable(caption = "Counterfactual baseline by habitat_group (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | habitat_group | n | mean_cf | prop_negative | mean_cert |
|---|---|---|---|---|---|
| A (rows = all survey_taxa) | Freshwater/Wetland | 56 | -13.83 | 87% | 2.02 |
| A (rows = all survey_taxa) | Open habitats | 49 | -13.43 | 85% | 1.98 |
| A (rows = all survey_taxa) | Woodland | 42 | -9.00 | 62% | 1.88 |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | 42 | -13.47 | 85% | 1.94 |
| B (collapsed = unique taxa:habitat_group) | Open habitats | 42 | -13.36 | 85% | 1.95 |
| B (collapsed = unique taxa:habitat_group) | Woodland | 42 | -9.00 | 62% | 1.88 |
two_cols <- c("#c9643e", "#618f72")
ggplot(cf_by_hab_AB, aes(x = habitat_group, y = mean_cf, fill = assumption)) +
geom_col(position = position_dodge(width = 0.8)) +
coord_flip() +
geom_hline(yintercept = 0, linetype = "dashed") +
scale_fill_manual(values = two_cols) +
labs(
title = "Counterfactual baseline by habitat_group (Assumptions A and B)",
x = NULL,
y = "Mean signed % change (counterfactual)"
) +
theme_minimal()
Across both assumptions, Freshwater/Wetland and Open habitats show the most negative baseline means, while Woodland is less negative. Under Assumption A, the mean counterfactual baseline is approximately:
Under Assumption B, the corresponding habitat-group means are very similar (Freshwater/Wetland and Open habitats remain strongly negative; Woodland remains less negative), with small shifts consistent with removing survey-row multiplicity.
Policy implication at this stage: the baseline “pull” is large enough that even sizeable positive deltas may still leave net negative outcomes once CF + delta are combined. This is exactly what Sections 5–6 test.
Taxa-level comparisons are not meaningfully altered by collapsing within habitat, so we report once using the row-based structure.
cf_by_taxa <- df_A %>%
filter(scenario == "counterfactual") %>%
group_by(taxa) %>%
summarise(
n = n(),
mean_cf = mean(cf_mag_signed, na.rm = TRUE),
prop_negative = mean(cf_mag_signed < 0, na.rm = TRUE),
mean_cert = mean(cf_cert, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(mean_cf)
cf_by_taxa %>%
mutate(
mean_cf = round(mean_cf, 2),
mean_cert = round(mean_cert, 2),
prop_negative = scales::percent(prop_negative, accuracy = 1)
) %>%
kable(caption = "Counterfactual baseline by taxa (A; identical interpretation under B)") %>%
kable_styling(full_width = FALSE)
| taxa | n | mean_cf | prop_negative | mean_cert |
|---|---|---|---|---|
| Freshwater invertebrate assemblages | 21 | -14.86 | 95% | 2.24 |
| Vascular plants | 21 | -13.33 | 76% | 2.19 |
| Lichens | 21 | -13.00 | 90% | 2.14 |
| Bryophytes | 21 | -12.62 | 86% | 2.00 |
| Diptera | 21 | -12.39 | 78% | 1.67 |
| Beetles | 21 | -10.48 | 67% | 1.95 |
| Spiders | 21 | -9.17 | 61% | 1.57 |
ggplot(cf_AB, aes(x = cf_mag_signed, fill = assumption)) +
geom_histogram(bins = 18, color = "white") +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~ assumption) +
scale_fill_manual(values = two_cols) +
guides(fill = "none") +
labs(
title = "Counterfactual (Q1): distribution of signed magnitude (% change)",
x = "Signed % change under counterfactual",
y = "Count"
) +
theme_minimal()
Section 5 interprets Q2 policy deltas, which are defined as incremental change relative to the counterfactual. These are not absolute outcomes. A positive delta means “less decline than counterfactual” (or “more increase than counterfactual”), but it does not tell us whether the system is stable/recovering overall.
We therefore treat Section 5 as answering:
The coverage table confirms that both A and B are based on the same expert panel (n=7), with B having fewer units because of the within-habitat collapsing step.
Q2 magnitudes are defined relative to the counterfactual. We show overall and habitat-group summaries under both assumptions.
pol_AB <- df_AB %>%
filter(str_detect(scenario, "^policy_"))
pol_AB %>%
group_by(assumption) %>%
summarise(
n_units = n(),
n_nonmiss = sum(!is.na(pol_mag_signed)),
n_experts = n_distinct(id_gj),
.groups = "drop"
) %>%
kable(caption = "Incremental policy delta (Q2) coverage check (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | n_units | n_nonmiss | n_experts |
|---|---|---|---|
| A (rows = all survey_taxa) | 588 | 564 | 7 |
| B (collapsed = unique taxa:habitat_group) | 504 | 480 | 7 |
The scenario-level delta table shows positive mean uplifts for every policy scenario under both assumptions. Under Assumption A (row-based), the mean deltas are:
This produces a clear and stable ranking: policy_3 > policy_2 ≈ policy_4 > policy_1.
The “prop_positive” column shows that most expert–unit responses are positive deltas (for example, under policy_3 it is very high), meaning experts generally agree policies improve outcomes relative to baseline even when they disagree about magnitude.
A vs B: At the overall scenario level, collapsing multiplicity (B) should not materially change the ordering and typically shifts headline means only modestly, because only a small subset of taxa:habitat combinations are affected by the collapse step. The main sensitivity to A vs B is expected to show up in habitat-group summaries, not in the overall ranking.
delta_headline_AB <- pol_AB %>%
group_by(assumption, scenario) %>%
summarise(
mean_delta = mean(pol_mag_signed, na.rm = TRUE),
median_delta = median(pol_mag_signed, na.rm = TRUE),
sd_delta = sd(pol_mag_signed, na.rm = TRUE),
p10 = quantile(pol_mag_signed, 0.10, na.rm = TRUE),
p90 = quantile(pol_mag_signed, 0.90, na.rm = TRUE),
prop_positive = mean(pol_mag_signed > 0, na.rm = TRUE),
prop_zero = mean(pol_mag_signed == 0, na.rm = TRUE),
prop_negative = mean(pol_mag_signed < 0, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(assumption, desc(mean_delta))
delta_headline_AB %>%
mutate(across(where(is.numeric), ~ round(.x, 2))) %>%
kable(caption = "Incremental policy uplifts (Q2 deltas): overall by scenario (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | scenario | mean_delta | median_delta | sd_delta | p10 | p90 | prop_positive | prop_zero | prop_negative |
|---|---|---|---|---|---|---|---|---|---|
| A (rows = all survey_taxa) | policy_3 | 16.55 | 15 | 6.43 | 8.00 | 25.0 | 0.99 | 0.01 | 0.00 |
| A (rows = all survey_taxa) | policy_4 | 11.81 | 13 | 5.02 | 5.00 | 18.0 | 0.96 | 0.04 | 0.00 |
| A (rows = all survey_taxa) | policy_2 | 10.75 | 10 | 5.77 | 4.00 | 20.0 | 0.94 | 0.06 | 0.00 |
| A (rows = all survey_taxa) | policy_1 | 4.15 | 5 | 6.98 | -10.00 | 10.0 | 0.73 | 0.13 | 0.14 |
| B (collapsed = unique taxa:habitat_group) | policy_3 | 16.58 | 15 | 6.34 | 7.90 | 25.0 | 0.98 | 0.02 | 0.00 |
| B (collapsed = unique taxa:habitat_group) | policy_4 | 11.81 | 13 | 4.98 | 5.90 | 17.1 | 0.96 | 0.04 | 0.00 |
| B (collapsed = unique taxa:habitat_group) | policy_2 | 10.94 | 10 | 5.63 | 4.95 | 20.0 | 0.97 | 0.03 | 0.00 |
| B (collapsed = unique taxa:habitat_group) | policy_1 | 4.24 | 5 | 6.96 | -10.00 | 10.0 | 0.77 | 0.09 | 0.14 |
ggplot(delta_headline_AB, aes(x = scenario, y = mean_delta, fill = assumption)) +
geom_col(position = position_dodge(width = 0.8)) +
geom_hline(yintercept = 0, linetype = "dashed") +
scale_fill_manual(values = two_cols) +
labs(
title = "Incremental policy uplifts by scenario (Assumptions A and B)",
x = NULL,
y = "Mean signed % delta"
) +
theme_minimal()
delta_by_hab_AB <- pol_AB %>%
group_by(assumption, habitat_group, scenario) %>%
summarise(
n = n(),
mean_delta = mean(pol_mag_signed, na.rm = TRUE),
prop_positive = mean(pol_mag_signed > 0, na.rm = TRUE),
.groups = "drop"
)
delta_by_hab_AB %>%
mutate(
mean_delta = round(mean_delta, 2),
prop_positive = scales::percent(prop_positive, accuracy = 1)
) %>%
arrange(assumption, habitat_group, scenario) %>%
kable(caption = "Incremental policy uplifts by habitat_group (Assumptions A and B)") %>%
kable_styling(full_width = FALSE) %>%
scroll_box(height = "320px")
| assumption | habitat_group | scenario | n | mean_delta | prop_positive |
|---|---|---|---|---|---|
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_1 | 56 | 4.44 | 74% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_2 | 56 | 10.83 | 93% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_3 | 56 | 17.61 | 100% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_4 | 56 | 13.13 | 98% |
| A (rows = all survey_taxa) | Open habitats | policy_1 | 49 | 3.94 | 72% |
| A (rows = all survey_taxa) | Open habitats | policy_2 | 49 | 10.81 | 94% |
| A (rows = all survey_taxa) | Open habitats | policy_3 | 49 | 16.32 | 98% |
| A (rows = all survey_taxa) | Open habitats | policy_4 | 49 | 11.51 | 98% |
| A (rows = all survey_taxa) | Woodland | policy_1 | 42 | 4.00 | 72% |
| A (rows = all survey_taxa) | Woodland | policy_2 | 42 | 10.57 | 98% |
| A (rows = all survey_taxa) | Woodland | policy_3 | 42 | 15.40 | 98% |
| A (rows = all survey_taxa) | Woodland | policy_4 | 42 | 10.38 | 92% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_1 | 42 | 4.35 | 78% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_2 | 42 | 10.89 | 98% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_3 | 42 | 17.59 | 100% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_4 | 42 | 13.12 | 98% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_1 | 42 | 4.38 | 80% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_2 | 42 | 11.35 | 95% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_3 | 42 | 16.75 | 98% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_4 | 42 | 11.93 | 98% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_1 | 42 | 4.00 | 72% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_2 | 42 | 10.57 | 98% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_3 | 42 | 15.40 | 98% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_4 | 42 | 10.38 | 92% |
ggplot(delta_by_hab_AB, aes(x = scenario, y = mean_delta, fill = assumption)) +
geom_col(position = position_dodge(width = 0.8)) +
facet_wrap(~ habitat_group) +
geom_hline(yintercept = 0, linetype = "dashed") +
scale_fill_manual(values = two_cols) +
labs(
title = "Incremental policy uplifts by habitat_group (Assumptions A and B)",
x = NULL,
y = "Mean signed % delta"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 0.5)
)
Habitat-group uplift summaries provide the first indication of where policy leverage is expected to be strongest. The key point is not just “uplifts are positive”, but whether uplifts in the most-negative baseline habitat groups (Freshwater/Wetland and Open habitats) are large enough to offset the baseline pull established in Section 4.
Because Assumption B collapses the freshwater assemblage multiplicity (and lichen multiplicity in open habitats), the habitat-group mean delta is the place to look for any material change between A vs B. If the habitat-group conclusions are robust across A and B, that strengthens confidence that the reported habitat differences are not artefacts of how the survey rows were structured.
pol_A <- df_A %>%
filter(str_detect(scenario, "^policy_"))
delta_by_taxa <- pol_A %>%
group_by(taxa, scenario) %>%
summarise(
n = n(),
mean_delta = mean(pol_mag_signed, na.rm = TRUE),
prop_positive = mean(pol_mag_signed > 0, na.rm = TRUE),
.groups = "drop"
)
delta_by_taxa %>%
mutate(
mean_delta = round(mean_delta, 2),
prop_positive = scales::percent(prop_positive, accuracy = 1)
) %>%
kable(caption = "Incremental policy uplifts by taxa (A; identical interpretation under B)") %>%
kable_styling(full_width = FALSE) %>%
scroll_box(height = "320px")
| taxa | scenario | n | mean_delta | prop_positive |
|---|---|---|---|---|
| Beetles | policy_1 | 21 | 4.76 | 71% |
| Beetles | policy_2 | 21 | 12.43 | 100% |
| Beetles | policy_3 | 21 | 17.10 | 100% |
| Beetles | policy_4 | 21 | 11.71 | 95% |
| Bryophytes | policy_1 | 21 | 3.24 | 76% |
| Bryophytes | policy_2 | 21 | 9.95 | 95% |
| Bryophytes | policy_3 | 21 | 16.10 | 95% |
| Bryophytes | policy_4 | 21 | 11.52 | 95% |
| Diptera | policy_1 | 21 | 3.89 | 67% |
| Diptera | policy_2 | 21 | 10.44 | 89% |
| Diptera | policy_3 | 21 | 15.83 | 94% |
| Diptera | policy_4 | 21 | 11.56 | 83% |
| Freshwater invertebrate assemblages | policy_1 | 21 | 4.71 | 71% |
| Freshwater invertebrate assemblages | policy_2 | 21 | 10.67 | 86% |
| Freshwater invertebrate assemblages | policy_3 | 21 | 17.67 | 100% |
| Freshwater invertebrate assemblages | policy_4 | 21 | 13.14 | 100% |
| Lichens | policy_1 | 21 | 2.52 | 62% |
| Lichens | policy_2 | 21 | 7.90 | 90% |
| Lichens | policy_3 | 21 | 13.90 | 100% |
| Lichens | policy_4 | 21 | 9.19 | 100% |
| Spiders | policy_1 | 21 | 3.50 | 78% |
| Spiders | policy_2 | 21 | 10.94 | 100% |
| Spiders | policy_3 | 21 | 16.28 | 100% |
| Spiders | policy_4 | 21 | 11.44 | 100% |
| Vascular plants | policy_1 | 21 | 6.29 | 86% |
| Vascular plants | policy_2 | 21 | 12.90 | 100% |
| Vascular plants | policy_3 | 21 | 18.86 | 100% |
| Vascular plants | policy_4 | 21 | 14.00 | 100% |
Across both aggregation assumptions, experts perceive all policy scenarios as improving outcomes relative to the counterfactual. The policy scenarios are clearly differentiated in strength, with policy_3 consistently eliciting the largest average uplift, followed by policy_2 and policy_4, with policy_1 weakest.
However, deltas alone do not answer whether the target-relevant condition (stability/recovery by 2042) is achieved. That requires combining Q1 and Q2 into absolute outcomes (Section 6).
Section 6 performs the key “sufficiency test” by converting the elicited components into absolute outcomes under each policy scenario:
abs_policy_mag_signed = cf_mag_signed + pol_mag_signed
This step matters because it answers the policy-facing question:
We present headline results under both Assumptions A and B. Here, A vs B matters most for habitat-group interpretation, because the collapsing step in B prevents freshwater assemblage and lichen multiplicity from dominating habitat-group means.
Absolute signed outcomes under policy are:
Absolute policy outcome = Counterfactual (Q1) + Policy delta (Q2)policy_summary_AB <- pol_AB %>%
group_by(assumption, scenario) %>%
summarise(
mean_policy = mean(abs_policy_mag_signed, na.rm = TRUE),
median_policy = median(abs_policy_mag_signed, na.rm = TRUE),
sd_policy = sd(abs_policy_mag_signed, na.rm = TRUE),
p10 = quantile(abs_policy_mag_signed, 0.10, na.rm = TRUE),
p90 = quantile(abs_policy_mag_signed, 0.90, na.rm = TRUE),
prop_positive = mean(abs_policy_mag_signed > 0, na.rm = TRUE),
prop_near_zero = mean(abs(abs_policy_mag_signed) <= 1, na.rm = TRUE),
prop_negative = mean(abs_policy_mag_signed < 0, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(assumption, desc(mean_policy))
policy_summary_AB %>%
mutate(across(where(is.numeric), ~ round(.x, 2))) %>%
kable(caption = "Absolute signed % outcome under each policy (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | scenario | mean_policy | median_policy | sd_policy | p10 | p90 | prop_positive | prop_near_zero | prop_negative |
|---|---|---|---|---|---|---|---|---|---|
| A (rows = all survey_taxa) | policy_3 | 4.23 | 5 | 9.42 | -10 | 15.0 | 0.67 | 0.13 | 0.21 |
| A (rows = all survey_taxa) | policy_4 | -0.52 | 0 | 10.30 | -15 | 13.0 | 0.41 | 0.23 | 0.36 |
| A (rows = all survey_taxa) | policy_2 | -1.57 | 0 | 10.54 | -20 | 9.0 | 0.34 | 0.28 | 0.39 |
| A (rows = all survey_taxa) | policy_1 | -8.18 | -5 | 12.20 | -25 | 5.0 | 0.18 | 0.20 | 0.63 |
| B (collapsed = unique taxa:habitat_group) | policy_3 | 4.63 | 5 | 9.78 | -10 | 15.0 | 0.68 | 0.12 | 0.21 |
| B (collapsed = unique taxa:habitat_group) | policy_4 | -0.14 | 0 | 10.67 | -15 | 13.0 | 0.42 | 0.22 | 0.35 |
| B (collapsed = unique taxa:habitat_group) | policy_2 | -1.01 | 0 | 10.89 | -20 | 10.0 | 0.38 | 0.31 | 0.34 |
| B (collapsed = unique taxa:habitat_group) | policy_1 | -7.70 | -5 | 12.64 | -25 | 5.3 | 0.20 | 0.22 | 0.59 |
The scenario-level absolute outcome table shows a consistent pattern:
Under Assumption A, the mean absolute outcomes are:
This is the core quantitative message: policy_3 comes close to offsetting baseline decline on average, but does not cross the stability threshold (0%) in the mean.
Two nuances are important for interpretation:
A vs B: the table is shown under both assumptions because the “headline mean” is sensitive to how units are counted. If the A vs B results are close, that strengthens confidence that conclusions are not being driven by survey-row multiplicity.
ggplot(policy_summary_AB, aes(x = scenario, y = mean_policy, fill = assumption)) +
geom_col(position = position_dodge(width = 0.8)) +
geom_hline(yintercept = 0, linetype = "dashed") +
scale_fill_manual(values = two_cols) +
labs(
title = "Absolute signed % outcome under each policy (Assumptions A and B)",
x = NULL,
y = "Mean signed % change (CF + delta)"
) +
theme_minimal()
cf_mean_AB <- cf_AB %>%
group_by(assumption) %>%
summarise(mean_cf = mean(cf_mag_signed, na.rm = TRUE), .groups = "drop")
policy_summary_plot_AB <- policy_summary_AB %>%
left_join(cf_mean_AB, by = "assumption")
ggplot(policy_summary_plot_AB, aes(x = scenario)) +
geom_col(aes(y = mean_policy, fill = assumption), position = position_dodge(width = 0.8)) +
facet_wrap(~ assumption) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_hline(aes(yintercept = mean_cf), linetype = "dotted") +
scale_fill_manual(values = two_cols) +
labs(
title = "Mean signed % outcome: Counterfactual vs Policy (A and B)",
subtitle = "Dotted = counterfactual mean; dashed = stability (0%)",
x = NULL,
y = "Mean signed % change"
) +
theme_minimal()
In the mean comparison plot, the dashed line at 0% represents stability (no net change by 2042). The dotted line marks the counterfactual mean. The vertical distance:
The important result is that all scenario means sit above the counterfactual mean (uplift), but still below zero (insufficient for stability in the mean). This graph makes the distinction between “uplift” and “recovery” visually explicit.
pol_AB %>%
group_by(assumption, scenario) %>%
summarise(
prop_recovery = mean(abs_policy_mag_signed > 0, na.rm = TRUE),
prop_stabilised = mean(abs_policy_mag_signed >= 0, na.rm = TRUE),
prop_decline_remaining = mean(abs_policy_mag_signed < 0, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(across(where(is.numeric), ~ scales::percent(.x, accuracy = 1))) %>%
kable(caption = "Proportion stabilised/recovered under each policy (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | scenario | prop_recovery | prop_stabilised | prop_decline_remaining |
|---|---|---|---|---|
| A (rows = all survey_taxa) | policy_1 | 18% | 37% | 63% |
| A (rows = all survey_taxa) | policy_2 | 34% | 61% | 39% |
| A (rows = all survey_taxa) | policy_3 | 67% | 79% | 21% |
| A (rows = all survey_taxa) | policy_4 | 41% | 64% | 36% |
| B (collapsed = unique taxa:habitat_group) | policy_1 | 20% | 41% | 59% |
| B (collapsed = unique taxa:habitat_group) | policy_2 | 38% | 66% | 34% |
| B (collapsed = unique taxa:habitat_group) | policy_3 | 68% | 79% | 21% |
| B (collapsed = unique taxa:habitat_group) | policy_4 | 42% | 65% | 35% |
The stabilised/recovered fractions provide a more decision-friendly view than the mean alone:
This reconciles two facts that can otherwise look contradictory:
A vs B: the fraction table is presented under both assumptions; the similarity between A and B proportions indicates the “many improve, some remain strongly negative” pattern is not an artefact of survey-row multiplicity.
policy_by_hab_AB <- df_AB %>%
group_by(assumption, habitat_group, scenario) %>%
summarise(
n = n(),
mean_policy = mean(abs_policy_mag_signed, na.rm = TRUE),
prop_positive = mean(abs_policy_mag_signed > 0, na.rm = TRUE),
.groups = "drop"
)
policy_by_hab_AB %>%
mutate(
mean_policy = round(mean_policy, 2),
prop_positive = scales::percent(prop_positive, accuracy = 1)
) %>%
arrange(assumption, habitat_group, scenario) %>%
kable(caption = "Absolute outcomes under policy by habitat_group (Assumptions A and B)") %>%
kable_styling(full_width = FALSE) %>%
scroll_box(height = "360px")
| assumption | habitat_group | scenario | n | mean_policy | prop_positive |
|---|---|---|---|---|---|
| A (rows = all survey_taxa) | Freshwater/Wetland | counterfactual | 56 | -13.83 | 2% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_1 | 56 | -9.39 | 9% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_2 | 56 | -3.00 | 24% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_3 | 56 | 3.78 | 69% |
| A (rows = all survey_taxa) | Freshwater/Wetland | policy_4 | 56 | -0.70 | 41% |
| A (rows = all survey_taxa) | Open habitats | counterfactual | 49 | -13.43 | 0% |
| A (rows = all survey_taxa) | Open habitats | policy_1 | 49 | -9.49 | 17% |
| A (rows = all survey_taxa) | Open habitats | policy_2 | 49 | -2.62 | 36% |
| A (rows = all survey_taxa) | Open habitats | policy_3 | 49 | 2.89 | 66% |
| A (rows = all survey_taxa) | Open habitats | policy_4 | 49 | -1.91 | 40% |
| A (rows = all survey_taxa) | Woodland | counterfactual | 42 | -9.00 | 22% |
| A (rows = all survey_taxa) | Woodland | policy_1 | 42 | -5.00 | 30% |
| A (rows = all survey_taxa) | Woodland | policy_2 | 42 | 1.57 | 45% |
| A (rows = all survey_taxa) | Woodland | policy_3 | 42 | 6.40 | 68% |
| A (rows = all survey_taxa) | Woodland | policy_4 | 42 | 1.38 | 42% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | counterfactual | 42 | -13.47 | 0% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_1 | 42 | -9.12 | 10% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_2 | 42 | -2.58 | 30% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_3 | 42 | 4.12 | 68% |
| B (collapsed = unique taxa:habitat_group) | Freshwater/Wetland | policy_4 | 42 | -0.35 | 42% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | counterfactual | 42 | -13.36 | 0% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_1 | 42 | -8.99 | 20% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_2 | 42 | -2.01 | 40% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_3 | 42 | 3.39 | 70% |
| B (collapsed = unique taxa:habitat_group) | Open habitats | policy_4 | 42 | -1.44 | 42% |
| B (collapsed = unique taxa:habitat_group) | Woodland | counterfactual | 42 | -9.00 | 22% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_1 | 42 | -5.00 | 30% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_2 | 42 | 1.57 | 45% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_3 | 42 | 6.40 | 68% |
| B (collapsed = unique taxa:habitat_group) | Woodland | policy_4 | 42 | 1.38 | 42% |
ggplot(policy_by_hab_AB, aes(x = scenario, y = mean_policy, fill = assumption)) +
geom_col(position = position_dodge(width = 0.8)) +
facet_wrap(~ habitat_group) +
geom_hline(yintercept = 0, linetype = "dashed") +
scale_fill_manual(values = two_cols) +
labs(
title = "Absolute outcomes under policy by habitat_group (Assumptions A and B)",
x = NULL,
y = "Mean signed % change (CF + delta)"
) +
theme_minimal()
The habitat-group absolute outcomes sharpen where the recovery gap sits:
This pattern is consistent with the baseline in Section 4: the most negative baseline domains (Freshwater/Wetland and Open habitats) remain the binding constraints on achieving system-wide stability by 2042.
Because these habitat-group results are shown for both A and B, we can interpret the habitat differences with higher confidence: the story persists even after removing the freshwater/lichen row multiplicity in Assumption B.
policy_by_taxa <- pol_A %>%
group_by(taxa, scenario) %>%
summarise(
mean_policy = mean(abs_policy_mag_signed, na.rm = TRUE),
prop_positive = mean(abs_policy_mag_signed > 0, na.rm = TRUE),
.groups = "drop"
)
policy_by_taxa %>%
mutate(
mean_policy = round(mean_policy, 2),
prop_positive = scales::percent(prop_positive, accuracy = 1)
) %>%
kable(caption = "Absolute outcomes under policy by taxa (A; identical interpretation under B)") %>%
kable_styling(full_width = FALSE) %>%
scroll_box(height = "360px")
| taxa | scenario | mean_policy | prop_positive |
|---|---|---|---|
| Beetles | policy_1 | -5.71 | 29% |
| Beetles | policy_2 | 1.95 | 57% |
| Beetles | policy_3 | 6.62 | 67% |
| Beetles | policy_4 | 1.24 | 52% |
| Bryophytes | policy_1 | -9.38 | 10% |
| Bryophytes | policy_2 | -2.67 | 24% |
| Bryophytes | policy_3 | 3.48 | 71% |
| Bryophytes | policy_4 | -1.10 | 33% |
| Diptera | policy_1 | -8.50 | 22% |
| Diptera | policy_2 | -1.94 | 28% |
| Diptera | policy_3 | 3.44 | 61% |
| Diptera | policy_4 | -0.83 | 33% |
| Freshwater invertebrate assemblages | policy_1 | -10.14 | 5% |
| Freshwater invertebrate assemblages | policy_2 | -4.19 | 10% |
| Freshwater invertebrate assemblages | policy_3 | 2.81 | 71% |
| Freshwater invertebrate assemblages | policy_4 | -1.71 | 38% |
| Lichens | policy_1 | -10.48 | 5% |
| Lichens | policy_2 | -5.10 | 14% |
| Lichens | policy_3 | 0.90 | 52% |
| Lichens | policy_4 | -3.81 | 24% |
| Spiders | policy_1 | -5.67 | 33% |
| Spiders | policy_2 | 1.78 | 61% |
| Spiders | policy_3 | 7.11 | 83% |
| Spiders | policy_4 | 2.28 | 61% |
| Vascular plants | policy_1 | -7.05 | 24% |
| Vascular plants | policy_2 | -0.43 | 48% |
| Vascular plants | policy_3 | 5.52 | 67% |
| Vascular plants | policy_4 | 0.67 | 48% |
To make the scenario results easier to interpret, we compute the probability of full recovery for each scenario:
abs_policy_mag_signed (counterfactual signed
magnitude plus policy delta).To quantify uncertainty given a small expert panel, we compute the
recovery probability for each scenario and assumption, and attach
95% percentile intervals from a cluster
bootstrap over experts (resampling experts,id_gj,
with replacement).
# ============================================================
# probability of full recovery (abs_policy_mag_signed >= 0)
# cluster bootstrap over experts (id_gj), by Assumption (A vs B)
# ============================================================
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(scales)
library(knitr)
library(kableExtra)
# ---- 0) Use the Section 6 object exactly as-is ----
stopifnot(exists("df_AB"))
stopifnot(all(c("assumption","scenario","id_gj","abs_policy_mag_signed") %in% names(df_AB)))
policy_levels <- c("policy_1","policy_2","policy_3","policy_4")
dat <- df_AB %>%
mutate(
assumption = as.character(assumption),
scenario = as.character(scenario)
) %>%
filter(scenario %in% policy_levels)
assumptions <- sort(unique(dat$assumption))
scenarios <- policy_levels
# ---- 1) Define full recovery indicator ----
dat <- dat %>%
mutate(full_recovery = abs_policy_mag_signed >= 0)
# ---- 2) Bootstrap settings ----
set.seed(1)
B <- 5000
# helper: weighted mean for logical indicators under expert-resampling weights
wtd_mean <- function(x, w) {
ok <- !is.na(x) & !is.na(w)
if (!any(ok)) return(NA_real_)
sum(w[ok] * x[ok]) / sum(w[ok])
}
# ---- 3) One cluster-bootstrap draw within ONE assumption ----
boot_once_assumption <- function(df_a) {
experts <- sort(unique(df_a$id_gj[!is.na(df_a$id_gj)]))
Nexp <- length(experts)
samp <- sample(experts, size = Nexp, replace = TRUE)
w <- table(samp)
dfb <- df_a %>%
semi_join(tibble(id_gj = as.integer(names(w))), by = "id_gj") %>%
mutate(.w = as.numeric(w[as.character(id_gj)]))
dfb %>%
group_by(scenario) %>%
summarise(
p = wtd_mean(as.numeric(full_recovery), .w),
.groups = "drop"
)
}
# ---- 4) Run bootstrap separately for A and B ----
boot_draws <- map_dfr(assumptions, function(a) {
df_a <- dat %>% filter(assumption == a)
map_dfr(seq_len(B), function(b) {
boot_once_assumption(df_a) %>%
mutate(assumption = a, b = b)
})
})
# ---- 5) Point estimates (non-bootstrap) ----
point_est <- dat %>%
group_by(assumption, scenario) %>%
summarise(
p_hat = mean(full_recovery, na.rm = TRUE),
n = n(),
.groups = "drop"
)
# ---- 6) Bootstrap percentile CIs ----
boot_ci <- boot_draws %>%
group_by(assumption, scenario) %>%
summarise(
p_lo = quantile(p, 0.025, na.rm = TRUE),
p_med = quantile(p, 0.50, na.rm = TRUE),
p_hi = quantile(p, 0.975, na.rm = TRUE),
.groups = "drop"
)
prob_recovery_AB <- point_est %>%
left_join(boot_ci, by = c("assumption","scenario")) %>%
mutate(
scenario = factor(scenario, levels = policy_levels),
assumption = factor(assumption, levels = assumptions)
) %>%
arrange(assumption, scenario)
# ---- 7) Pretty table ----
prob_recovery_AB %>%
mutate(
p_hat = percent(p_hat, accuracy = 1),
p_lo = percent(p_lo, accuracy = 1),
p_hi = percent(p_hi, accuracy = 1),
ci95 = paste0(p_lo, "–", p_hi)
) %>%
select(assumption, scenario, n, p_hat, ci95) %>%
pivot_wider(
id_cols = c(scenario),
names_from = assumption,
values_from = c(n, p_hat, ci95),
names_glue = "{assumption}_{.value}"
) %>%
kable(caption = "Probability of full recovery (abs_policy_mag_signed ≥ 0%) by scenario and assumption (cluster bootstrap over experts; 95% CI)") %>%
kable_styling(full_width = FALSE)
| scenario | A (rows = all survey_taxa)_n | B (collapsed = unique taxa:habitat_group)_n | A (rows = all survey_taxa)_p_hat | B (collapsed = unique taxa:habitat_group)_p_hat | A (rows = all survey_taxa)_ci95 | B (collapsed = unique taxa:habitat_group)_ci95 |
|---|---|---|---|---|---|---|
| policy_1 | 147 | 126 | 37% | 41% | 17%–58% | 18%–64% |
| policy_2 | 147 | 126 | 61% | 66% | 37%–80% | 41%–85% |
| policy_3 | 147 | 126 | 79% | 79% | 50%–97% | 51%–98% |
| policy_4 | 147 | 126 | 64% | 65% | 33%–90% | 35%–91% |
# ---- 8) Plot: probability with CI (facet by assumption) ----
prob_recovery_AB %>%
ggplot(aes(x = scenario, y = p_hat)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = p_lo, ymax = p_hi), width = 0.15) +
facet_wrap(~ assumption) +
geom_hline(yintercept=1, linetype="dashed") +
scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0, 1)) +
labs(
title = "Probability of full recovery (abs_policy_mag_signed ≥ 0%)",
subtitle = "Cluster bootstrap over experts (id_gj), 95% percentile intervals",
x = NULL,
y = "Probability"
) +
theme_minimal()
# ---- 9) context table showing mean absolute outcomes too ----
dat %>%
group_by(assumption, scenario) %>%
summarise(
mean_abs_policy = mean(abs_policy_mag_signed, na.rm = TRUE),
median_abs_policy = median(abs_policy_mag_signed, na.rm = TRUE),
prop_full_recovery = mean(full_recovery, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(assumption, scenario) %>%
kable(digits = 2, caption = "Context: mean/median absolute policy outcome and raw recovery share") %>%
kable_styling(full_width = FALSE)
| assumption | scenario | mean_abs_policy | median_abs_policy | prop_full_recovery |
|---|---|---|---|---|
| A (rows = all survey_taxa) | policy_1 | -9.67 | -5 | 0.37 |
| A (rows = all survey_taxa) | policy_2 | -4.55 | 0 | 0.61 |
| A (rows = all survey_taxa) | policy_3 | -1.73 | 5 | 0.79 |
| A (rows = all survey_taxa) | policy_4 | -4.99 | 0 | 0.64 |
| B (collapsed = unique taxa:habitat_group) | policy_1 | -9.20 | -5 | 0.41 |
| B (collapsed = unique taxa:habitat_group) | policy_2 | -4.01 | 0 | 0.66 |
| B (collapsed = unique taxa:habitat_group) | policy_3 | -1.37 | 5 | 0.79 |
| B (collapsed = unique taxa:habitat_group) | policy_4 | -4.64 | 0 | 0.65 |
Across both assumptions, recovery probabilities increase strongly with policy ambition. Under Assumption A (all survey_taxa rows), the probability that the implied absolute policy outcome is ≥0% rises from 37% under policy_1 to 61% under policy_2, 79% under policy_3, and 64% under policy_4. The bootstrap intervals are wide—reflecting the small expert panel—but clearly shift upward with scenario strength (e.g. policy_3: 50–97%).
Results are highly consistent under Assumption B (collapsed to unique taxa × habitat_group), with probabilities of 41%, 66%, 79%, and 65% for policy_1–4 respectively. The similarity between A and B indicates that aggregation structure does not materially alter the overall recovery signal.
Policy_3 consistently emerges as the strongest scenario: it produces the highest probability of full recovery (~79%) under both assumptions. However, the wide bootstrap intervals (e.g. ~51–98% under B) indicate substantial expert-level uncertainty. Policy_1 remains clearly insufficient, with recovery probabilities below 50% under both assumptions and lower bounds well below 20%. Overall, stronger policy scenarios substantially increase the likelihood of stabilisation or recovery, but do not deliver near-certainty of full recovery across taxa.
Section 6 provides the core “sufficiency” conclusion:
This creates a clear bridge to Section 7: we now test whether this sufficiency verdict is robust given panel size and disagreement.
Section 7 examines whether the headline conclusions (notably: “policy_3 is strongest but still not sufficient on average”) are robust to the small expert panel and to disagreement about magnitude.
We focus on three complementary diagnostics:
To keep uncertainty aligned with the outcome summaries, dispersion and resampling are reported under both assumptions A and B.
consensus_var_AB <- df_AB %>%
filter(str_detect(scenario, "^policy_") | scenario == "counterfactual") %>%
group_by(assumption, taxa, habitat_group, scenario) %>%
summarise(
mean_signed = mean(abs_policy_mag_signed, na.rm = TRUE),
sd_signed = sd(abs_policy_mag_signed, na.rm = TRUE),
n = n(),
.groups = "drop"
)
consensus_summary_AB <- consensus_var_AB %>%
group_by(assumption, scenario) %>%
summarise(
mean_sd = mean(sd_signed, na.rm = TRUE),
median_sd = median(sd_signed, na.rm = TRUE),
.groups = "drop"
)
consensus_summary_AB %>%
mutate(across(where(is.numeric), ~ round(.x, 2))) %>%
kable(caption = "Mean dispersion (SD) in signed magnitude by scenario (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | scenario | mean_sd | median_sd |
|---|---|---|---|
| A (rows = all survey_taxa) | counterfactual | 11.94 | 11.13 |
| A (rows = all survey_taxa) | policy_1 | 12.77 | 12.47 |
| A (rows = all survey_taxa) | policy_2 | 10.67 | 10.05 |
| A (rows = all survey_taxa) | policy_3 | 9.54 | 8.70 |
| A (rows = all survey_taxa) | policy_4 | 10.51 | 9.67 |
| B (collapsed = unique taxa:habitat_group) | counterfactual | 11.92 | 11.13 |
| B (collapsed = unique taxa:habitat_group) | policy_1 | 12.77 | 12.47 |
| B (collapsed = unique taxa:habitat_group) | policy_2 | 10.65 | 10.05 |
| B (collapsed = unique taxa:habitat_group) | policy_3 | 9.51 | 8.70 |
| B (collapsed = unique taxa:habitat_group) | policy_4 | 10.49 | 9.67 |
The dispersion table shows that SD in signed outcomes is lowest under the counterfactual and increases under stronger policy packages, peaking around policy_3:
This pattern is typical of expert elicitation: experts agree more on the existence of baseline pressures than on the extent to which ambitious intervention can offset them. Importantly, the pattern is essentially the same under A and B, suggesting the uncertainty structure is not being driven by survey-row multiplicity.
Policy interpretation: uncertainty is concentrated where decisions are highest leverage (policy_3), so “how close to zero” policy_3 sits should be treated probabilistically rather than as a point estimate, even when the ranking of scenarios is stable.
ggplot(consensus_var_AB, aes(x = mean_signed, y = sd_signed)) +
geom_point(alpha = 0.6) +
facet_grid(assumption ~ scenario) +
labs(
title = "Mean–variance relationship across taxa × habitat_group (A and B)",
x = "Mean signed % outcome",
y = "Standard deviation (expert disagreement)"
) +
theme_minimal()
The mean–variance plot tests whether disagreement tends to increase as projected outcomes become more extreme. A widening spread away from zero (a “funnel” pattern) is a healthy signal: it suggests experts are most aligned around modest changes and diverge most when asked about large recovery or large decline.
In this analysis, the combination of (i) increasing dispersion under ambitious policy and (ii) mean–variance structure supports the interpretation that uncertainty is epistemic and structured (reflecting real limits on knowledge of intervention efficacy), rather than arbitrary disagreement.
certainty_summary_AB <- df_AB %>%
filter(str_detect(scenario, "^policy_") | scenario == "counterfactual") %>%
group_by(assumption, scenario) %>%
summarise(
mean_cert = mean(if_else(scenario == "counterfactual", cf_cert, pol_cert), na.rm = TRUE),
sd_cert = sd(if_else(scenario == "counterfactual", cf_cert, pol_cert), na.rm = TRUE),
.groups = "drop"
)
certainty_summary_AB %>%
mutate(across(where(is.numeric), round, 2)) %>%
kable(caption = "Mean and SD of certainty scores (1–5) (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(where(is.numeric), round, 2)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
| assumption | scenario | mean_cert | sd_cert |
|---|---|---|---|
| A (rows = all survey_taxa) | counterfactual | 1.97 | 0.57 |
| A (rows = all survey_taxa) | policy_1 | 1.99 | 0.79 |
| A (rows = all survey_taxa) | policy_2 | 1.97 | 0.72 |
| A (rows = all survey_taxa) | policy_3 | 1.93 | 0.75 |
| A (rows = all survey_taxa) | policy_4 | 1.95 | 0.73 |
| B (collapsed = unique taxa:habitat_group) | counterfactual | 1.93 | 0.55 |
| B (collapsed = unique taxa:habitat_group) | policy_1 | 1.97 | 0.79 |
| B (collapsed = unique taxa:habitat_group) | policy_2 | 1.95 | 0.74 |
| B (collapsed = unique taxa:habitat_group) | policy_3 | 1.91 | 0.76 |
| B (collapsed = unique taxa:habitat_group) | policy_4 | 1.93 | 0.75 |
Certainty scores provide a self-reported complement to dispersion. In practice, with small panels, certainty scales can be “sticky” (limited spread), so it is appropriate that the analysis relies more heavily on:
Where certainty is systematically low in the most negative baseline domains, it reinforces the case for carrying uncertainty forward into trajectory modelling rather than over-interpreting point means.
set.seed(123)
scenario_levels_pol <- c("policy_1","policy_2","policy_3","policy_4")
resample_means_any <- function(dat, k, scenario_name, n_iter = 1000){
replicate(n_iter, {
sample_ids <- sample(unique(dat$id_gj), k, replace = FALSE)
dat %>%
filter(id_gj %in% sample_ids, scenario == scenario_name) %>%
summarise(mean_signed = mean(abs_policy_mag_signed, na.rm = TRUE)) %>%
pull(mean_signed)
})
}
resampling_results_AB <- df_AB %>%
filter(str_detect(scenario, "^policy_")) %>%
group_by(assumption) %>%
group_modify(~{
dat <- .x
purrr::map_dfr(scenario_levels_pol, function(sc){
tibble(
scenario = sc,
k2 = sd(resample_means_any(dat, 2, sc)),
k3 = sd(resample_means_any(dat, 3, sc)),
k4 = sd(resample_means_any(dat, 4, sc)),
k5 = sd(resample_means_any(dat, 5, sc)),
k6 = sd(resample_means_any(dat, 6, sc)),
k7 = sd(resample_means_any(dat, 7, sc))
)
})
}) %>%
ungroup()
resampling_results_AB %>%
mutate(across(where(is.numeric), round, 2)) %>%
kable(caption = "Resampling SD of mean signed outcome by panel size (Assumptions A and B)") %>%
kable_styling(full_width = FALSE)
| assumption | scenario | k2 | k3 | k4 | k5 | k6 | k7 |
|---|---|---|---|---|---|---|---|
| A (rows = all survey_taxa) | policy_1 | 8.32 | 6.01 | 4.47 | 3.24 | 2.07 | 0 |
| A (rows = all survey_taxa) | policy_2 | 9.77 | 7.10 | 5.36 | 3.83 | 2.50 | 0 |
| A (rows = all survey_taxa) | policy_3 | 13.14 | 9.80 | 7.38 | 5.36 | 3.44 | 0 |
| A (rows = all survey_taxa) | policy_4 | 11.02 | 8.40 | 6.31 | 4.60 | 2.97 | 0 |
| B (collapsed = unique taxa:habitat_group) | policy_1 | 8.24 | 6.24 | 4.56 | 3.34 | 2.24 | 0 |
| B (collapsed = unique taxa:habitat_group) | policy_2 | 10.24 | 7.20 | 5.51 | 4.15 | 2.57 | 0 |
| B (collapsed = unique taxa:habitat_group) | policy_3 | 13.74 | 10.04 | 7.50 | 5.57 | 3.34 | 0 |
| B (collapsed = unique taxa:habitat_group) | policy_4 | 11.63 | 8.48 | 6.30 | 4.68 | 2.97 | 0 |
resampling_long_AB <- resampling_results_AB %>%
pivot_longer(cols = starts_with("k"), names_to = "panel_size", values_to = "sd_mean") %>%
mutate(panel_size = as.numeric(str_remove(panel_size, "k")))
ggplot(resampling_long_AB, aes(x = panel_size, y = sd_mean)) +
geom_line() +
geom_point() +
facet_grid(assumption ~ scenario) +
labs(
title = "Panel-size sensitivity of mean signed policy outcome (Assumptions A and B)",
x = "Number of experts sampled",
y = "SD of mean signed outcome"
) +
theme_minimal()
The resampling analysis asks: “If we had elicited from a slightly different subset of experts, would the headline means and scenario ranking change?”
The expected and policy-relevant pattern is:
This supports a nuanced conclusion:
Uncertainty diagnostics strengthen (rather than weaken) the credibility of the results:
Together, this supports using the elicitation outputs as inputs to trajectory modelling, with uncertainty propagation and careful interpretation of near-zero outcomes.
This expert elicitation separates two components of expected change by 2042:
This structure is valuable because it prevents us from conflating “policies help” with “policies deliver recovery.” Q2 can be strongly positive while absolute outcomes remain negative if baseline decline is large.
All synthesis statements below are robust to the two aggregation assumptions used in the analysis:
Only a small subset of combinations are affected by the A→B collapse step (freshwater assemblages in Freshwater/Wetland; lichens in Open habitats), so A vs B is best seen as a sensitivity check for habitat-group interpretation rather than a competing “true” baseline.
Selective evidence to keep visible in this section: - Table: Counterfactual baseline (Q1 signed %): Assumptions A and B - Table + figure: Counterfactual baseline by habitat_group (Assumptions A and B)
Across both assumptions, the counterfactual baseline mean is around −12%, with a high proportion of units negative.
Habitat-group baselines show that:
This is a crucial policy framing point: the baseline “pull” is substantial, so “reasonable” policy uplifts may still be insufficient to deliver stability unless they are large, broad, and fast-acting.
Selective evidence to keep visible: - Table + figure: Counterfactual baseline by habitat_group (Assumptions A and B)
All policy scenarios show positive mean deltas relative to the counterfactual, with a consistent ranking:
The high proportion of positive deltas, especially under policy_3, indicates broad agreement that interventions improve outcomes relative to baseline. However, because these are deltas, this result should be communicated as “policies reduce decline relative to BAU,” not as “policies deliver recovery.”
Selective evidence to keep visible: - Table:
Incremental policy uplifts (Q2 deltas): overall by scenario
(Assumptions A and B)
- One figure: the scenario-level uplift bar chart (faceting by
assumption if shown)
The key sufficiency test is the absolute outcome:
Absolute outcome under policy = counterfactual + delta
This is the quantity that matters for target framing (stability vs recovery). Under Assumption A, the mean absolute outcomes remain negative for every scenario:
This means the policy packages, as elicited, are best characterised as decline-reduction pathways, not system-wide recovery pathways, at least in the mean.
At the same time, the distributions are heterogeneous: under policy_3, most units are stable/positive, yet a material minority remain negative, and these residual declines keep the mean below zero. This is exactly why it is helpful to report both the mean and the recovered/stabilised fractions.
Selective evidence to keep visible: - Table: Absolute signed % outcome under each policy (Assumptions A and B) - Figure: Mean signed % outcome: counterfactual vs policy
The recovered/stabilised/decline-remaining table provides the most accessible translation:
This reconciles the apparent tension between “many outcomes are positive” and “mean outcome is negative.” The minority of negative outcomes are often sufficiently negative to dominate the mean. From a policy perspective, this implies that:
Selective evidence to keep visible: - Table: recovered/stabilised/decline remaining under A and B
Habitat-group absolute outcomes indicate that:
Because these habitat-group results are reported under both A and B, one can interpret them as a robust signal rather than a survey-row artefact.
Selective evidence to keep visible: - Table/scroll
box: Absolute outcomes under policy by habitat_group (Assumptions A
and B)
- One figure: habitat-group mean outcome bars by scenario (faceted by
habitat_group and/or assumption)
Uncertainty diagnostics show:
This supports a credible communication stance:
Selective evidence to keep visible: - Table: Mean dispersion (SD) in signed magnitude by scenario (Assumptions A and B) - One figure: mean–variance scatter (faceted by scenario), OR resampling stability plot (pick one, not both)
The statutory extinction-risk target implies no net increase in extinction risk by 2042, and ideally improvement. We have not translated percent distribution-change expectations into explicit Red List category transitions (and therefore not directly into ERLI change), because that would require an additional mapping model and strong assumptions about how these percent changes relate to category thresholds.
However, the CF + delta results provide a strong sufficiency diagnostic:
Therefore, on the elicited evidence alone, it is difficult to argue that any of the tested policy packages is likely to deliver system-wide stability/recovery by 2042 in a way that would reliably support ERLI neutrality or improvement. The most defensible framing is: