# Paths — assumes dams_master.R has been run, OR set manually
if (!exists("intermed_dir")) {
  if (Sys.info()["user"] == "katiekzhang") {
    base_dir <- "~/water_conflict_files"
  } else {
    base_dir <- file.path(path.expand("~"), "..", "Dropbox", "research_directory", "dams")
  }
  data_dir     <- file.path(base_dir, "Data")
  intermed_dir <- file.path(data_dir, "Intermediate")
  raw_dir      <- file.path(data_dir, "Raw")
}

ddd_output_dir <- file.path(intermed_dir, "DDD_analysis")
out_dir        <- file.path(dirname(data_dir), "Output")

# Load pre-saved data
panel        <- readRDS(file.path(ddd_output_dir, "panel_ddd_v2.rds"))
full_treatment <- readRDS(file.path(ddd_output_dir, "treatment_assignments_v2.rds"))

# Load DOR panel if available (from dor_analysis.R)
panel_dor_path <- file.path(ddd_output_dir, "panel_dor.rds")
has_dor <- file.exists(panel_dor_path)
if (has_dor) panel_dor <- readRDS(panel_dor_path)

# Load matching covariates if available
covs_path <- file.path(ddd_output_dir, "station_matching_covariates.rds")
has_covs <- file.exists(covs_path)
if (has_covs) station_covs <- readRDS(covs_path)

# Load cached HonestDiD results if available
hdid_path <- file.path(ddd_output_dir, "honestdid_results.rds")
has_hdid  <- file.exists(hdid_path)
if (has_hdid) hdid_cached <- readRDS(hdid_path)

Executive Summary

For a time-constrained read-through: The key sections are flagged below. Everything else provides supporting detail.

Binary DDD (Sections 2–6): The original triple-difference design (downstream x transboundary x post) is not credibly identified. The DDD coefficient has an unexpected sign (positive, not negative), severe pre-trends persist across all 9 matching specifications, and HonestDiD bounds include zero even under the most favorable assumption. Key section: Section 6 (Assessment) for the bottom line.

DOR specification (Section 7): Pivoting from binary treatment to a continuous Degree of Regulation measure yields the paper’s strongest result. Key sections:

Section 7.5 (Main DOR Results): The static DOR x TRB coefficient is $-0.038^*$ with climate controls — consistent with strategic withholding. Each additional unit of DOR reduces TRB discharge by $\sim 3.7$ more than domestic.
Section 7.7 (DOR Bins + Coefficient Plot): The TRB effect concentrates in the Low DOR bin ($-0.234^{***}$), with higher bins insignificant. This non-monotonic pattern is the most provocative finding.
Section 7.10 (Leave-One-Out): The result is moderately robust across countries but sensitive to Africa.
Section 7.12 (DOR Key Takeaways): High-level synthesis of what the DOR results mean for the paper.

Seasonal decomposition (Section 8): Splitting the binary DDD by season and dam type reveals opposite seasonal patterns: hydropower dams show effects in high-flow months, while irrigation/water-supply dams show effects in low-flow and warm months. Stations with weak precipitation seasonality show no effect. Key section: Section 8 for the mechanism story.

Bottom line: The binary DDD should be relegated to background/motivation. The DOR specification provides suggestive evidence of strategic withholding, with the strongest signal at low regulation levels, but pre-trend concerns from the dose-response event study prevent a definitive causal claim. The seasonal decomposition adds a mechanism story — the type of withholding differs by dam purpose — but does not resolve the identification concerns.

1 Research Question and Setup

Question: Do transboundary dams reduce downstream water discharge more than domestic dams, consistent with strategic water withholding?

Design: Triple Difference-in-Differences (DDD) using three binary differences:

Time: pre vs. post dam construction
Space: downstream vs. upstream of the dam
River type: transboundary (station and dam in different countries) vs. domestic

\[\log(Q)_{it} = \beta \cdot (\text{Downstream}_i \times \text{Transboundary}_i \times \text{Post}_{it}) + \gamma \mathbf{X}_{it} + \alpha_i + \delta_t + \varepsilon_{it}\]

Identifying assumption: Absent strategic behavior, the downstream–upstream discharge gap would have evolved identically on transboundary and domestic rivers.

cell_counts <- full_treatment %>%
  count(downstream, transboundary) %>%
  mutate(
    group = case_when(
      downstream == 1 & transboundary == 1 ~ "TRB Downstream (treated)",
      downstream == 1 & transboundary == 0 ~ "Domestic Downstream",
      downstream == 0 & transboundary == 1 ~ "TRB Upstream",
      downstream == 0 & transboundary == 0 ~ "Domestic Upstream"
    )
  ) %>%
  select(Group = group, Stations = n)

kable(cell_counts, caption = "Station counts by DDD cell") %>%
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Station counts by DDD cell
Group	Stations
Domestic Upstream	433
TRB Upstream	440
Domestic Downstream	1000
TRB Downstream (treated)	192

cat("Panel:", format(nrow(panel), big.mark = ","), "station-months,",
    n_distinct(panel$station_id), "stations\n")

## Panel: 533,062 station-months, 1241 stations

2 Binary DDD Results

2.1 Model specifications

I estimate five models to examine the DDD from different angles:

Column	Model	Description
(1) Baseline	Full DDD	`downstream x transboundary x post` with station + year-month FE. The triple interaction is the coefficient of interest.
(2) Climate	DDD + climate controls	Adds precipitation and temperature to account for weather-driven discharge variation. Restricted to observations with ERA5 climate data.
(3) Detrended	DDD + group-specific linear trends	Adds `years_to_treatment` interacted with each DDD group. The DDD coefficient now captures a discrete jump at treatment, net of any linear drift. This directly addresses the pre-trend.
(4) Split: TRB	DD on transboundary rivers only	`downstream x post` estimated on TRB subsample. Shows how the downstream–upstream gap changes around dam construction on transboundary rivers.
(5) Split: Domestic	DD on domestic rivers only	Same as (4) but on domestic subsample. The DDD is conceptually the difference between columns (4) and (5).

# Baseline DDD
m1 <- feols(
  log_discharge ~ downstream * transboundary * post | station_id + year_month,
  data = panel, cluster = ~station_id
)

# DDD with climate controls
panel_climate <- panel %>% filter(!is.na(precipitation_mm))
m2 <- feols(
  log_discharge ~ downstream * transboundary * post +
    precipitation_mm + temperature_c | station_id + year_month,
  data = panel_climate, cluster = ~station_id
)

# Detrended
m10 <- feols(
  log_discharge ~ downstream * transboundary * post +
    years_to_treatment +
    downstream:years_to_treatment +
    transboundary:years_to_treatment +
    I(downstream * transboundary * years_to_treatment) |
    station_id + year_month,
  data = panel, cluster = ~station_id
)

# Split-sample DD
m_trb <- feols(
  log_discharge ~ downstream * post | station_id + year_month,
  data = panel %>% filter(transboundary == 1), cluster = ~station_id
)

m_dom <- feols(
  log_discharge ~ downstream * post | station_id + year_month,
  data = panel %>% filter(transboundary == 0), cluster = ~station_id
)

# Rename coefficients for display
cm <- c(
  "downstream:transboundary:post" = "Down x TRB x Post (DDD)",
  "downstream:post"               = "Down x Post (DD)",
  "transboundary:post"            = "TRB x Post",
  "downstream:transboundary"      = "Down x TRB",
  "downstream"                    = "Downstream",
  "transboundary"                 = "Transboundary",
  "post"                          = "Post",
  "precipitation_mm"              = "Precipitation (mm)",
  "temperature_c"                 = "Temperature (C)"
)

modelsummary(
  list("(1) Baseline" = m1, "(2) Climate" = m2, "(3) Detrended" = m10,
       "(4) Split: TRB" = m_trb, "(5) Split: Domestic" = m_dom),
  coef_map = cm,
  gof_map = c("nobs", "r.squared", "FE: station_id", "FE: year_month"),
  stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
  title = "Binary DDD and Split-Sample DD Results",
  notes = list("Station and year-month FE in all columns. SE clustered at station level."),
  escape = FALSE,
  output = "kableExtra"
) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE, font_size = 13)

Binary DDD and Split-Sample DD Results
	(1) Baseline	(2) Climate	(3) Detrended	(4) Split: TRB	(5) Split: Domestic
Down x TRB x Post (DDD)	0.391***	0.184**	0.004
	(0.089)	(0.083)	(0.097)
Down x Post (DD)	0.105*	0.119**	0.241***	0.511***	0.073
	(0.058)	(0.047)	(0.055)	(0.069)	(0.058)
TRB x Post	−0.392***	−0.067	−0.076
	(0.064)	(0.054)	(0.066)
Transboundary		0.315
		(117112.501)
Post	−0.048	−0.128***	−0.162***	−0.463***	−0.025
	(0.053)	(0.042)	(0.050)	(0.054)	(0.054)
Precipitation (mm)		0.275***
		(0.007)
Temperature (C)		−0.011***
		(0.003)
Num.Obs.	533062	324461	533062	119182	413880
R2	0.811	0.839	0.811	0.809	0.823
FE: station_id	X	X	X	X	X
FE: year_month	X	X	X	X	X
* p < 0.1, p < 0.05, * p < 0.01

2.1.1 Interpretation

The DDD coefficient ($\text{Down} \times \text{TRB} \times \text{Post}$) is positive in the baseline — unexpected under the strategic withholding hypothesis, which predicts $\beta < 0$. The positive sign means the data suggests TRB downstream discharge rises (relative to domestic downstream) after dam construction, controlling for upstream changes.

However, this result should be interpreted with extreme caution. The detrended specification (column 3) shows that the DDD coefficient loses significance once each DDD group is allowed its own linear time trend, suggesting the positive coefficient in columns (1)–(2) is driven by pre-existing trends rather than a causal treatment effect. The pre-trend problem is examined in the event study below.

2.2 Event studies

event_data <- panel %>%
  filter(abs(years_to_treatment) <= 10) %>%
  mutate(rel_year = years_to_treatment)

# DDD event study
m_es_ddd <- feols(
  log_discharge ~ i(rel_year, downstream, ref = -1) +
    i(rel_year, transboundary, ref = -1) +
    i(rel_year, I(downstream * transboundary), ref = -1) |
    station_id + year_month,
  data = event_data, cluster = ~station_id
)

# Split-sample event studies
m_es_trb <- feols(
  log_discharge ~ i(rel_year, downstream, ref = -1) | station_id + year_month,
  data = event_data %>% filter(transboundary == 1), cluster = ~station_id
)

m_es_dom <- feols(
  log_discharge ~ i(rel_year, downstream, ref = -1) | station_id + year_month,
  data = event_data %>% filter(transboundary == 0), cluster = ~station_id
)

par(mfrow = c(2, 2), mar = c(4, 4, 3, 1))

# DDD triple interaction
all_c <- coef(m_es_ddd)
all_s <- se(m_es_ddd)
nms   <- names(all_c)
triple_idx <- grepl("I\\(downstream", nms)
if (sum(triple_idx) == 0) triple_idx <- grepl("downstream", nms) & grepl("transboundary", nms)

triple_est <- all_c[triple_idx]
triple_se  <- all_s[triple_idx]
triple_yrs <- as.numeric(regmatches(names(triple_est), regexpr("-?\\d+", names(triple_est))))

ddd_coefs <- data.frame(
  yr = c(triple_yrs, -1), est = c(triple_est, 0),
  lo = c(triple_est - 1.96 * triple_se, 0),
  hi = c(triple_est + 1.96 * triple_se, 0)
) %>% arrange(yr)

plot(ddd_coefs$yr, ddd_coefs$est, pch = 19, col = "darkgreen",
     xlim = c(-10, 10), ylim = range(c(ddd_coefs$lo, ddd_coefs$hi)),
     main = "DDD Event Study (Down x TRB x Time)",
     xlab = "Years to Dam", ylab = "DDD Coefficient")
segments(ddd_coefs$yr, ddd_coefs$lo, ddd_coefs$yr, ddd_coefs$hi,
         col = "darkgreen", lwd = 1.5)
abline(h = 0); abline(v = -0.5, lty = 2); grid(col = "grey90")

# Split DD - Transboundary
iplot(m_es_trb,
      main = "Transboundary DD (Down x Time)",
      xlab = "Years to Dam", ylab = "Downstream Coef",
      col = "firebrick", ci_col = "firebrick", ref.line = 0)

# Split DD - Domestic
iplot(m_es_dom,
      main = "Domestic DD (Down x Time)",
      xlab = "Years to Dam", ylab = "Downstream Coef",
      col = "steelblue", ci_col = "steelblue", ref.line = 0)

# Raw DD gaps
cell_trends <- event_data %>%
  mutate(cell = case_when(
    downstream == 1 & transboundary == 1 ~ "TRB Down",
    downstream == 1 & transboundary == 0 ~ "Dom Down",
    downstream == 0 & transboundary == 1 ~ "TRB Up",
    downstream == 0 & transboundary == 0 ~ "Dom Up"
  )) %>%
  group_by(cell, rel_year) %>%
  summarise(mean_log_q = mean(log_discharge, na.rm = TRUE), .groups = "drop") %>%
  mutate(
    river = if_else(grepl("TRB", cell), "TRB", "Domestic"),
    pos   = if_else(grepl("Down", cell), "down", "up")
  ) %>%
  select(-cell) %>%
  pivot_wider(names_from = pos, values_from = mean_log_q) %>%
  mutate(dd_gap = down - up)

plot(NULL, xlim = c(-10, 10),
     ylim = range(cell_trends$dd_gap, na.rm = TRUE),
     main = "Raw DD Gap: Down - Up",
     xlab = "Years to Dam", ylab = "Mean Log(Q): Down - Up")
lines(cell_trends$rel_year[cell_trends$river == "TRB"],
      cell_trends$dd_gap[cell_trends$river == "TRB"],
      col = "firebrick", lwd = 2)
lines(cell_trends$rel_year[cell_trends$river == "Domestic"],
      cell_trends$dd_gap[cell_trends$river == "Domestic"],
      col = "steelblue", lwd = 2)
abline(v = -0.5, lty = 2); grid(col = "grey90")
legend("topleft", c("TRB", "Domestic"),
       col = c("firebrick", "steelblue"), lwd = 2, cex = 0.8)

par(mfrow = c(1, 1))

2.2.1 The pre-trend problem

The four panels above reveal pre-existing trends in the DDD cells that complicate identification (reference year is $t=-1$, normalized to zero):

Top right — Transboundary DD: On TRB rivers, the downstream coefficient is relatively flat in the pre-period, hovering near zero with wide confidence intervals. There is no strong monotonic pre-trend on TRB rivers alone. Post-treatment, the coefficients drift positive, suggesting downstream discharge rises relative to upstream after dam construction on TRB rivers.

Bottom left — Domestic DD: On domestic rivers, the pre-treatment coefficients show more variation. There are negative values at earlier event times that trend upward toward $t=-1$, suggesting the downstream–upstream gap on domestic rivers was already changing before dam construction. Post-treatment, coefficients remain noisy.

Top left — DDD triple interaction: This is the diff between the TRB and domestic DD event studies. The pre-treatment coefficients are noisy and bounce around zero (sans $t=-7$) with wide confidence intervals, but some negative values at earlier event times suggest the TRB and domestic DD gaps were likely on somewhat different trajectories before dam construction.

Bottom right — Raw DD gap: The raw (non-regression) downstream-minus-upstream mean log discharge shows that TRB (red) and domestic (blue) rivers have very different levels of the DD gap, and the two evolve along different paths over the event window. The domestic gap trends upward over time, while the TRB gap is relatively more stable. Something weird happening on the very right for TRB, but the pre-period dynamics are more concerning for identification.

What this means for identification: The DDD requires that, absent dam construction, the downstream–upstream discharge gap would have evolved identically on TRB and domestic rivers. The event study suggests this assumption is questionable: while the TRB DD is relatively flat, the domestic DD shows concerning pre-period trends, and the raw DD gaps evolve on different trajectories. The DDD triple interaction inherits noise from both components, making it difficult to cleanly separate a treatment effect from pre-existing differences in the two groups’ dynamics.

Why might the pre-trends differ? Several non-exclusive explanations:

Selection on trends: Transboundary and domestic dams are built on rivers with systematically different hydrological dynamics. Domestic rivers may have more pre-existing upstream dams (e.g. as it is easier to build without international blowback) or land-use changes that generate pre-trends in the downstream–upstream gap. There’s also the simple fact that transboundary rivers are likely to be bigger!
Anticipation effects: Countries may begin altering water management during the multi-year dam planning and construction phase, well before the official completion date. This would be a major contamination.
Measurement timing: The dam completion year (from GDAT) may not precisely capture when the dam begins affecting flows; construction-phase diversion channels can alter hydrology years before impoundment.
Compositional differences: Despite matching (Section 4), TRB and domestic stations sit on systematically different rivers that may have different underlying hydrological trends.
Seasonal aggregation: Monthly discharge averages may mask seasonal dynamics. If dams on TRB vs. domestic rivers have different seasonal storage/release patterns, this could generate differential trends even before the treatment dam is built (due to other dams in the system). Section 8 investigates this directly: the DDD does concentrate in specific seasons and differs a fair bit between hydropower and non-hydropower dams.

3 Pre-Trend Diagnostics

I investigated several potential threads for the pre-trend.

diagnoses <- tibble(
  Diagnostic = c(
    "4A. Raw mean trends by cell",
    "4B. Continental decomposition",
    "4C. Pre-existing dams",
    "4D. Station-level pre-trend slopes",
    "4E. Country-pair corridors",
    "4F. River size (large vs small)"
  ),
  Finding = c(
    "TRB downstream mean log(Q) trends upward relative to TRB upstream in the pre-period; domestic gap is flatter. The DDD pre-trend is visible in raw means.",
    "Pre-trend present across multiple continents, though strongest in Asia. Not driven by a single region.",
    "Stations with prior upstream dams (already-dammed rivers) show a stronger pre-trend, suggesting contamination from earlier regulation. But the pre-trend persists even on 'clean' (no prior dam) rivers.",
    "Station-level pre-treatment discharge slopes differ across cells: TRB downstream stations have more negative (declining) pre-slopes than domestic downstream.",
    "No single TRB corridor (dam country -> station country) dominates. Multiple corridors contribute.",
    "Pre-trend appears in both large and small rivers, though noisier on small rivers."
  ),
  Implication = c(
    "Pre-trend is in the data, not an artifact of the regression.",
    "Not a single-continent problem.",
    "Prior regulation contributes but doesn't fully explain the pre-trend.",
    "Systematic composition difference in pre-period behavior.",
    "Not driven by a specific bilateral relationship.",
    "Not a large-river phenomenon."
  )
)

kable(diagnoses, caption = "Pre-Trend Diagnostic Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
  column_spec(2, width = "40%") %>%
  column_spec(3, width = "25%")

Pre-Trend Diagnostic Summary
Diagnostic	Finding	Implication
4A. Raw mean trends by cell	TRB downstream mean log(Q) trends upward relative to TRB upstream in the pre-period; domestic gap is flatter. The DDD pre-trend is visible in raw means.	Pre-trend is in the data, not an artifact of the regression.
4B. Continental decomposition	Pre-trend present across multiple continents, though strongest in Asia. Not driven by a single region.	Not a single-continent problem.
4C. Pre-existing dams	Stations with prior upstream dams (already-dammed rivers) show a stronger pre-trend, suggesting contamination from earlier regulation. But the pre-trend persists even on ‘clean’ (no prior dam) rivers.	Prior regulation contributes but doesn’t fully explain the pre-trend.
4D. Station-level pre-trend slopes	Station-level pre-treatment discharge slopes differ across cells: TRB downstream stations have more negative (declining) pre-slopes than domestic downstream.	Systematic composition difference in pre-period behavior.
4E. Country-pair corridors	No single TRB corridor (dam country -> station country) dominates. Multiple corridors contribute.	Not driven by a specific bilateral relationship.
4F. River size (large vs small)	Pre-trend appears in both large and small rivers, though noisier on small rivers.	Not a large-river phenomenon.

Bottom line: The pre-trend appears to be a genuine feature of the data across regions, river types, and corridors. It is not driven by a single confounding source that can be easily fixed.

4 Matching Attempts

I tried multiple matching strategies to address the concern that the pre-trend is driven by composition differences between TRB and domestic stations.

4.1 Approach A: Matching on observables

Entropy balancing (Hainmueller, 2012; Basri et al., 2021) reweights domestic stations to match transboundary stations on observable river characteristics (discharge, drainage area, stream order, gradient, elevation, dam capacity, dam purpose, climate).

Result: Observable matching achieves near-perfect balance on the targeted covariates. However, pre_trend_log_q balance often worsens — the pre-trend is not driven by observable river characteristics.

Why might balance actually worsen on the pre-trend? Entropy balancing finds weights that equalize the targeted moments (e.g., mean discharge, drainage area). But the pre-trend is not a simple function of these observables. Reweighting to match on observables can inadvertently shift the composition of domestic stations in a way that increases the pre-trend difference. In particular:

Matching on river size and dam capacity may select domestic stations on heavily regulated rivers that had strong pre-existing discharge trends, thereby amplifying the pre-trend gap.
If the pre-trend is driven by unobserved factors (e.g., upstream water management practices, irrigation expansion, or land-use change) that are partially correlated with observables, balancing on observables could be exacerbating the imbalance on the unobserved drivers.
The entropy balancing objective prioritises exact moment conditions on the specified covariates, but it does not penalize imbalance on untargeted variables like the pre-trend slope.

# Love plot: standardized mean differences before and after entropy balancing
library(ebal)

# Prepare covariate matrix for downstream stations
covs_down <- station_covs %>%
  filter(downstream == 1) %>%
  drop_na(transboundary, log_dis_av, log_upland, log_capacity)

# Observable matching variables (same as ddd_matching.R Approach A)
obs_vars <- intersect(
  c("log_dis_av", "log_upland", "ord_stra", "log_capacity", "hydropower",
    "pre_mean_precip", "pre_mean_temp", "pre_cv_precip",
    "gradient_m_per_km", "avg_elev_m"),
  names(covs_down)
)

# Non-targeted variables (the ones that may worsen)
extra_vars <- intersect(
  c("pre_mean_log_q", "pre_sd_log_q", "pre_trend_log_q"),
  names(covs_down)
)

all_vars <- c(obs_vars, extra_vars)

# Drop rows with NAs in matching variables
covs_complete <- covs_down %>%
  drop_na(all_of(all_vars))

X_mat <- as.matrix(covs_complete[, obs_vars])
treat <- covs_complete$transboundary

# Run entropy balancing
eb <- tryCatch(
  ebalance(Treatment = treat, X = X_mat),
  error = function(e) NULL
)

## Converged within tolerance

if (!is.null(eb)) {
  # Compute standardized mean differences
  smd_df <- tibble(Variable = character(), Unmatched = numeric(), Matched = numeric())

  for (v in all_vars) {
    x <- covs_complete[[v]]
    pooled_sd <- sd(x, na.rm = TRUE)
    if (pooled_sd == 0) next

    mean_trb <- mean(x[treat == 1], na.rm = TRUE)
    mean_dom_raw <- mean(x[treat == 0], na.rm = TRUE)

    # Weighted mean for domestic (control) stations
    w <- eb$w
    mean_dom_wt <- weighted.mean(x[treat == 0], w = w, na.rm = TRUE)

    smd_raw <- (mean_trb - mean_dom_raw) / pooled_sd
    smd_wt  <- (mean_trb - mean_dom_wt) / pooled_sd

    smd_df <- bind_rows(smd_df, tibble(
      Variable = v, Unmatched = smd_raw, Matched = smd_wt
    ))
  }

  # Clean variable names for display
  name_map <- c(
    log_dis_av = "Log Discharge", log_upland = "Log Drainage Area",
    ord_stra = "Stream Order", log_capacity = "Log Dam Capacity",
    hydropower = "Hydropower", pre_mean_precip = "Mean Precip (pre)",
    pre_mean_temp = "Mean Temp (pre)", pre_cv_precip = "CV Precip (pre)",
    gradient_m_per_km = "Gradient", avg_elev_m = "Elevation",
    pre_mean_log_q = "Mean Log Q (pre)", pre_sd_log_q = "SD Log Q (pre)",
    pre_trend_log_q = "Pre-Trend Log Q"
  )
  smd_df$Label <- ifelse(smd_df$Variable %in% names(name_map),
                          name_map[smd_df$Variable], smd_df$Variable)
  smd_df$Targeted <- ifelse(smd_df$Variable %in% obs_vars, "Targeted", "Not Targeted")

  # Plot
  smd_long <- smd_df %>%
    pivot_longer(cols = c(Unmatched, Matched), names_to = "Status", values_to = "SMD")

  smd_long$Label <- factor(smd_long$Label, levels = rev(smd_df$Label))
  smd_long$Status <- factor(smd_long$Status, levels = c("Unmatched", "Matched"))

  par(mar = c(5, 14, 4, 2))
  n_vars <- nrow(smd_df)
  y_pos <- seq(1, n_vars)

  plot(NULL, xlim = c(-0.6, 0.6), ylim = c(0.5, n_vars + 0.5),
       xlab = "Standardized Mean Difference (TRB - Domestic)",
       ylab = "", yaxt = "n",
       main = "Covariate Balance: Before vs After Entropy Balancing (Observables)")
  axis(2, at = y_pos, labels = rev(smd_df$Label), las = 1, cex.axis = 0.8)
  abline(v = 0, lty = 1, col = "grey50")
  abline(v = c(-0.1, 0.1), lty = 2, col = "grey70")

  # Shade non-targeted variables
  non_targ_idx <- which(rev(smd_df$Targeted) == "Not Targeted")
  if (length(non_targ_idx) > 0) {
    for (idx in non_targ_idx) {
      rect(-0.7, idx - 0.4, 0.7, idx + 0.4, col = rgb(1, 0.95, 0.9), border = NA)
    }
  }

  # Replot axis and lines on top of shading
  abline(v = 0, lty = 1, col = "grey50")
  abline(v = c(-0.1, 0.1), lty = 2, col = "grey70")

  # Points
  points(rev(smd_df$Unmatched), y_pos, pch = 1, col = "steelblue", cex = 1.3, lwd = 2)
  points(rev(smd_df$Matched), y_pos, pch = 19, col = "firebrick", cex = 1.3)

  # Arrows from unmatched to matched
  arrows(rev(smd_df$Unmatched), y_pos, rev(smd_df$Matched), y_pos,
         length = 0.08, col = "grey40", lwd = 0.8)

  legend("topright",
         legend = c("Unmatched", "After Ebal", "Not targeted (may worsen)"),
         pch = c(1, 19, NA), col = c("steelblue", "firebrick", NA),
         fill = c(NA, NA, rgb(1, 0.95, 0.9)),
         border = c(NA, NA, "grey80"),
         pt.lwd = c(2, 1, NA), pt.cex = c(1.3, 1.3, NA),
         cex = 0.8, bg = "white")
}

The love plot shows that entropy balancing successfully moves targeted covariates (white background) toward zero SMD, but Pre-Trend Log Q (shaded, not targeted) can move away from zero, confirming that observable matching does not fix — and may worsen — the pre-trend.

4.2 Approach B: Matching on pre-treatment dynamics

Entropy balancing on pre-treatment discharge statistics (mean, trend, SD) directly targets the pre-trend divergence.

Result: Even when I force domestic and TRB stations to have identical pre-treatment means and trends, the DDD event study pre-trend persists. This means the pre-trend operates at a finer level than station-level annual averages.

Why annual averages may be insufficient: The pre-treatment statistics used for matching (mean, trend, SD of annual log discharge) capture station-level averages but miss seasonal dynamics, which are likely critical given how dams operate. Dams typically store water during wet seasons and release during dry seasons. If TRB and domestic rivers differ in their seasonal storage/release patterns — even with identical annual means and trends — matching on annual statistics will fail to equalize the within-year discharge dynamics that drive the event-study pre-trend. Section 8 confirms that the DDD signal is indeed concentrated in specific seasons and differs sharply by dam purpose (hydropower vs. irrigation/water supply), supporting the view that seasonal dynamics are important to investigate.

4.3 Summary across matching methods

match_summary <- tibble(
  Method = c(
    "Unmatched",
    "Entropy Balance (Observables)",
    "Entropy Balance (Obs, reduced)",
    "Mahalanobis (3:1 NN)",
    "Coarsened Exact Matching",
    "Propensity Score (caliper 0.2)",
    "Entropy Balance (Pre-treatment)",
    "Caliper restriction",
    "Entropy Balance (All variables)"
  ),
  `DDD Coef` = c(
    "Positive, significant",
    "Positive, significant (attenuated)",
    "Positive, significant",
    "Positive, significant (attenuated)",
    "Positive, insignificant (sample loss)",
    "Positive, significant",
    "Positive, significant",
    "Positive, significant",
    "Positive, significant"
  ),
  `Pre-Trend Fixed?` = c(
    "No", "No", "No", "No", "No", "No", "No", "No", "No"
  ),
  Notes = c(
    "Baseline",
    "10 river/climate covariates balanced",
    "4 key covariates balanced",
    "~85K obs remain; wider CIs",
    "Heavy sample attrition (~85K obs)",
    "Caliper = 0.2 SD of propensity score",
    "Pre-mean, pre-trend, pre-SD balanced",
    "Domestic within ±1 SD of TRB pre-stats",
    "Observables + pre-treatment combined"
  )
)

kable(match_summary,
      caption = "DDD Coefficient Across 9 Matching Specifications") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
  column_spec(3, bold = TRUE, color = "red")

DDD Coefficient Across 9 Matching Specifications
Method	DDD Coef	Pre-Trend Fixed?	Notes
Unmatched	Positive, significant	No	Baseline
Entropy Balance (Observables)	Positive, significant (attenuated)	No	10 river/climate covariates balanced
Entropy Balance (Obs, reduced)	Positive, significant	No	4 key covariates balanced
Mahalanobis (3:1 NN)	Positive, significant (attenuated)	No	~85K obs remain; wider CIs
Coarsened Exact Matching	Positive, insignificant (sample loss)	No	Heavy sample attrition (~85K obs)
Propensity Score (caliper 0.2)	Positive, significant	No	Caliper = 0.2 SD of propensity score
Entropy Balance (Pre-treatment)	Positive, significant	No	Pre-mean, pre-trend, pre-SD balanced
Caliper restriction	Positive, significant	No	Domestic within ±1 SD of TRB pre-stats
Entropy Balance (All variables)	Positive, significant	No	Observables + pre-treatment combined

Key takeaway: The DDD coefficient is positive across all 9 matching methods (significant in 8/9; CEM loses significance due to sample attrition), but the pre-trend is never eliminated. The pre-trend is not an artifact of comparing different types of rivers — it reflects a genuine difference in pre-dam discharge dynamics on TRB vs. domestic rivers.

5 HonestDiD Sensitivity

Rambachan & Roth (2023) provide formal bounds on the treatment effect under violations of parallel trends. The smoothness restriction $\Delta^{SD}(\bar{M})$ bounds the second difference of the trend violation:

$\bar{M} = 0$: trend violation is linear (linear extrapolation of the pre-trend into post-period)
$\bar{M} > 0$: allows curvature in the trend violation

# Build summary from cached results
hdid_summary <- tibble(
  Specification = character(), M = numeric(),
  Lower = numeric(), Upper = numeric()
)

for (spec in list(
  list(res = hdid_cached$trb_unmatched, name = "TRB DD (Unmatched)"),
  list(res = hdid_cached$trb_ebal_A,   name = "TRB DD (Ebal Obs)"),
  list(res = hdid_cached$ddd_unmatched, name = "DDD (Unmatched)"),
  list(res = hdid_cached$ddd_ebal_A,   name = "DDD (Ebal Obs)")
)) {
  if (!is.null(spec$res)) {
    tryCatch({
      res_df <- as_tibble(spec$res)
      m_col  <- intersect(names(res_df), c("Mbar", "M", "m"))[1]
      lb_col <- intersect(names(res_df), c("lb", "lower", "CI.lower"))[1]
      ub_col <- intersect(names(res_df), c("ub", "upper", "CI.upper"))[1]
      if (!is.na(m_col) && !is.na(lb_col) && !is.na(ub_col)) {
        hdid_summary <- bind_rows(hdid_summary,
          tibble(Specification = spec$name,
                 M = res_df[[m_col]], Lower = res_df[[lb_col]], Upper = res_df[[ub_col]]))
      }
    }, error = function(e) NULL)
  }
}

if (nrow(hdid_summary) > 0) {
  ggplot(hdid_summary,
         aes(x = M, ymin = Lower, ymax = Upper,
             color = Specification, fill = Specification)) +
    geom_ribbon(alpha = 0.15) +
    geom_line(aes(y = Lower), linewidth = 0.6) +
    geom_line(aes(y = Upper), linewidth = 0.6) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "grey40") +
    facet_wrap(~ ifelse(grepl("DDD", Specification), "DDD", "TRB DD"),
               scales = "free_y") +
    scale_color_manual(values = c("TRB DD (Unmatched)" = "firebrick",
                                  "TRB DD (Ebal Obs)"  = "steelblue",
                                  "DDD (Unmatched)"    = "firebrick",
                                  "DDD (Ebal Obs)"     = "steelblue")) +
    scale_fill_manual(values = c("TRB DD (Unmatched)" = "firebrick",
                                 "TRB DD (Ebal Obs)"  = "steelblue",
                                 "DDD (Unmatched)"    = "firebrick",
                                 "DDD (Ebal Obs)"     = "steelblue")) +
    labs(x = expression(bar(M) ~ "(Smoothness Bound)"),
         y = "Treatment Effect (Log Discharge)",
         title = "HonestDiD Sensitivity: Identified Set vs. Trend Violation Bound",
         subtitle = expression(bar(M) ~ "= 0: linear trend extrapolation; larger" ~ bar(M) ~ "allows more curvature")) +
    theme_minimal(base_size = 12) +
    theme(legend.position = "bottom", legend.title = element_blank(),
          strip.text = element_text(face = "bold", size = 13))
}

if (nrow(hdid_summary) > 0) {
  hdid_wide <- hdid_summary %>%
    filter(M %in% c(0, 0.005, 0.01, 0.02, 0.05)) %>%
    mutate(CI = paste0("[", round(Lower, 3), ", ", round(Upper, 3), "]")) %>%
    select(Specification, M, CI) %>%
    pivot_wider(names_from = Specification, values_from = CI)

  kable(hdid_wide,
        caption = "HonestDiD 95% Confidence Sets at Selected Smoothness Bounds",
        col.names = c("M-bar", names(hdid_wide)[-1])) %>%
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
}

HonestDiD 95% Confidence Sets at Selected Smoothness Bounds
M-bar	TRB DD (Unmatched)	TRB DD (Ebal Obs)	DDD (Unmatched)	DDD (Ebal Obs)
0.000	[-0.265, 0.037]	[-0.201, 0.137]	[-0.053, 0.078]	[-0.078, 0.106]
0.005	[-0.376, 0.375]	[-0.335, 0.48]	[-1.247, 1.575]	[-1.259, 1.677]
0.010	[-0.482, 0.662]	[-0.41, 0.848]	[-2.625, 2.628]	[-2.544, 2.862]
0.020	[-0.701, 1.157]	[-0.634, 1.36]	[-4.621, 5.107]	[-4.714, 5.296]
0.050	[-1.657, 2.159]	[-1.547, 2.467]	[-11.715, 10.708]	[-11.877, 11.104]

5.0.1 Interpretation

At $\bar{M} = 0$ (the most conservative assumption: linear extrapolation of the pre-trend), the identified sets for both the TRB DD and the DDD include zero.
The bounds widen rapidly with $\bar{M}$.
Matching (Ebal Obs) does not meaningfully tighten the bounds.

Conclusion: Even at $\bar{M} = 0$ (the tightest bounds — assumes the trend violation is exactly linear), the identified sets include zero. Larger $\bar{M}$ only makes the bounds wider. HonestDiD cannot rescue the binary DDD because the pre-trend is too large relative to the signal.

6 Assessment: Binary DDD

assessment <- tibble(
  Issue = c(
    "Sign of DDD coefficient",
    "Pre-trend in event study",
    "Matching eliminates pre-trend?",
    "HonestDiD at M=0",
    "Detrended specification"
  ),
  Status = c(
    "Positive (unexpected under strategic withholding, which predicts negative)",
    "Severe (negative, upward-trending pre-treatment DDD coefficients)",
    "No (persists across all 9 matching methods)",
    "Includes zero (cannot reject null effect)",
    "Coefficient loses significance once group-specific trends are absorbed"
  )
)

kable(assessment, caption = "Binary DDD: Summary of Issues") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE)

Binary DDD: Summary of Issues
Issue	Status
Sign of DDD coefficient	Positive (unexpected under strategic withholding, which predicts negative)
Pre-trend in event study	Severe (negative, upward-trending pre-treatment DDD coefficients)
Matching eliminates pre-trend?	No (persists across all 9 matching methods)
HonestDiD at M=0	Includes zero (cannot reject null effect)
Detrended specification	Coefficient loses significance once group-specific trends are absorbed

The binary DDD is not credibly identified. The pre-trend is the central problem: the downstream–upstream discharge gap was already evolving differently on TRB and domestic rivers before dam construction. The positive coefficient likely reflects this pre-existing trend rather than a causal effect. The result does not survive detrending, and HonestDiD bounds include zero even under the most favorable assumption.

6.1 One potentially interesting pattern: late reversal

# Zoom in on the DDD event study post-treatment coefficients
par(mar = c(5, 5, 4, 2))

plot(ddd_coefs$yr, ddd_coefs$est, pch = 19,
     col = if_else(ddd_coefs$yr >= 0, "firebrick", "steelblue"),
     cex = 1.2,
     xlim = c(-10, 10), ylim = range(c(ddd_coefs$lo, ddd_coefs$hi)),
     main = "DDD Event Study: Note Late Reversal at t = +7 to +10",
     xlab = "Years Relative to Dam Construction",
     ylab = "DDD Coefficient (Down x TRB)")
segments(ddd_coefs$yr, ddd_coefs$lo, ddd_coefs$yr, ddd_coefs$hi,
         col = if_else(ddd_coefs$yr >= 0, "firebrick", "steelblue"), lwd = 1.5)
abline(h = 0, lwd = 1)
abline(v = -0.5, lty = 2)
grid(col = "grey90")

# Highlight the late reversal region
rect(6.5, min(ddd_coefs$lo) - 0.05, 10.5, max(ddd_coefs$hi) + 0.05,
     border = "orange", lwd = 2, lty = 2)
text(8.5, max(ddd_coefs$hi) + 0.03, "Late reversal?", col = "orange", cex = 0.9)

The DDD event study shows the coefficients dipping negative around $t = +7$ to $t = +10$. If taken at face value, this would be consistent with a theory where:

Dam construction occurs (t = 0)
Reservoirs take years to fill
Countries gradually learn to exploit the dam for strategic withholding
The negative discharge effect manifests with a multi-year delay

However, this interpretation is highly speculative given:

The severe pre-trend makes any post-treatment inference unreliable
I am looking at the tails of the event window where sample sizes shrink
This could be an artifact of the pre-trend reverting or noise

This pattern motivates the continuous DOR specification, which directly tests whether more regulation on transboundary rivers produces more withholding.

7 DOR Analysis: Degree of Regulation

7.1 Motivation

The binary DDD has a fundamental problem: it treats all dams equally. A 1 MCM dam and a 10,000 MCM dam on a river with 100 MCM/year of flow get the same “treatment.” This:

Reduces power: Many dams are too small relative to river flow to produce measurable effects
Ignores dose-response: Strategic withholding capacity depends on how much water the dam can withhold relative to the river’s total flow. Small vs large dams are likely to have very different operations.
Coarse comparison: Binary downstream/upstream x pre/post collapses all variation into four cells, making it vulnerable to any compositional difference between TRB and domestic rivers

7.2 DOR Definition

The Degree of Regulation (DOR) is a standard hydrological measure (Lehner et al. 2011, Grill et al. 2019):

\[\text{DOR}_{it} = \frac{\text{Cumulative upstream reservoir capacity (MCM)}_i \text{ built by year } t}{\text{Mean annual streamflow (MCM/year)}_i}\]

DOR = 0: no upstream dams (upstream stations, or pre-dam observations)
DOR = 0.5: upstream dams can store half a year’s flow
DOR = 1: upstream dams can store an entire year’s flow
DOR > 1: cumulative storage exceeds annual flow (heavily regulated)

Key properties for this analysis:

Time-varying: Increases as new dams are built upstream; 0 before any upstream dam exists
Continuous: Captures dose-response (more regulation → more potential withholding)
Station-specific: Depends on the station’s location and upstream dam portfolio
Upstream stations: Generally have DOR = 0 or very low DOR (they are upstream of the treatment dam, so most regulation is downstream of them). However, upstream stations can have DOR > 0 if other dams exist further upstream. This occurs because the DOR computation requires volume_max (maximum reservoir capacity) from the GDAT database, and many smaller dams lack this field. The dams that do have volume_max and appear in the DOR calculation are likely the larger, more significant dams. Smaller upstream dams without capacity data are omitted, so measured upstream DOR understates the true degree of regulation.

Bias implications: If unmeasured upstream regulation exists, it means some stations classified as “low DOR” actually face moderate regulation. This would attenuate the estimated DOR effect: stations with DOR = 0 in the data are actually somewhat regulated, blurring the contrast between regulated and unregulated observations. The attenuation bias would work against finding a significant DOR x TRB coefficient, so the current estimates may be conservative. This also connects to the binned results (Section 7.6): the “Low” DOR bin (0–0.1) may contain a mix of genuinely low-regulation stations and stations where DOR is underestimated due to missing dam data, which could explain why the Low bin behaves differently from higher bins.

7.3 Why DOR over alternative continuous measures?

Two other continuous treatment measures were considered and rejected:

Total count of upstream dams: Simply counting the number of dams upstream of a station ignores scale. A station with 10 small run-of-river dams (e.g. $5 \times 10 = 50 MCM$) would receive a higher “treatment intensity” than a station with one massive dam (capacity: 5,000 MCM). Since strategic withholding depends on the capacity to store and release water, not just the total number of structures, dam count is a poor proxy for regulatory capability. DOR, by contrast, aggregates total storage capacity and normalizes by river flow, directly capturing how much of the river’s annual discharge can be controlled.
Inverse network distance to the nearest dam: Distance-based measures capture proximity but also ignore scale. A station 10 km downstream of a small weir would appear more “treated” than one 100 km downstream of a mega-dam, despite the latter facing far more regulatory impact. Additionally, inverse distance does not account for the cumulative effect of multiple dams upstream, whereas DOR naturally aggregates all upstream storage. Distance may matter for the timing of flow impacts (hydrological lag), but it does not capture the magnitude of regulation. I may later try to incorporate distance as an additional variable to capture timing effects, but DOR is the more fundamental measure of regulatory intensity.

DOR is the standard measure in the hydrology literature (Lehner et al. 2011, Grill et al. 2019) precisely because it combines both the storage capacity and the river’s natural flow volume into a single, physically meaningful ratio.

7.4 DOR Specification

Replace the binary DDD with a continuous treatment TWFE:

\[\log(Q)_{it} = \beta_1 \cdot \text{DOR}_{it} + \beta_2 \cdot \text{DOR}_{it} \times \text{Transboundary}_i + \gamma \mathbf{X}_{it} + \alpha_i + \delta_t + \varepsilon_{it}\]

Interpretation:

$\beta_1$: The average effect of an additional unit of DOR on log discharge. Sign is ambiguous a priori — dams store water (reducing flow in some periods) but also release it (increasing flow in dry periods). The net effect depends on the timing of storage and release relative to measurement.
$\beta_2 < 0$: Regulation reduces TRB discharge more than domestic → consistent with strategic withholding (TRB dam operators store more / release less compared to domestic dam operators at the same DOR level)
$\beta_2 = 0$: No evidence of strategic behavior; TRB and domestic dams have the same downstream discharge impact per unit of regulation

7.5 Main DOR Results

The three DOR models are:

Column	Model	Description
(1) DOR Baseline	DOR + DOR x TRB	Station + year-month FE. DOR is time-varying and continuous. The interaction with transboundary is the key coefficient.
(2) DOR + Climate	Adds precipitation and temperature	Controls for weather-driven discharge variation. Same identification as (1) but with climate controls.
(3) Capacity (MCM)	Uses cumulative capacity (MCM) instead of DOR	Tests whether the result is driven by the DOR normalization. Capacity is the numerator of DOR (not divided by river flow).

# Main DOR regression
d1 <- feols(
  log_discharge ~ DOR + DOR:transboundary | station_id + year_month,
  data = panel_dor, cluster = ~station_id
)

# DOR with climate controls
panel_dor_clim <- panel_dor %>% filter(!is.na(precipitation_mm))
d2 <- feols(
  log_discharge ~ DOR + DOR:transboundary +
    precipitation_mm + temperature_c | station_id + year_month,
  data = panel_dor_clim, cluster = ~station_id
)

# Cumulative capacity (levels, not ratio)
d5 <- feols(
  log_discharge ~ cum_capacity_mcm + cum_capacity_mcm:transboundary |
    station_id + year_month,
  data = panel_dor, cluster = ~station_id
)

cm_dor <- c(
  "DOR"                              = "Degree of Regulation",
  "DOR:transboundary"                = "DOR x Transboundary",
  "cum_capacity_mcm"                 = "Cum. Capacity (MCM)",
  "cum_capacity_mcm:transboundary"   = "Capacity x TRB",
  "precipitation_mm"                 = "Precipitation (mm)",
  "temperature_c"                    = "Temperature (C)"
)

modelsummary(
  list("(1) DOR Baseline" = d1, "(2) DOR + Climate" = d2,
       "(3) Capacity (MCM)" = d5),
  coef_map = cm_dor,
  gof_map = c("nobs", "r.squared", "FE: station_id", "FE: year_month"),
  stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
  title = "DOR (Continuous Treatment) Regression Results",
  notes = list("Station and year-month FE. SE clustered at station level.",
               "DOR = cumulative upstream capacity / mean annual flow."),
  escape = FALSE,
  output = "kableExtra"
) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE, font_size = 13)

DOR (Continuous Treatment) Regression Results
	(1) DOR Baseline	(2) DOR + Climate	(3) Capacity (MCM)
Degree of Regulation	0.029***	0.039**
	(0.010)	(0.018)
DOR x Transboundary	−0.022*	−0.037**
	(0.012)	(0.019)
Cum. Capacity (MCM)			0.000
			(0.000)
Capacity x TRB			0.000
			(0.000)
Precipitation (mm)		0.275***
		(0.007)
Temperature (C)		−0.011***
		(0.003)
Num.Obs.	533062	324461	533062
R2	0.810	0.839	0.810
FE: station_id	X	X	X
FE: year_month	X	X	X
* p < 0.1, p < 0.05, * p < 0.01

coef_dor   <- coef(d2)["DOR:transboundary"]
se_dor     <- se(d2)["DOR:transboundary"]
pv_dor     <- pvalue(d2)["DOR:transboundary"]

cat("Key result: DOR x Transboundary =", round(coef_dor, 4),
    " (SE =", round(se_dor, 4), ", p =", round(pv_dor, 4), ")\n")

## Key result: DOR x Transboundary = -0.0375  (SE = 0.0188 , p = 0.0459 )

cat("Interpretation: A 1-unit increase in DOR (capacity = 1 year's flow)\n")

## Interpretation: A 1-unit increase in DOR (capacity = 1 year's flow)

cat("  changes TRB discharge by an additional",
    round(100 * (exp(coef_dor) - 1), 2), "% vs domestic.\n")

##   changes TRB discharge by an additional -3.68 % vs domestic.

7.5.1 Interpretation

Degree of Regulation (main effect): The DOR coefficient is positive and significant in both columns (1) and (2) ($0.029^{***}$ and $0.039^{**}$). This means that, on average, more upstream regulation is associated with higher measured discharge. This is not paradoxical: heavily regulated rivers have dams releasing stored water, which can raise measured flow at downstream gauges relative to what the station-level mean would predict (recall that station FE absorb level differences). The positive sign likely reflects the timing of measurement relative to dam releases.

DOR x Transboundary (key coefficient): The interaction is negative and significant: $-0.022^*$ in the baseline and $-0.037^{**}$ with climate controls. This means that each additional unit of DOR reduces discharge at transboundary stations by 2–4% more than at domestic stations. This is consistent with the strategic withholding hypothesis: at the same level of regulation, TRB rivers experience more discharge reduction than domestic rivers, as if TRB dam operators store more or release less water.

Cumulative Capacity (column 3): When using raw capacity (MCM) instead of DOR, both the main effect and the TRB interaction round to 0.000, reflecting the massive scale difference (capacity is in MCM, not normalized). The near-zero coefficients suggest that the DOR normalization by river flow is necessary: it is the ratio of storage to flow, not absolute storage, that predicts discharge impacts.

7.5.2 Comparison with binary DDD

comparison <- tibble(
  Feature = c(
    "Treatment variable",
    "Sign of TRB interaction",
    "Consistent with theory?",
    "Statistical significance",
    "Pre-trend severity",
    "Identifies what?",
    "Power"
  ),
  `Binary DDD` = c(
    "0/1 (downstream x post)",
    "Positive",
    "Unexpected (predicts TRB increases flow)",
    "Significant but driven by pre-trend",
    "Severe (cannot be eliminated)",
    "Any dam effect, regardless of size",
    "Low (small and large dams pooled)"
  ),
  `DOR Specification` = c(
    "Continuous, time-varying DOR",
    "Negative",
    "Consistent (more regulation, more TRB withholding)",
    "Significant with climate controls",
    "Present in dose-response ES but different nature (positive, not negative)",
    "Dose-response: more capacity = more effect",
    "Higher (variation in DOR across stations and time)"
  )
)

kable(comparison,
      caption = "Binary DDD vs. DOR Specification Comparison") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
  column_spec(3, bold = TRUE)

Binary DDD vs. DOR Specification Comparison
Feature	Binary DDD	DOR Specification
Treatment variable	0/1 (downstream x post)	Continuous, time-varying DOR
Sign of TRB interaction	Positive	Negative
Consistent with theory?	Unexpected (predicts TRB increases flow)	Consistent (more regulation, more TRB withholding)
Statistical significance	Significant but driven by pre-trend	Significant with climate controls
Pre-trend severity	Severe (cannot be eliminated)	Present in dose-response ES but different nature (positive, not negative)
Identifies what?	Any dam effect, regardless of size	Dose-response: more capacity = more effect
Power	Low (small and large dams pooled)	Higher (variation in DOR across stations and time)

7.6 DOR Distribution

dor_downstream <- panel_dor %>%
  filter(downstream == 1) %>%
  distinct(station_id, year, .keep_all = TRUE)

par(mfrow = c(1, 2))

hist(log(dor_downstream$DOR[dor_downstream$DOR > 0] + 0.001),
     breaks = 50, col = "steelblue", border = "white",
     main = "DOR Distribution (Downstream, Post-Dam)",
     xlab = "log(DOR)", ylab = "Frequency")
abline(v = log(c(0.1, 0.5, 1)), lty = 2, col = "red")
legend("topright", c("0.1", "0.5", "1.0"),
       lty = 2, col = "red", title = "DOR thresholds", cex = 0.8)

boxplot(DOR ~ transboundary,
        data = dor_downstream %>% filter(DOR > 0),
        names = c("Domestic", "Transboundary"),
        col = c("steelblue", "firebrick"),
        main = "DOR by River Type",
        ylab = "DOR", outline = FALSE)

par(mfrow = c(1, 1))

7.7 DOR Bins and Coefficient Plot (Non-linear Specification)

d3 <- feols(
  log_discharge ~ i(DOR_bin, ref = "None") +
    i(DOR_bin, transboundary, ref = "None") | station_id + year_month,
  data = panel_dor, cluster = ~station_id
)

modelsummary(
  list("DOR Bins (Non-Linear)" = d3),
  gof_map = c("nobs", "r.squared", "FE: station_id", "FE: year_month"),
  stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
  title = "DOR Binned Specification",
  notes = list("Bins: None (ref), Low (0-0.1), Medium (0.1-0.5), High (0.5-1), Very High (>1).",
               "Station and year-month FE. SE clustered at station level."),
  escape = FALSE,
  output = "kableExtra"
) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE, font_size = 13)

DOR Binned Specification
	DOR Bins (Non-Linear)
DOR_bin = Low	−0.042
	(0.036)
DOR_bin = Medium	−0.073*
	(0.043)
DOR_bin = High	−0.006
	(0.054)
DOR_bin = Very High	0.037
	(0.069)
DOR_bin = Low × transboundary	−0.234***
	(0.085)
DOR_bin = Medium × transboundary	0.091
	(0.081)
DOR_bin = High × transboundary	0.097
	(0.184)
DOR_bin = Very High × transboundary	0.174
	(0.121)
Num.Obs.	533062
R2	0.810
FE: station_id	X
FE: year_month	X
* p < 0.1, p < 0.05, * p < 0.01

# Extract d3 coefficients
cc3 <- coef(d3); se3 <- se(d3)

# Main bin effects
bins <- c("Low", "Medium", "High", "Very High")
main_idx <- paste0("DOR_bin::", bins)
trb_idx  <- paste0("DOR_bin::", bins, ":transboundary")

main_est <- cc3[main_idx]; main_se <- se3[main_idx]
trb_est  <- cc3[trb_idx];  trb_se  <- se3[trb_idx]

par(mfrow = c(1, 2), mar = c(7, 5, 4, 2))

# Left: Main effects
x_pos <- 1:4
plot(x_pos, main_est, pch = 19, col = "steelblue", cex = 1.3,
     xlim = c(0.5, 4.5), ylim = range(c(main_est - 1.96*main_se, main_est + 1.96*main_se)),
     xaxt = "n", main = "DOR Bin Effects on Log(Discharge)\n(ref = None/Upstream)",
     xlab = "", ylab = "Coefficient")
axis(1, at = x_pos, labels = bins, las = 2)
segments(x_pos, main_est - 1.96*main_se, x_pos, main_est + 1.96*main_se,
         col = "steelblue", lwd = 2)
abline(h = 0, lty = 2); grid(col = "grey90")

# Right: TRB interactions
plot(x_pos, trb_est, pch = 19, col = "firebrick", cex = 1.3,
     xlim = c(0.5, 4.5), ylim = range(c(trb_est - 1.96*trb_se, trb_est + 1.96*trb_se)),
     xaxt = "n", main = "DOR Bin x TRB Interaction\n(Additional TRB Effect, ref = None)",
     xlab = "", ylab = "Coefficient")
axis(1, at = x_pos, labels = bins, las = 2)
segments(x_pos, trb_est - 1.96*trb_se, x_pos, trb_est + 1.96*trb_se,
         col = "firebrick", lwd = 2)
abline(h = 0, lty = 2); grid(col = "grey90")

par(mfrow = c(1, 1))

7.7.1 Interpretation

Main DOR bin effects (left panel, top table rows): The overall effect of regulation on discharge is small and concentrated at low-to-medium levels. Low and Medium DOR bins show negative coefficients ($-0.042$ and $-0.073^*$), consistent with moderate regulation reducing downstream flow. High and Very High bins are near zero or slightly positive, suggesting that at very high regulation levels, the discharge impact may plateau or reverse (e.g., heavily regulated rivers releasing water for irrigation or hydropower).

DOR bin x TRB interactions (right panel, bottom table rows): These are the key coefficients for strategic withholding. The coefficient plot makes the non-monotonic pattern visually striking:

Low DOR x TRB = $-0.234^{***}$ (SE = 0.085): The largest and only statistically significant interaction. At low levels of regulation (DOR 0–0.1), TRB stations experience substantially more discharge reduction than domestic stations.
Medium, High, Very High x TRB: Positive but insignificant (0.091, 0.097, 0.174). The confidence intervals on the higher bins are wide enough to include substantial negative effects, so I cannot rule out withholding at higher DOR levels, but the point estimates do not support a monotonic dose-response story.

What does this pattern mean? The concentration of the TRB effect in the Low DOR bin is puzzling under a simple “more regulation = more withholding” story. Several possible explanations:

Measurement issue in the Low bin: As discussed above, the Low DOR bin may contain stations where upstream regulation is underestimated due to missing dam capacity data. If these are actually moderately regulated stations misclassified as low-DOR, the “Low DOR x TRB” coefficient may be capturing a different margin than intended.
Threshold effect: Strategic behavior may be most detectable when a dam first comes online (small initial DOR), i.e. the marginal impact of going from no regulation to some regulation is large. At higher DOR levels, downstream stations may have already adapted (alternative water sources, adjusted agriculture), muting the measurable effect.
Composition: Low-DOR TRB stations may be on rivers where the dam is small relative to the river but strategically important (e.g., a diversion dam on an international border), while high-DOR TRB stations may be on rivers with complex multi-dam systems where the bilateral dynamic is diluted.
Power: The Low bin likely has the most observations (many dams produce small DOR increases), giving it more statistical power. The higher bins have fewer observations and correspondingly wider confidence intervals.
Lower scrutiny: Low-DOR dams are smaller operations that may attract less attention from downstream countries. Strategic withholding from a large, highly visible dam risks diplomatic consequences, whereas small diversions from a minor dam may go unnoticed or uncontested. If downstream countries monitor and respond to large dams more aggressively (through treaties, complaints, or retaliatory action), strategic behavior may be easier to sustain, and therefore more prevalent, at low regulation levels.

This non-monotonic pattern warrants further investigation and motivates the split-sample event studies by DOR group below.

7.8 DOR Dose-Response Event Study

The dose-response event study interacts station-level $\Delta\text{DOR}$ (the DOR increase caused by the treatment dam) with event-time dummies. For upstream stations, $\Delta\text{DOR} = 0$ by construction (the treatment dam is downstream of them). For downstream stations, $\Delta\text{DOR} = \text{treatment capacity} / \text{mean annual flow}$.

Two models:

D-ES1 (Main effect): $\log(Q)_{it} = \sum_k \beta_k \cdot \mathbb{1}[t = k] \cdot \Delta\text{DOR}_i + \alpha_i + \delta_t + \varepsilon_{it}$
D-ES2 (TRB interaction): Adds $\sum_k \gamma_k \cdot \mathbb{1}[t = k] \cdot \Delta\text{DOR}_i \cdot \text{TRB}_i$ to capture the additional TRB effect.

# Compute delta_DOR per station
station_flow_rmd <- panel %>%
  group_by(station_id) %>%
  summarise(
    mean_discharge_m3s = mean(exp(log_discharge), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(annual_flow_mcm = mean_discharge_m3s * 365.25 * 86400 / 1e6)

delta_dor_station <- full_treatment %>%
  select(station_id, downstream, treatment_capacity) %>%
  left_join(station_flow_rmd %>% select(station_id, annual_flow_mcm),
            by = "station_id") %>%
  mutate(
    delta_DOR = case_when(
      downstream == 0 ~ 0,
      downstream == 1 & annual_flow_mcm > 0 ~ treatment_capacity / annual_flow_mcm,
      TRUE ~ NA_real_
    )
  )

# Build event study data
event_dor <- panel_dor %>%
  filter(abs(years_to_treatment) <= 10) %>%
  mutate(rel_year = years_to_treatment) %>%
  left_join(delta_dor_station %>% select(station_id, delta_DOR),
            by = "station_id") %>%
  filter(!is.na(delta_DOR))

# D-ES1: Main effect
d_es1 <- feols(
  log_discharge ~ i(rel_year, delta_DOR, ref = -1) |
    station_id + year_month,
  data = event_dor, cluster = ~station_id
)

# D-ES2: TRB interaction
d_es2 <- feols(
  log_discharge ~ i(rel_year, delta_DOR, ref = -1) +
    i(rel_year, I(delta_DOR * transboundary), ref = -1) |
    station_id + year_month,
  data = event_dor, cluster = ~station_id
)

# Extract coefficients for plotting
extract_es_coefs <- function(model, pattern) {
  cc <- coef(model)
  ss <- se(model)
  idx <- grepl(pattern, names(cc))
  est <- cc[idx]
  se_vals <- ss[idx]
  yrs <- as.numeric(regmatches(names(est), regexpr("-?\\d+", names(est))))
  df <- data.frame(yr = c(yrs, -1), est = c(est, 0),
                   lo = c(est - 1.96 * se_vals, 0),
                   hi = c(est + 1.96 * se_vals, 0)) %>% arrange(yr)
  df
}

main_coefs <- extract_es_coefs(d_es1, "delta_DOR")
trb_coefs  <- extract_es_coefs(d_es2, "I\\(delta_DOR \\* transboundary\\)")

par(mfrow = c(1, 2), mar = c(5, 5, 4, 2))

# Left panel: Main effect
plot(main_coefs$yr, main_coefs$est, pch = 19, col = "steelblue", cex = 1.1,
     xlim = c(-10, 10), ylim = range(c(main_coefs$lo, main_coefs$hi)),
     main = expression("DOR Dose Event Study: Main Effect"),
     xlab = "Years Relative to Dam Construction",
     ylab = expression("Effect of " * Delta * "DOR on Log(Discharge)"))
segments(main_coefs$yr, main_coefs$lo, main_coefs$yr, main_coefs$hi,
         col = "steelblue", lwd = 1.5)
abline(h = 0, lty = 3); abline(v = -0.5, lty = 2); grid(col = "grey90")

# Right panel: TRB interaction
plot(trb_coefs$yr, trb_coefs$est, pch = 19, col = "firebrick", cex = 1.1,
     xlim = c(-10, 10), ylim = range(c(trb_coefs$lo, trb_coefs$hi)),
     main = expression("DOR Dose Event Study: " * Delta * "DOR x TRB"),
     xlab = "Years Relative to Dam Construction",
     ylab = expression("Additional TRB Effect of " * Delta * "DOR"))
segments(trb_coefs$yr, trb_coefs$lo, trb_coefs$yr, trb_coefs$hi,
         col = "firebrick", lwd = 1.5)
abline(h = 0, lty = 3); abline(v = -0.5, lty = 2); grid(col = "grey90")

par(mfrow = c(1, 1))

7.8.1 Interpretation

Main effect (left panel): The pre-treatment coefficients hover around zero with tight confidence intervals and don’t show a strong pre-trend in the main DOR dose effect, which is encouraging. Around $t=0$, coefficients remain near zero and then drift slightly negative by $t = +8$ to $+10$ (reaching $\sim -0.10$ to $-0.15$), suggesting a gradual discharge reduction as regulation takes hold. The slow onset is consistent with reservoirs taking years to fill and regulation building incrementally. Overall, the main effect is well-behaved and does not raise serious pre-trend concerns.

TRB interaction (right panel): The $\Delta\text{DOR} \times \text{TRB}$ interaction coefficients are positive in the pre-period ($\sim 0.3$–$0.5$) with wide confidence intervals, indicating that transboundary stations with higher future DOR already had somewhat different discharge dynamics before dam construction. Post-treatment, the coefficients rise further ($\sim 1.0$–$2.5$), but the pre-existing positive level complicates attribution. The wide confidence intervals throughout (many individual coefficients include zero) reflect the limited number of TRB stations with high $\Delta$DOR.

Implications for the DOR specification: The main effect panel is reassuring: the pre-period is approximately flat, suggesting that the DOR dose variable itself is not contaminated by pre-trends. The TRB interaction is more concerning: positive pre-treatment coefficients suggest that the $\Delta$DOR x TRB interaction captures some pre-existing differences between TRB and domestic stations. Importantly, the pre-treatment interaction coefficients are positive, while the static DOR x TRB coefficient from d2 is negative. This sign discrepancy likely arises because the event study uses $\Delta$DOR (the marginal DOR change from the treatment dam) while the static regression uses total cumulative DOR, which captures different variation. The event study result does not necessarily invalidate the static result, but it highlights that the identification is coming from different sources of variation in the two specifications.

7.9 Split-Sample Event Study by DOR Group

I split downstream stations into three groups based on their post-treatment DOR level (Low: 0–0.1, Medium: 0.1–0.5, High: >0.5) and run a DDD event study within each group (using all upstream stations as controls).

# Classify downstream stations by post-treatment DOR
post_dor_class <- panel_dor %>%
  filter(downstream == 1, post == 1) %>%
  group_by(station_id) %>%
  summarise(mean_post_DOR = mean(DOR, na.rm = TRUE), .groups = "drop") %>%
  mutate(dor_group = case_when(
    mean_post_DOR <= 0.1 ~ "Low DOR (0-0.1)",
    mean_post_DOR <= 0.5 ~ "Medium DOR (0.1-0.5)",
    TRUE                 ~ "High DOR (>0.5)"
  ))

# Build event data with DOR group
event_dor_grouped <- panel_dor %>%
  filter(abs(years_to_treatment) <= 10) %>%
  mutate(rel_year = years_to_treatment) %>%
  left_join(post_dor_class %>% select(station_id, dor_group), by = "station_id")

# Upstream stations get included in all subsamples as controls
upstream_data <- event_dor_grouped %>% filter(downstream == 0)

groups <- c("Low DOR (0-0.1)", "Medium DOR (0.1-0.5)", "High DOR (>0.5)")
colors <- c("steelblue", "orange", "firebrick")

par(mfrow = c(1, 3), mar = c(5, 5, 4, 2))

for (g in seq_along(groups)) {
  sub_data <- bind_rows(
    event_dor_grouped %>% filter(dor_group == groups[g]),
    upstream_data
  )

  m_sub <- tryCatch(
    feols(log_discharge ~ i(rel_year, downstream, ref = -1) +
            i(rel_year, transboundary, ref = -1) +
            i(rel_year, I(downstream * transboundary), ref = -1) |
            station_id + year_month,
          data = sub_data, cluster = ~station_id),
    error = function(e) NULL
  )

  if (!is.null(m_sub)) {
    cc <- coef(m_sub); ss <- se(m_sub)
    idx <- grepl("I\\(downstream", names(cc))
    if (sum(idx) == 0) idx <- grepl("downstream", names(cc)) & grepl("transboundary", names(cc))
    est <- cc[idx]; se_v <- ss[idx]
    yrs <- as.numeric(regmatches(names(est), regexpr("-?\\d+", names(est))))
    df <- data.frame(yr = c(yrs, -1), est = c(est, 0),
                     lo = c(est - 1.96 * se_v, 0),
                     hi = c(est + 1.96 * se_v, 0)) %>% arrange(yr)

    plot(df$yr, df$est, pch = 19, col = colors[g], cex = 1.1,
         xlim = c(-10, 10), ylim = range(c(df$lo, df$hi)),
         main = groups[g],
         xlab = "Years Relative to Dam Construction",
         ylab = "DDD Coef (Down x TRB)")
    segments(df$yr, df$lo, df$yr, df$hi, col = colors[g], lwd = 1.5)
    abline(h = 0, lty = 3); abline(v = -0.5, lty = 2); grid(col = "grey90")
  }
}

par(mfrow = c(1, 1))

7.9.1 Interpretation

Low DOR (0–0.1): The pre-treatment coefficients are negative ($\sim -0.2$ to $-0.3$) and trend upward toward $t=-1$, a pattern similar to the full-sample binary DDD. Post-treatment, coefficients bounce around zero or become slightly positive. The pre-trend suggests the Low DOR group inherits the identification problems of the binary specification.

Medium DOR (0.1–0.5): Very noisy with wide confidence intervals, reflecting the smaller sample. A large negative spike at $t=-10$ is followed by positive values. No clear pre-trend or treatment effect can be identified.

High DOR (>0.5): Pre-treatment coefficients trend from near zero upward to positive values. Post-treatment, coefficients are consistently positive ($\sim 0.2$–$0.4$). This is unexpected under the withholding hypothesis (which predicts negative TRB differentials for highly regulated rivers) and mirrors the sign puzzle of the binary DDD.

Overall pattern: The split-sample analysis does not reveal a clean dose-response pattern. If strategic withholding intensifies with regulation, high-DOR TRB stations should show the most negative coefficients, but instead they show positive coefficients. This complicates the interpretation of the static DOR x TRB result.

7.10 Leave-One-Out Robustness

To assess whether the DOR x TRB result is driven by a single region or country, I re-run the baseline DOR regression (d2) after dropping each continent or country one at a time.

# Try to load saved LOO results; if not available, compute
loo_cont_path <- file.path(out_dir, "Tables", "dor_loo_continent.csv")
loo_ctry_path <- file.path(out_dir, "Tables", "dor_loo_country.csv")

# Also check relative path in case out_dir differs
if (!file.exists(loo_cont_path)) {
  loo_cont_path <- "Output/Tables/dor_loo_continent.csv"
  loo_ctry_path <- "Output/Tables/dor_loo_country.csv"
}

has_loo <- file.exists(loo_cont_path) && file.exists(loo_ctry_path)

if (!has_loo && has_dor) {
  # Compute LOO if CSVs not available
  cat("Computing leave-one-out (CSVs not found, running from panel_dor)...\n")

  panel_dor_loo <- panel_dor %>% filter(!is.na(precipitation_mm))

  if ("continent" %in% names(panel_dor_loo)) {
    continents <- unique(na.omit(panel_dor_loo$continent))
    loo_cont_list <- list()

    d2_full <- feols(log_discharge ~ DOR + DOR:transboundary +
                       precipitation_mm + temperature_c | station_id + year_month,
                     data = panel_dor_loo, cluster = ~station_id)
    loo_cont_list[["None (full sample)"]] <- data.frame(
      dropped = "None (full sample)",
      coef_dor_trb = coef(d2_full)["DOR:transboundary"],
      se_dor_trb = se(d2_full)["DOR:transboundary"],
      n_stations = n_distinct(panel_dor_loo$station_id)
    )

    for (cont in continents) {
      sub <- panel_dor_loo %>% filter(continent != cont)
      m <- tryCatch(
        feols(log_discharge ~ DOR + DOR:transboundary +
                precipitation_mm + temperature_c | station_id + year_month,
              data = sub, cluster = ~station_id),
        error = function(e) NULL
      )
      if (!is.null(m)) {
        loo_cont_list[[cont]] <- data.frame(
          dropped = cont, coef_dor_trb = coef(m)["DOR:transboundary"],
          se_dor_trb = se(m)["DOR:transboundary"],
          n_stations = n_distinct(sub$station_id)
        )
      }
    }
    loo_continent_results <- bind_rows(loo_cont_list)
  }

  if ("station_country" %in% names(panel_dor_loo)) {
    country_counts <- panel_dor_loo %>%
      distinct(station_id, station_country) %>%
      count(station_country, sort = TRUE) %>%
      filter(n >= 5)

    loo_ctry_list <- list()
    for (ctry in country_counts$station_country) {
      sub <- panel_dor_loo %>% filter(station_country != ctry)
      m <- tryCatch(
        feols(log_discharge ~ DOR + DOR:transboundary +
                precipitation_mm + temperature_c | station_id + year_month,
              data = sub, cluster = ~station_id),
        error = function(e) NULL
      )
      if (!is.null(m)) {
        loo_ctry_list[[ctry]] <- data.frame(
          dropped = ctry, coef_dor_trb = coef(m)["DOR:transboundary"],
          se_dor_trb = se(m)["DOR:transboundary"],
          n_stations = n_distinct(sub$station_id)
        )
      }
    }
    loo_country_results <- bind_rows(loo_ctry_list)
    baseline_coef <- coef(d2_full)["DOR:transboundary"]
    loo_country_results <- loo_country_results %>%
      mutate(deviation = abs(coef_dor_trb - baseline_coef)) %>%
      arrange(desc(deviation)) %>%
      head(15)
  }

  has_loo <- exists("loo_continent_results")
} else if (has_loo) {
  loo_continent_results <- read_csv(loo_cont_path, show_col_types = FALSE)
  loo_country_results   <- read_csv(loo_ctry_path, show_col_types = FALSE)
}

if (has_loo) {
  par(mfrow = c(1, 2), mar = c(8, 5, 4, 2))

  # Continent LOO
  loo_continent_results <- loo_continent_results[order(loo_continent_results$coef_dor_trb), ]
  x_pos <- seq_len(nrow(loo_continent_results))

  cols <- ifelse(loo_continent_results$dropped == "None (full sample)", "firebrick", "steelblue")

  plot(x_pos, loo_continent_results$coef_dor_trb, pch = 19, col = cols, cex = 1.3,
       xaxt = "n",
       ylim = range(c(loo_continent_results$coef_dor_trb - 1.96 * loo_continent_results$se_dor_trb,
                       loo_continent_results$coef_dor_trb + 1.96 * loo_continent_results$se_dor_trb)),
       main = "Leave-One-Continent-Out:\nDOR x TRB Coefficient",
       xlab = "", ylab = "DOR x TRB Coefficient")
  axis(1, at = x_pos, labels = loo_continent_results$dropped, las = 2, cex.axis = 0.8)
  segments(x_pos, loo_continent_results$coef_dor_trb - 1.96 * loo_continent_results$se_dor_trb,
           x_pos, loo_continent_results$coef_dor_trb + 1.96 * loo_continent_results$se_dor_trb,
           col = cols, lwd = 1.5)
  abline(h = 0, lty = 2); grid(col = "grey90")
  abline(h = loo_continent_results$coef_dor_trb[loo_continent_results$dropped == "None (full sample)"],
         lty = 3, col = "firebrick")

  # Country LOO — keep top 15 most influential
  if (exists("loo_country_results") && nrow(loo_country_results) > 0) {
    if (nrow(loo_country_results) > 15) {
      full_est <- loo_continent_results$coef_dor_trb[
        loo_continent_results$dropped == "None (full sample)"]
      loo_country_results$deviation <- abs(loo_country_results$coef_dor_trb - full_est)
      loo_country_results <- loo_country_results[order(-loo_country_results$deviation), ]
      loo_country_results <- head(loo_country_results, 15)
    }
    loo_country_results <- loo_country_results[order(loo_country_results$coef_dor_trb), ]
    x_pos2 <- seq_len(nrow(loo_country_results))

    plot(x_pos2, loo_country_results$coef_dor_trb, pch = 19, col = "steelblue", cex = 1.1,
         xaxt = "n",
         ylim = range(c(loo_country_results$coef_dor_trb - 1.96 * loo_country_results$se_dor_trb,
                         loo_country_results$coef_dor_trb + 1.96 * loo_country_results$se_dor_trb)),
         main = "Leave-One-Country-Out:\nDOR x TRB (15 Most Influential)",
         xlab = "", ylab = "DOR x TRB Coefficient")
    axis(1, at = x_pos2, labels = loo_country_results$dropped, las = 2, cex.axis = 0.7)
    segments(x_pos2, loo_country_results$coef_dor_trb - 1.96 * loo_country_results$se_dor_trb,
             x_pos2, loo_country_results$coef_dor_trb + 1.96 * loo_country_results$se_dor_trb,
             col = "steelblue", lwd = 1.5)
    abline(h = 0, lty = 2); grid(col = "grey90")

    # Full sample reference line
    if (exists("d2")) {
      abline(h = coef(d2)["DOR:transboundary"], lty = 3, col = "firebrick")
      legend("bottomright", "Full sample", lty = 3, col = "firebrick", cex = 0.8)
    }
  }

  par(mfrow = c(1, 1))
}

7.10.1 Interpretation

Leave-one-continent-out (left): The full-sample DOR x TRB coefficient (red) is modestly negative ($\sim -0.03$). Dropping Africa makes it substantially more negative ($\sim -0.26$), suggesting Africa contains TRB stations that attenuate the overall negative effect (i.e., African TRB rivers do not show withholding, and removing them strengthens the result for the remaining sample). Dropping North America moves the coefficient toward zero ($\sim -0.02$). Asia and South America are close to the full-sample estimate.

Leave-one-country-out (right): Most countries cluster around the full-sample estimate ($\sim -0.03$ to $-0.05$). Dropping Zimbabwe or South Africa moves the coefficient toward zero or slightly positive, indicating these countries contribute observations that drive the negative result. Dropping the United States makes it more negative. The result is not dominated by any single country, though a handful of African countries are influential.

Overall assessment: The DOR x TRB result is moderately robust across countries (no single country’s removal eliminates it) but it is pretty sensitive to continental composition as African TRB rivers appear to behave differently from the global pattern, potentially reflecting different institutional environments or dam operating practices.

7.11 DOR Time Path

This descriptive plot shows how mean DOR has evolved over calendar time for downstream stations, separately for TRB and domestic rivers.

dor_time <- panel_dor %>%
  filter(downstream == 1, DOR > 0) %>%
  mutate(river_type = ifelse(transboundary == 1, "TRB", "Domestic")) %>%
  group_by(river_type, year) %>%
  summarise(
    mean_DOR = mean(DOR, na.rm = TRUE),
    p75_DOR  = quantile(DOR, 0.75, na.rm = TRUE),
    n = n(),
    .groups = "drop"
  ) %>%
  filter(n >= 10)  # drop years with very few obs

plot(NULL, xlim = range(dor_time$year), ylim = c(0, max(dor_time$p75_DOR, na.rm = TRUE) * 1.1),
     main = "Mean DOR Over Time: TRB vs Domestic (Downstream Stations)",
     xlab = "Year", ylab = "Mean Degree of Regulation (DOR)")
grid(col = "grey90")

# Mean lines
lines(dor_time$year[dor_time$river_type == "TRB"],
      dor_time$mean_DOR[dor_time$river_type == "TRB"],
      col = "firebrick", lwd = 2.5)
lines(dor_time$year[dor_time$river_type == "Domestic"],
      dor_time$mean_DOR[dor_time$river_type == "Domestic"],
      col = "steelblue", lwd = 2.5)

# 75th percentile lines
lines(dor_time$year[dor_time$river_type == "TRB"],
      dor_time$p75_DOR[dor_time$river_type == "TRB"],
      col = "firebrick", lwd = 1.5, lty = 2)
lines(dor_time$year[dor_time$river_type == "Domestic"],
      dor_time$p75_DOR[dor_time$river_type == "Domestic"],
      col = "steelblue", lwd = 1.5, lty = 2)

legend("topleft",
       c("TRB (mean)", "Domestic (mean)", "TRB (75th pct)", "Domestic (75th pct)"),
       col = c("firebrick", "steelblue", "firebrick", "steelblue"),
       lwd = c(2.5, 2.5, 1.5, 1.5), lty = c(1, 1, 2, 2), cex = 0.8, bg = "white")

7.11.1 Interpretation

Domestic rivers (blue) became regulated earlier and reached higher DOR levels than transboundary rivers (red). By the 1980s, mean domestic DOR exceeded 1.0 (full year’s flow stored), while mean TRB DOR peaked around 0.5–0.7. This has important implications:

Domestic rivers are more heavily regulated overall. The DOR x TRB interaction captures the differential effect of regulation on TRB vs. domestic rivers. Since domestic rivers face higher baseline DOR, the TRB interaction is estimated conditional on domestic rivers already being heavily regulated.
Historical timing differs. Domestic dam-building peaked earlier (1960s–1980s) than TRB dam-building. This means the pre-treatment period for TRB dams overlaps with a period of active domestic dam construction, potentially confounding the comparison.
The late spike and decline in TRB DOR (the red series rising sharply then falling after ~2000) is a sample composition artifact, not a real change in regulation. DOR for any individual station can only stay flat or increase (dams are almost never decommissioned and I don’t think I can observe this in my data anyway), so the aggregate decline is implausible if the station set were constant. What happens is that many water gauges dropped out of my sample in the 2000s–2010s (possibly due to funding cuts, equipment failure, or countries ceasing data submission). If the highly regulated TRB stations that drove the peak disproportionately drop out of the panel, while less-regulated TRB stations continue reporting, the mean DOR among surviving stations falls mechanically. The dramatic spike in the TRB 75th percentile (dashed red) around 2010–2020 reinforces this: a few very high-DOR stations entering or persisting in a shrinking sample create large swings in the summary statistics. This means the late portions of the time path should be interpreted with caution, as they reflect who is still reporting rather than changes in actual river regulation.

7.12 DOR Key Takeaways

The DOR analysis yields three central findings and two important caveats:

Findings:

The static DOR x TRB coefficient is the paper’s strongest result. At $-0.038^*$ with climate controls, it has the correct sign for strategic withholding and survives basic robustness checks. Each additional unit of DOR (capacity equal to one year’s flow) reduces transboundary discharge by $\sim 3.7$ more than domestic. This is economically meaningful: for a transboundary river with mean flow of 500 MCM/year, a dam storing half the annual flow (DOR = 0.5) implies an additional $$1.8% discharge reduction relative to a comparable domestic dam.
The non-monotonic binned pattern is the most interesting finding. The entire TRB effect is concentrated in the Low DOR bin ($-0.234^{***}$), with higher bins insignificant. This is difficult to reconcile with a simple dose-response story but consistent with several mechanisms: threshold effects at initial regulation, lower international scrutiny of small dams, and measurement issues in the Low bin. This finding might warrant further investigation and may could be a really interesting results for the paper.
Leave-one-out analysis provides moderate comfort that the result is not an artifact of a single country or corridor, though African TRB rivers behave differently from the global pattern and may warrant a separate analysis (though there are also few corridors dominating the results in Africa, so I could later investigate a few case studies)/

Caveats:

The dose-response event study reveals pre-existing differences in the $\Delta$DOR x TRB interaction. While the nature of this pre-trend differs from the binary DDD (positive rather than negative pre-treatment coefficients), it complicates the causal interpretation of the static result.
The split-sample event study does not show a clean dose-response pattern. High-DOR TRB stations show positive coefficients, opposite to the withholding prediction. The static and dynamic results tell partially conflicting stories, and resolving this tension is the key challenge for the paper.

8 Seasonal Decomposition

If dams enable strategic withholding, the effect should vary by season: dams store water during high-flow periods and release during low-flow periods, and the incentive to withhold depends on when water is scarce downstream. I decompose the DDD by season using three complementary definitions computed at the station level.

Season definitions:

Wet / Dry: Station-specific precipitation median split (each station’s months classified by whether precipitation is above or below that station’s median).
High / Low flow: Station-specific discharge median split (above or below station median discharge_m3s).
Warm / Cold: Hemisphere-aware calendar split (Apr–Sep vs. Oct–Mar in Northern Hemisphere; reversed in Southern).

# Compute season indicators (not saved in panel_ddd_v2.rds)
panel <- panel %>%
  group_by(station_id) %>%
  mutate(
    precip_median_stn = median(precipitation_mm, na.rm = TRUE),
    wet_month  = if_else(has_climate & precipitation_mm >= precip_median_stn, 1L, 0L),
    dry_month  = if_else(has_climate & precipitation_mm <  precip_median_stn, 1L, 0L),
    q_median_stn = median(discharge_m3s, na.rm = TRUE),
    high_flow  = if_else(discharge_m3s >= q_median_stn, 1L, 0L),
    low_flow   = if_else(discharge_m3s <  q_median_stn, 1L, 0L)
  ) %>%
  ungroup()

# Hemisphere-aware warm/cold
panel <- panel %>%
  mutate(
    warm_season = case_when(
      lat >= 0 & month %in% 4:9  ~ 1L,
      lat <  0 & month %in% c(10:12, 1:3) ~ 1L,
      TRUE ~ 0L
    ),
    cold_season = 1L - warm_season
  )

# Seasonality strength: precipitation CV
stn_cv <- panel %>%
  filter(has_climate) %>%
  group_by(station_id) %>%
  summarise(
    precip_cv = sd(precipitation_mm, na.rm = TRUE) /
                mean(precipitation_mm, na.rm = TRUE),
    .groups = "drop"
  )

panel <- panel %>%
  left_join(stn_cv, by = "station_id") %>%
  mutate(strong_seasonality = if_else(!is.na(precip_cv) & precip_cv > 0.7, 1L, 0L))

8.1 Overall seasonal DDD

# Helper: run DDD on a subset with tryCatch
run_ddd <- function(dat, label = "") {
  tryCatch(
    feols(log_discharge ~ downstream * transboundary * post |
            station_id + year_month, data = dat, cluster = ~station_id),
    error = function(e) { message(label, " failed: ", e$message); NULL }
  )
}

m_all  <- run_ddd(panel, "All")
m_wet  <- run_ddd(panel %>% filter(wet_month == 1),  "Wet")
m_dry  <- run_ddd(panel %>% filter(dry_month == 1),  "Dry")
m_hi   <- run_ddd(panel %>% filter(high_flow == 1),  "High flow")
m_lo   <- run_ddd(panel %>% filter(low_flow == 1),   "Low flow")
m_warm <- run_ddd(panel %>% filter(warm_season == 1), "Warm")
m_cold <- run_ddd(panel %>% filter(cold_season == 1), "Cold")

seasonal_models <- list(
  "(1) All" = m_all, "(2) Wet" = m_wet, "(3) Dry" = m_dry,
  "(4) High Flow" = m_hi, "(5) Low Flow" = m_lo,
  "(6) Warm" = m_warm, "(7) Cold" = m_cold
)

modelsummary(
  seasonal_models,
  coef_map = c("downstream:transboundary:post" = "Down x TRB x Post (DDD)"),
  gof_map = c("nobs", "r.squared"),
  stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
  title = "DDD by Season",
  notes = "Station and year-month FE. SE clustered at station level."
)

DDD by Season
	(1) All	(2) Wet	(3) Dry	(4) High Flow	(5) Low Flow	(6) Warm	(7) Cold
* p < 0.1, p < 0.05, * p < 0.01
Station and year-month FE. SE clustered at station level.
Down x TRB x Post (DDD)	0.391***	0.249***	0.315***	0.043	0.319***	0.519***	0.691***
	(0.089)	(0.080)	(0.106)	(0.039)	(0.100)	(0.090)	(0.112)
Num.Obs.	533062	162476	161985	266909	266153	272068	260994
R2	0.811	0.834	0.865	0.938	0.914	0.826	0.869

The overall seasonal split shows where the aggregate DDD concentrates. But this masks important heterogeneity by dam purpose.

8.2 Hydro vs. non-hydro dams

Hydropower and irrigation/water-supply dams operate on fundamentally different seasonal schedules: hydropower dams store water during peak flow to generate electricity on demand, while irrigation dams store water for release during growing seasons. If the DDD captures strategic withholding, the seasonal pattern should differ by dam type.

panel_hydro    <- panel %>% filter(hydropower == 1)
panel_nonhydro <- panel %>% filter(hydropower == 0)

# Extract compact coefficient string
fmt_coef <- function(m) {
  if (is.null(m)) return("---")
  cf <- coef(m)["downstream:transboundary:post"]
  se <- sqrt(vcov(m)["downstream:transboundary:post",
                      "downstream:transboundary:post"])
  pv <- 2 * pnorm(-abs(cf / se))
  stars <- if (pv < 0.01) "***" else if (pv < 0.05) "**" else
           if (pv < 0.1) "*" else ""
  sprintf("%.3f (%.3f)%s", cf, se, stars)
}

seasons <- c("All", "Wet", "Dry", "High Flow", "Low Flow", "Warm", "Cold")
hydro_coefs <- nonhydro_coefs <- character(7)

for (i in seq_along(seasons)) {
  s <- seasons[i]
  sub_h <- switch(s,
    "All"       = panel_hydro,
    "Wet"       = panel_hydro %>% filter(wet_month == 1),
    "Dry"       = panel_hydro %>% filter(dry_month == 1),
    "High Flow" = panel_hydro %>% filter(high_flow == 1),
    "Low Flow"  = panel_hydro %>% filter(low_flow == 1),
    "Warm"      = panel_hydro %>% filter(warm_season == 1),
    "Cold"      = panel_hydro %>% filter(cold_season == 1)
  )
  sub_n <- switch(s,
    "All"       = panel_nonhydro,
    "Wet"       = panel_nonhydro %>% filter(wet_month == 1),
    "Dry"       = panel_nonhydro %>% filter(dry_month == 1),
    "High Flow" = panel_nonhydro %>% filter(high_flow == 1),
    "Low Flow"  = panel_nonhydro %>% filter(low_flow == 1),
    "Warm"      = panel_nonhydro %>% filter(warm_season == 1),
    "Cold"      = panel_nonhydro %>% filter(cold_season == 1)
  )
  hydro_coefs[i]    <- fmt_coef(run_ddd(sub_h, paste("Hydro", s)))
  nonhydro_coefs[i] <- fmt_coef(run_ddd(sub_n, paste("Non-hydro", s)))
}

tibble(Season = seasons, `Hydropower` = hydro_coefs, `Non-Hydro` = nonhydro_coefs) %>%
  kable(caption = "DDD coefficient (Down x TRB x Post) by season and dam type",
        align = c("l", "r", "r")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Station and year-month FE. SE (in parentheses) clustered at station level. * p<0.1, ** p<0.05, *** p<0.01.")

DDD coefficient (Down x TRB x Post) by season and dam type
Season	Hydropower	Non-Hydro
All	0.392 (0.133)***	0.323 (0.109)***
Wet	0.193 (0.114)*	0.395 (0.102)***
Dry	0.037 (0.168)	0.243 (0.131)*
High Flow	0.216 (0.052)***	-0.013 (0.046)
Low Flow	0.134 (0.142)	0.378 (0.115)***
Warm	0.204 (0.139)	0.438 (0.109)***
Cold	0.388 (0.173)**	0.591 (0.139)***
Note:
Station and year-month FE. SE (in parentheses) clustered at station level. * p<0.1, p<0.05, * p<0.01.

# Aseasonal stations: weak precipitation seasonality (CV <= 0.7)
n_aseasonal <- n_distinct(panel$station_id[panel$strong_seasonality == 0 & panel$has_climate])
n_seasonal  <- n_distinct(panel$station_id[panel$strong_seasonality == 1])

m_aseasonal <- run_ddd(
  panel %>% filter(has_climate, strong_seasonality == 0), "Aseasonal"
)
aseasonal_coef <- fmt_coef(m_aseasonal)

Aseasonal placebo: Among the 366 stations with weak precipitation seasonality (CV $\leq$ 0.7), the DDD is 0.053 (0.156) (not significant). The effect requires seasonal variation to operate.

8.2.1 Interpretation

The hydro vs. non-hydro split reveals strikingly opposite seasonal patterns:

Hydropower dams show a significant DDD only in high-flow months. This is consistent with hydropower dams on transboundary rivers storing peak flows more aggressively than domestic counterparts, so it’s potentially retaining water that would otherwise flow downstream during the wet season. Higher storage during peak flow provides more head and generation capacity. On domestic rivers, regulators may constrain peak storage to protect downstream users; on transboundary rivers, this constraint is weaker.
Non-hydropower dams (irrigation, water supply, flood control) show the opposite: significant effects in low-flow, warm, and cold months but not high-flow months. This is consistent with irrigation/water-supply dams withdrawing stored water for consumptive use during dry or growing seasons, reducing downstream flows relative to what would occur without strategic behavior. The null in high-flow months makes sense: during wet periods, reservoirs are filling and there is less need to withdraw.
The aseasonal null provides a useful placebo: at stations where precipitation is roughly uniform year-round (CV $\leq$ 0.7, about 29% of stations), the hypothesised mechanism shouldn’t operate. There, the DDD is close to zero. That is, in regions that don’t tend to have a wet/dry contrast, there is less scope for seasonal reallocation, and there seems to be no differential TRB effect.

Caveat: This decomposition is descriptive. Splitting by season does not constitute a separate identification strategy. The pre-trend concerns from the binary DDD (Section 3) still apply within each seasonal subsample. These results are best interpreted as characterizing where the aggregate DDD signal concentrates, which informs the mechanism but does not resolve the causal question.

9 Summary and Next Steps

9.1 Current status

summary_final <- tibble(
  Finding = c(
    "Binary DDD: unexpected sign, pre-trend, not credibly identified",
    "Matching: cannot fix the pre-trend (9/9 methods fail)",
    "HonestDiD: bounds include zero even at M=0",
    "Static DOR x TRB: negative and significant with climate controls",
    "DOR dose-response event study: pre-trend in TRB interaction",
    "DOR binned: Low DOR x TRB significant; higher bins not",
    "Leave-one-out: result moderately robust across countries, sensitive to Africa",
    "DOR time path: domestic rivers more heavily regulated than TRB",
    "Seasonal: hydro dams affect high-flow months; non-hydro affect low-flow/warm",
    "Seasonal: aseasonal stations (precip CV <= 0.7) show null DDD"
  ),
  Implication = c(
    "Cannot use binary DDD to claim strategic withholding",
    "Pre-trend is a genuine feature, not compositional",
    "No formal basis for the binary DDD result",
    "Consistent with strategic withholding (more regulation, more TRB discharge reduction)",
    "Static DOR x TRB result may also be affected by pre-existing differences",
    "Non-monotonic pattern complicates simple dose-response narrative",
    "Not driven by a single country; Africa attenuates the global result",
    "TRB-domestic comparison is not apples-to-apples on baseline regulation",
    "Different dam types withhold in different seasons; mechanism story for the paper",
    "Effect requires seasonal variation --- supports strategic timing interpretation"
  )
)

kable(summary_final, caption = "Summary of Current Status") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE)

Summary of Current Status
Finding	Implication
Binary DDD: unexpected sign, pre-trend, not credibly identified	Cannot use binary DDD to claim strategic withholding
Matching: cannot fix the pre-trend (9/9 methods fail)	Pre-trend is a genuine feature, not compositional
HonestDiD: bounds include zero even at M=0	No formal basis for the binary DDD result
Static DOR x TRB: negative and significant with climate controls	Consistent with strategic withholding (more regulation, more TRB discharge reduction)
DOR dose-response event study: pre-trend in TRB interaction	Static DOR x TRB result may also be affected by pre-existing differences
DOR binned: Low DOR x TRB significant; higher bins not	Non-monotonic pattern complicates simple dose-response narrative
Leave-one-out: result moderately robust across countries, sensitive to Africa	Not driven by a single country; Africa attenuates the global result
DOR time path: domestic rivers more heavily regulated than TRB	TRB-domestic comparison is not apples-to-apples on baseline regulation
Seasonal: hydro dams affect high-flow months; non-hydro affect low-flow/warm	Different dam types withhold in different seasons; mechanism story for the paper
Seasonal: aseasonal stations (precip CV <= 0.7) show null DDD	Effect requires seasonal variation — supports strategic timing interpretation

9.2 Proposed path forward

Further investigate seasonal dynamics: Seasonal decomposition reveals opposite patterns by dam type: hydropower dams affect high-flow months, non-hydro dams affect low-flow/warm months. Aseasonal stations show no effect. This provides a mechanism story but does not resolve pre-trend concerns.
Explore the Low DOR result: The significant Low DOR x TRB coefficient ($-0.234^{***}$) is intriguing. Investigate what types of dams/rivers are in this bin and whether the result has a clean interpretation.
Address pre-trends in DOR: Consider detrended DOR specifications (analogous to the binary DDD detrended model) or alternative identification strategies that can account for pre-existing differences in the $\Delta$DOR x TRB interaction.
Continental heterogeneity: Run the full analysis separately for Africa vs. rest-of-world, since Africa appears to drive important compositional effects.
Reconsider main specification: The static DOR regression remains the strongest individual result, but the supplementary analyses raise important caveats. I may need to present this as suggestive evidence rather than a definitive causal claim.
Consider what this means along with strategic placement: The DDD and DOR results together suggest a complex picture: it’s unlikely that domestic is a good control for transboundary, but the strategic placement results may indicate exactly that – that TRB dams are placed on rivers with different characteristics (e.g., more variable flow, more upstream regulation) that make them very different. This could be a key insight: the “treatment” of being transboundary may come with a bundle of river characteristics that drive the observed patterns, rather than the DOR x TRB interaction being a pure effect of regulation on withholding behavior.

Report generated on February 22, 2026 at 02:31 PM.

DDD Status Report: Binary DDD Assessment and DOR Analysis

Transboundary Dam Impacts on Downstream Discharge

Katie Zhang

February 22, 2026