1 Abstract

This report presents a simulation-based power analysis for the final negative binomial mixed log-linear model used to study Nematodirus infection intensity in Wyoming sheep. The response is the number of sheep in each infection-intensity category within each farm. The fitted model includes infection-intensity scale, stocking density, number of deworming treatments per year, their three-way interaction, heterogeneous dispersion by intensity scale, and a farm-level random intercept. Because this model contains overdispersion, clustering, and an interaction term, simulation is more appropriate than a closed-form power calculation.The final model included several features that make closed-form power calculations unreliable: negative binomial overdispersion, heterogeneous dispersion across infection-intensity scale, farm-level clustering, and a three-way interaction among infection scale, stocking density, and deworming frequency. In this setting, power depends not only on sample size, but also on the distribution of farm-level predictors, the random-effect variance, the dispersion structure, and the chosen testing procedure, e.g.: LRT, or Wald. Therefore, repeated simulation from the fitted model provides a more realistic estimate of detection probability for this specific study design.

The main power question is:

How often would this study design detect the observed three-way interaction if the fitted negative binomial mixed model were treated as the true data-generating process?

The analysis emphasizes the power of current-design, sensitivity to smaller interaction effects, the operating behavior of the test near the null, and the likely value of adding more farms in future studies.

2 Study context and goal

The original analysis modeled aggregated counts from sheep farms. Each farm contributed rows for ordered Nematodirus infection-intensity categories. The predictors of interest were the infection-intensity scale, stocking density per acre, and number of deworming treatments per year.

The final model from the original analysis was:

\[ \text{frequency}_{ij} \sim \text{NB2}(\mu_{ij}, \phi_{ij}) \]

\[ \log(\mu_{ij}) = \beta_0 + \beta_1\text{scale}_{ij} + \beta_2\text{density}_{i} + \beta_3\text{deworm}_{i} + \beta_4(\text{scale}_{ij} \times \text{density}_{i}) + \beta_5(\text{scale}_{ij} \times \text{deworm}_{i}) + \beta_6(\text{density}_{i} \times \text{deworm}_{i}) + \beta_7(\text{scale}_{ij} \times \text{density}_{i} \times \text{deworm}_{i}) + b_i \]

where \(b_i\) is a farm-level random intercept. The dispersion model was:

\[ \log(\phi_{ij}) = \gamma_0 + \gamma_1\text{scale}_{ij}. \]

The analysis is planning-oriented. It does not prove that the original biological effect is true. It asks how reliably a similar design would detect that effect under repeated sampling, assuming the fitted model is a reasonable approximation to the data-generating process.

3 Data structure

data/work_data.csv

The minimum required variables are shown below.

Minimum variables required for the power-analysis report.
Variable Role
Farms Farm identifier / grouping variable
nem_scaled Ordered Nematodirus infection-intensity scale
frequency Number of sheep in each farm-by-scale cell
density_per_acre Farm-level stocking-density predictor
number_deworm_year Farm-level deworming-frequency predictor
data_path <- "data/work_data.csv"

work_dat <- readr::read_csv(data_path, show_col_types = FALSE) |>
  dplyr::mutate(
    Farms = factor(Farms),
    density_per_acre = as.numeric(density_per_acre),
    number_deworm_year = as.numeric(number_deworm_year),
    nem_scaled = as.numeric(nem_scaled),
    frequency = as.integer(frequency)
  ) |>
  dplyr::arrange(Farms, nem_scaled)

data_summary <- tibble::tibble(
  Quantity = c(
    "Number of farm-by-scale rows",
    "Number of farms",
    "Rows per farm",
    "Minimum observed count",
    "Maximum observed count"
  ),
  Value = c(
    nrow(work_dat),
    dplyr::n_distinct(work_dat$Farms),
    4,
    min(work_dat$frequency, na.rm = TRUE),
    max(work_dat$frequency, na.rm = TRUE)
  )
)

kable_scroll(data_summary, caption = "Basic data structure used for the power simulation.")
Basic data structure used for the power simulation.
Quantity Value
Number of farm-by-scale rows 44
Number of farms 11
Rows per farm 4
Minimum observed count 0
Maximum observed count 82

4 Infection intensity

During fecal flotation, eggs were attached to the cover slips and were counted in each field of view of microscope, and the mean egg count per field of view was calculated. Based on the average egg count, an egg excretion intensity score, referred to as the ‘scale’, was assigned to each sample for each GIN type according to the criteria described in Table 1.

Semi-quantitative fecal egg count intensity scoring criteria for four gastrointestinal nematode groups.
GIN group Very low (1) Low (2) Moderate (3) High (4)
Nematodirus n/a 1 2-5 >5
Trichostrongylids 1-2 3-10 11-20 >20
Marshallagia n/a 1 2-5 >5
Trichuris n/a 1-2 3-10 >10
Note: Observed mean number of eggs per field of view using the 10x objective, equivalent to 100x total magnification.

In this analysis, animals with no detected eggs in feces were considered uninfected and assigned to scale 0, which is not shown in the above Table . For Nematodirus, score 1 (very low intensity) was not applicable. This classification was used to maintain comparability of infection intensity with other roundworm groups, particularly Trichostrongylids. Because female Trichostrongylid worms generally produce more eggs than female Nematodirus worms, 1 egg per field for Nematodirus was considered approximately equivalent in infection intensity to 3 to 10 eggs per field for Trichostrongylids. Therefore, Nematodirus was classified beginning at score 2 (low intensity).

5 Exploratory view of the design

The four rows per farm represent infection-intensity categories, not four independent animals. This matters because density and deworming are farm-level predictors. For management-level inference, the number of farms is the main design constraint.

ggplot(work_dat, aes(x = factor(nem_scaled), y = frequency)) +
  geom_boxplot() +
  geom_jitter(width = 0.08, alpha = 0.55) +
  labs(
    x = "Nematodirus infection-intensity scale",
    y = "Observed frequency",
    title = "Observed count distribution by infection-intensity scale",
    subtitle = "Each point is a farm-by-scale cell"
  ) +
  theme_power()
Observed sheep counts by Nematodirus infection-intensity scale.

Observed sheep counts by Nematodirus infection-intensity scale.

ggplot(work_dat, aes(x = nem_scaled, y = frequency, group = Farms, color = Farms)) +
  geom_line(alpha = 0.75, linewidth = 0.9) +
  geom_point(size = 2) +
  scale_x_continuous(breaks = sort(unique(work_dat$nem_scaled))) +
  labs(
    x = "Nematodirus infection-intensity scale",
    y = "Observed frequency",
    color = "Farm",
    title = "Farm-specific observed count profiles",
    subtitle = "Each color represents one farm"
  ) +
  theme_power() +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold")
  )
Farm-specific count profiles across the Nematodirus infection-intensity scale.

Farm-specific count profiles across the Nematodirus infection-intensity scale.

6 Baseline fitted model

The power simulation begins by fitting the final negative binomial mixed model to the observed data.

fit_nb <- glmmTMB::glmmTMB(
  frequency ~ nem_scaled * density_per_acre * number_deworm_year + (1 | Farms),
  dispformula = ~ nem_scaled,
  family = glmmTMB::nbinom2(link = "log"),
  data = work_dat
)
model_fit_table <- tibble::tibble(
  Statistic = c("AIC", "BIC", "logLik", "-2 logLik", "Residual df"),
  Value = c(
    AIC(fit_nb),
    BIC(fit_nb),
    as.numeric(logLik(fit_nb)),
    as.numeric(-2 * logLik(fit_nb)),
    df.residual(fit_nb)
  )
)

kable_scroll(model_fit_table, caption = "Baseline fitted-model summary.", digits = 3)
Baseline fitted-model summary.
Statistic Value
AIC 230.840
BIC 250.466
logLik -104.420
-2 logLik 208.840
Residual df 33.000
# Conditional mean component
cond_raw <- summary(fit_nb)$coefficients$cond

cond_tbl <- as.data.frame(cond_raw) |>
  tibble::rownames_to_column("Term") |>
  dplyr::transmute(
    Component = "Conditional mean",
    Term = Term,
    Estimate = Estimate,
    SE = `Std. Error`,
    z = `z value`,
    p_value = `Pr(>|z|)`
  )

# Dispersion component
disp_raw <- summary(fit_nb)$coefficients$disp

disp_tbl <- as.data.frame(disp_raw) |>
  tibble::rownames_to_column("Term") |>
  dplyr::transmute(
    Component = "Dispersion",
    Term = Term,
    Estimate = Estimate,
    SE = `Std. Error`,
    z = `z value`,
    p_value = `Pr(>|z|)`
  )

# Random-effect component
vc <- glmmTMB::VarCorr(fit_nb)

rand_tbl <- tibble::tibble(
  Component = "Random effect",
  Term = c("Farms: intercept variance", "Farms: intercept SD"),
  Estimate = c(
    as.numeric(vc$cond$Farms[1]),
    as.numeric(attr(vc$cond$Farms, "stddev"))
  ),
  SE = NA_real_,
  z = NA_real_,
  p_value = NA_real_
)

baseline_output <- dplyr::bind_rows(
  cond_tbl,
  disp_tbl,
  rand_tbl
) |>
  dplyr::mutate(
    dplyr::across(
      c(Estimate, SE, z, p_value),
      ~ round(.x, 4)
    )
  )

kable_scroll(
  baseline_output,
  caption = "Baseline negative binomial mixed model used as the simulation source.",
  digits = 4,
  height = "520px"
)
Baseline negative binomial mixed model used as the simulation source.
Component Term Estimate SE z p_value
Conditional mean (Intercept) 3.5832 0.6376 5.6201 0.0000
Conditional mean nem_scaled -1.0270 0.2291 -4.4829 0.0000
Conditional mean density_per_acre -3.6007 2.3806 -1.5125 0.1304
Conditional mean number_deworm_year -0.3754 0.4383 -0.8566 0.3917
Conditional mean nem_scaled:density_per_acre 2.5994 0.9697 2.6805 0.0074
Conditional mean nem_scaled:number_deworm_year 0.2024 0.1446 1.3992 0.1617
Conditional mean density_per_acre:number_deworm_year 1.5822 1.1999 1.3186 0.1873
Conditional mean nem_scaled:density_per_acre:number_deworm_year -1.2892 0.4874 -2.6451 0.0082
Dispersion (Intercept) 4.9753 2.1427 2.3220 0.0202
Dispersion nem_scaled -1.7332 0.6539 -2.6505 0.0080
Random effect Farms: intercept variance 0.4495 NA NA NA
Random effect Farms: intercept SD 0.6704 NA NA NA

The target term for power estimation is the three-way interaction:

target_term <- "nem_scaled:density_per_acre:number_deworm_year"

beta_cond <- glmmTMB::fixef(fit_nb)$cond
beta_disp <- glmmTMB::fixef(fit_nb)$disp
sd_farm <- as.numeric(attr(glmmTMB::VarCorr(fit_nb)$cond$Farms, "stddev"))

target_summary <- tibble::tibble(
  Quantity = c(
    "Target interaction",
    "Observed coefficient",
    "Farm random-effect SD",
    "Dispersion scale coefficient"
  ),
  Value = c(
    target_term,
    round(beta_cond[target_term], 4),
    round(sd_farm, 4),
    round(beta_disp["nem_scaled"], 4)
  )
)

kable_scroll(target_summary, caption = "Key quantities carried forward into the simulation.")
Key quantities carried forward into the simulation.
Quantity Value
Target interaction nem_scaled:density_per_acre:number_deworm_year
Observed coefficient -1.2892
Farm random-effect SD 0.6704
Dispersion scale coefficient -1.7332

7 What power means here

Power is estimated by simulation. For each simulated dataset:

  1. Generate new counts from the fitted negative binomial mixed model.
  2. Refit the same model to the simulated dataset.
  3. Test the target interaction.
  4. Record whether the p-value is below \(\alpha = 0.05\).

\[ \widehat{\text{Power}} = \frac{\text{Number of significant successful simulations}} {\text{Number of successful fitted simulations}} \]

The Monte Carlo standard error is approximated as:

\[ SE_{MC} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{N_{\text{successful}}}}. \]

Simulation-based power analysis is a standard strategy when generalized linear mixed models are too complex for simple closed-form calculations (Kumle, Võ, and Draschkow 2021; Pargent et al. 2024). The glmmTMB package is used here because it supports negative binomial mixed models with flexible dispersion structures (Brooks et al. 2017).

8 Simulation functions

The functions below implement two complementary simulation strategies:

  • Current-design simulation: uses simulate() directly from the fitted glmmTMB model.
  • Effect-size sensitivity simulation: manually modifies the three-way coefficient before generating new counts.
get_p_wald <- function(fit, term = target_term) {
  coefs <- summary(fit)$coefficients$cond

  if (!term %in% rownames(coefs)) {
    return(NA_real_)
  }

  coefs[term, "Pr(>|z|)"]
}

get_p_lrt <- function(dat) {
  full_fit <- tryCatch(
    glmmTMB::glmmTMB(
      frequency ~ nem_scaled * density_per_acre * number_deworm_year + (1 | Farms),
      dispformula = ~ nem_scaled,
      family = glmmTMB::nbinom2(link = "log"),
      data = dat
    ),
    error = function(e) NULL
  )

  reduced_fit <- tryCatch(
    glmmTMB::glmmTMB(
      frequency ~
        nem_scaled + density_per_acre + number_deworm_year +
        nem_scaled:density_per_acre +
        nem_scaled:number_deworm_year +
        density_per_acre:number_deworm_year +
        (1 | Farms),
      dispformula = ~ nem_scaled,
      family = glmmTMB::nbinom2(link = "log"),
      data = dat
    ),
    error = function(e) NULL
  )

  if (is.null(full_fit) || is.null(reduced_fit)) {
    return(NA_real_)
  }

  out <- tryCatch(
    stats::anova(reduced_fit, full_fit),
    error = function(e) NULL
  )

  if (is.null(out)) {
    return(NA_real_)
  }

  out$`Pr(>Chisq)`[2]
}

summarize_pvals <- function(pvals, nsim, alpha, test, effect_multiplier = NA_real_,
                            true_threeway_effect = NA_real_) {
  successful_fits <- sum(!is.na(pvals))
  power_est <- mean(pvals < alpha, na.rm = TRUE)

  tibble::tibble(
    effect_multiplier = effect_multiplier,
    true_threeway_effect = true_threeway_effect,
    nsim = nsim,
    alpha = alpha,
    test = test,
    successful_fits = successful_fits,
    failed_fits = sum(is.na(pvals)),
    power = power_est,
    mc_se = sqrt(power_est * (1 - power_est) / successful_fits)
  )
}

run_power_observed_design <- function(
  base_fit,
  base_data,
  term = target_term,
  nsim = 1000,
  alpha = 0.05,
  test = c("wald", "lrt"),
  seed = 123
) {
  test <- match.arg(test)
  set.seed(seed)

  pvals <- rep(NA_real_, nsim)

  for (i in seq_len(nsim)) {
    sim_y <- tryCatch(
      stats::simulate(base_fit, nsim = 1)[[1]],
      error = function(e) NA
    )

    if (all(is.na(sim_y))) {
      next
    }

    sim_dat <- base_data
    sim_dat$frequency <- sim_y

    if (test == "wald") {
      sim_fit <- tryCatch(
        glmmTMB::glmmTMB(
          frequency ~ nem_scaled * density_per_acre * number_deworm_year + (1 | Farms),
          dispformula = ~ nem_scaled,
          family = glmmTMB::nbinom2(link = "log"),
          data = sim_dat
        ),
        error = function(e) NULL
      )

      if (!is.null(sim_fit)) {
        pvals[i] <- get_p_wald(sim_fit, term)
      }
    } else {
      pvals[i] <- get_p_lrt(sim_dat)
    }
  }

  summarize_pvals(
    pvals = pvals,
    nsim = nsim,
    alpha = alpha,
    test = test,
    effect_multiplier = 1,
    true_threeway_effect = beta_cond[term]
  )
}

simulate_nb_glmmTMB_style <- function(
  dat,
  beta_cond,
  beta_disp,
  sd_farm,
  effect_multiplier = 1,
  term = target_term
) {
  dat <- dat |>
    dplyr::mutate(Farms = factor(Farms))

  beta_use <- beta_cond
  beta_use[term] <- beta_cond[term] * effect_multiplier

  X <- stats::model.matrix(
    ~ nem_scaled * density_per_acre * number_deworm_year,
    data = dat
  )

  Xdisp <- stats::model.matrix(
    ~ nem_scaled,
    data = dat
  )

  farm_levels <- levels(dat$Farms)
  b_farm <- stats::rnorm(length(farm_levels), mean = 0, sd = sd_farm)
  names(b_farm) <- farm_levels

  eta <- as.vector(X %*% beta_use) + b_farm[as.character(dat$Farms)]
  mu <- exp(eta)

  eta_disp <- as.vector(Xdisp %*% beta_disp)
  phi <- exp(eta_disp)

  dat$frequency <- stats::rnbinom(
    n = nrow(dat),
    size = phi,
    mu = mu
  )

  dat
}

estimate_power_glmmTMB <- function(
  base_data,
  beta_cond,
  beta_disp,
  sd_farm,
  effect_multiplier = 1,
  term = target_term,
  nsim = 1000,
  alpha = 0.05,
  test = c("wald", "lrt"),
  seed = 123
) {
  test <- match.arg(test)
  set.seed(seed)

  pvals <- rep(NA_real_, nsim)

  for (i in seq_len(nsim)) {
    sim_dat <- simulate_nb_glmmTMB_style(
      dat = base_data,
      beta_cond = beta_cond,
      beta_disp = beta_disp,
      sd_farm = sd_farm,
      effect_multiplier = effect_multiplier,
      term = term
    )

    if (test == "wald") {
      fit_i <- tryCatch(
        glmmTMB::glmmTMB(
          frequency ~ nem_scaled * density_per_acre * number_deworm_year + (1 | Farms),
          dispformula = ~ nem_scaled,
          family = glmmTMB::nbinom2(link = "log"),
          data = sim_dat
        ),
        error = function(e) NULL
      )

      if (!is.null(fit_i)) {
        pvals[i] <- get_p_wald(fit_i, term)
      }
    } else {
      pvals[i] <- get_p_lrt(sim_dat)
    }
  }

  summarize_pvals(
    pvals = pvals,
    nsim = nsim,
    alpha = alpha,
    test = test,
    effect_multiplier = effect_multiplier,
    true_threeway_effect = beta_cond[term] * effect_multiplier
  )
}

9 Current-design power

The first analysis asks whether the current design can detect the observed three-way interaction under repeated sampling.

current_power_file <- "results/current_design_lrt_power.csv"

if (params$run_simulations || !file.exists(current_power_file)) {
  power_current_lrt <- run_power_observed_design(
    base_fit = fit_nb,
    base_data = work_dat,
    nsim = params$nsim_current,
    test = "lrt",
    seed = 102
  )

  readr::write_csv(power_current_lrt, current_power_file)
} else {
  power_current_lrt <- readr::read_csv(current_power_file, show_col_types = FALSE)
}

kable_scroll(
  power_current_lrt |>
    dplyr::mutate(dplyr::across(where(is.numeric), ~ round(.x, 4))),
  caption = "Current-design power for detecting the observed three-way interaction using the likelihood-ratio test.",
  digits = 4
)
Current-design power for detecting the observed three-way interaction using the likelihood-ratio test.
effect_multiplier true_threeway_effect nsim alpha test successful_fits failed_fits power mc_se
1 -1.2892 1000 0.05 lrt 950 50 0.8179 0.0125

If the fitted model is treated as the true data-generating model, the current design estimates the probability of detecting the observed three-way interaction. This estimate is conditional on the fitted fixed effects, dispersion structure, and farm random-effect variance.

10 Effect-size sensitivity

Observed effects from small pilot studies can be larger than the true population effect. For that reason, the three-way interaction was evaluated at 50%, 75%, and 100% of the observed coefficient.

effect_power_file <- "results/effect_sensitivity_lrt_power.csv"
effect_grid <- c(0.50, 0.75, 1.00)

if (params$run_simulations || !file.exists(effect_power_file)) {
  power_effects_lrt <- purrr::map_dfr(
    effect_grid,
    ~ estimate_power_glmmTMB(
      base_data = work_dat,
      beta_cond = beta_cond,
      beta_disp = beta_disp,
      sd_farm = sd_farm,
      effect_multiplier = .x,
      nsim = params$nsim_effect,
      test = "lrt",
      seed = 200 + round(.x * 100)
    )
  )

  readr::write_csv(power_effects_lrt, effect_power_file)
} else {
  power_effects_lrt <- readr::read_csv(effect_power_file, show_col_types = FALSE)
}

effect_table <- power_effects_lrt |>
  dplyr::mutate(
    effect_label = scales::percent(effect_multiplier, accuracy = 1),
    dplyr::across(where(is.numeric), ~ round(.x, 4))
  ) |>
  dplyr::select(
    effect_label,
    true_threeway_effect,
    nsim,
    test,
    successful_fits,
    failed_fits,
    power,
    mc_se
  )

kable_scroll(
  effect_table,
  caption = "Power sensitivity for 50%, 75%, and 100% of the observed three-way interaction.",
  digits = 4
)
Power sensitivity for 50%, 75%, and 100% of the observed three-way interaction.
effect_label true_threeway_effect nsim test successful_fits failed_fits power mc_se
50% -0.6446 1000 lrt 998 2 0.4269 0.0157
75% -0.9669 1000 lrt 993 7 0.6254 0.0154
100% -1.2892 1000 lrt 965 35 0.8021 0.0128
ggplot(power_effects_lrt, aes(x = effect_multiplier, y = power)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = 0.80, linetype = "dashed") +
  scale_x_continuous(
    labels = scales::percent_format(accuracy = 1),
    breaks = effect_grid
  ) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  labs(
    x = "Three-way interaction effect multiplier",
    y = "Estimated detection probability",
    title = "Effect-size sensitivity",
    subtitle = "Likelihood-ratio test; dashed line marks 80% power"
  ) +
  theme_power()
Power for detecting the three-way interaction under 50%, 75%, and 100% of the observed effect size.

Power for detecting the three-way interaction under 50%, 75%, and 100% of the observed effect size.

The detection depends strongly on the true size of the interaction. A design that is close to adequate for the observed effect may still have weak power for smaller but biologically meaningful effects.

11 Type I error and full effect curve

The null scenario sets the three-way interaction to zero. This estimates the simulated false-positive rate for the target test under the same model-fitting workflow.

type1_file <- "results/type1_wald_result.csv"
curve_file <- "results/power_curve_wald.csv"

if (params$run_simulations || !file.exists(type1_file)) {
  type1_result <- estimate_power_glmmTMB(
    base_data = work_dat,
    beta_cond = beta_cond,
    beta_disp = beta_disp,
    sd_farm = sd_farm,
    effect_multiplier = 0,
    nsim = params$nsim_curve,
    test = "wald",
    seed = 999
  )

  readr::write_csv(type1_result, type1_file)
} else {
  type1_result <- readr::read_csv(type1_file, show_col_types = FALSE)
}

curve_grid <- seq(0, 1.5, by = 0.25)

if (params$run_simulations || !file.exists(curve_file)) {
  power_curve_effect <- purrr::map_dfr(
    curve_grid,
    ~ estimate_power_glmmTMB(
      base_data = work_dat,
      beta_cond = beta_cond,
      beta_disp = beta_disp,
      sd_farm = sd_farm,
      effect_multiplier = .x,
      nsim = params$nsim_curve,
      test = "wald",
      seed = 500 + round(.x * 100)
    )
  )

  readr::write_csv(power_curve_effect, curve_file)
} else {
  power_curve_effect <- readr::read_csv(curve_file, show_col_types = FALSE)
}

kable_scroll(
  type1_result |>
    dplyr::mutate(dplyr::across(where(is.numeric), ~ round(.x, 4))),
  caption = "Null-scenario result. Here, the estimated power column is interpreted as the simulated Type I error rate.",
  digits = 4
)
Null-scenario result. Here, the estimated power column is interpreted as the simulated Type I error rate.
effect_multiplier true_threeway_effect nsim alpha test successful_fits failed_fits power mc_se
0 0 1000 0.05 wald 940 60 0.0947 0.0095
ggplot(power_curve_effect, aes(x = effect_multiplier, y = power)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2.2) +
  geom_hline(yintercept = 0.80, linetype = "dashed") +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  labs(
    x = "Three-way interaction effect-size multiplier",
    y = "Estimated detection probability",
    title = "Power curve for the three-way interaction",
    subtitle = "0 = null scenario / Type I error; 1 = observed effect size"
  ) +
  theme_power()
Power curve for the three-way interaction. The point at zero corresponds to the null scenario.

Power curve for the three-way interaction. The point at zero corresponds to the null scenario.

The full curve is useful for visualizing the general relationship between effect size and power. However, irregular behavior at larger effect multipliers should not be overinterpreted. Sparse count patterns and model-fitting instability can affect simulated power estimates, especially in small clustered designs.

12 Design sensitivity: number of farms

Because density and deworming are farm-level variables, adding more farms is usually more valuable than adding more rows within the same farms. The design expansion below preserves the farm-by-scale structure by sampling observed farm profiles and relabeling them as new farms.

make_extended_farm_data <- function(dat, n_farms_new, seed = NULL) {
  if (!is.null(seed)) set.seed(seed)

  dat <- dat |>
    dplyr::mutate(Farms = factor(Farms))

  old_farms <- levels(dat$Farms)
  new_data_list <- vector("list", n_farms_new)

  for (j in seq_len(n_farms_new)) {
    sampled_farm <- sample(old_farms, size = 1)

    farm_dat_new <- dat |>
      dplyr::filter(Farms == sampled_farm) |>
      dplyr::arrange(nem_scaled)

    farm_dat_new$Farms <- factor(paste0("Farm_", j))
    new_data_list[[j]] <- farm_dat_new
  }

  dplyr::bind_rows(new_data_list) |>
    dplyr::mutate(Farms = factor(Farms))
}
farm_power_file <- "results/power_by_farms_wald.csv"
farm_grid <- c(11, 15, 20, 25, 30)

if (params$run_simulations || !file.exists(farm_power_file)) {
  power_by_farms <- purrr::map_dfr(
    farm_grid,
    function(nf) {
      dat_design <- make_extended_farm_data(
        dat = work_dat,
        n_farms_new = nf,
        seed = 1000 + nf
      )

      estimate_power_glmmTMB(
        base_data = dat_design,
        beta_cond = beta_cond,
        beta_disp = beta_disp,
        sd_farm = sd_farm,
        effect_multiplier = 1.00,
        nsim = params$nsim_farms,
        test = "wald",
        seed = 2000 + nf
      ) |>
        dplyr::mutate(
          n_farms = nf,
          rows_per_farm = nrow(dat_design) / dplyr::n_distinct(dat_design$Farms),
          total_rows = nrow(dat_design)
        )
    }
  )

  readr::write_csv(power_by_farms, farm_power_file)
} else {
  power_by_farms <- readr::read_csv(farm_power_file, show_col_types = FALSE)
}

farm_table <- power_by_farms |>
  dplyr::select(
    n_farms,
    rows_per_farm,
    total_rows,
    effect_multiplier,
    true_threeway_effect,
    nsim,
    test,
    successful_fits,
    failed_fits,
    power,
    mc_se
  ) |>
  dplyr::mutate(dplyr::across(where(is.numeric), ~ round(.x, 4)))

kable_scroll(
  farm_table,
  caption = "Design sensitivity by number of farms.",
  digits = 4
)
Design sensitivity by number of farms.
n_farms rows_per_farm total_rows effect_multiplier true_threeway_effect nsim test successful_fits failed_fits power mc_se
11 4 44 1 -1.2892 1000 wald 995 5 0.1196 0.0103
15 4 60 1 -1.2892 1000 wald 999 1 0.1081 0.0098
20 4 80 1 -1.2892 1000 wald 1000 0 0.8050 0.0125
25 4 100 1 -1.2892 1000 wald 1000 0 0.9920 0.0028
30 4 120 1 -1.2892 1000 wald 1000 0 0.9980 0.0014
ggplot(power_by_farms, aes(x = n_farms, y = power)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = 0.80, linetype = "dashed") +
  scale_x_continuous(breaks = farm_grid) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  labs(
    x = "Number of farms",
    y = "Estimated power",
    title = "Design sensitivity by number of farms",
    subtitle = "Observed three-way interaction effect size; dashed line marks 80% power"
  ) +
  theme_power()
Estimated power for detecting the observed three-way interaction across expanded farm-count scenarios.

Estimated power for detecting the observed three-way interaction across expanded farm-count scenarios.

farm_threshold <- power_by_farms |>
  dplyr::filter(power >= 0.80) |>
  dplyr::arrange(n_farms) |>
  dplyr::slice(1)

if (nrow(farm_threshold) == 0) {
  threshold_text <- tibble::tibble(
    Message = "No evaluated farm-count scenario reached 80% power."
  )
  kable_scroll(threshold_text, caption = "Farm-count threshold summary.")
} else {
  kable_scroll(
    farm_threshold |>
      dplyr::select(n_farms, total_rows, power, mc_se, successful_fits, failed_fits) |>
      dplyr::mutate(dplyr::across(where(is.numeric), ~ round(.x, 4))),
    caption = "Smallest evaluated farm-count scenario reaching at least 80% simulated power.",
    digits = 4
  )
}
Smallest evaluated farm-count scenario reaching at least 80% simulated power.
n_farms total_rows power mc_se successful_fits failed_fits
20 80 0.805 0.0125 1000 0

The farm-count analysis should be interpreted cautiously. The new-farm generator reuses observed farm profiles, so it is useful for planning but not a substitute for a fully designed sampling plan. A future study should deliberately balance stocking density and deworming-frequency values across farms.

13 Interpretation

The current design is evaluated under a strong but transparent assumption: the fitted negative binomial mixed model is treated as the true data-generating process. Under that assumption, current-design power estimates the chance that the observed three-way interaction would be detected in repeated studies with the same farm-by-scale structure.

The effect-size sensitivity analysis is the most important caution. If the true interaction is smaller than the observed estimate, power can drop sharply. This is especially relevant because small pilot studies can overestimate effect magnitudes.

The design analysis reinforces the practical message: for management-level predictors, farms are the main independent units. Additional farms are likely more valuable than adding more rows within the same farms, because the four rows per farm are the intensity-scale categories rather than independent animals.

14 Limitations

This power analysis is conditional on the fitted model, including the estimated fixed effects, farm random-effect variance, and dispersion model. If the true data-generating process differs substantially, the estimated power may change.

Model-fitting failures and convergence problems are part of the result. They reflect the difficulty of fitting complex negative binomial mixed models to sparse, clustered count data.

The expanded farm-count simulation is a planning exercise. It should not be read as an exact guarantee that a specific number of farms will achieve a fixed power level. A stronger future design simulation should specify realistic distributions of stocking density and deworming frequency among newly sampled farms.

15 Conclusion

Simulation-based power analysis was appropriate because the target model contains overdispersion, heterogeneous dispersion, farm-level clustering, and a three-way interaction. The current pilot design may be capable of detecting a large interaction similar to the observed effect, but power is much weaker for attenuated effects. Future studies should prioritize additional farms and realistic coverage of management conditions.

The power analysis tells us how much confidence we should have in detecting that pattern again.

16 Reproducibility notes

sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] stringr_1.5.1       scales_1.4.0        broom.mixed_0.2.9.7
##  [4] kableExtra_1.4.0    knitr_1.50          readr_2.1.5        
##  [7] ggplot2_3.5.2       purrr_1.2.2         tidyr_1.3.1        
## [10] dplyr_1.1.4         glmmTMB_1.1.11     
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1    viridisLite_0.4.2   farver_2.1.2       
##  [4] fastmap_1.2.0       TH.data_1.1-3       digest_0.6.37      
##  [7] estimability_1.5.1  lifecycle_1.0.4     survival_3.8-3     
## [10] magrittr_2.0.3      compiler_4.5.0      rlang_1.2.0        
## [13] sass_0.4.10         tools_4.5.0         yaml_2.3.10        
## [16] labeling_0.4.3      bit_4.6.0           xml2_1.3.8         
## [19] RColorBrewer_1.1-3  multcomp_1.4-28     withr_3.0.2        
## [22] numDeriv_2016.8-1.1 grid_4.5.0          xtable_1.8-4       
## [25] future_1.70.0       emmeans_1.11.1      globals_0.19.1     
## [28] MASS_7.3-65         cli_3.6.4           mvtnorm_1.3-3      
## [31] crayon_1.5.3        rmarkdown_2.29      reformulas_0.4.0   
## [34] generics_0.1.3      rstudioapi_0.17.1   tzdb_0.5.0         
## [37] minqa_1.2.8         cachem_1.1.0        splines_4.5.0      
## [40] parallel_4.5.0      vctrs_0.7.3         boot_1.3-31        
## [43] Matrix_1.7-3        sandwich_3.1-1      jsonlite_2.0.0     
## [46] hms_1.1.3           bit64_4.6.0-1       listenv_0.9.1      
## [49] systemfonts_1.2.3   jquerylib_0.1.4     glue_1.8.0         
## [52] parallelly_1.47.0   nloptr_2.2.1        codetools_0.2-20   
## [55] stringi_1.8.7       gtable_0.3.6        lme4_1.1-37        
## [58] tibble_3.2.1        pillar_1.10.2       furrr_0.4.0        
## [61] htmltools_0.5.8.1   R6_2.6.1            TMB_1.9.17         
## [64] textshaping_1.0.1   Rdpack_2.6.4        vroom_1.6.5        
## [67] evaluate_1.0.3      lattice_0.22-6      rbibutils_2.3      
## [70] backports_1.5.0     broom_1.0.8         bslib_0.9.0        
## [73] Rcpp_1.0.14         svglite_2.2.1       coda_0.19-4.1      
## [76] nlme_3.1-168        mgcv_1.9-1          xfun_0.52          
## [79] zoo_1.8-14          forcats_1.0.0       pkgconfig_2.0.3

References

Brooks, Mollie E., Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Mächler, and Benjamin M. Bolker. 2017. “glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling.” The R Journal 9 (2): 378–400. https://doi.org/10.32614/RJ-2017-066.
Kumle, Levi, Melissa L.-H. Võ, and Dejan Draschkow. 2021. “Estimating Power in (Generalized) Linear Mixed Models: An Open Introduction and Tutorial in r.” Behavior Research Methods 53 (6): 2528–43. https://doi.org/10.3758/s13428-021-01546-0.
Pargent, Florian, Timo K. Koch, Anne-Kathrin Kleine, Eva Lermer, and Susanne Gaube. 2024. “A Tutorial on Tailored Simulation-Based Sample-Size Planning for Experimental Designs with Generalized Linear Mixed Models.” Advances in Methods and Practices in Psychological Science 7 (4). https://doi.org/10.1177/25152459241287132.