STRAT-NUDGE cluster randomized trial

Author

A.Amstutz

STRAT-NUDGE cluster randomized trial (CRT)

Nuding intervention on the level of SHCS physicians at SHCS sites in Switzerland to promote the use of a novel TB risk score to do individual risk-tailored TB screening (i.e. IGRA testing):

  • Control: Enhanced standard of care: Electronic health records (EHR)-based based nudge consisting of a physician-targeted reminder for IGRA testing (no risk score)

  • Intervention: EHR-based nudge consisting of a physician-targeted reminder for IGRA testing guided by the risk score (individually risk-tailored: no testing for low-risk, and testing for high-risk))

Parameters and design considerations

  • Two-arm parallel-group, multi-center, superiority, cluster-randomized trial

  • Clusters are SHCS physicians, with their SHCS cohort participants, from all major sites (7 sites across Switzerland)

  • Eligible physician-clusters are active stable SHCS physician and eligible participants are SHCS participants with no recorded IGRA

  • Primary outcome, powered sample size on, is “individually risk-tailored IGRA testing rate”, i.e. testing among high-risk and no testing among low-risk participants (according to the TB score) 

  • Unit of data collection with be the SHCS cohort participants, but level of inference will be the physicians, i.e. cluster-average

  • Binary primary outcome: Proportion of patients successfully risk-tailored screened at first presentation

  • Baseline primary outcome rate (control clusters):

    • We estimate across the entire cohort that 10% are at high-risk, 90% at low-risk for developing TB (according to score)

    • The high-risk should be screened, the low-risk should NOT be screened

    • We estimate that:

      • We estimate an IGRA testing rate of 45% on average across all physician-clusters, higher among the 10% high-risk and lower among the 90% low-risk, e.g.:

      • High-risk: 90%, Low-risk: 40% => Overall: 0.1*(0.9) + 0.9*(0.40) = 45%

      • That means 90% (of the 10% high-risk) are correctly tested and 60% (of the 90% low-risk) are correctly NOT tested at the moment => Overall: 0.1*(0.9) + 0.9*(0.60) = 63%

      • => we estimate a baseline outcome rate of 63%

  • Expected outcome in intervention, and delta:

    • We estimate that we can increase the strata-tailored success outcome rate to:

      • 95% (of the 10% high-risk) are correctly tested and 80% (of the 90% low-risk) are correctly NOT tested => Overall: 0.1*(0.95) + 0.9*(0.8) = 81.5%

      • => we estimate an outcome rate of 81% in intervention

      • => delta of 81-63 = 18 absolute percentage points

  • Cluster size (m) of eligible overall participants (6m recruitment period):

    • on average 18 eligible participants per physician-cluster over a 6m period
  • CV (coefficient of variation), ratio of standard deviation of cluster sizes to mean of cluster sizes:

    • 0.98
  • ICC for the primary outcome: 0.20 (behavioural intervention/outcome)

  • Max. ca. 90 eligible clusters, i.e. 45 clusters per arm, due to cohort

  • Min. desired power 80%, two-sided alpha of 0.05

  • 1:1 allocation

Packages

Code
req_pkgs <- c("pwr",
              "dplyr",
              "purrr",
              "ggplot2",
              "lme4",
              "geepack", # for GEE (if needed)
              "MASS", # for GLMM PQL
              "marginaleffects", # for marginal standardization
              
              "future",
              "future.apply",
              "nlme",
              
              "tibble",
              "knitr",
              "kableExtra",
              "splines"
)
install_if_missing <- function(pkgs){
  for(p in pkgs){
    if(!requireNamespace(p, quietly=TRUE)){
      install.packages(p, repos="https://cloud.r-project.org")
    }
    library(p, character.only=TRUE)
  }
}
install_if_missing(req_pkgs)

# set global RNG seed for reproducibility
set.seed(20250809)

Corresponding individual randomized trial

Sample size for the individual randomized trial on the same question

Code
# Parameters
p_C <- 0.63
p_I <- 0.81
power <- 0.80 
ICC <- 0.20
alpha <- 0.05

# Effect size, standardized as Cohen's h
h_I_C <- ES.h(p1 = p_I, p2 = p_C)
cat("Cohen's h for Intervention vs Control:", round(h_I_C, 3), "\n")
Cohen's h for Intervention vs Control: 0.406 
Code
# Sample size pair-wise comparison
ss_I_C <- pwr.2p.test(h = h_I_C, sig.level = alpha, power = power)
n_per_arm <- ceiling(ss_I_C$n)
n_total <- n_per_arm * 2

cat("Sample size per arm:", n_per_arm, "\n")
Sample size per arm: 96 
Code
cat("Total trial sample size (2-arm trial):", n_total)
Total trial sample size (2-arm trial): 192

(1) Sample size calculation CRT: formula-based

Add the design effect (DEFF) to the individual RCT sample size. The usual standard DEFF formula:

DEFF = 1+(m−1)ICC , whereby m = cluster size

However, let’s not forget the cluster size variation. The usual conservative adjustment of the DEFF with cluster size variation is (e.g. see here: https://pmc.ncbi.nlm.nih.gov/articles/PMC7394950/#sup1):

DEFF_cv = 1+((m(1+CV^2)−1))ICC , whereby CV is the coefficient of variation (ratio of standard deviation of cluster sizes to mean of cluster sizes)

Code
# Parameters
p_C <- 0.63
p_I <- 0.81  
power <- 0.80 
ICC <- 0.20
alpha <- 0.05

m <- 18 # average cluster size
CV <- 0.98 # CV

deff <- 1+(m-1)*ICC # standard DEFF
deff_cv <- 1+((m*(1+CV^2))-1)*ICC # DEFF with cluster size variation

# Effect size
h_I_C <- ES.h(p1 = p_I, p2 = p_C)

# sample size for corresponding individual RCT 
ss <- pwr.2p.test(h = h_I_C, power = power, sig.level = alpha)$n

# CRT sample size
ss_crt <- ceiling(ss * deff_cv)
n_clusters <- ceiling(ss_crt / m)
cat("Cluster sample size int arm 1:", n_clusters, "\n")
Cluster sample size int arm 1: 42 
Code
cat("Individual sample size int arm 1:", ss_crt, "\n")
Individual sample size int arm 1: 750 
Code
# Total
tot_clusters <- n_clusters * 2
tot_ind <- ss_crt * 2
cat("Total cluster sample size:", tot_clusters, "\n")
Total cluster sample size: 84 
Code
cat("Total individual sample size:", tot_ind, "\n")
Total individual sample size: 1500 

(1.1) Varying assumptions - Standard sample size calculation

(1.1.1) Varying baseline control rate

Assuming same delta, but varying baseline control rates, all other parameters fixed

Code
power <- 0.80
alpha <- 0.05
ICC <- 0.20
CV <- 0.98
m <- 18
delta <- 0.18

# Baseline control rates
p_C_values <- seq(0.30, 0.80, by = 0.05)

results_df <- data.frame(
  p_C = numeric(),
  n_clusters_per_arm = numeric(),
  n_individuals_per_arm = numeric()
)

cohen_h <- function(p1, p2) {
  2 * (asin(sqrt(p1)) - asin(sqrt(p2)))
}

for (p_C in p_C_values) {
  p_I1 <- p_C + delta
  
  # Skip if intervention rates are invalid (less than 0)
  if (p_I1 < 0) {
    next
  }

  deff_cv <- 1 + ((m * (1 + CV^2)) - 1) * ICC

  h_I1_C <- cohen_h(p_I1, p_C)

  ss1 <- pwr.2p.test(h = h_I1_C, power = power, sig.level = alpha)$n
  
  # Individual sample size of the CRT
  n_per_arm_crt <- ceiling(ss1 * deff_cv)
  
  # Cluster sample size of the CRT
  n_clusters_per_arm <- ceiling(n_per_arm_crt / m)
  
  # Append results
  results_df <- rbind(results_df, data.frame(
    p_C = p_C,
    n_clusters_per_arm = n_clusters_per_arm,
    n_individuals_per_arm = n_per_arm_crt
  ))
}
Code
ggplot(results_df, aes(x = p_C, y = n_clusters_per_arm * 2)) +
  geom_line(color = "darkgreen", size = 1) +
  geom_point(color = "darkgreen", size = 2) +
  labs(
    title = "Total clusters needed vs. Baseline control rate",
    x = "Baseline control rate",
    y = "Total clusters needed (for 2-arm trial)"
  ) +
  theme_minimal() +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1),
                     breaks = seq(0.30, 0.80, by = 0.05)) +
  scale_y_continuous(breaks = seq(0, max(results_df$n_clusters_per_arm * 2), by = 1))

Code
ggplot(results_df, aes(x = p_C, y = n_individuals_per_arm * 2)) +
  geom_line(color = "lightgreen", size = 1) +
  geom_point(color = "lightgreen", size = 2) +
  labs(
    title = "Total individuals needed vs. Baseline control rate",
    x = "Baseline control rate",
    y = "Total individuals needed (for 2-arm trial)"
  ) +
  theme_minimal() +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1),
                     breaks = seq(0.30, 0.80, by = 0.05)) +
  scale_y_continuous(breaks = seq(0, max(results_df$n_individuals_per_arm * 2), by = 50))

(1.1.2) Varying ICC

Varying ICC, all other parameters fixed

Code
power <- 0.80
alpha <- 0.05
p_C <- 0.63 
m <- 18
CV <- 0.98
delta <- 0.18 # 16 percentage points

# Range of ICC values to test
ICC_values <- seq(0.05, 0.35, by = 0.01)

results_df <- data.frame(
  ICC = numeric(),
  n_clusters_per_arm = numeric(),
  n_individuals_per_arm = numeric()
)

cohen_h <- function(p1, p2) {
  2 * (asin(sqrt(p1)) - asin(sqrt(p2)))
}

for (icc in ICC_values) {
  p_I1 <- p_C + delta
  
  deff_cv <- 1 + ((m * (1 + CV^2)) - 1) * icc
  
  h_I1_C <- cohen_h(p_I1, p_C)
  
  ss1 <- pwr.2p.test(h = h_I1_C, power = power, sig.level = alpha)$n
  
  n_per_arm_crt <- ceiling(ss1 * deff_cv)
  
  n_clusters_per_arm <- ceiling(n_per_arm_crt / m)
  
  results_df <- rbind(results_df, data.frame(
    ICC = icc,
    n_clusters_per_arm = n_clusters_per_arm,
    n_individuals_per_arm = n_per_arm_crt
  ))
}
Code
ggplot(results_df, aes(x = ICC, y = n_clusters_per_arm * 2)) +
  geom_line(color = "darkred", size = 1) +
  geom_point(color = "darkred", size = 2) +
  labs(
    title = "Total clusters needed vs. ICC",
    x = "ICC",
    y = "Total clusters needed"
  ) +
  theme_minimal() +
  scale_x_continuous(breaks = seq(0.05, 0.35, by = 0.02)) +
  scale_y_continuous(breaks = seq(0, max(results_df$n_clusters_per_arm * 2), by = 5))

Code
ggplot(results_df, aes(x = ICC, y = n_individuals_per_arm * 2)) +
  geom_line(color = "red", size = 1) +
  geom_point(color = "red", size = 2) +
  labs(
    title = "Total individuals needed vs. ICC",
    x = "ICC",
    y = "Total individuals needed"
  ) +
  theme_minimal() +
  scale_x_continuous(breaks = seq(0.05, 0.35, by = 0.02)) +
  scale_y_continuous(breaks = seq(0, max(results_df$n_individuals_per_arm * 2), by = 500))

(2) Sample size calculation CRT: Simulation-based

(2.1) Parameters

We follow the simulation setup according to J. Thompson & C. Leyrat for binary outcomes: https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01699-2

Data-generating model (per cluster):

  • For arm i (0=control, 1=intervention) and cluster j: Y_ij ∼ Binomial(m_ij, p_ij), logit⁡(p_ij) = β0 + β1_i + u_j , where u_j is a cluster random effect with mean 0 and variance σ^2_b

    • ​Binomial(m_ij, p_ij): Conditional on p_ij, we assume each of the m_ij individuals in that cluster are independent Bernoulli trials with probability p_ij. So Y_ij is a binomial draw with that probability, for the cluster-level.

      • A Bernoulli trial is a random event with two outcomes (success/failure), with the same, independent, probability of success every time.

      • Independence assumption (within-cluster): Whether one person gets the IGRA test doesn’t change the probability for another person in the same cluster (once p_ij is fixed). The correlation between people’s outcomes in the same cluster comes entirely from them sharing the same p_ij (via the physician)

      • => Y_ij can be any integer from 0 to m_ij. E.g. if m_ij =42 and p_ij =0.50, then Y_ij is the total number of IGRA tests in that cluster, drawn from a binomial distribution with 42 trials and 50% success probability.

    • logit⁡(p_ij) = β0 + β1_i + u_j:

      • Using the logit link maps probability p ∈ (0,1) to the whole real line, so we can model it as a linear predictor.

      • β0 is the baseline log-odds (the logit of the control probability for a typical cluster, i.e. when u_j = 0), representing the the marginal cluster-specific probability.

      • β1_i encodes the treatment effect and is a log-odds difference; exp⁡(β1) is the conditional odds ratio comparing treatment vs control for the same cluster (holding u_j fixed).

      • u_j is the cluster random intercept (a cluster-level shift on the log-odds scale). It captures unobserved cluster-level factors (e.g. physician tendency) that move all individuals in the cluster up/down in log-odds. Typically, u_j has mean 0 and variance σ^2_b and is independent across clusters (see above). The random intercept does not change the conditional treatment effect, it only shifts the baseline log-odds for that whole cluster. In other words, the difference in log-odds between arms for the same cluster is always constant, but the actual probabilities shift up/down with u_j. For clusters with positive u_j both arms have higher probabilities; for negative u_j both are lower.

ICC on log-odds scale:

  • ICC = p = rho = σ^2_b / (σ^2_b+(π^2/3))

  • The ICC consists of individual-level variance (noise) and between-cluster variance (noise), in the sense of: between-cluster variance / total variance. The between-cluster variance approximates the cluster random effect variance (σ^2_b)

  • In logistic models with binary outcomes, the individual level variation is usually fixed at π^2/3 (3.29)

  • So, focusing on the cluster random effect variance (σ^2_b), we can derive it from the formula above as: σ_b = sigma_b = sqrt((ICC(π^2/3))/(1−ICC))

  • (If there’s additional within-site variation over time, i.e. baseline period or SW-CRT, we include σ^2_b_p, typically as a fraction of σ^2_b, e.g., half the site-level variance)

Cluster effect distributions:

  • While ICC is the proportion of the total variance (in the latent scale) that comes from between-cluster differences (“what fraction of the total variability is due to between-cluster differences”), the σ^2_b is an absolute variance (“How big the cluster intercept spread is in log-odds units” or “how much variation there is in prescription tendency across clusters”) and can have different shapes.

  • GLMM assumes normal distribution, but reality is often skewed. Simulate three scenarios including a realistic/skewed/conservative scenario and see if GLMM breaks (as in paper above):

  • a) Normal: u_j ∼ N(0, σ^2_b)

    • Symmetric, bell-shaped, skewness = 0, kurtosis = 0.
  • b) Gamma (skewed): generate a_j ∼ Gamma(shape=2,scale=1), then set u_j ​= σ_b(​(a_j​−2)/sqrt(2))

    • A shape parameter of 2 give a distribution with skew 1.4 and kurtosis 3, i.e., positive skew (some clusters much higher tendency than average)
  • c) Uniform: u_j ∼ Uniform(−sqrt(3)σ_b, sqrt(3)σ_b)

    • Skewness = 0 (perfectly symmetric), Kurtosis = −6/5 (lighter tails than normal), no extreme values, overall flat, all clusters are evenly spread; to test if GLMMs are sensitive to lack of tail weight, i.e., whether they rely on the normal distribution’s tails to stabilize estimates.

Cluster sizes m_ij​:

  • Allow for varying cluster size, i.e. varying coefficient of variation (CV) of cluster sizes, using same approach as they did: They sampled cluster sizes so that m_ij = 2 + δ_ij,​ drawn from a Negative Binomial:

    • δ_ij ​∼ NegBin(size = (m-2)^2/(s^2-(m-2)), p = m-2/s^2)

    • where s is the SD of cluster sizes (CV = s/m).

    • This yields a minimum cluster size of 3. (note: they note no.offails and prob.offail; but the above should represent the same).

    • δ is in a way the random component added to 2 to get the cluster size (of min 3).

(2.2) Create main functions and simulate one dataset

Code
# 1) compute sigma_b from ICC (on latent logit scale):
icc_to_sigma <- function(rho){
  if(rho<=0) return(0)
  sigma_b <- sqrt( (rho * (pi^2/3)) / (1 - rho) )
  return(sigma_b)
}

# 2) compute beta0 for given control prevalence p0
p_to_beta0 <- function(p0){
  qlogis(p0)
}

# 3) given p0 and p1, compute OR on the cluster-specific log-odds scale
p0_p1_to_OR <- function(p0, p1){
  odds0 <- p0 / (1 - p0)
  odds1 <- p1 / (1 - p1)
  odds1 / odds0
}

# 4) generate random cluster-level u_j for the three distributions
generate_u <- function(n_clusters, sigma_b, dist = c("normal","gamma","uniform")){
  dist <- match.arg(dist)
  if(sigma_b == 0) return(rep(0, n_clusters))
  if(dist == "normal"){
    return(rnorm(n_clusters, mean=0, sd = sigma_b))
  } else if(dist == "gamma"){
    # they used Gamma(shape=2, scale=1) then standardized to mean 0 and sd sigma_b
    a <- rgamma(n_clusters, shape=2, scale=1)
    # a has mean 2, var 2. Standardize: (a - 2)/sqrt(2) then scale to sigma_b
    return(sigma_b * (a - 2)/sqrt(2))
  } else if(dist == "uniform"){
    cut <- sqrt(3) * sigma_b
    return(runif(n_clusters, min = -cut, max = cut))
  }
}

# 5) generate cluster sizes with target mean m and CV. Implementation follows their negative-binomial based approach and enforces minimum cluster size of 3.
generate_cluster_sizes <- function(n_clusters, m, CV){
  if(CV == 0){
    return(rep(m, n_clusters))
  }
  s <- CV * m
  # We want delta = m_j - 2 to follow NegBin with mean (m-2) and variance s^2
  mu_delta <- m - 2
  var_delta <- s^2
  if(var_delta <= mu_delta){
    # Negative Binomial requires variance > mean. So, this is an impossible NB parameterization
    # If so, fall back to a discrete uniform around m
    low <- max(3, floor(m - s*1.5))
    high <- ceiling(m + s*1.5)
    out <- pmax(3, round(runif(n_clusters, low, high)))
    return(out)
  }
  size_nb <- (mu_delta^2) / (var_delta - mu_delta) # see formula above
  prob_nb <- mu_delta / var_delta # see formula above
  # rnbinom in R uses size, prob; mean = size*(1-prob)/prob, but with this param it matches
  delta <- rnbinom(n_clusters, size = size_nb, prob = prob_nb)
  m_j <- 2 + delta
  m_j[m_j < 3] <- 3 # enforce min 3 (generating 2+delta ensures >=2, we bump to 3)
  return(m_j)
}

# Parameters for single simulated dataset
n_clusters <- 84
m_mean <- 18
CV <- 0.98
p0 <- 0.63
p1 <- 0.82
OR <- p0_p1_to_OR(p0, p1) # compute OR
rho <- 0.20 # ICC
re_dist <- "gamma"

# Simulate
set.seed(20250809)
sigma_b <- icc_to_sigma(rho)
u_j <- generate_u(n_clusters, sigma_b, dist = re_dist)
sizes <- generate_cluster_sizes(n_clusters, m_mean, CV)
arm_assign <- sample(rep(0:1, length.out = n_clusters))
beta0 <- p_to_beta0(p0)
beta1 <- log(OR)
y <- integer(n_clusters)

for(j in seq_len(n_clusters)){ # iterate over each cluster
  # create the linear predictor (NOTE: beta1 turns 0 if arm0, and 1 * beta1 if arm1)
  linpred <- beta0 + beta1 * arm_assign[j] + u_j[j] 
  # apply the inverse logit (logistic function) to convert log-odds to probability
  p_j <- plogis(linpred) 
  # Simulate the number of successes in cluster j
  y[j] <- rbinom(1, size = sizes[j], prob = p_j) 
}

df_sim <- data.frame(cluster = seq_len(n_clusters),
                      arm = arm_assign,
                      size = sizes,
                      y = y)
df_sim
   cluster arm size   y
1        1   1   82  59
2        2   0   16   6
3        3   0   10   6
4        4   0   26  14
5        5   0   10   9
6        6   1   11  10
7        7   1    3   3
8        8   0   17  15
9        9   1   18  13
10      10   0    7   3
11      11   0   41  22
12      12   0    3   1
13      13   0   26  10
14      14   1   36  34
15      15   0    5   3
16      16   0   15  12
17      17   1   18  12
18      18   1    3   3
19      19   1   38  24
20      20   1   17  14
21      21   0   22  18
22      22   1   44  31
23      23   1    5   5
24      24   0   73  43
25      25   0    6   2
26      26   1    3   3
27      27   0   13   5
28      28   1    5   3
29      29   0    5   3
30      30   1   14  14
31      31   0    4   0
32      32   0    4   4
33      33   1   31  25
34      34   0   17  17
35      35   1    3   3
36      36   0   16  10
37      37   0    3   1
38      38   1   30  28
39      39   0    7   4
40      40   1   19  16
41      41   0   15   7
42      42   0   20  11
43      43   1   56  36
44      44   0    3   1
45      45   1   13  12
46      46   1   22  20
47      47   1   25  21
48      48   1   21  21
49      49   0    4   2
50      50   1   62  55
51      51   1    7   5
52      52   1    3   2
53      53   1    5   4
54      54   1   13  11
55      55   1    3   2
56      56   1    3   3
57      57   0    6   2
58      58   1   27  22
59      59   0    7   3
60      60   0    5   4
61      61   0   28  23
62      62   1    5   5
63      63   1   11   8
64      64   0   17  13
65      65   0   12   6
66      66   1    6   6
67      67   0   19  10
68      68   0   24  20
69      69   1   10   8
70      70   1   10   9
71      71   0   12   5
72      72   1    3   3
73      73   0    4   2
74      74   1    3   3
75      75   0   19  15
76      76   1   45  27
77      77   1   20  15
78      78   0    3   1
79      79   1   28  21
80      80   1  104 100
81      81   0    6   5
82      82   0   11   7
83      83   0   13   4
84      84   0    3   3
Code
mean_sizes <- df_sim %>%
  group_by(arm) %>%
  summarise(mean_size = mean(size))

ggplot(df_sim, aes(x = factor(cluster), y = size, fill = factor(arm))) +
  geom_bar(stat = "identity", color = "black") +
  geom_hline(data = mean_sizes, aes(yintercept = mean_size, color = factor(arm)),
             linetype = "dashed", size = 1, show.legend = FALSE) +
  geom_text(data = mean_sizes, aes(x = Inf, y = mean_size, label = paste0("Mean = ", round(mean_size, 1))),
            hjust = 1.1, vjust = -0.5, color = c("skyblue4", "tomato3"), size = 4) +
  scale_fill_manual(values = c("skyblue", "tomato"), labels = c("Control (arm=0)", "Intervention (arm=1)")) +
  scale_color_manual(values = c("skyblue4", "tomato3")) +
  labs(x = "Cluster", y = "Cluster Size", fill = "Treatment Group") +
  theme_minimal() +
  ggtitle("Cluster size per cluster") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

size = number of individuals in a cluster

y = number of individual-level successes (binary=1) observed in the cluster, i.e., represents the number of individuals in that cluster who correctly received an IGRA test (no if low-risk, yes if high-risk)

(2.3) Simulate power, using cluster-level analysis approach

NOTES:

  • Use cluster-level analysis (unweighted t-test on log-odds, with 0.5 continuity correction, as per guidance according to Thompson & Leyrat & al -> “clan” command)

  • Use gamma distribution (most conservative), simulate 500-1000 trials

(2.3.1) Create function

Code
simulate_power <- function(n_clusters = 84, 
                           m_mean = 18, 
                           CV = 0.98,
                           p0 = 0.63, 
                           p1 = 0.81, 
                           rho = 0.20,
                           re_dist = "gamma", 
                           n_sim = 1000,
                           alpha = 0.05, 
                           seed = 20250809) {
  set.seed(seed)
  
  # Compute derived parameters
  sigma_b <- icc_to_sigma(rho)
  beta0 <- p_to_beta0(p0)
  OR <- p0_p1_to_OR(p0, p1)
  beta1 <- log(OR)
  
  p_values <- numeric(n_sim)
  
  for (i in seq_len(n_sim)) {
    u_j <- generate_u(n_clusters, sigma_b, dist = re_dist)
    sizes <- generate_cluster_sizes(n_clusters, m_mean, CV)
    arm_assign <- sample(rep(0:1, length.out = n_clusters))
    
    y <- integer(n_clusters)
    for (j in seq_len(n_clusters)) {
      linpred <- beta0 + beta1 * arm_assign[j] + u_j[j]
      p_j <- plogis(linpred)
      y[j] <- rbinom(1, size = sizes[j], prob = p_j)
    }
    
    # Cluster-level log-odds with 0.5 continuity correction
    log_odds <- log((y + 0.5) / (sizes - y + 0.5))
    
    # Unweighted t-test
    group0 <- log_odds[arm_assign == 0]
    group1 <- log_odds[arm_assign == 1]
    
    test <- try(t.test(group1, group0, var.equal = TRUE), silent = TRUE)
    p_values[i] <- if (inherits(test, "try-error")) NA else test$p.value
  }
  
  # Estimate power
  mean(p_values < alpha, na.rm = TRUE)
}

(2.3.2) Calculate baseline scenario

Code
power_estimate <- simulate_power(n_clusters = 84,
                                 m_mean = 18,
                                 CV = 0.98,
                                 p0 = 0.63,
                                 p1 = 0.81,
                                 rho = 0.20,
                                 re_dist = "gamma",
                                 n_sim = 1000)

cat("Estimated power:", round(power_estimate, 3), "\n")
Estimated power: 0.945 

(2.3.3) Vary effect sizes

Code
p0_vals <- seq(0.35, 0.55, by = 0.05)
p1_vals <- seq(0.50, 0.85, by = 0.05)

grid <- expand.grid(p0 = p0_vals, p1 = p1_vals)

results <- grid %>%
  rowwise() %>%
  mutate(power = simulate_power(n_clusters = 84,
                                m_mean = 18,
                                CV = 0.98,
                                p0 = p0,
                                p1 = p1,
                                rho = 0.2,
                                re_dist = "gamma",
                                n_sim = 1000)) %>%
  ungroup()

# Plot
ggplot(results, aes(x = p1, y = power, color = factor(p0))) +
  
  # Shaded region above 90% power
  geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = 0.9, ymax = Inf),
            fill = "lightgrey", alpha = 0.3, inherit.aes = FALSE) +
  
  # Power curves
  geom_line(size = 1.2) +
  geom_point() +
  
  # Labels and scales
  labs(title = "Power curves by p0 and p1",
       x = "Intervention group probability (p1)",
       y = "Estimated power",
       color = "Control group probability (p0)") +
  scale_y_continuous(breaks = seq(0, 1, by = 0.1),
                     limits = c(0, 1),
                     labels = scales::percent_format(accuracy = 1)) +
  scale_x_continuous(breaks = seq(0, 1, by = 0.05)) +
  theme_minimal(base_size = 14)

(2.3.4) Vary ICC

Code
# Vector of ICC values to test
icc_values <- seq(0.05, 0.45, by = 0.02)

# Run power simulations for each ICC
power_results <- sapply(icc_values, function(rho) {
  simulate_power(n_clusters = 84,
                 m_mean = 18,
                 CV = 0.98,
                 p0 = 0.63,
                 p1 = 0.81,
                 rho = rho,
                 re_dist = "gamma",
                 n_sim = 1000,
                 alpha = 0.05,
                 seed = 20250809)
})

# Create data frame for plotting
df_power_icc <- data.frame(ICC = icc_values, Power = power_results)

# Plot
ggplot(df_power_icc, aes(x = ICC, y = Power)) +
  geom_line(color = "darkred", size = 1.2) +
  geom_point(color = "firebrick") +
  labs(title = "Power Curve by ICC",
       x = "Intraclass Correlation (ICC)",
       y = "Estimated Power") +
  scale_y_continuous(breaks = seq(0.70, 1, by = 0.1),
                     limits = c(0.70, 1),
                     labels = scales::percent_format(accuracy = 1)) +
  theme_minimal()

(2.3.5) Vary number of clusters

Code
# Vector of cluster counts to test
n_clusters_vec <- seq(30, 110, by = 1)

# Run power simulations for each cluster count
power_results <- sapply(n_clusters_vec, function(nc) {
  simulate_power(n_clusters = nc,
                 m_mean = 18,
                 CV = 0.98,
                 p0 = 0.63,
                 p1 = 0.81,
                 rho = 0.20,
                 re_dist = "gamma",
                 n_sim = 1000,
                 alpha = 0.05,
                 seed = 20250809)
})

# Create data frame for plotting
df_power_css <- data.frame(Cluster_ss = n_clusters_vec, Power = power_results)

# Plot
ggplot(df_power_css, aes(x = Cluster_ss, y = Power)) +
  geom_line(color = "darkgreen", size = 1.2) +
  geom_point(color = "forestgreen") +
  labs(title = "Power vs Number of total clusters",
       x = "Total number of clusters",
       y = "Estimated power") +
  scale_y_continuous(breaks = seq(0.50, 1, by = 0.1),
                     limits = c(0.50, 1),
                     labels = scales::percent_format(accuracy = 1)) +
  scale_x_continuous(breaks = seq(30, 110, by = 5)) +
  theme_minimal()

(2.3.6) Vary number of individuals per cluster (mean cluster size)

Code
m_mean_vec <- seq(2, 40, by = 2)

# Run power simulations for each cluster count
power_results <- sapply(m_mean_vec, function(n) {
  simulate_power(n_clusters = 84,
                 m_mean = n,
                 CV = 0.98,
                 p0 = 0.63,
                 p1 = 0.81,
                 rho = 0.20,
                 re_dist = "gamma",
                 n_sim = 1000,
                 alpha = 0.05,
                 seed = 20250809)
})

# Create data frame for plotting
df_power_iss <- data.frame(Individual_ss = m_mean_vec, Power = power_results)

# Plot
ggplot(df_power_iss, aes(x = Individual_ss, y = Power)) +
  geom_line(color = "darkblue", size = 1.2) +
  geom_point(color = "skyblue") +
  labs(title = "Power vs Average number of individuals per cluster",
       x = "Average number of individuals per cluster",
       y = "Estimated power") +
  scale_y_continuous(breaks = seq(0.70, 1, by = 0.1),
                     limits = c(0.70, 1),
                     labels = scales::percent_format(accuracy = 1)) +
  scale_x_continuous(breaks = seq(2, 40, by = 2)) +
  theme_minimal()

(2.4) Simulate power, using GLMM analysis approach

NOTES:

  • As per trial protocol

  • 84 clusters in total, 42 per arm => cluster number large enough to use normal GLMM (not with restricted pseudo-likelihood and reduced degree of freedom as per guidance according to Thompson & Leyrat & al)

  • Keep gamma distribution throughout

(2.4.1) Create function

Code
simulate_power_glmer <- function(n_clusters = 84, 
                                   m_mean = 18, 
                                   CV = 0.98,
                                   p0 = 0.63, 
                                   p1 = 0.81, 
                                   rho = 0.20,
                                   re_dist = "gamma", 
                                   n_sim = 1000,
                                   alpha = 0.05, 
                                   seed = 20250809) {
  set.seed(seed)
  
  sigma_b <- icc_to_sigma(rho)
  beta0 <- p_to_beta0(p0)
  OR <- p0_p1_to_OR(p0, p1)
  beta1 <- log(OR)
  
  p_values <- numeric(n_sim)
  
  for (i in seq_len(n_sim)) {
    u_j <- generate_u(n_clusters, sigma_b, dist = re_dist)
    sizes <- generate_cluster_sizes(n_clusters, m_mean, CV)
    arm_assign <- sample(rep(0:1, length.out = n_clusters))
    
    y <- integer(n_clusters)
    for (j in seq_len(n_clusters)) {
      linpred <- beta0 + beta1 * arm_assign[j] + u_j[j]
      p_j <- plogis(linpred)
      y[j] <- rbinom(1, size = sizes[j], prob = p_j)
    }
    
    dd_sim <- data.frame(
      y = y,
      size = sizes,
      arm = factor(arm_assign),
      cluster = factor(seq_len(n_clusters))
    )
    
    # Fit GLMM using glmer
    fit <- try(
      lme4::glmer(
        cbind(y, size - y) ~ arm + (1 | cluster),
        family = binomial(link = "logit"),
        data = dd_sim,
        control = lme4::glmerControl(optimizer = "bobyqa") # see Bates et al. (lme4 developers)
      ),
      silent = TRUE
    )

    if (!inherits(fit, "try-error")) {
      sm <- summary(fit)
      p_values[i] <- sm$coefficients["arm1", "Pr(>|z|)"]
    } else {
      p_values[i] <- NA
    }
  }

  mean(p_values < alpha, na.rm = TRUE)
}

(2.4.2) Calculate baseline scenario

Code
power_estimate <- simulate_power_glmer(n_clusters = 84,
                                         m_mean = 18,
                                         CV = 0.98,
                                         p0 = 0.63,
                                         p1 = 0.81,
                                         rho = 0.20,
                                         re_dist = "gamma",
                                         n_sim = 1000)

cat("Estimated power (GLMM):", round(power_estimate, 3), "\n")
Estimated power (GLMM): 0.981 

Currently, the cluster size variation (CV = 0.98) is too high, estimation is imprecise. We need to work on refining the eligibility criteria for the clusters (physicians)

(2.4.3) Vary effect sizes

Code
# p0_vals <- seq(0.35, 0.55, by = 0.05)
# p1_vals <- seq(0.50, 0.85, by = 0.05)
# 
# grid_glmm <- expand.grid(p0 = p0_vals, p1 = p1_vals)
# 
# # Use map2 to apply the function to each p0/p1 pair, bit more efficient
# grid_glmm$power <- map2_dbl(grid_glmm$p0, grid_glmm$p1, ~ simulate_power_glmmPQL(
#   n_clusters = 110,
#   m_mean = 100,
#   CV = 0.4,
#   p0 = .x,
#   p1 = .y,
#   rho = 0.2,
#   re_dist = "gamma",
#   n_sim = 300 # reduced for speed
# ))
# 
# ggplot(grid_glmm, aes(x = p1, y = power, color = factor(p0))) +
#   geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = 0.9, ymax = Inf),
#             fill = "lightgrey", alpha = 0.3, inherit.aes = FALSE) +
#   geom_line(size = 1.2) +
#   geom_point() +
#   labs(title = "Power Curves by p0 and p1 (GLMM)",
#        x = "Intervention group probability (p1)",
#        y = "Estimated power",
#        color = "Control group probability (p0)") +
#   scale_y_continuous(breaks = seq(0, 1, by = 0.1),
#                      limits = c(0, 1),
#                      labels = scales::percent_format(accuracy = 1)) +
#   theme_minimal(base_size = 14)

(2.4.4) Vary ICC

Code
# icc_values <- seq(0.05, 0.35, by = 0.02)
# 
# power_results_glmm <- sapply(icc_values, function(rho) {
#   simulate_power_glmmPQL(n_clusters = 110,
#                  m_mean = 100,
#                  CV = 0.40,
#                  p0 = 0.48,
#                  p1 = 0.64,
#                  rho = rho,
#                  re_dist = "gamma",
#                  n_sim = 300, # reduced for speed
#                  alpha = 0.05,
#                  seed = 20250809)
# })
# 
# df_power_icc_glmm <- data.frame(ICC = icc_values, Power = power_results_glmm)
# 
# ggplot(df_power_icc_glmm, aes(x = ICC, y = Power)) +
#   geom_line(color = "darkred", size = 1.2) +
#   geom_point(color = "firebrick") +
#   labs(title = "Power curve by ICC (GLMM)",
#        x = "Intra-cluster correlation (ICC)",
#        y = "Estimated Power") +
#   scale_y_continuous(breaks = seq(0.70, 1, by = 0.1),
#                      limits = c(0.70, 1),
#                      labels = scales::percent_format(accuracy = 1)) +
#   theme_minimal()

(2.4.5) Vary number of clusters

Code
# n_clusters_vec <- seq(80, 120, by = 1)
# 
# power_results_glmm <- sapply(n_clusters_vec, function(nc) {
#   simulate_power_glmmPQL(n_clusters = nc,
#                  m_mean = 100,
#                  CV = 0.40,
#                  p0 = 0.48,
#                  p1 = 0.64,
#                  rho = 0.20,
#                  re_dist = "gamma",
#                  n_sim = 300, # reduced for speed
#                  alpha = 0.05,
#                  seed = 20250809)
# })
# 
# df_power_css_glmm <- data.frame(Cluster_ss = n_clusters_vec, Power = power_results_glmm)
# 
# ggplot(df_power_css_glmm, aes(x = Cluster_ss, y = Power)) +
#   geom_line(color = "darkgreen", size = 1.2) +
#   geom_point(color = "forestgreen") +
#   labs(title = "Power vs Number of total clusters (GLMM)",
#        x = "Total number of clusters",
#        y = "Estimated power") +
#   scale_y_continuous(breaks = seq(0.70, 1, by = 0.1),
#                      limits = c(0.70, 1),
#                      labels = scales::percent_format(accuracy = 1)) +
#   scale_x_continuous(breaks = seq(22, 36, by = 1)) +
#   theme_minimal()

(3) Simulate the full dataset and implement the main analysis strategy

The main analysis strategy:

  • Generalized linear mixed model (GLMM) with restricted pseudo-likelihood estimation and small-sample correction for degrees of freedom (clusters minus cluster-level parameters), as suggested by Thompson and colleagues (could be dropped here with 50+ clusters per arm)

  • We will adjust the model for these a priori defined covariates (as fixed effects):

    • Cluster-level covariates (cluster mean): baseline testing rate (as continuous covariates assuming linearity) and site (7 levels)

    • Individual-level covariates:

      • Self-reported sex/gender (for now, as binary covariate: male, female)

      • Age (as continuous covariates assuming non-linear association modelled using restricted cubic splines with 3 knots at 10th, 50th and 90th percentiles of the observed age distribution)

      • Region of origin (xxx levels)

  • We will report the resulting treatment-effect (beta-1), which is the log-odds difference between intervention and control or – when exponentiated – the adjusted odds ratio, with its 95% confidence interval. This represents a relative cluster-specific effect, conditional on all included covariates.

  • In addition, we will use marginal standardization and report the resulting population-average marginal relative risk and risk difference with their 95% confidence intervals (is primarily and individual-average estimate, but equals a cluster-average estimate in case of no informative cluster size)

Notes on simulating a realistic dataset:

  • Reuse the helper functions from chapter 2.2, incl. conservative gamma distribution for u_j

  • Causal structure:

    • Cluster latent effect (u_j) influences baseline IGRA testing rate (through alpha, the correlation strength between baseline and u_j) and directly affects the outcome via the cluster random effect

    • Baseline IGRA testing rate directly affects the outcome (via beta_baseline), representing residual correlation beyond the shared cluster effect alpha

    • => baseline_rate directly pushes the outcome (via beta_baseline) and shares correlation with u_j (i.e., indirectly pushes the outcome), in the sense of: “Clusters with higher IGRA testing rate at baseline -> higher individually risk-tailored testing rate at endline”

    • Treatment (arm 1) directly affects the outcome (through beta_1), but correlation above and noise below masking it

  • Add some baseline noise (e.g. tau = 0.45) ensuring that even clusters with the same u_j will show some variability in their observed baseline_rate

  • alpha (correlation baseline_rate with u_j): e.g. a value of 0.3 means 30% of the variation in the baseline logit is driven by u_j (i.e. drives true cluster tendency or the “cluster-to-cluster variation” at baseline, which also has an impact on the outcome), while the remaining comes from independent measurement noise.

  • Produce an individual-level dataset, not cluster-level only - as the real-life dataset will look like and in case we also want to add individual-level correlations

(3.1) Simulate one dataset and check some diagnostics

Code
## We use the helper functions from chapter 2.2:

# icc_to_sigma
# generate_u
# generate_cluster_sizes
# p_to_beta0
# p0_p1_to_OR

## General parameters
set.seed(20250809)
n_clusters <- 84
m_mean <- 18
CV <- 0.98
p0 <- 0.63
p1 <- 0.81
OR <- p0_p1_to_OR(p0, p1)
rho <- 0.20 # ICC (on latent logit scale)
re_dist <- "gamma" # distribution for u_j, keep it conservative

# Individual-level covariates
age_mean <- 35
age_sd <- 12
sex_prob <- 0.40

## Generate cluster structure
sizes <- generate_cluster_sizes(n_clusters, m_mean, CV)
sigma_b <- icc_to_sigma(rho)
u_j <- generate_u(n_clusters, sigma_b, dist = re_dist)
arm_assign <- sample(rep(0:1, length.out = n_clusters))

# First important thing to mimic: IGRA testing rate at baseline
# alpha controls how much the baseline rate depends on the same latent cluster effect
# The bigger alpha, the more high-baseline clusters will also tend to have high endline outcomes indirectly, because u_j is reused in the outcome model => indirect correlation
# Baseline IGRA testing rate is explained by u_j + random noise eps + global average level gamma0.
gamma0 <- qlogis(p0) # the average cluster-level log-odds of baseline IGRA testing rate
alpha <- 0.3 # how much baseline (logit) depends on u_j (i.e. the latent cluster effect); 0 would be no correlation
tau <- 0.45 # the residual variation (SD) in baseline log-odds not explained by u_j, i.e. the random measurement noise
eps <- rnorm(n_clusters, 0, tau)
logit_b <- gamma0 + alpha * u_j + eps # putting it all together
baseline_rate <- plogis(logit_b) # map back to probability scale

# Second, the sites: uncorrelated 7-level
site <- sample(1:7, n_clusters, replace = TRUE,
               prob = c(0.05, 0.15, 0.20, 0.10, 0.15, 0.25, 0.10))

## Fixed effects on outcome, direct correlations on outcome
beta0 <- p_to_beta0(p0) # intercept
beta1 <- log(OR) # intervention effect
beta_baseline <- 0.5 # how strongly the baseline rate predicts the endline outcome, independent of u_j
beta_site <- 0.0 # no correlation
beta0_adj <- beta0 - 1.0 # after including u_j, baseline_rate, etc., the overall mean outcome probability can drift because of the nonlinear logistic function. Stabilize.

## Simulate individual-level data
ind_list <- vector("list", length = n_clusters)
for(j in seq_len(n_clusters)){
  nj <- sizes[j]
  age_j <- rnorm(nj, mean = age_mean, sd = age_sd) # draw from normal
  sex_j <- rbinom(nj, 1, prob = sex_prob) # draw from bernoulli
  
  logit_baseline_j <- qlogis(baseline_rate[j]) # back to logit
  # the log-odds of IGRA testing rate for all individuals in cluster j (same cluster-level predictors for all)
  linpred_j <- beta0 +
               beta1 * arm_assign[j] +
               beta_baseline * logit_baseline_j +
               beta_site * site[j] +
               u_j[j] # latent cluster random effect
  
  p_ij <- plogis(linpred_j) # Predicted probability of receiving an IGRA testing for each individual in cluster j. Since all individuals in a cluster share the same cluster-level covariates, p_ij is identical for everyone in the cluster (unless we later include individual-level predictors...)
  y_ij <- rbinom(nj, 1, p_ij) # the outcome; bernoulli with probability p_ij
  
  # save data for this one cluster
  ind_list[[j]] <- data.frame(
    cluster = j,
    arm = arm_assign[j],
    age = age_j,
    sex = sex_j,
    site = site[j],
    baseline_rate = baseline_rate[j],
    u_j = u_j[j],
    p = p_ij,
    y = y_ij
  )
}
df_ind <- do.call(rbind, ind_list)

## Cluster-level summary, aggregate at cluster-level
df_cluster <- aggregate(y ~ cluster + arm, data = df_ind, sum) # aggregate number of outcomes
df_cluster$size <- aggregate(y ~ cluster, data = df_ind, length)$y # count number of ind => cluster size
cluster_meta <- data.frame(
  cluster = seq_len(n_clusters),
  arm = arm_assign,
  site = site,
  baseline_rate = baseline_rate,
  u_j = u_j
)
df_sim <- merge(df_cluster, cluster_meta, by = c("cluster","arm"))
df_sim <- df_sim[order(df_sim$cluster),
                 c("cluster","arm","size","y","baseline_rate",
                   "site","u_j")]

## Diagnostics
cat("Mean baseline_rate =", round(mean(baseline_rate),3), "\n")
Mean baseline_rate = 0.615 
Code
cat("First few cluster-level rows:\n")
First few cluster-level rows:
Code
print(head(df_sim, 10))
   cluster arm size  y baseline_rate site         u_j
1        1   1    3 12     0.5241710    6 -0.54533049
12       2   0   15 12     0.6786808    7 -0.02958850
23       3   0   20 21     0.5343476    7  0.13334179
34       4   1    5  4     0.5292495    4  0.02987162
45       5   0   33  4     0.8141927    3 -0.50985282
56       6   0    5 11     0.7074863    7 -1.00681936
67       7   1    3 29     0.4490206    5 -0.34833021
78       8   0    5  2     0.6245495    2 -0.43983164
84       9   1   30 24     0.6804400    5  0.33511213
2       10   0   13 12     0.5644116    5  1.21584720
Code
cat("First few individual-level rows:\n")
First few individual-level rows:
Code
print(head(df_ind, 50))
   cluster arm       age sex site baseline_rate        u_j         p y
1        1   1 43.487902   1    6     0.5241710 -0.5453305 0.7217300 0
2        1   1 36.093478   0    6     0.5241710 -0.5453305 0.7217300 1
3        1   1 35.830547   0    6     0.5241710 -0.5453305 0.7217300 1
4        1   1 61.808917   0    6     0.5241710 -0.5453305 0.7217300 1
5        1   1 33.756109   0    6     0.5241710 -0.5453305 0.7217300 0
6        1   1 21.832337   0    6     0.5241710 -0.5453305 0.7217300 1
7        1   1 41.505355   1    6     0.5241710 -0.5453305 0.7217300 1
8        1   1 26.195464   0    6     0.5241710 -0.5453305 0.7217300 1
9        1   1 41.345142   1    6     0.5241710 -0.5453305 0.7217300 1
10       1   1 48.646471   0    6     0.5241710 -0.5453305 0.7217300 1
11       1   1 25.924216   1    6     0.5241710 -0.5453305 0.7217300 1
12       1   1 25.958232   0    6     0.5241710 -0.5453305 0.7217300 1
13       1   1 58.882986   1    6     0.5241710 -0.5453305 0.7217300 0
14       1   1 33.900753   0    6     0.5241710 -0.5453305 0.7217300 1
15       1   1 49.396689   1    6     0.5241710 -0.5453305 0.7217300 1
16       2   0 19.820009   0    7     0.6786808 -0.0295885 0.7060936 1
17       2   0 49.845996   0    7     0.6786808 -0.0295885 0.7060936 0
18       2   0 25.980348   0    7     0.6786808 -0.0295885 0.7060936 1
19       2   0 34.500022   0    7     0.6786808 -0.0295885 0.7060936 1
20       2   0 56.066420   1    7     0.6786808 -0.0295885 0.7060936 1
21       2   0  8.990755   0    7     0.6786808 -0.0295885 0.7060936 0
22       2   0 44.342623   1    7     0.6786808 -0.0295885 0.7060936 1
23       2   0 52.066237   0    7     0.6786808 -0.0295885 0.7060936 1
24       2   0 52.103608   0    7     0.6786808 -0.0295885 0.7060936 0
25       2   0 23.644091   1    7     0.6786808 -0.0295885 0.7060936 1
26       2   0 40.964825   1    7     0.6786808 -0.0295885 0.7060936 1
27       2   0 43.050042   0    7     0.6786808 -0.0295885 0.7060936 0
28       2   0 36.919039   0    7     0.6786808 -0.0295885 0.7060936 0
29       2   0 25.352254   0    7     0.6786808 -0.0295885 0.7060936 1
30       2   0 27.091611   1    7     0.6786808 -0.0295885 0.7060936 1
31       2   0 33.633571   1    7     0.6786808 -0.0295885 0.7060936 1
32       2   0 39.662819   0    7     0.6786808 -0.0295885 0.7060936 0
33       2   0 22.123820   0    7     0.6786808 -0.0295885 0.7060936 0
34       2   0 40.446962   0    7     0.6786808 -0.0295885 0.7060936 0
35       2   0 44.056765   1    7     0.6786808 -0.0295885 0.7060936 1
36       3   0 26.055303   0    7     0.5343476  0.1333418 0.6757618 0
37       3   0 34.919207   0    7     0.5343476  0.1333418 0.6757618 0
38       3   0 39.721004   0    7     0.5343476  0.1333418 0.6757618 1
39       3   0 25.793194   0    7     0.5343476  0.1333418 0.6757618 1
40       3   0 23.434322   0    7     0.5343476  0.1333418 0.6757618 0
41       3   0 22.620895   0    7     0.5343476  0.1333418 0.6757618 1
42       3   0 31.902922   0    7     0.5343476  0.1333418 0.6757618 1
43       3   0 37.658602   0    7     0.5343476  0.1333418 0.6757618 1
44       3   0 45.585714   0    7     0.5343476  0.1333418 0.6757618 1
45       3   0 18.930409   0    7     0.5343476  0.1333418 0.6757618 1
46       3   0 45.909040   1    7     0.5343476  0.1333418 0.6757618 1
47       3   0 42.280878   0    7     0.5343476  0.1333418 0.6757618 0
48       3   0 34.655976   0    7     0.5343476  0.1333418 0.6757618 0
49       3   0 38.711691   1    7     0.5343476  0.1333418 0.6757618 1
50       3   0 50.531720   0    7     0.5343476  0.1333418 0.6757618 0
Code
cat("\nOverall N =", sum(df_sim$size), "individuals across", n_clusters, "clusters\n")

Overall N = 1494 individuals across 84 clusters
Code
# Compute mean prescription rate per arm
arm_rates <- aggregate(y ~ arm, data = df_ind, mean)
arm_rates$y <- round(arm_rates$y, 3)
for(i in seq_len(nrow(arm_rates))){
  cat("Arm", arm_rates$arm[i], "observed IGRA testing rate:", arm_rates$y[i], "\n")
}
Arm 0 observed IGRA testing rate: 0.667 
Arm 1 observed IGRA testing rate: 0.797 
Code
invisible(list(individual = df_ind, cluster = df_sim)) # prevents automatic printing to console

(3.2) The primary analysis approach (GLMM, fully adjusted)

Code
## Precompute spline basis for age and convert to numeric
age_spline <- as.data.frame(ns(df_ind$age, knots = quantile(df_ind$age, probs=c(0.1,0.5,0.9))))
colnames(age_spline) <- paste0("age_spline", seq_len(ncol(age_spline)))
age_spline[] <- lapply(age_spline, as.numeric)
df_ind <- cbind(df_ind, age_spline)

## Ensure factor levels
df_ind$arm <- factor(df_ind$arm, levels = c(0,1)) # 0 = control, 1 = intervention
df_ind$sex <- factor(df_ind$sex, levels = c(0,1)) # 0 = male, 1 = female
df_ind$site <- factor(df_ind$site, levels = c(1,2,3,4,5,6,7))

## Fit GLMM using glmer (fully adjusted, age spline) 
spline_cols <- colnames(df_ind)[grepl("^age_spline", colnames(df_ind))]

form_primary <- as.formula(
  paste("y ~ arm + baseline_rate + site + sex +",
        paste(spline_cols, collapse = " + "),
        "+ (1 | cluster)")
)
model_primary <- lme4::glmer(
  formula = form_primary,
  family = binomial(link = "logit"),
  data = df_ind,
  control = lme4::glmerControl(optimizer = "bobyqa") 
)

### Now, let's make a few comparisons
## 1. Unadjusted OR
form_unadj <- as.formula(paste("y ~ arm + (1 | cluster)"))
model_unadj <- lme4::glmer(
  formula = form_unadj,
  family = binomial(link = "logit"),
  data = df_ind,
  control = lme4::glmerControl(optimizer = "bobyqa") 
)

coef_name_unadj <- grep("^arm", names(fixef(model_unadj)), value=TRUE)
coef_arm_unadj <- fixef(model_unadj)[coef_name_unadj]
se_arm_unadj <- summary(model_unadj)$coefficients[coef_name_unadj,"Std. Error"]
p_val_unadj <- summary(model_unadj)$coefficients[coef_name_unadj,"Pr(>|z|)"]
OR_unadj <- exp(coef_arm_unadj)
CI_unadj <- exp(confint(model_unadj, method="Wald")[coef_name_unadj, ])

## 2. Adjusted for stratification variables only
form_strata <- as.formula(paste("y ~ arm + baseline_rate + site + (1 | cluster)"))
model_strata <- lme4::glmer(
  formula = form_strata,
  family = binomial(link = "logit"),
  data = df_ind,
  control = lme4::glmerControl(optimizer = "bobyqa") 
)

coef_name_strata <- grep("^arm", names(fixef(model_strata)), value=TRUE)
coef_arm_strata <- fixef(model_strata)[coef_name_strata]
se_arm_strata <- summary(model_strata)$coefficients[coef_name_strata,"Std. Error"]
p_val_strata <- summary(model_strata)$coefficients[coef_name_strata,"Pr(>|z|)"]
OR_strata <- exp(coef_arm_strata)
CI_strata <- exp(confint(model_strata, method="Wald")[coef_name_strata, ])

## 3. Fully adjusted, age as spline (see main model above)
coef_name_full <- grep("^arm", names(fixef(model_primary)), value=TRUE)
coef_arm_full <- fixef(model_primary)[coef_name_full]
se_arm_full <- summary(model_primary)$coefficients[coef_name_full,"Std. Error"]
p_val_full <- summary(model_primary)$coefficients[coef_name_full,"Pr(>|z|)"]
OR_full <- exp(coef_arm_full)
CI_full <- exp(confint(model_primary, method="Wald")[coef_name_full, ])

## 4. And finally, calculate RR for the main model, using marginal standardization
RR_model <- tryCatch({
  avg_comparisons(model_primary, variables="arm", type="response", comparison="ratio", re.form = NA)
}, error=function(e) NULL)

if(!is.null(RR_model)){
  rr <- RR_model$estimate[1]
  rr_cl <- RR_model$conf.low[1]
  rr_ch <- RR_model$conf.high[1]
} else {
  rr <- rr_cl <- rr_ch <- NA_real_
}

## Combine it all into a table
results_table <- data.frame(
  Metric = c("Unadjusted", "Adjusted for strat only", "Fully adjusted; age spline"),
  OR = c(sprintf("%.3f", OR_unadj),
         sprintf("%.3f", OR_strata),
         sprintf("%.3f", OR_full)),
  CI_lower = c(sprintf("%.3f", CI_unadj[1]),
               sprintf("%.3f", CI_strata[1]),
               sprintf("%.3f", CI_full[1])),
  CI_upper = c(sprintf("%.3f", CI_unadj[2]),
               sprintf("%.3f", CI_strata[2]),
               sprintf("%.3f", CI_full[2])),
  wald_based_p_value = c(sprintf("%.3f", p_val_unadj), sprintf("%.3f", p_val_strata), sprintf("%.3f", p_val_full)),
  RR = c(NA, NA, sprintf("%.3f", rr)),
  RR_CI_lower = c(NA, NA, sprintf("%.3f", rr_cl)),
  RR_CI_upper = c(NA, NA, sprintf("%.3f", rr_ch))
)

results_table %>%
  kable("html", caption="Intervention effect: OR and RR with 95% CI (single simulation)") %>%
  kable_styling(bootstrap_options="striped", full_width=FALSE)
Intervention effect: OR and RR with 95% CI (single simulation)
Metric OR CI_lower CI_upper wald_based_p_value RR RR_CI_lower RR_CI_upper
Unadjusted 1.853 1.188 2.888 0.007 NA NA NA
Adjusted for strat only 2.266 1.615 3.179 0.000 NA NA NA
Fully adjusted; age spline 2.281 1.627 3.199 0.000 1.223 1.121 1.325

CAVE: This is 1 randomly simulated dataset.

Also: Add third individual-level covariate (region of origin)

Due to correlation structure the adjustment for baseline outcome rate (part of the stratification factors) increases power and precision. The further adjustment for individual-level covariates does not change much, since there is no simulated correlation at that level.

RR only constructed for primary model (fully adjusted model)

(3.3) Put all together and simulate the power

1000 simulations, based on dataset simulation (Chapter 3.1) and primary analysis model (Chapter 3.2)

Code
simulate_crt <- function(
  n_clusters = 84,
  m_mean = 18,
  CV = 0.98,
  p0 = 0.63,
  p1 = 0.81,
  rho = 0.20,
  re_dist = "gamma",
  alpha = 0.3, # weak-moderate correlation between u_j and baseline outcome rate
  tau = 0.45, # SD of baseline noise
  beta_baseline = 0.2, # weak-moderate pos correlation: baseline outcome rate -> outcome, independent of u_j
  age_mean = 35,
  age_sd = 12,
  sex_prob = 0.40
){

  # (1) Compute OR and intercept
  OR <- p0_p1_to_OR(p0, p1)
  beta0 <- p_to_beta0(p0)
  beta1 <- log(OR)
  beta0_adj <- beta0 - 1.0
  
  # (2) Generate clusters
  sizes <- generate_cluster_sizes(n_clusters, m_mean, CV)
  sigma_b <- icc_to_sigma(rho)
  u_j <- generate_u(n_clusters, sigma_b, dist = re_dist)
  arm_assign <- sample(rep(0:1, length.out = n_clusters))
  
  # (3) Baseline IGRA testing rate
  eps <- rnorm(n_clusters, 0, tau)
  logit_b <- qlogis(p0) + alpha * u_j + eps
  baseline_rate <- plogis(logit_b)
  
  # (4) site
  site <- sample(1:7, n_clusters, replace = TRUE,
               prob = c(0.05, 0.15, 0.20, 0.10, 0.15, 0.25, 0.10))
  
  # (5) Individual-level simulation
  ind_list <- vector("list", length = n_clusters)
  for(j in seq_len(n_clusters)){
    nj <- sizes[j]
    age_j <- rnorm(nj, mean = age_mean, sd = age_sd)
    sex_j <- rbinom(nj, 1, sex_prob)
    logit_baseline_j <- qlogis(baseline_rate[j])
    
    linpred_j <- beta0_adj +
                 beta1 * arm_assign[j] +
                 beta_baseline * logit_baseline_j +
                 beta_site * site[j] + 
                 u_j[j]
    
    p_ij <- plogis(linpred_j)
    y_ij <- rbinom(nj, 1, p_ij)
    
    ind_list[[j]] <- data.frame(
      cluster = j,
      arm = arm_assign[j],
      age = age_j,
      sex = sex_j,
      site = site[j],
      baseline_rate = baseline_rate[j],
      u_j = u_j[j],
      p = p_ij,
      y = y_ij
    )
  }
  
  df_ind <- do.call(rbind, ind_list)
  
  # (6) Cluster-level summary
  df_cluster <- aggregate(y ~ cluster + arm, data = df_ind, sum)
  df_cluster$size <- aggregate(y ~ cluster, data = df_ind, length)$y
  cluster_meta <- data.frame(
    cluster = seq_len(n_clusters),
    arm = arm_assign,
    site = site,
    baseline_rate = baseline_rate,
    u_j = u_j
  )
  df_sim <- merge(df_cluster, cluster_meta, by = c("cluster","arm"))
  df_sim <- df_sim[order(df_sim$cluster),
                   c("cluster","arm","size","y","baseline_rate",
                     "site","u_j")]
  
  return(list(
    individual = df_ind,
    cluster = df_sim
  ))
}

# Default simulation
# sim_data <- simulate_crt()
# df_ind <- sim_data$individual
# df_cluster <- sim_data$cluster

# Number of simulations
n_sims <- 100
set.seed(20250809)

# Storage
results <- data.frame(
  sim = 1:n_sims,
  unadj_signif = NA,
  adj_signif = NA,
  beta_unadj = NA,
  beta_adj = NA,
  results$OR_unadj <- NA,
  results$OR_unadj_lower <- NA,
  results$OR_unadj_upper <- NA,
  results$OR_adj <- NA,
  results$OR_adj_lower <- NA,
  results$OR_adj_upper <- NA
)

for(i in seq_len(n_sims)){
  
  # (1) Simulate trial
  sim_data <- simulate_crt()
  df_ind <- sim_data$individual
  
  # (2) Prepare age spline
  age_spline <- as.data.frame(ns(df_ind$age, 
                                 knots = quantile(df_ind$age, probs=c(0.1,0.5,0.9))))
  colnames(age_spline) <- paste0("age_spline", seq_len(ncol(age_spline)))
  df_ind <- cbind(df_ind, age_spline)
  
  # (3) Ensure factors
  df_ind$arm <- factor(df_ind$arm, levels = c(0,1))
  df_ind$sex <- factor(df_ind$sex, levels = c(0,1))
  df_ind$site <- factor(df_ind$site, levels = c(1,2,3,4,5,6,7))
  
  # (4) Unadjusted model
  form_unadj <- as.formula(paste("y ~ arm + (1 | cluster)"))
  model_unadj <- lme4::glmer(
  formula = form_unadj,
  family = binomial(link = "logit"),
  data = df_ind,
  control = lme4::glmerControl(optimizer = "bobyqa")
  )
  coef_name_unadj <- grep("^arm", names(fixef(model_unadj)), value=TRUE)
  beta1_unadj <- fixef(model_unadj)[coef_name_unadj]
  se1_unadj <- summary(model_unadj)$coefficients[coef_name_unadj,"Std. Error"]
  pval_unadj <- summary(model_unadj)$coefficients[coef_name_unadj,"Pr(>|z|)"]
  
  # (5) Fully adjusted model
  spline_cols <- colnames(df_ind)[grepl("^age_spline", colnames(df_ind))]
  form_adj <- as.formula(
  paste("y ~ arm + baseline_rate + site + sex +",
        paste(spline_cols, collapse = " + "),
        "+ (1 | cluster)")
  )
  model_adj <- lme4::glmer(
  formula = form_adj,
  family = binomial(link = "logit"),
  data = df_ind,
  control = lme4::glmerControl(optimizer = "bobyqa") 
  )
  coef_name_adj <- grep("^arm", names(fixef(model_adj)), value=TRUE)
  beta1_adj <- fixef(model_adj)[coef_name_adj]
  se1_adj <- summary(model_adj)$coefficients[coef_name_adj,"Std. Error"]
  pval_adj <- summary(model_adj)$coefficients[coef_name_adj,"Pr(>|z|)"]
  
  # (6) Save results including OR and CI
  CI_unadj <- exp(confint(model_unadj, method="Wald")[coef_name_unadj, ])
  results$OR_unadj[i] <- exp(beta1_unadj)
  results$OR_unadj_lower[i] <- CI_unadj[1]
  results$OR_unadj_upper[i] <- CI_unadj[2]
  
  CI_adj <- exp(confint(model_adj, method="Wald")[coef_name_adj, ])
  results$OR_adj[i] <- exp(beta1_adj)
  results$OR_adj_lower[i] <- CI_adj[1]
  results$OR_adj_upper[i] <- CI_adj[2]
  
  results$unadj_signif[i] <- (pval_unadj < 0.05)
  results$adj_signif[i]   <- (pval_adj < 0.05)
}

# (7) Compute estimated power
power_unadj <- mean(results$unadj_signif)
power_adj <- mean(results$adj_signif)

cat("Estimated power (unadjusted)  =", round(power_unadj,4), "\n")
Estimated power (unadjusted)  = 0.98 
Code
cat("Estimated power (fully adjusted) =", round(power_adj,4), "\n")
Estimated power (fully adjusted) = 0.99 
Code
# Summary of ORs
summary(results[,c("OR_unadj","OR_unadj_lower","OR_unadj_upper",
                   "OR_adj","OR_adj_lower","OR_adj_upper")])
    OR_unadj     OR_unadj_lower   OR_unadj_upper      OR_adj     
 Min.   :1.421   Min.   :0.8232   Min.   :2.360   Min.   :1.480  
 1st Qu.:2.075   1st Qu.:1.2656   1st Qu.:3.387   1st Qu.:2.061  
 Median :2.354   Median :1.4809   Median :3.733   Median :2.449  
 Mean   :2.460   Mean   :1.5329   Mean   :3.957   Mean   :2.557  
 3rd Qu.:2.820   3rd Qu.:1.7779   3rd Qu.:4.479   3rd Qu.:2.799  
 Max.   :4.223   Max.   :2.4439   Max.   :7.297   Max.   :5.040  
  OR_adj_lower     OR_adj_upper  
 Min.   :0.9115   Min.   :2.310  
 1st Qu.:1.3575   1st Qu.:3.129  
 Median :1.6095   Median :3.685  
 Mean   :1.6821   Mean   :3.896  
 3rd Qu.:1.8696   3rd Qu.:4.197  
 Max.   :3.1847   Max.   :7.976  

(4) Stratified randomization algorithm

(4.1) Covariate-constrained randomization

If we use batch-randomization (all clusters randomized at once and no new clusters entering later), the probably simple covariate-constrained randomization to be used: https://rethinkingclinicaltrials.org/chapters/design/experimental-designs-and-randomization-schemes/covariate-constrained-randomization/

  • Exact 1:1 overall allocation:

    • 42 Control / 42 Intervention
  • Soft site stratification:

    • Each site gets roughly half clusters in each arm

    • Small deviations allowed for odd-sized sites

  • Global numeric balance re baseline outcome rate:

    • Optimises baseline_date mean difference between arms
  • Random selection among best allocations:

    • Preserves randomness while enforcing optimal balance
Code
set.seed(20250820)

n_clusters <- 84
site <- factor(sample(
  1:7, n_clusters, replace = TRUE,
  prob = c(0.05, 0.15, 0.20, 0.10, 0.15, 0.25, 0.10)
))

cluster_data <- data.frame(
  cluster_id = 1:n_clusters,
  baseline_rate = runif(n_clusters, 0.35, 0.65),
  site = site
)

treatments <- c("Control", "Intervention")

### CCR function for global randomisation
run_global_ccr <- function(df, n_sims = 10000, top_pct = 0.10) {
  n <- nrow(df)
  n_ctrl <- n_int <- n / 2  # exact 1:1 allocation
  
  scores <- numeric(n_sims)
  allocs <- matrix(NA, nrow = n_sims, ncol = n)
  
  for (i in 1:n_sims) {
    # Random 1:1 allocation
    arm_assign <- sample(c(rep("Control", n_ctrl),
                           rep("Intervention", n_int)))
    allocs[i, ] <- arm_assign
    temp <- df
    temp$arm <- arm_assign
    
    # Numeric covariate imbalance
    means_diff <- abs(tapply(temp$baseline_rate, temp$arm, mean)["Control"] -
                      tapply(temp$baseline_rate, temp$arm, mean)["Intervention"])
    
    # Soft site stratification: imbalance = sum of squared differences between observed vs ideal allocation per site
    site_table <- table(temp$site, temp$arm)
    ideal_site <- table(temp$site) / 2  # target per arm per site (soft)
    site_diff <- sum((site_table[, "Control"] - ideal_site)^2 +
                     (site_table[, "Intervention"] - ideal_site)^2)
    
    # Total score: numeric imbalance + small weight for site imbalance
    scores[i] <- means_diff + 0.01 * site_diff
  }
  
  # Select best allocations
  threshold <- quantile(scores, top_pct)
  best_idx <- which(scores <= threshold)
  chosen <- sample(best_idx, 1)
  
  df$final_arm <- allocs[chosen, ]
  return(df)
}

### Run it
final_result <- run_global_ccr(cluster_data)
print(final_result)
   cluster_id baseline_rate site    final_arm
1           1     0.5055127    4 Intervention
2           2     0.4308069    6      Control
3           3     0.6309318    1 Intervention
4           4     0.5829530    7 Intervention
5           5     0.5397852    6 Intervention
6           6     0.4321546    3 Intervention
7           7     0.5446242    2 Intervention
8           8     0.3965107    6      Control
9           9     0.4836322    6      Control
10         10     0.6162682    1      Control
11         11     0.6090239    2      Control
12         12     0.5005644    6      Control
13         13     0.6034779    6 Intervention
14         14     0.5563255    5      Control
15         15     0.4810453    6      Control
16         16     0.6341021    6      Control
17         17     0.5228288    4      Control
18         18     0.4573054    5      Control
19         19     0.4577337    6 Intervention
20         20     0.4572299    3      Control
21         21     0.6190049    6      Control
22         22     0.6238735    5      Control
23         23     0.3792085    1      Control
24         24     0.3979843    3 Intervention
25         25     0.5827194    6 Intervention
26         26     0.3678664    1 Intervention
27         27     0.4648712    7 Intervention
28         28     0.5082518    3 Intervention
29         29     0.5782608    4      Control
30         30     0.5159968    6      Control
31         31     0.5251654    3 Intervention
32         32     0.3843489    4 Intervention
33         33     0.6089731    7      Control
34         34     0.5031032    6      Control
35         35     0.4041871    6 Intervention
36         36     0.6092656    5 Intervention
37         37     0.4701395    7      Control
38         38     0.4326763    5 Intervention
39         39     0.5573211    2      Control
40         40     0.3511044    3      Control
41         41     0.4021090    1      Control
42         42     0.4842421    5      Control
43         43     0.3615362    6      Control
44         44     0.5424458    3      Control
45         45     0.5349092    6      Control
46         46     0.4250874    6 Intervention
47         47     0.6187297    4 Intervention
48         48     0.5905047    2 Intervention
49         49     0.4687591    2      Control
50         50     0.4065764    3      Control
51         51     0.6257611    3 Intervention
52         52     0.4012752    4      Control
53         53     0.6443693    1      Control
54         54     0.5372378    6 Intervention
55         55     0.5869562    4 Intervention
56         56     0.4314816    6 Intervention
57         57     0.5850968    6 Intervention
58         58     0.4714507    2      Control
59         59     0.4283994    5      Control
60         60     0.4214659    4 Intervention
61         61     0.4622420    3 Intervention
62         62     0.4819583    1 Intervention
63         63     0.4594293    3      Control
64         64     0.4733779    6 Intervention
65         65     0.6452266    3 Intervention
66         66     0.4311654    3      Control
67         67     0.5038346    4      Control
68         68     0.4152730    2 Intervention
69         69     0.5302194    1 Intervention
70         70     0.3952863    6 Intervention
71         71     0.6158604    3      Control
72         72     0.5933622    4      Control
73         73     0.5612075    5      Control
74         74     0.5715312    5 Intervention
75         75     0.4923413    3      Control
76         76     0.4417027    2 Intervention
77         77     0.4798992    7 Intervention
78         78     0.6474203    4 Intervention
79         79     0.4415370    7 Intervention
80         80     0.4459414    1      Control
81         81     0.5831044    4 Intervention
82         82     0.4066814    5 Intervention
83         83     0.4704573    2 Intervention
84         84     0.5853412    6      Control
Code
### Checks
cat("\nOverall treatment counts:\n")

Overall treatment counts:
Code
print(table(final_result$final_arm))

     Control Intervention 
          42           42 
Code
cat("\nBalance within each site (aim: approximate 1:1):\n")

Balance within each site (aim: approximate 1:1):
Code
print(table(final_result$site, final_result$final_arm))
   
    Control Intervention
  1       5            4
  2       4            5
  3       8            7
  4       5            7
  5       6            4
  6      12           11
  7       2            4
Code
cat("\nBalance by mean baseline_rate by arm (aim: approximate 1:1):\n")

Balance by mean baseline_rate by arm (aim: approximate 1:1):
Code
print(tapply(final_result$baseline_rate, final_result$final_arm, mean))
     Control Intervention 
   0.5044568    0.5057797