NUDGE cluster randomized trial

Author

B. Surial, A. Boyd, A.Amstutz

NUDGE SHCS precision CVD/statin CRT

Nudging intervention on the level of SHCS physicians at SHCS sites to promote statin prescription:

Control: Usual Care
Intervention: Nudge shared-decision making based on improved CVD risk score

Parameters and design considerations

Two-arm parallel-group, 1:1 allocation, superiority, cluster-randomized
Cluster eligibility:
- SHCS physicians who use Django
- Saw at least 5 elig pat in the previous year
- Are not planning to move out of the site within coming 6m
Individual eligibility:
- Age ≥40 years and ≤75 years
- Receiving antiretroviral therapy
- Currently not receiving a statin
- No documented intolerance, allergy, or adverse reaction towards statins
- Not pregnant or breastfeeding
- For primary analysis population (powered): No prior CVD or diabetes
Primary outcome (binary): “being prescribed a statin within 6m after first patient-encounter”
Unit of data collection with be the SHCS cohort participants, but level of inference will be the physicians, i.e. cluster-average
Baseline primary outcome rate, see calculation below:
- ca. 22%
Expected delta:
- 15 pp (based on JAMA + a bit more to be expected due to difference in providers & nudge
Cluster size (m) of eligible overall participants, see calculation below:
- on average 33 eligible participants per physician-cluster
CV (coefficient of variation), see calculation below:
- 0.95
ICC for the primary outcome, see calculation below:
- 0.07 (take conservative 0.10 for now)
Max. ca. 170 eligible clusters
Min. desired power 80%, two-sided alpha of 0.05

Packages & seed

Code

req_pkgs <- c("pwr",
              "dplyr",
              "tidyr",
              "purrr",
              "ggplot2",
              "here",
              "lme4",
              "performance",
              "gt"
)
install_if_missing <- function(pkgs){
  for(p in pkgs){
    if(!requireNamespace(p, quietly=TRUE)){
      install.packages(p, repos="https://cloud.r-project.org")
    }
    library(p, character.only=TRUE)
  }
}
install_if_missing(req_pkgs)

# set global RNG seed for reproducibility
set.seed(20250809)

Prep Data SHCS

Code

df_ev <- readRDS(here("01-eligible_visits25.rds"))
df_vpp <- readRDS(here("01-visits_per_physician25.rds"))

# Remove any existing grouping
df_ev <- df_ev |> 
  ungroup()
df_vpp <- df_vpp |> 
  ungroup()

Calculation CV

Code

# # Keep only physicians who have at least seen 5 pat in past year
df_vpp <- df_vpp |>
  filter(n_without_statin >= 5)

# Calculate CV for cluster sizes (=physicians)
# CV = sd(n_without_statin) / mean(n_without_statin)
cv_tbl <- df_vpp |>
  summarise(
    n_clusters = n(),
    mean_cluster_size = mean(n_without_statin),
    sd_cluster_size = sd(n_without_statin),
    cv = sd_cluster_size / mean_cluster_size
  )

cv_tbl |>
  gt() |>
  cols_label(
    n_clusters = "Number of Clusters",
    mean_cluster_size = "Mean Cluster Size",
    sd_cluster_size = "SD Cluster Size",
    cv = "CV"
  ) |>
  fmt_number(
    columns = c(mean_cluster_size, sd_cluster_size, cv),
    decimals = 2
  # ) |>
  # tab_header(
  #   title = "Coefficient of Variation"
  ) |>
  tab_options(
    table.font.size = 14
  )

Number of Clusters	Mean Cluster Size	SD Cluster Size	CV
171	33.88	32.18	0.95

Calculation baseline rate and ICC

Code

# Calculate number of eligible patients who received statins
df_vpp <- df_vpp |>
  mutate(n_with_statin = n_overall - n_without_statin)

# Reconstruct individual patient data (only eligible patients)
individual_data <- df_vpp |>
  rowwise() |>
  mutate(
    patient_data = list(
      tibble(
        statin_prescribed = c(
          rep(1, n_with_statin),
          rep(0, n_without_statin)
        )
      )
    )
  ) |>
  ungroup() |>
  unnest(patient_data) |>
  select(center, physician, statin_prescribed)

# Verify
ind_data <- individual_data |>
  group_by(physician) |>
  summarise(
    n_total = n(),
    n_received_statin = sum(statin_prescribed),
    rate = mean(statin_prescribed)
  )

# baseline rate
baseline_rate <- individual_data |>
  summarise(
    baseline_rate = mean(statin_prescribed)
  )

baseline_rate

# A tibble: 1 × 1
  baseline_rate
          <dbl>
1         0.221

Code

# Calculate ICC
model <- glmer(statin_prescribed ~ 1 + (1 | physician), 
               data = individual_data, 
               family = binomial)

icc(model)

# Intraclass Correlation Coefficient

    Adjusted ICC: 0.068
  Unadjusted ICC: 0.068

Corresponding individual randomized trial

Sample size for the individual randomized trial on the same question

Code

# Parameters
p_C <- 0.22
p_I <- 0.37
power <- 0.80 
alpha <- 0.05

# Effect size, standardized as Cohen's h
h_I_C <- ES.h(p1 = p_I, p2 = p_C)
cat("Cohen's h for Intervention vs Control:", round(h_I_C, 3), "\n")

Cohen's h for Intervention vs Control: 0.331

Code

# Sample size pair-wise comparison
ss_I_C <- pwr.2p.test(h = h_I_C, sig.level = alpha, power = power)
n_per_arm <- ceiling(ss_I_C$n)
n_total <- n_per_arm * 2

cat("Sample size per arm:", n_per_arm, "\n")

Sample size per arm: 143

Code

cat("Total trial sample size (2-arm trial):", n_total)

Total trial sample size (2-arm trial): 286

(1) Sample size calculation CRT: formula-based

Add the design effect (DEFF) to the individual RCT sample size. The usual standard DEFF formula:

DEFF = 1+(m−1)ICC , whereby m = cluster size

However, let’s not forget the cluster size variation. The usual conservative adjustment of the DEFF with cluster size variation is (e.g. see here: https://pmc.ncbi.nlm.nih.gov/articles/PMC7394950/#sup1):

DEFF_cv = 1+((m(1+CV^2)−1))ICC , whereby CV is the coefficient of variation (ratio of standard deviation of cluster sizes to mean of cluster sizes)

Code

# Parameters
p_C <- 0.22
p_I <- 0.37  
power <- 0.80 
ICC <- 0.10
alpha <- 0.05

m <- 33 # average cluster size
CV <- 0.95 # CV

deff <- 1+(m-1)*ICC # standard DEFF
deff_cv <- 1+((m*(1+CV^2))-1)*ICC # DEFF with cluster size variation

# Effect size
h_I_C <- ES.h(p1 = p_I, p2 = p_C)

# sample size for corresponding individual RCT 
ss <- pwr.2p.test(h = h_I_C, power = power, sig.level = alpha)$n

# CRT sample size
ss_crt <- ceiling(ss * deff_cv)
n_clusters <- ceiling(ss_crt / m)
cat("Cluster sample size one arm:", n_clusters, "\n")

Cluster sample size one arm: 32

Code

# cat("Individual sample size one arm:", ss_crt, "\n")

# Total
tot_clusters <- n_clusters * 2
tot_ind <- ss_crt * 2
cat("Total cluster sample size:", tot_clusters, "\n")

Total cluster sample size: 64

Code

cat("Total individual sample size:", tot_ind, "\n")

Total individual sample size: 2054

(1.1) Varying assumptions - Standard sample size calculation

(1.1.1) Varying Effect size and varying ICC

3-D plot, varying the effect size (10 pp to 20 pp) & varying ICC (0.05 to 0.2)

Keep the baseline prescription rate at 22% (control rate)

Keep m (cluster size) at 33

Keep the CV at 0.95

Code

# Define parameters
power <- 0.80
alpha <- 0.05
p_C <- 0.22
CV <- 0.95
m <- 33

# Ranges
ICC_values <- seq(0.05, 0.20, by = 0.01)
effect_sizes_pp <- seq(10, 20, by = 1)

# Create grid
results_3d <- expand.grid(
  ICC = ICC_values,
  effect_size_pp = effect_sizes_pp
)

results_3d$n_clusters_per_arm <- NA

cohen_h <- function(p1, p2) {
  2 * (asin(sqrt(p1)) - asin(sqrt(p2)))
}

for (i in 1:nrow(results_3d)) {
  icc <- results_3d$ICC[i]
  delta_pp <- results_3d$effect_size_pp[i]
  
  p_I <- p_C + (delta_pp / 100)
  
  if (p_I < 0) {
    next
  }
  
  deff_cv <- 1 + ((m * (1 + CV^2)) - 1) * icc
  
  h <- cohen_h(p_I, p_C)
  
  ss <- pwr.2p.test(h = h, power = power, sig.level = alpha)$n
  
  n_per_arm_crt <- ceiling(ss * deff_cv)
  
  n_clusters_per_arm <- ceiling(n_per_arm_crt / m)
  
  results_3d$n_clusters_per_arm[i] <- n_clusters_per_arm
}

Code

ggplot(results_3d, aes(x = effect_size_pp, y = ICC, fill = n_clusters_per_arm)) +
  geom_tile() +
  geom_text(aes(label = n_clusters_per_arm), size = 2.5, color = "black") +
  scale_fill_gradient2(
    low = "darkgreen", 
    mid = "yellow", 
    high = "darkred",
    midpoint = median(results_3d$n_clusters_per_arm, na.rm = TRUE),
    name = "Clusters\nper arm"
  ) +
  labs(
    title = "Clusters per arm: ICC vs Effect size",
    x = "Effect size (percentage point increase)",
    y = "ICC"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    plot.title = element_text(hjust = 0.5, face = "bold")
  ) +
  scale_x_continuous(breaks = seq(10, 20, by = 1)) +
  scale_y_continuous(breaks = seq(0.05, 0.30, by = 0.01))

(1.1.2) Varying CV and varying ICC

3-D plot, varying the CV (0.5 to 1.2) & varying ICC (0.05 to 0.2)

Keep the baseline prescription rate at 22% (control rate) and delta at 15 pp

Keep m (cluster size) at 33

Code

# Define parameters
power <- 0.80
alpha <- 0.05
p_C <- 0.22
delta_pp <- 15
m <- 33

# Ranges
ICC_values <- seq(0.05, 0.20, by = 0.01)
CV_values <- seq(0.5, 1.2, by = 0.05)

# Create grid
results_3d <- expand.grid(
  ICC = ICC_values,
  CV = CV_values
)

results_3d$n_clusters_per_arm <- NA

cohen_h <- function(p1, p2) {
  2 * (asin(sqrt(p1)) - asin(sqrt(p2)))
}

for (i in 1:nrow(results_3d)) {
  icc <- results_3d$ICC[i]
  cv <- results_3d$CV[i]
  
  p_I <- p_C + (delta_pp / 100)
  
  deff_cv <- 1 + ((m * (1 + cv^2)) - 1) * icc
  
  h <- cohen_h(p_I, p_C)
  
  ss <- pwr.2p.test(h = h, power = power, sig.level = alpha)$n
  
  n_per_arm_crt <- ceiling(ss * deff_cv)
  n_clusters_per_arm <- ceiling(n_per_arm_crt / m)
  
  results_3d$n_clusters_per_arm[i] <- n_clusters_per_arm
}

Code

ggplot(results_3d, aes(x = CV, y = ICC, fill = n_clusters_per_arm)) +
  geom_tile() +
  geom_text(aes(label = n_clusters_per_arm), size = 2.5, color = "black") +
  scale_fill_gradient2(
    low = "darkgreen",
    mid = "yellow",
    high = "darkred",
    midpoint = median(results_3d$n_clusters_per_arm, na.rm = TRUE),
    name = "Clusters\nper arm"
  ) +
  labs(
    title = "Clusters per arm: ICC vs CV (Effect size = 15pp)",
    x = "Coefficient of Variation (CV)",
    y = "ICC"
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    plot.title = element_text(hjust = 0.5, face = "bold")
  ) +
  scale_x_continuous(breaks = seq(0.5, 1.2, by = 0.05)) +
  scale_y_continuous(breaks = seq(0.05, 0.20, by = 0.01))

(2) Sample size calculation CRT: Simulation-based

(3) Simulate the full dataset and implement the main analysis strategy

(4) Stratified randomization algorithm

(4.1) Covariate-constrained randomization

If we use batch-randomization (all clusters randomized at once and no new clusters entering later), the probably simple covariate-constrained randomization to be used: https://rethinkingclinicaltrials.org/chapters/design/experimental-designs-and-randomization-schemes/covariate-constrained-randomization/

Exact 1:1 overall allocation:
- XX Control / XX Intervention
Soft site stratification:
- Each site gets roughly half clusters in each arm
- Small deviations allowed for odd-sized sites
Global numeric balance re baseline outcome rate:
- Optimises baseline_rate mean difference between arms
Random selection among best allocations:
- Preserves randomness while enforcing optimal balance

Code

set.seed(20250820)

n_clusters <- 84
site <- factor(sample(
  1:7, n_clusters, replace = TRUE,
  prob = c(0.05, 0.15, 0.20, 0.10, 0.15, 0.25, 0.10)
))

cluster_data <- data.frame(
  cluster_id = 1:n_clusters,
  baseline_rate = runif(n_clusters, 0.10, 0.30),
  site = site
)

treatments <- c("Control", "Intervention")

### CCR function for global randomisation
run_global_ccr <- function(df, n_sims = 10000, top_pct = 0.10) {
  n <- nrow(df)
  n_ctrl <- n_int <- n / 2  # exact 1:1 allocation
  
  scores <- numeric(n_sims)
  allocs <- matrix(NA, nrow = n_sims, ncol = n)
  
  for (i in 1:n_sims) {
    # Random 1:1 allocation
    arm_assign <- sample(c(rep("Control", n_ctrl),
                           rep("Intervention", n_int)))
    allocs[i, ] <- arm_assign
    temp <- df
    temp$arm <- arm_assign
    
    # Numeric covariate imbalance
    means_diff <- abs(tapply(temp$baseline_rate, temp$arm, mean)["Control"] -
                      tapply(temp$baseline_rate, temp$arm, mean)["Intervention"])
    
    # Soft site stratification: imbalance = sum of squared differences between observed vs ideal allocation per site
    site_table <- table(temp$site, temp$arm)
    ideal_site <- table(temp$site) / 2  # target per arm per site (soft)
    site_diff <- sum((site_table[, "Control"] - ideal_site)^2 +
                     (site_table[, "Intervention"] - ideal_site)^2)
    
    # Total score: numeric imbalance + small weight for site imbalance
    scores[i] <- means_diff + 0.01 * site_diff
  }
  
  # Select best allocations
  threshold <- quantile(scores, top_pct)
  best_idx <- which(scores <= threshold)
  chosen <- sample(best_idx, 1)
  
  df$final_arm <- allocs[chosen, ]
  return(df)
}

### Run it
final_result <- run_global_ccr(cluster_data)
print(final_result)

   cluster_id baseline_rate site    final_arm
1           1     0.2036752    4 Intervention
2           2     0.1538713    6 Intervention
3           3     0.2872879    1 Intervention
4           4     0.2553020    7      Control
5           5     0.2265234    6 Intervention
6           6     0.1547697    3 Intervention
7           7     0.2297495    2      Control
8           8     0.1310071    6      Control
9           9     0.1890881    6      Control
10         10     0.2775121    1      Control
11         11     0.2726826    2      Control
12         12     0.2003763    6 Intervention
13         13     0.2689853    6 Intervention
14         14     0.2375503    5 Intervention
15         15     0.1873635    6      Control
16         16     0.2894014    6      Control
17         17     0.2152192    4 Intervention
18         18     0.1715369    5 Intervention
19         19     0.1718225    6 Intervention
20         20     0.1714866    3 Intervention
21         21     0.2793366    6      Control
22         22     0.2825824    5      Control
23         23     0.1194723    1 Intervention
24         24     0.1319895    3 Intervention
25         25     0.2551463    6      Control
26         26     0.1119110    1 Intervention
27         27     0.1765808    7 Intervention
28         28     0.2055012    3 Intervention
29         29     0.2521738    4      Control
30         30     0.2106645    6 Intervention
31         31     0.2167770    3      Control
32         32     0.1228993    4      Control
33         33     0.2726487    7      Control
34         34     0.2020688    6 Intervention
35         35     0.1361247    6 Intervention
36         36     0.2728437    5 Intervention
37         37     0.1800930    7      Control
38         38     0.1551175    5      Control
39         39     0.2382141    2 Intervention
40         40     0.1007362    3 Intervention
41         41     0.1347393    1      Control
42         42     0.1894948    5 Intervention
43         43     0.1076908    6      Control
44         44     0.2282972    3 Intervention
45         45     0.2232728    6 Intervention
46         46     0.1500583    6 Intervention
47         47     0.2791531    4      Control
48         48     0.2603365    2 Intervention
49         49     0.1791727    2      Control
50         50     0.1377176    3      Control
51         51     0.2838408    3      Control
52         52     0.1341835    4 Intervention
53         53     0.2962462    1 Intervention
54         54     0.2248252    6      Control
55         55     0.2579708    4      Control
56         56     0.1543211    6 Intervention
57         57     0.2567312    6      Control
58         58     0.1809671    2 Intervention
59         59     0.1522663    5      Control
60         60     0.1476439    4      Control
61         61     0.1748280    3      Control
62         62     0.1879722    1      Control
63         63     0.1729529    3 Intervention
64         64     0.1822520    6      Control
65         65     0.2968177    3 Intervention
66         66     0.1541102    3      Control
67         67     0.2025564    4 Intervention
68         68     0.1435154    2 Intervention
69         69     0.2201463    1      Control
70         70     0.1301909    6 Intervention
71         71     0.2772403    3      Control
72         72     0.2622415    4 Intervention
73         73     0.2408050    5 Intervention
74         74     0.2476875    5      Control
75         75     0.1948942    3      Control
76         76     0.1611351    2      Control
77         77     0.1865994    7      Control
78         78     0.2982802    4      Control
79         79     0.1610247    7 Intervention
80         80     0.1639609    1      Control
81         81     0.2554029    4 Intervention
82         82     0.1377876    5      Control
83         83     0.1803049    2      Control
84         84     0.2568941    6 Intervention

Code

### Checks
cat("\nOverall treatment counts:\n")


Overall treatment counts:

Code

print(table(final_result$final_arm))


     Control Intervention 
          42           42

Code

cat("\nBalance within each site (aim: approximate 1:1):\n")


Balance within each site (aim: approximate 1:1):

Code

print(table(final_result$site, final_result$final_arm))

   
    Control Intervention
  1       5            4
  2       5            4
  3       7            8
  4       6            6
  5       5            5
  6      10           13
  7       4            2

Code

cat("\nBalance by mean baseline_rate by arm (aim: approximate 1:1):\n")


Balance by mean baseline_rate by arm (aim: approximate 1:1):

Code

print(tapply(final_result$baseline_rate, final_result$final_arm, mean))

     Control Intervention 
   0.2089960    0.1978283