NUDGE SHCS precision CVD/statin CRT
Nudging intervention on the level of SHCS physicians at SHCS sites to promote statin prescription:
Parameters and design considerations
Two-arm parallel-group, 1:1 allocation, superiority, cluster-randomized
Cluster eligibility:
SHCS physicians who use Django
Saw at least 5 elig pat in the previous year
Are not planning to move out of the site within coming 6m
Individual eligibility:
Age ≥40 years and ≤75 years
Receiving antiretroviral therapy
Currently not receiving a statin
No documented intolerance, allergy, or adverse reaction towards statins
Not pregnant or breastfeeding
For primary analysis population (powered): No prior CVD or diabetes
Primary outcome (binary): “being prescribed a statin within 6m after first patient-encounter”
Unit of data collection with be the SHCS cohort participants, but level of inference will be the physicians, i.e. cluster-average
Baseline primary outcome rate, see calculation below:
Expected delta:
15 pp (based on JAMA + a bit more to be expected due to difference in providers & nudge
Cluster size (m) of eligible overall participants, see calculation below:
on average 33 eligible participants per physician-cluster
CV (coefficient of variation), see calculation below:
ICC for the primary outcome, see calculation below:
0.07 (take conservative 0.10 for now)
Max. ca. 170 eligible clusters
Min. desired power 80%, two-sided alpha of 0.05
Packages & seed
Code
req_pkgs <- c ("pwr" ,
"dplyr" ,
"tidyr" ,
"purrr" ,
"ggplot2" ,
"here" ,
"lme4" ,
"performance" ,
"gt"
)
install_if_missing <- function (pkgs){
for (p in pkgs){
if (! requireNamespace (p, quietly= TRUE )){
install.packages (p, repos= "https://cloud.r-project.org" )
}
library (p, character.only= TRUE )
}
}
install_if_missing (req_pkgs)
# set global RNG seed for reproducibility
set.seed (20250809 )
Prep Data SHCS
Code
df_ev <- readRDS (here ("01-eligible_visits25.rds" ))
df_vpp <- readRDS (here ("01-visits_per_physician25.rds" ))
# Remove any existing grouping
df_ev <- df_ev |>
ungroup ()
df_vpp <- df_vpp |>
ungroup ()
Calculation CV
Code
# # Keep only physicians who have at least seen 5 pat in past year
df_vpp <- df_vpp |>
filter (n_without_statin >= 5 )
# Calculate CV for cluster sizes (=physicians)
# CV = sd(n_without_statin) / mean(n_without_statin)
cv_tbl <- df_vpp |>
summarise (
n_clusters = n (),
mean_cluster_size = mean (n_without_statin),
sd_cluster_size = sd (n_without_statin),
cv = sd_cluster_size / mean_cluster_size
)
cv_tbl |>
gt () |>
cols_label (
n_clusters = "Number of Clusters" ,
mean_cluster_size = "Mean Cluster Size" ,
sd_cluster_size = "SD Cluster Size" ,
cv = "CV"
) |>
fmt_number (
columns = c (mean_cluster_size, sd_cluster_size, cv),
decimals = 2
# ) |>
# tab_header(
# title = "Coefficient of Variation"
) |>
tab_options (
table.font.size = 14
)
Calculation baseline rate and ICC
Code
# Calculate number of eligible patients who received statins
df_vpp <- df_vpp |>
mutate (n_with_statin = n_overall - n_without_statin)
# Reconstruct individual patient data (only eligible patients)
individual_data <- df_vpp |>
rowwise () |>
mutate (
patient_data = list (
tibble (
statin_prescribed = c (
rep (1 , n_with_statin),
rep (0 , n_without_statin)
)
)
)
) |>
ungroup () |>
unnest (patient_data) |>
select (center, physician, statin_prescribed)
# Verify
ind_data <- individual_data |>
group_by (physician) |>
summarise (
n_total = n (),
n_received_statin = sum (statin_prescribed),
rate = mean (statin_prescribed)
)
# baseline rate
baseline_rate <- individual_data |>
summarise (
baseline_rate = mean (statin_prescribed)
)
baseline_rate
# A tibble: 1 × 1
baseline_rate
<dbl>
1 0.221
Code
# Calculate ICC
model <- glmer (statin_prescribed ~ 1 + (1 | physician),
data = individual_data,
family = binomial)
icc (model)
# Intraclass Correlation Coefficient
Adjusted ICC: 0.068
Unadjusted ICC: 0.068
Corresponding individual randomized trial
Sample size for the individual randomized trial on the same question
Code
# Parameters
p_C <- 0.22
p_I <- 0.37
power <- 0.80
alpha <- 0.05
# Effect size, standardized as Cohen's h
h_I_C <- ES.h (p1 = p_I, p2 = p_C)
cat ("Cohen's h for Intervention vs Control:" , round (h_I_C, 3 ), " \n " )
Cohen's h for Intervention vs Control: 0.331
Code
# Sample size pair-wise comparison
ss_I_C <- pwr.2p.test (h = h_I_C, sig.level = alpha, power = power)
n_per_arm <- ceiling (ss_I_C$ n)
n_total <- n_per_arm * 2
cat ("Sample size per arm:" , n_per_arm, " \n " )
Code
cat ("Total trial sample size (2-arm trial):" , n_total)
Total trial sample size (2-arm trial): 286
(2) Sample size calculation CRT: Simulation-based
(3) Simulate the full dataset and implement the main analysis strategy
(4) Stratified randomization algorithm
(4.1) Covariate-constrained randomization
If we use batch-randomization (all clusters randomized at once and no new clusters entering later), the probably simple covariate-constrained randomization to be used: https://rethinkingclinicaltrials.org/chapters/design/experimental-designs-and-randomization-schemes/covariate-constrained-randomization/
Exact 1:1 overall allocation:
XX Control / XX Intervention
Soft site stratification:
Global numeric balance re baseline outcome rate:
Optimises baseline_rate mean difference between arms
Random selection among best allocations:
Preserves randomness while enforcing optimal balance
Code
set.seed (20250820 )
n_clusters <- 84
site <- factor (sample (
1 : 7 , n_clusters, replace = TRUE ,
prob = c (0.05 , 0.15 , 0.20 , 0.10 , 0.15 , 0.25 , 0.10 )
))
cluster_data <- data.frame (
cluster_id = 1 : n_clusters,
baseline_rate = runif (n_clusters, 0.10 , 0.30 ),
site = site
)
treatments <- c ("Control" , "Intervention" )
### CCR function for global randomisation
run_global_ccr <- function (df, n_sims = 10000 , top_pct = 0.10 ) {
n <- nrow (df)
n_ctrl <- n_int <- n / 2 # exact 1:1 allocation
scores <- numeric (n_sims)
allocs <- matrix (NA , nrow = n_sims, ncol = n)
for (i in 1 : n_sims) {
# Random 1:1 allocation
arm_assign <- sample (c (rep ("Control" , n_ctrl),
rep ("Intervention" , n_int)))
allocs[i, ] <- arm_assign
temp <- df
temp$ arm <- arm_assign
# Numeric covariate imbalance
means_diff <- abs (tapply (temp$ baseline_rate, temp$ arm, mean)["Control" ] -
tapply (temp$ baseline_rate, temp$ arm, mean)["Intervention" ])
# Soft site stratification: imbalance = sum of squared differences between observed vs ideal allocation per site
site_table <- table (temp$ site, temp$ arm)
ideal_site <- table (temp$ site) / 2 # target per arm per site (soft)
site_diff <- sum ((site_table[, "Control" ] - ideal_site)^ 2 +
(site_table[, "Intervention" ] - ideal_site)^ 2 )
# Total score: numeric imbalance + small weight for site imbalance
scores[i] <- means_diff + 0.01 * site_diff
}
# Select best allocations
threshold <- quantile (scores, top_pct)
best_idx <- which (scores <= threshold)
chosen <- sample (best_idx, 1 )
df$ final_arm <- allocs[chosen, ]
return (df)
}
### Run it
final_result <- run_global_ccr (cluster_data)
print (final_result)
cluster_id baseline_rate site final_arm
1 1 0.2036752 4 Intervention
2 2 0.1538713 6 Intervention
3 3 0.2872879 1 Intervention
4 4 0.2553020 7 Control
5 5 0.2265234 6 Intervention
6 6 0.1547697 3 Intervention
7 7 0.2297495 2 Control
8 8 0.1310071 6 Control
9 9 0.1890881 6 Control
10 10 0.2775121 1 Control
11 11 0.2726826 2 Control
12 12 0.2003763 6 Intervention
13 13 0.2689853 6 Intervention
14 14 0.2375503 5 Intervention
15 15 0.1873635 6 Control
16 16 0.2894014 6 Control
17 17 0.2152192 4 Intervention
18 18 0.1715369 5 Intervention
19 19 0.1718225 6 Intervention
20 20 0.1714866 3 Intervention
21 21 0.2793366 6 Control
22 22 0.2825824 5 Control
23 23 0.1194723 1 Intervention
24 24 0.1319895 3 Intervention
25 25 0.2551463 6 Control
26 26 0.1119110 1 Intervention
27 27 0.1765808 7 Intervention
28 28 0.2055012 3 Intervention
29 29 0.2521738 4 Control
30 30 0.2106645 6 Intervention
31 31 0.2167770 3 Control
32 32 0.1228993 4 Control
33 33 0.2726487 7 Control
34 34 0.2020688 6 Intervention
35 35 0.1361247 6 Intervention
36 36 0.2728437 5 Intervention
37 37 0.1800930 7 Control
38 38 0.1551175 5 Control
39 39 0.2382141 2 Intervention
40 40 0.1007362 3 Intervention
41 41 0.1347393 1 Control
42 42 0.1894948 5 Intervention
43 43 0.1076908 6 Control
44 44 0.2282972 3 Intervention
45 45 0.2232728 6 Intervention
46 46 0.1500583 6 Intervention
47 47 0.2791531 4 Control
48 48 0.2603365 2 Intervention
49 49 0.1791727 2 Control
50 50 0.1377176 3 Control
51 51 0.2838408 3 Control
52 52 0.1341835 4 Intervention
53 53 0.2962462 1 Intervention
54 54 0.2248252 6 Control
55 55 0.2579708 4 Control
56 56 0.1543211 6 Intervention
57 57 0.2567312 6 Control
58 58 0.1809671 2 Intervention
59 59 0.1522663 5 Control
60 60 0.1476439 4 Control
61 61 0.1748280 3 Control
62 62 0.1879722 1 Control
63 63 0.1729529 3 Intervention
64 64 0.1822520 6 Control
65 65 0.2968177 3 Intervention
66 66 0.1541102 3 Control
67 67 0.2025564 4 Intervention
68 68 0.1435154 2 Intervention
69 69 0.2201463 1 Control
70 70 0.1301909 6 Intervention
71 71 0.2772403 3 Control
72 72 0.2622415 4 Intervention
73 73 0.2408050 5 Intervention
74 74 0.2476875 5 Control
75 75 0.1948942 3 Control
76 76 0.1611351 2 Control
77 77 0.1865994 7 Control
78 78 0.2982802 4 Control
79 79 0.1610247 7 Intervention
80 80 0.1639609 1 Control
81 81 0.2554029 4 Intervention
82 82 0.1377876 5 Control
83 83 0.1803049 2 Control
84 84 0.2568941 6 Intervention
Code
### Checks
cat (" \n Overall treatment counts: \n " )
Overall treatment counts:
Code
print (table (final_result$ final_arm))
Control Intervention
42 42
Code
cat (" \n Balance within each site (aim: approximate 1:1): \n " )
Balance within each site (aim: approximate 1:1):
Code
print (table (final_result$ site, final_result$ final_arm))
Control Intervention
1 5 4
2 5 4
3 7 8
4 6 6
5 5 5
6 10 13
7 4 2
Code
cat (" \n Balance by mean baseline_rate by arm (aim: approximate 1:1): \n " )
Balance by mean baseline_rate by arm (aim: approximate 1:1):
Code
print (tapply (final_result$ baseline_rate, final_result$ final_arm, mean))
Control Intervention
0.2089960 0.1978283