Our sample size was determined based on the primary outcome assessed at the interim timepoint (month 16 of the trial), prior to switching the final sequence of clusters to the intervention. With two sequences of six clusters each, we achieve 80% power at a 5% significance level to detect a 12% higher engagement in care (across both hypertension and diabetes taken together) at 6 months in the intervention compared to the control.
Based on our recent research in the same study area, we anticipate a hypertension prevalence of 17% and a type 2 diabetes prevalence of 4%, with 75% of individuals with hypertension and 67% with diabetes already engaged in care. In the predecessor ComBaCaL trial, we observed a 12% higher engagement in care for hypertension and a 34% higher engagement in care for diabetes in the intervention clusters compared to controls at 6 months follow-up. The intracluster correlation coefficient (ICC) for engagement in care ranged from 0.06 to 0.14. Each cluster encompasses 30–40 villages, each with 2700-3600 adult residents, resulting in an estimated 400-600 individuals with hypertension and/or diabetes per cluster.
Sample size calculations followed the recommendation for a SW-CRT with repeated cross-sectional data collection, using these assumptions drawn from a similar intervention in the same setting:
Baseline engagement in care (across hypertension and diabetes): 75%
Expected increase in engagement in care at 6 months: 12%
ICC range: 0.06–0.14
Mean cluster size: 400 participants
Discrete-time decay of the correlation structure, allowing within-cluster correlation to decrease over successive measurement periods (allowing for a more realistic scenario)
Two sequences and three periods (including baseline period)
Under these assumptions, 80% power is achieved with two sequences of at least six clusters each. Notably, the assumptions for ICC and mean cluster size are conservative, suggesting that actual power may exceed this estimate. Furthermore, since these calculations are based on the interim analysis after two sequences, including a third sequence would further increase statistical power.
req_pkgs <-c("dplyr","ggplot2","readr","tidyr")install_if_missing <-function(pkgs){for(p in pkgs){if(!requireNamespace(p, quietly=TRUE)){install.packages(p, repos="https://cloud.r-project.org") }library(p, character.only=TRUE) }}install_if_missing(req_pkgs)# set global RNG seed for reproducibilityset.seed(20250809)
Sample size curve
Code
# Read your datadf <-read_csv("save_curve.csv")# Keep only the needed columnsdf <- df %>%select(no_clusters_x, power_x, power_x_l, power_x_u)# Reshape to long format for ggplotdf_long <- df %>%pivot_longer(cols =c(power_x, power_x_l, power_x_u),names_to ="scenario",values_to ="power")# Map custom labels and colorsdf_long$scenario <-factor(df_long$scenario,levels =c("power_x", "power_x_l", "power_x_u"),labels =c("base ICC", "lower ICC", "upper ICC"))# Plotggplot(df_long, aes(x = no_clusters_x, y = power, color = scenario)) +geom_line(size =1.2) +geom_point(size =3) +scale_color_manual(values =c("black", "blue", "red")) +scale_x_continuous(breaks =seq(min(df$no_clusters_x), max(df$no_clusters_x), 1)) +geom_vline(xintercept =6, linetype ="dashed", color ="darkgreen", size =1) +# emphasize 6geom_hline(yintercept =0.8, linetype ="dotted", color ="darkorange", size =1) +# highlight 0.8annotate("text", x =max(df$no_clusters_x) -1, # shift left a bity =0.71, # move above the linelabel ="Power = 0.8",hjust =1, vjust =0, color ="darkorange", fontface ="bold", size =4) +labs(title ="Sample Size Curve: Clusters vs Power",x ="Number of clusters per sequence",y ="Power",color ="Scenario" ) +theme_classic(base_size =16, base_family ="Helvetica") +theme(plot.title =element_text(face ="bold", hjust =0.5),axis.title =element_text(face ="bold"),legend.position ="top",legend.title =element_text(face ="bold") )
Footnote: base ICC (0.10), lower ICC (0.06), upper ICC (0.14)