ComBaScale stepped-wedge cluster randomized trial

Author

A.Amstutz

ComBaScale stepped-wedge cluster randomized trial (SW-CRT)

Our sample size was determined based on the primary outcome assessed at the interim timepoint (month 16 of the trial), prior to switching the final sequence of clusters to the intervention. With two sequences of six clusters each, we achieve 80% power at a 5% significance level to detect a 12% higher engagement in care (across both hypertension and diabetes taken together) at 6 months in the intervention compared to the control.

Based on our recent research in the same study area, we anticipate a hypertension prevalence of 17% and a type 2 diabetes prevalence of 4%, with 75% of individuals with hypertension and 67% with diabetes already engaged in care. In the predecessor ComBaCaL trial, we observed a 12% higher engagement in care for hypertension and a 34% higher engagement in care for diabetes in the intervention clusters compared to controls at 6 months follow-up. The intracluster correlation coefficient (ICC) for engagement in care ranged from 0.06 to 0.14. Each cluster encompasses 30–40 villages, each with 2700-3600 adult residents, resulting in an estimated 400-600 individuals with hypertension and/or diabetes per cluster.

Sample size calculations followed the recommendation for a SW-CRT with repeated cross-sectional data collection, using these assumptions drawn from a similar intervention in the same setting:

  • Baseline engagement in care (across hypertension and diabetes): 75%

  • Expected increase in engagement in care at 6 months: 12%

  • ICC range: 0.06–0.14

  • Mean cluster size: 400 participants

  • Discrete-time decay of the correlation structure, allowing within-cluster correlation to decrease over successive measurement periods (allowing for a more realistic scenario)

  • Two sequences and three periods (including baseline period)

Under these assumptions, 80% power is achieved with two sequences of at least six clusters each. Notably, the assumptions for ICC and mean cluster size are conservative, suggesting that actual power may exceed this estimate. Furthermore, since these calculations are based on the interim analysis after two sequences, including a third sequence would further increase statistical power.

Plot data exported from: https://clusterrcts.shinyapps.io/rshinyapp/

Packages

Code
req_pkgs <- c("dplyr",
              "ggplot2",
              "readr",
              "tidyr")
install_if_missing <- function(pkgs){
  for(p in pkgs){
    if(!requireNamespace(p, quietly=TRUE)){
      install.packages(p, repos="https://cloud.r-project.org")
    }
    library(p, character.only=TRUE)
  }
}
install_if_missing(req_pkgs)

# set global RNG seed for reproducibility
set.seed(20250809)

Sample size curve

Code
# Read your data
df <- read_csv("save_curve.csv")

# Keep only the needed columns
df <- df %>%
  select(no_clusters_x, power_x, power_x_l, power_x_u)

# Reshape to long format for ggplot
df_long <- df %>%
  pivot_longer(cols = c(power_x, power_x_l, power_x_u),
               names_to = "scenario",
               values_to = "power")

# Map custom labels and colors
df_long$scenario <- factor(df_long$scenario,
                           levels = c("power_x", "power_x_l", "power_x_u"),
                           labels = c("base ICC", "lower ICC", "upper ICC"))

# Plot
ggplot(df_long, aes(x = no_clusters_x, y = power, color = scenario)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  scale_color_manual(values = c("black", "blue", "red")) +
  scale_x_continuous(breaks = seq(min(df$no_clusters_x), max(df$no_clusters_x), 1)) + 
  geom_vline(xintercept = 6, linetype = "dashed", color = "darkgreen", size = 1) +     # emphasize 6
  geom_hline(yintercept = 0.8, linetype = "dotted", color = "darkorange", size = 1) +  # highlight 0.8
  annotate("text", 
         x = max(df$no_clusters_x) - 1,  # shift left a bit
         y = 0.71,                       # move above the line
         label = "Power = 0.8",
         hjust = 1, vjust = 0, 
         color = "darkorange", fontface = "bold", size = 4) +
  labs(
    title = "Sample Size Curve: Clusters vs Power",
    x = "Number of clusters per sequence",
    y = "Power",
    color = "Scenario"
  ) +
  theme_classic(base_size = 16, base_family = "Helvetica") +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.title = element_text(face = "bold"),
    legend.position = "top",
    legend.title = element_text(face = "bold")
  )

Footnote: base ICC (0.10), lower ICC (0.06), upper ICC (0.14)