Stepped Wedge RCT

Setup

This section contains the setup and the various utility functions used throughout.

Libraries used:

library(data.table)
library(ggplot2)
library(swdpwr)

Compiled code (the models are shown later):

Stepped wedge RCT

The stepped-wedge rct design is a cluster randomised design, however, each cluster is exposed to both the control and the intervention over the course of several time periods.

Clusters are randomised to the order that they are exposed to the intervention, which forms a stepped pattern, hence the name.

There are three basic types of SW-CRCT:

cross sectional - at each time period a new set of participants are observed in each cluster
closed cohort - at each time period the same set of participants are observed in each cluster
open cohort - at each time period, new participants might join, some participants will remain and some will have left

The above distinction is important but a lot of the time it is glossed over.

The time periods are usually evenly spaced, but not always. In the “complete” version of the design

all the clusters are in the “control” state in the first observation period
all are in the “intervention” state by the last observation period
all the time periods are evenly spaced and
measurements occur in every period

An “incomplete” design is any deviation from the above, but usually arises due to unevenly spaced observations or missing observation periods.

One of the important considerations for SW-CRCTs relates to background changes in the underlying response of interest; in other words, we need to deal with a potential confounding effect of time. In an ordinary cluster trial or individual level randomisation trial, you do not typically have to worry about time effects.

A GLMM is often used to analyse SW-CRCT’s and the following outlines some generalities in the parameterisations that are used:

the treatment effect is introduced into the model as a fixed effect that is 0 if the cluster is in the control state and 1 if it is in the intervention state
a random effect is introduced at the level of the cluster, e.g. a cluster level random intercept
in cohort variants, individual-level variation needs to be accounted for; often this is by introducing another random effect into the linear predictor
some adjustment of temporal change is included in the model to address confounding due to time

Hussey and Hughes 2007 introduced the following (HH) model for the outcome variable, \(y_{ijk}\), for individual \(k\) during period \(j\) in cluster \(i\)

\[ \begin{equation} \begin{split} y_{ijk} &= \mu + \beta_j + \delta x_{ij} + \alpha_i + \epsilon_{ijk} \end{split} \tag{1} \end{equation} \]

where

\(\mu\) is the grand mean
\(\beta_j\) is the period effect (\(\beta_1 = 0\)) in period \(j\)
\(x_{ij}\) is the intervention indicator for cluster \(i\) in period \(j\)
\(\delta\) is the (time averaged) intervention effect
\(\alpha_i \sim \mathsf{Normal}(0, \tau_\alpha^2)\) captures the cluster specific departure from the average but is assumed to be homogeneous across time periods and intervention sequences
\(\epsilon\) is the residual (independent of \(\alpha\))

The HH model is probably the simplest GLMM (actually a LMM as I have written it) available for a SW-CRT. Obviously, we could also recast as a true GLMM (for counts, yes/no responses etc) if we were to introduce a link function and revise the variance specifications. Anyway, from the HH model we can get a common Intraclass-correlation ICC

\[ \begin{equation} \begin{split} \rho = \frac{\tau_\alpha^2}{(\tau_\alpha^2 + \sigma_\epsilon^2)} \end{split} \tag{2} \end{equation} \]

i.e. a ratio formed from the variance components. A high ICC indicates that the variation between the clusters is high and the variation within the clusters is low; so the clusters are homogenous, i.e. the observations from the same cluster tend to be similar. A low ICC indicates that heterogeneity is high within the clusters and so observations from the same cluster will tend to be quite different.

The HH model imposes an exchangeable correlation structure, which implies that the intracluster correlation (correlation between two subjects in the same cluster and same period) and interperiod correlations (correlation between two subjects in the same cluster but different periods) are equal. A more plausible model has been proposed that includes both a random intercept and slope for time defined at the level of the cluster, allowing the intracluster and inter-period correlations to differ.

Clearly, the HH model makes many assumptions, not least of which is that it does not adjust for individual level repeat measures and is therefore only applicable for the cross sectional variant of a SW-CRCT.

The HH model can be extended in a number of ways. Fan Li 2020 gives a good overview. One such variation is to introduce a more sophisticated approach to modeling the time trend and another is to find better ways to conceptualise the treatment effect. This latter point is an important consideration because the treatment effect may quite reasonable be expected to grow or decline over time or even be different in every period. Many variations are possible with a somewhat generalised version being the time-on-treatment specification

\[ \begin{equation} \begin{split} \delta_{0}\mathsf{I}_{j=s} + \delta_{1}\mathsf{I}_{j=s+1}+ \dots + \delta_{J-s}\mathsf{I}_{j=J} \end{split} \tag{3} \end{equation} \]

where \(\mathsf{I}\) is the indicator function and \(s \in \{1, 2, 3, \dots, J\}\) is the sequence number. Using the above approach, the first period of the intervention phase for each cluster has a shared treatment \(\delta_0\), the second period in the intervention period for each cluster has \(\delta_1\) as the treatment effect and so on.

The random effects structure can also be extended. For example, a cluster-by-time interaction can be introduced to allow for deviation from the group average to be both cluster-specific and period-specific. The cluster random effect becomes:

\[ \begin{equation} \begin{split} \alpha_i + \gamma_{ij} \end{split} \tag{4} \end{equation} \]

where \(\gamma_{ij} \sim \mathsf{Normal}(0, \tau_\gamma^2)\).

To accommodate cohort designs the HH model is extended with a subject specific random effect to account for the repeat measures on person \(k\)

\[ \begin{equation} \begin{split} y_{ijk} &= \mu + \beta_j + \delta x_{ij} + \alpha_i + \phi_{ik} + \epsilon_{ijk} \end{split} \tag{5} \end{equation} \]

the implication of this is that you get two ICC values, one for the repeated measures of individuals and the other across individuals irrespective of time. More detail can be found in the paper from Fan Li 2020.

Note that for both the HH cross-sectional model and the extended version for a cohort setting, when the cluster level sample size gets large (i.e. more people in each cluster), the HH sample size formula is anti-conservative, more detail below.

Analysis

R packages

Mainly geared towards sample size estimation and power analysis

swdpwr: power calculations including for binary endpoints, but is only an approximation

Sample size

The example in Hussey and Hughes looks at the population level effect of expedited partner therapy versus standard notification for treatment of STD. The formulas are all in the paper, however, just note that frankly, I think that people who work on developing sample size formulas should really find something better to do with their time. Sample size formula do little but disservice to comprehension and good practice.

24 clusters
4 possible times (6 cluster each)
cross sectional
binary
baseline prevalence 0.05
coefficient of variation 0.3 (between county sd over mean prevalence)

Note that while the outcome is binary, a linear model is assumed based on the proportion of responses. In the power/sample size calculation, they compute the variance as \(\sigma^2 = \frac{p_0(1-p_0)}{N}\) and use a normal approximation for the risk difference.

# builds the design matrix
build_design <- function(clust = 24, time_seqs = 4){
  if(clust %% time_seqs != 0) stop()
  m <- matrix(0, nrow = clust, ncol = time_seqs + 1)
  step <- clust / time_seqs
  for(i in 1:(time_seqs)){
    m[1:(step*i), i+1] <- 1
  }
  m
}

summarise_design <- function(m){
  l <- list()
  l$U <- sum(m)
  l$W <- sum(apply(m, 2, function(i)sum(i)^2))
  l$V <- sum(apply(m, 1, function(i)sum(i)^2))
  l$I <- nrow(m)
  l$TT <- ncol(m)
  l
}

# variance of theta (which is the effect size in terms of a risk difference)
v_theta <- function(I, sig2, TT, tau2, U, W, V){
  a <- I * sig2 * (sig2 + TT * tau2)
  b <- (I*U - W)*sig2
  c <- (U^2 + I*TT*U - TT*W - I*V)*tau2
  a / (b + c)
}

# N per cluster per time period
# p0 baseline prob of resp
# assumed coefficient of var (between county sd over mean prevalence)
# alpha error rate
# theta effect sizes risk differences
# m design matrix
pwr_calc <- function(N = 100, p0 = 0.05, cv = 0.3, alpha = 0.05,
                theta = 0.05 * seq(0.2, 0.5, len = 20), 
                m){
  sig2 <- p0*(1-p0)/N
  # variance of random effect for cluster
  tau2 <- (cv * p0)^2
  # design summary
  l <- summarise_design(m)
  # variance of theta (the effect size - risk difference)
  vv <- v_theta(l$I, sig2, l$TT, tau2, l$U, l$W, l$V)
  p1 <- p0 - theta
  rr <- -(p1 - p0) / p0
  pwr <- pnorm((theta / sqrt(vv))-qnorm(1-(alpha/2)))
  list(pwr = pwr, rr = rr, theta = theta, 
       sig2 = sig2, tau2 = tau2, N = N, p0 = p0,
       cv = cv, alpha = alpha, m = m)
}

m <- build_design()
pwr_0_3 <- pwr_calc(cv = 0.3, m = m)
pwr_0_5 <- pwr_calc(cv = 0.5, m = m)
plot(pwr_0_3$rr, pwr_0_3$pwr, type = "l", ylim = c(0, 1),
     xlab = "Relative risk", ylab = "Power",
     main = "HH Model")
lines(pwr_0_5$rr, pwr_0_5$pwr, lty = 2)

What happens when you have a small number of clusters and sequences but \(N\) per period per cluster gets big (ostensibly to offset the limited clusters)?

m <- build_design(clust = 4, time_seqs = 4)
pwr <- pwr_calc(N = 800, p0 = 0.5, cv = 0.3, alpha = 0.05,
                    theta = 0.5 * seq(0.2, 0.5, len = 20), 
                    m)
plot(pwr$rr, pwr$pwr, type = "l", ylim = c(0, 1),
     xlab = "Relative risk", ylab = "Power",
     main = "HH Model")

Clearly, that isn’t right, why? Because \(\lim_{N\to\infty} \frac{p_0(1-p_0)}{N} = 0\), so in the limit, \(\sigma^2\) goes to zero and when that happens the variance of the effect also tends to zero (see code snippet below). Now, because the power is computed from

\[ \begin{equation} \begin{split} \mathsf{Power} = \Phi \biggl( \frac{\theta}{\sqrt{\mathsf{Var}(\theta)}} - Z_{1-\alpha/2} \biggr) \end{split} \tag{6} \end{equation} \]

the first part gets very large, which leads to an overly optimistic value for the power. In reality, there are too few clusters to reliably recover the variance components.

p0 = 0.5
N <- 1000
cv <- 0.4
sig2 <- p0*(1-p0)/N
# variance of random effect for cluster
tau2 <- (cv * p0)^2
# design summary
l <- summarise_design(m)
# variance of theta (the effect size - risk difference)
vv <- v_theta(l$I, sig2, l$TT, tau2, l$U, l$W, l$V)
round(vv, 3)

## [1] 0

References

Hussey and Hughes: 2007 https://pubmed.ncbi.nlm.nih.gov/16829207/
Fan Li 2020: Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview https://journals.sagepub.com/doi/full/10.1177/0962280220932962
Fan Li 2020: Overview of Statistical Models for Design & Analysis of Stepped Wedge Cluster Randomized Trials-MtG https://www.youtube.com/watch?v=3pCkj1MOetE
Kasza 2019: Information content of stepped-wedge designs when treatment effect heterogeneity and/or implementation periods are present https://pubmed.ncbi.nlm.nih.gov/31321806/
Zhou 2020: A maximum likelihood approach to power calculations for stepped wedge designs of binary outcomes https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7410259/
Murray 1998: Analysis of data from group-randomized trials with repeat observations on the same groups https://pubmed.ncbi.nlm.nih.gov/9699231/