This section contains the setup and the various utility functions used throughout.
Libraries used:
library(data.table)
library(ggplot2)
library(swdpwr)
Compiled code (the models are shown later):
The stepped-wedge rct design is a cluster randomised design, however, each cluster is exposed to both the control and the intervention over the course of several time periods.
Clusters are randomised to the order that they are exposed to the intervention, which forms a stepped pattern, hence the name.
There are three basic types of SW-CRCT:
The above distinction is important but a lot of the time it is glossed over.
The time periods are usually evenly spaced, but not always. In the “complete” version of the design
An “incomplete” design is any deviation from the above, but usually arises due to unevenly spaced observations or missing observation periods.
One of the important considerations for SW-CRCTs relates to background changes in the underlying response of interest; in other words, we need to deal with a potential confounding effect of time. In an ordinary cluster trial or individual level randomisation trial, you do not typically have to worry about time effects.
A GLMM is often used to analyse SW-CRCT’s and the following outlines some generalities in the parameterisations that are used:
Hussey and Hughes 2007 introduced the following (HH) model for the outcome variable, \(y_{ijk}\), for individual \(k\) during period \(j\) in cluster \(i\)
\[ \begin{equation} \begin{split} y_{ijk} &= \mu + \beta_j + \delta x_{ij} + \alpha_i + \epsilon_{ijk} \end{split} \tag{1} \end{equation} \]
where
The HH model is probably the simplest GLMM (actually a LMM as I have written it) available for a SW-CRT. Obviously, we could also recast as a true GLMM (for counts, yes/no responses etc) if we were to introduce a link function and revise the variance specifications. Anyway, from the HH model we can get a common Intraclass-correlation ICC
\[ \begin{equation} \begin{split} \rho = \frac{\tau_\alpha^2}{(\tau_\alpha^2 + \sigma_\epsilon^2)} \end{split} \tag{2} \end{equation} \]
i.e. a ratio formed from the variance components. A high ICC indicates that the variation between the clusters is high and the variation within the clusters is low; so the clusters are homogenous, i.e. the observations from the same cluster tend to be similar. A low ICC indicates that heterogeneity is high within the clusters and so observations from the same cluster will tend to be quite different.
The HH model imposes an exchangeable correlation structure, which implies that the intracluster correlation (correlation between two subjects in the same cluster and same period) and interperiod correlations (correlation between two subjects in the same cluster but different periods) are equal. A more plausible model has been proposed that includes both a random intercept and slope for time defined at the level of the cluster, allowing the intracluster and inter-period correlations to differ.
Clearly, the HH model makes many assumptions, not least of which is that it does not adjust for individual level repeat measures and is therefore only applicable for the cross sectional variant of a SW-CRCT.
The HH model can be extended in a number of ways. Fan Li 2020 gives a good overview. One such variation is to introduce a more sophisticated approach to modeling the time trend and another is to find better ways to conceptualise the treatment effect. This latter point is an important consideration because the treatment effect may quite reasonable be expected to grow or decline over time or even be different in every period. Many variations are possible with a somewhat generalised version being the time-on-treatment specification
\[ \begin{equation} \begin{split} \delta_{0}\mathsf{I}_{j=s} + \delta_{1}\mathsf{I}_{j=s+1}+ \dots + \delta_{J-s}\mathsf{I}_{j=J} \end{split} \tag{3} \end{equation} \]
where \(\mathsf{I}\) is the indicator function and \(s \in \{1, 2, 3, \dots, J\}\) is the sequence number. Using the above approach, the first period of the intervention phase for each cluster has a shared treatment \(\delta_0\), the second period in the intervention period for each cluster has \(\delta_1\) as the treatment effect and so on.
The random effects structure can also be extended. For example, a cluster-by-time interaction can be introduced to allow for deviation from the group average to be both cluster-specific and period-specific. The cluster random effect becomes:
\[ \begin{equation} \begin{split} \alpha_i + \gamma_{ij} \end{split} \tag{4} \end{equation} \]
where \(\gamma_{ij} \sim \mathsf{Normal}(0, \tau_\gamma^2)\).
To accommodate cohort designs the HH model is extended with a subject specific random effect to account for the repeat measures on person \(k\)
\[ \begin{equation} \begin{split} y_{ijk} &= \mu + \beta_j + \delta x_{ij} + \alpha_i + \phi_{ik} + \epsilon_{ijk} \end{split} \tag{5} \end{equation} \]
the implication of this is that you get two ICC values, one for the repeated measures of individuals and the other across individuals irrespective of time. More detail can be found in the paper from Fan Li 2020.
Note that for both the HH cross-sectional model and the extended version for a cohort setting, when the cluster level sample size gets large (i.e. more people in each cluster), the HH sample size formula is anti-conservative, more detail below.
Mainly geared towards sample size estimation and power analysis
swdpwr
: power calculations including for binary endpoints, but is only an approximationThe example in Hussey and Hughes looks at the population level effect of expedited partner therapy versus standard notification for treatment of STD. The formulas are all in the paper, however, just note that frankly, I think that people who work on developing sample size formulas should really find something better to do with their time. Sample size formula do little but disservice to comprehension and good practice.
Note that while the outcome is binary, a linear model is assumed based on the proportion of responses. In the power/sample size calculation, they compute the variance as \(\sigma^2 = \frac{p_0(1-p_0)}{N}\) and use a normal approximation for the risk difference.
# builds the design matrix
build_design <- function(clust = 24, time_seqs = 4){
if(clust %% time_seqs != 0) stop()
m <- matrix(0, nrow = clust, ncol = time_seqs + 1)
step <- clust / time_seqs
for(i in 1:(time_seqs)){
m[1:(step*i), i+1] <- 1
}
m
}
summarise_design <- function(m){
l <- list()
l$U <- sum(m)
l$W <- sum(apply(m, 2, function(i)sum(i)^2))
l$V <- sum(apply(m, 1, function(i)sum(i)^2))
l$I <- nrow(m)
l$TT <- ncol(m)
l
}
# variance of theta (which is the effect size in terms of a risk difference)
v_theta <- function(I, sig2, TT, tau2, U, W, V){
a <- I * sig2 * (sig2 + TT * tau2)
b <- (I*U - W)*sig2
c <- (U^2 + I*TT*U - TT*W - I*V)*tau2
a / (b + c)
}
# N per cluster per time period
# p0 baseline prob of resp
# assumed coefficient of var (between county sd over mean prevalence)
# alpha error rate
# theta effect sizes risk differences
# m design matrix
pwr_calc <- function(N = 100, p0 = 0.05, cv = 0.3, alpha = 0.05,
theta = 0.05 * seq(0.2, 0.5, len = 20),
m){
sig2 <- p0*(1-p0)/N
# variance of random effect for cluster
tau2 <- (cv * p0)^2
# design summary
l <- summarise_design(m)
# variance of theta (the effect size - risk difference)
vv <- v_theta(l$I, sig2, l$TT, tau2, l$U, l$W, l$V)
p1 <- p0 - theta
rr <- -(p1 - p0) / p0
pwr <- pnorm((theta / sqrt(vv))-qnorm(1-(alpha/2)))
list(pwr = pwr, rr = rr, theta = theta,
sig2 = sig2, tau2 = tau2, N = N, p0 = p0,
cv = cv, alpha = alpha, m = m)
}
m <- build_design()
pwr_0_3 <- pwr_calc(cv = 0.3, m = m)
pwr_0_5 <- pwr_calc(cv = 0.5, m = m)
plot(pwr_0_3$rr, pwr_0_3$pwr, type = "l", ylim = c(0, 1),
xlab = "Relative risk", ylab = "Power",
main = "HH Model")
lines(pwr_0_5$rr, pwr_0_5$pwr, lty = 2)
What happens when you have a small number of clusters and sequences but \(N\) per period per cluster gets big (ostensibly to offset the limited clusters)?
m <- build_design(clust = 4, time_seqs = 4)
pwr <- pwr_calc(N = 800, p0 = 0.5, cv = 0.3, alpha = 0.05,
theta = 0.5 * seq(0.2, 0.5, len = 20),
m)
plot(pwr$rr, pwr$pwr, type = "l", ylim = c(0, 1),
xlab = "Relative risk", ylab = "Power",
main = "HH Model")
Clearly, that isn’t right, why? Because \(\lim_{N\to\infty} \frac{p_0(1-p_0)}{N} = 0\), so in the limit, \(\sigma^2\) goes to zero and when that happens the variance of the effect also tends to zero (see code snippet below). Now, because the power is computed from
\[ \begin{equation} \begin{split} \mathsf{Power} = \Phi \biggl( \frac{\theta}{\sqrt{\mathsf{Var}(\theta)}} - Z_{1-\alpha/2} \biggr) \end{split} \tag{6} \end{equation} \]
the first part gets very large, which leads to an overly optimistic value for the power. In reality, there are too few clusters to reliably recover the variance components.
p0 = 0.5
N <- 1000
cv <- 0.4
sig2 <- p0*(1-p0)/N
# variance of random effect for cluster
tau2 <- (cv * p0)^2
# design summary
l <- summarise_design(m)
# variance of theta (the effect size - risk difference)
vv <- v_theta(l$I, sig2, l$TT, tau2, l$U, l$W, l$V)
round(vv, 3)
## [1] 0