Page 1 of 14
From Allen et al (2016). A Combined Patient and Provider
Intervention for Management of Osteoarthritis in Veterans: A Randomized
Clinical Trial. Annals of Internal Medicine, 164(2): 73-83.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4732728/
Study Design.
This was a cluster randomized, controlled trial, with primary care
providers (PCPs) assigned to an osteoarthritis intervention group or a
usual care control group. Randomization was computer-generated,
maintained by the study statistician, and stratified on the basis of the
providers’ volume of female patients (<15% vs. ≥15%). We aimed to
enroll 10 patient participants (5 white and 5 nonwhite) from each of 30
PCPs.
Sample Size.
We based our sample size of 300 patient participants on detection of a
moderate effect size of approximately 0.30 for the difference in mean
WOMAC scores between groups, with 80% power and a type I error rate of
0.05. This translates to a 4.2-point difference at 12 months, which is
equivalent to an improvement of approximately 11% from the anticipated
mean baseline score; this allowed sufficient power to detect a
clinically relevant difference (12% to 18%, based on prior relevant
literature) (37–39). We used a 2-sample t test sample size calculation
for the between-group difference at 12 months multiplied by a factor of
1 − ρ², where ρ represents the Pearson correlation between baseline and
follow-up outcome measures (0.60) (40). This sample size was then
adjusted to reflect provider clustering using an intraclass correlation
coefficient of 0.02 (41) and was inflated to compensate for potential
attrition (12%). On the basis of our pilot work, we assumed a mean
baseline WOMAC score of 38 with a standard deviation of 14.
Page 2 of 14
1. An effect of \(d = 0.30 = \frac{\text{Mean Difference}}{SD} = \frac{4.2}{14}\) yields a sample size of \(n_j = 175.385\) per arm (group); \(N = 352\) total subjects. So where does the sample size of 300 come from?
2. The ANCOVA adjustment factor is actually applied to the variance (or use the square root multiplied by the SD): \[SD_C = SD_B\sqrt{1 - \rho^2}\]where \(SD_C\) is the SD of the change score and \(SD_B\) is the assumed baseline SD. In this case, \(SD_B = 14\) and \(\rho = 0.60\), so \(SD_C = 14\sqrt{1 - 0.6^2} = 14(0.8) = 11.2\).
Now an effect size of \(d = \frac{4.2}{11.2} = 0.375\), or equivalently \(\frac{d}{\sqrt{1 - \rho^2}} = \frac{0.3}{0.8} = 0.375\). This yields \(n_j = 112.5967\) per arm; \(N = 226\) total subjects for \(80\%\) power at \(\alpha = 0.05\).
3. Adjustment has to do with patients being nested in clinics (clustering) and that a random effects mixed model will be used for analyses. The adjustment factor is called the Design Effect. The authors state they will “enroll 10 patient participants from each of 30 PCPs.” The Design Effect formula is \[D = 1 + ((C - 1)\times ICC)\] where \(C\) is the average cluster size and \(ICC\) is the Intra-Class (Intra-Cluster) Correlation. In this case, \(C = 10\) and \(ICC = 0.02\): \(D = 1 + ((10-1)(0.02)) = 1.18\). This yields a sample size of \(n_j = 112.5967 \times 1.18 = 132.864\) per arm; \(N = 265.72\) total subjects.
4. The inflation factor for attrition is \(\frac{N}{1-A}\), where \(A\) is the proportion of attrition. Here \(A = 0.12\). Final \(N = \frac{265.72}{1 - 0.12} = 301.96\), rounded to \(302\).
Page 3 of 14
We based our sample size of 300 patient participants on detection of a moderate effect size of approximately \(d = 0.30\) for the difference in mean WOMAC scores between groups, with \(80\%\) power and a type I error rate of \(\alpha = 0.05\). This translates to a 4.2-point difference at 12 months, which is equivalent to an improvement of approximately \(11\%\) from the anticipated mean baseline score (since \(11\% = 4.2/38\)); this allowed sufficient power to detect a clinically relevant difference (12% to 18%, based on prior relevant literature) (37–39).
Page 4 of 14
1. An effect of \(d = 0.30 = \frac{\text{Mean Difference}}{SD} = \frac{4.2}{14}\) yields a sample size of \(n_j = 175.385\) per arm (group); \(N = 352\) total subjects. So where does the sample size of 300 come from?
The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 Mean Difference 0.3 Standard Deviation 1 Nominal Power 0.8 Number of Sides 2 Null Difference 0 Computed Ceiling N per Group Fractional N per Group Actual Power Ceiling N per Group 175.384669 0.801 176
The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 Mean Difference 4.2 Standard Deviation 14 Nominal Power 0.8 Number of Sides 2 Null Difference 0 Computed Ceiling N per Group Fractional N per Group Actual Power Ceiling N per Group 175.384669 0.801 176
Page 5 of 14
2. The ANCOVA adjustment factor is actually applied
to the variance. Or you could use the square root multiplied by the
SD:
\[ SD_C = SD_B \sqrt{1 - \rho^2}
\].
Where \(SD_C\) is the SD of the
change score and \(SD_B\) is the
assumed baseline SD.
In this case, \(SD_B = 14\) and \(\rho = 0.60\), so:
\[ SD_C = 14\sqrt{1 - 0.6^2} = 11.2
\].
Now an effect size of \(d = \frac{4.2}{11.2} = 0.375\), which yields \(n_j = 112.5967\) per arm; \(N = 226\) total subjects for 80% power at \(\alpha = 0.05\).
So where does the sample size of 300 come from?
Page 6 of 14
\(n_j = 112.5967\) per arm; \(N = 226\) total subjects for 80% power at \(\alpha = 0.05\).
The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 Mean Difference 0.375 Standard Deviation 1 Nominal Power 0.8 Number of Sides 2 Null Difference 0 Computed Ceiling N per Group Fractional N per Group Actual Power Ceiling N per Group 112.596695 0.801 113
The POWER Procedure Two-Sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Alpha 0.05 Mean Difference 4.2 Standard Deviation 11.2 Nominal Power 0.8 Number of Sides 2 Null Difference 0 Computed Ceiling N per Group Fractional N per Group Actual Power Ceiling N per Group 112.596695 0.801 113
Page 7 of 14
SIDE NOTE on the COVARIANCE ADJUSTMENT (Borm et al., 2007)
When calculating power or sample size for pre–post designs there are two different covariance adjustments that can be applied.
Difference Score Covariance Adjustment (ANOVA
model):
\[
(Y_{\text{FOLLOW}} - Y_{\text{BASE}}) = \beta_0 + \beta_G G +
\varepsilon
\]
Assume pre (baseline) and post (follow-up) have the same
variance:
\[
S_B^2 = S_F^2 = S_{*}^2
\]
Then the variance of the difference score is:
\[
\begin{aligned}
S_D^2 &= S_B^2 + S_F^2 - 2\text{COV}(B,F) \\
&= S_B^2 + S_F^2 - 2r_{BF} S_B^2 S_F^2 \\
&= S_{*}^2 + S_{*}^2 - 2r_{BF} S_{*}^2 S_{*}^2 \\
&= 2S_{*}^2 - 2r_{BF} S_{*}^2 \\
&= 2S_{*}^2(1 - r_{BF})
\end{aligned}
\]
Current Example:
\[
\begin{aligned}
S_D^2 &= 2 \times 14^2 (1 - 0.6) = 156.8 \\
S_D &= 12.522 \\
d &= \frac{4.2}{12.522} = 0.3354
\end{aligned}
\]
Page 8 of 14
ANCOVA Covariance Adjustment (ANCOVA model):
\[
\hat{Y}_{\text{FOLLOW}} = \beta_0 + \beta_G G + \beta_B Y_{\text{BASE}}
\]
The slope for the baseline covariate is a function of the baseline–follow-up correlation \(r_{BF}\).
Assume that group (\(G\)) and baseline (\(X = Y_{\text{BASE}}\)) are uncorrelated (\(r_{GX} = 0\)), which is a reasonable assumption when groups are randomized. However, it will almost never happen in practice — one reason why we solve for 80% or 90% power (to account for random fluctuation).
Since we assume \(r_{GX} =
0\):
\[
\beta_B = r_{BF}\left(\frac{S_F^2}{S_B^2}\right)
\]
If \(S_B^2 = S_F^2 = S_{*}^2\), then \(\beta_B = r_{BF}\).
Thus, the ANCOVA model becomes:
\[
\begin{aligned}
Y_{\text{FOLLOW}} &= \beta_0 + \beta_G G + \beta_B Y_{\text{BASE}} +
e \\
&= \beta_0 + \beta_G G + r_{BF} Y_{\text{BASE}} +
e \\
Y_{\text{FOLLOW}} - r_{BF} Y_{\text{BASE}} &= \beta_0 + \beta_G G +
e
\end{aligned}
\]
↑
CONCEPTUAL
The variance of \(\hat{Y}_F\)
reduces as a function of the pre–post correlation:
\[
S_{\hat{Y}}^2 = S_{*}^2(1 - r_{BF}^2)
\]
A more formal treatment is presented below.
Page 9 of 14
Starting from the adjusted ANCOVA model: \[ Y_{\text{FOLLOW}} - r_{BF} Y_{\text{BASE}} = \beta_0 + \beta_G G + e \]
Assume \(S_B^2 = S_F^2 = S_{*}^2 \;\Rightarrow\; S_B = S_F = S_{*}\).
Expanding the quadratic form: \[ (Y_F - r_{BF} Y_B)'(Y_F - r_{BF} Y_B) = Y_F^2 + r_{BF}^2 Y_B^2 - 2r_{BF} Y_B Y_F \]
After summation: \[ \Sigma Y_F^2 + r_{BF}^2 \Sigma Y_B^2 - 2r_{BF} \Sigma Y_B Y_F \]
In variance terms: \[ S_F^2 + r_{BF}^2 S_B^2 - 2r_{BF} S_{BF} \]
With the assumption \(S_B^2 = S_F^2 = S_{*}^2\), we have \[ S_{BF} = r_{BF} S_B S_F = r_{BF} S_{*} S_{*} = r_{BF} S_{*}^2. \]
Substituting \(S_{BF}\) with \(r_{BF} S_{*}^2\) gives \[ S_{*}^2 + r_{BF}^2 S_{*}^2 - 2 r_{BF} r_{BF} S_{*}^2 \]
Continuing on with some factoring, we have \[ \begin{aligned} S_{*}^2 + r_{BF}^2 S_{*}^2 - 2 r_{BF}^2 S_{*}^2 &= S_{*}^2\,(1 + r_{BF}^2 - 2 r_{BF}^2) \\ &= S_{*}^2\,(1 - r_{BF}^2). \end{aligned} \]
Page 10 of 14
This sample size was then adjusted to reflect provider clustering using an intraclass correlation coefficient of 0.02 (Batistatou et al., 2014).
The Design Effect Formula is:
\[
D = 1 + ((C - 1)\, ICC)
\]
where \(C\) is the average cluster size
and \(ICC\) is the Intra-Class
(Cluster) Correlation.
Based on this formula: many small clusters will yield more power than a few large clusters.
ICC is a measure of the relatedness of clustered
data. It accounts for this relatedness by comparing the
variance within clusters with the variance between clusters:
\[
ICC = \frac{S_B^2}{S_B^2 + S_W^2}
\]
where \(S_B^2\) =
Between Cluster Variance and \(S_W^2\) = Within
Cluster Variance.
The authors state they will “enroll 10 patient participants from each of 30 PCPs.”
Example values of Design Effect D:
C ICC D 1 1 1.00 2 0.05 1.05 5 0.05 1.20 10 0.05 1.45 20 0.05 1.95 2 0.02 1.02 5 0.02 1.08 10 0.02 1.18 20 0.02 1.38
Interpretation for RCTs:
Page 11 of 14
This sample size was then adjusted to reflect provider clustering using an intraclass correlation coefficient of 0.02 (41) and was inflated to compensate for potential attrition (12%).
The authors state they will “enroll 10 patient participants from
each of 30 PCPs.”
In this case \(C = 10\) and \(ICC = 0.02\).
\[ D = \big(1 + ((10 - 1)(0.02))\big) = 1.18 \]
Yielding a sample size of:
\[
n_j = (112.5967 \times 1.18) = 132.864 \ \text{per arm}; \quad N =
265.72 \ \text{Total.}
\]
This was then inflated to compensate for potential attrition (12%).
4. The Inflation Factor for Attrition is:
\[
N / (1 - A)
\]
where \(A\) is the proportion of
attrition.
Here \(A = 0.12\). So:
\[
\text{Final } N = \frac{265.72}{1 - 0.12} = 301.96 \ \ (\text{rounded to
} 302).
\]
I think the authors just rounded down to have 300 subjects, or they used (and rounded) the values (SD; ICC) in the analyses.
Page 12 of 14
Working backwards
1. Take the total sample size and multiply by the attrition factor \[ N(1 - A) = 300 \times (1 - 0.12) = 264 \]
2. Divide by the clustering design effect \[ \frac{N(1 - A)}{D} = \frac{300 \times (1 - 0.12)}{1.18} = \frac{264}{1.18} = 223.729 \]
3. Find the effect size (covariate-adjusted, \(d = 0.375\)):
SAS code:Obs MeanDiff Power 124 0.37623 0.7999874329 125 0.37624 0.8000082792
4. Multiply \(d^*\) by the covariate adjustment factor \[ d = d^* \sqrt{1 - \rho^2} = 0.37624 \times \sqrt{1 - 0.6^2} \]
\[ d = 0.37624 \times 0.8 = 0.300992 \]
References
Borm GF, Fransen J, Lemmens WA. A simple sample size formula for
analysis of covariance in randomized clinical trials. J Clin
Epidemiol. 2007;60:1234–1238.
https://pubmed.ncbi.nlm.nih.gov/17998077/
Donner A, Klar N. Design and Analysis of Cluster Randomized Trials in Health Research. New York: Oxford University Press; 2000.
Power for Unequal Variances and Unequally Sized Treatment
Arms
Batistatou E, Roberts C, Roberts S. Sample size and power calculations
for trials and quasi-experimental studies with clustering. The Stata
Journal. 2014;14(1):159–175.
https://journals.sagepub.com/doi/pdf/10.1177/1536867X1401400111
Power for Unequal Cluster Sizes
(variation in cluster size can affect power)
Guittet L, Ravaud P, Giraudeau B. Planning a cluster randomized
trial with unequal cluster sizes: practical issues involving continuous
outcomes. BMC Med Res Methodol. 2006;6:17.
https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-6-17
Kerry SM, Bland JM. Unequal cluster sizes for trials in English
and Welsh general practice: implications for sample size calculations.
Stat Med. 2001;20:377–390.
https://onlinelibrary.wiley.com/doi/epdf/10.1002/1097-0258%2820010215%2920%3A3%3C377%3A%3AAID-SIM799%3E3.0.CO%3B2-N
From:
Bennell KL, Nelligan RK, Kimp AJ, Wrigley TV, Metcalf B, Kasza J, Hodges
PW, Hinman RS. Comparison of weight-bearing functional exercise and
non-weight-bearing quadriceps strengthening exercise on pain and
function for people with knee osteoarthritis and obesity: protocol for
the TARGET randomized controlled trial. BMC Musculoskeletal
Disorders. 2019;20:291.
https://doi.org/10.1186/s12891-019-2662-5
Trial sample size
The sample size was calculated based on both primary outcomes of pain and function. For an effect size of 0.5, power 80% and two-sided significance level 0.05, with a correlation between pre- and post-measurements of 0.45 for pain, 51 participants per arm will be required (using analysis of covariance including baseline pain measurement as a covariate). To account for 20% loss to follow up, sample size will be increased to 64 per arm, for a total of 128. This gives power of 83% to detect an effect size of 0.5 for function with a correlation between pre- and post-measurements of 0.49 and a two-sided significance level of 0.05.
Can you reproduce these values?