Estimating the sample size for 2-Sample, 2-Sided Equality testing

This tutorial provide the codes for estimating the sample sizes needed for the comparison of 2 means of 2 indepdent groups A and B, based on 2-sided equality testing.

Formula and Terms

First, we must introduce our hypotheses

\[ H_0:\mu_A-\mu_B=0 \]

\[ H_0:\mu_A-\mu_B=0 H_1:\mu_A-\mu_B\neq0 \]

Our calculation also implies a matching ratio Kappa which is the ratio between the sample sizes of the two groups

\[ \kappa=\frac{n_A}{n_B} \]

The formula for estimating the sample size is:

\[ n_A=\kappa n_B \;\text{ and }\; n_B=\left(1+\frac{1}{\kappa}\right) \left(\sigma\frac{z_{1-\alpha/2}+z_{1-\beta}}{\mu_A-\mu_B}\right)^2 \]

This could be considered an inverse problem from determination of standardised score Z of the difference. This means we estimate the sample size by solving inversely the equation of Z-score:

\[ \ z=\frac{\mu_A-\mu_B}{\sigma\sqrt{\frac{1}{n_A}+\frac{1}{n_B}}} \]

\[ \alpha \] is the Type I error or the wrong rejection of a true null hypothesis (false positive).

\[ \beta \] is the Type II error, or the probability of incorectly retaining a false null hypothesis (false negative)

When comparing two means, concluding the means were different when in reality they were not different would be a Type I error; concluding the means were not different when in reality they were different would be a Type II error.

\[ (1 -beta) \] is called the Power

muA, muB and sigma are respectively the mean in group A and group B, and the standard deviation sd. Those values could be determined from a pilot study, or from the previous studies. When no prior data are available, we should simulate our sample size based on a possible (fictional) range of MuA or MuB, then choose the most appropriate sample size that make a good balance between feasibility and statistical power.

Implementing to R codes

Asumming that we got MuA= 20, muB = 25, and Sd=5; we want equal sample size for both A and B groups so kappa = 1. The alpha is fixed at 5% or 0.05 and beta is fixed at 0.2 for an expected statistical Power of 80%

A simple calculation could be done as follows:

muA=20
muB=25
kappa=1
sd=5
alpha=0.05
beta=0.2

nB=(1+1/kappa)*(sd*(qnorm(1-alpha/2)+qnorm(1-beta))/(muA-muB))^2
nB=ceiling(nB)

Sample_size=kappa*nB+nB

Z=(muA-muB)/(sd*sqrt((1+1/kappa)/nB))
             
Power=pnorm(Z-qnorm(1-alpha/2))+pnorm(-Z-qnorm(1-alpha/2))

rbind(nB,Sample_size,Power,Z)
##                   [,1]
## nB          16.0000000
## Sample_size 32.0000000
## Power        0.8074304
## Z           -2.8284271

The results could be interpreted as follows:

A total sample size of n=32 would be required to detect a standardised difference of Z=-2.83 with a statistical power of 0.8

In reality, it’s difficult to balance between the feasibility and our expectation for Z or Power. That’s why a single point estimation is not sufficient. It would be better if we make a simulation on a wider range of MuA or MuB, and under different hypotheses of statistical power.

Simulation

For this example, we will simulate the sample size for A group in function of A mean that could vary from 18 to 22, the sample size of B group in function of B mean that could vary from 22 to 28, kappa from 1 to 4 and sd from 2 to 8, while other parameters are kept as suggested by prior conditions.

rangeA=seq(18,22,length.out=100)
rangeB=seq(22,28,length.out=100)
rangekappa=seq(1,4,length.out=100)
rangesd=seq(2,8,length.out=100)
df=cbind(rangeA,rangeB,rangesd,rangekappa)%>%as.data.frame()

df$muA=20
df$muB=25
df$kappa=1
df$sd=5
df$alpha=0.05
df$beta=0.2

df$AP90=(1+1/df$kappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.1))/(df$rangeA-df$muB))^2
df$AP80=(1+1/df$kappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.2))/(df$rangeA-df$muB))^2
df$AP70=(1+1/df$kappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.3))/(df$rangeA-df$muB))^2

df$BP90=(1+1/df$kappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.1))/(df$muA-df$rangeB))^2
df$BP80=(1+1/df$kappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.2))/(df$muA-df$rangeB))^2
df$BP70=(1+1/df$kappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.3))/(df$muA-df$rangeB))^2

df$SP90=(1+1/df$kappa)*(df$rangesd*(qnorm(1-df$alpha/2)+qnorm(1-0.1))/(df$muA-df$muB))^2
df$SP80=(1+1/df$kappa)*(df$rangesd*(qnorm(1-df$alpha/2)+qnorm(1-0.2))/(df$muA-df$muB))^2
df$SP70=(1+1/df$kappa)*(df$rangesd*(qnorm(1-df$alpha/2)+qnorm(1-0.3))/(df$muA-df$muB))^2

df$KP90=(1+1/df$rangekappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.1))/(df$muA-df$muB))^2
df$KP80=(1+1/df$rangekappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.2))/(df$muA-df$muB))^2
df$KP70=(1+1/df$rangekappa)*(df$sd*(qnorm(1-df$alpha/2)+qnorm(1-0.3))/(df$muA-df$muB))^2

library(tidyverse)
library(dygraphs)
df[,c(1,13,12,11)]%>%dygraph(main = "Simulation for MuA", ylab = "A group Sample size",xlab="Mean of A group")%>%
  dySeries("AP70",label="70%") %>%
  dySeries("AP80",label="80%") %>%
  dySeries("AP90",label="90%") %>%
  dyOptions(stepPlot = TRUE,fillGraph = TRUE, fillAlpha = 0.1,colors = RColorBrewer::brewer.pal(3, "Set2"))%>%
  dyRangeSelector(height = 40)
df[,c(2,16,15,14)]%>%dygraph(main = "Simulation for MuB", ylab = "B group Sample size",xlab="Mean of B group")%>%
  dySeries("BP70",label="70%") %>%
  dySeries("BP80",label="80%") %>%
  dySeries("BP90",label="90%") %>%
  dyOptions(stepPlot = TRUE,fillGraph = TRUE, fillAlpha = 0.1,colors = RColorBrewer::brewer.pal(3, "Set2"))%>%
  dyRangeSelector(height = 40)
df[,c(3,17,18,19)]%>%dygraph(main = "Simulation for Sd", ylab = "B group Sample size",xlab="Sd value")%>%
  dySeries("SP70",label="70%") %>%
  dySeries("SP80",label="80%") %>%
  dySeries("SP90",label="90%") %>%
  dyOptions(stepPlot = TRUE,fillGraph = TRUE, fillAlpha = 0.1,colors = RColorBrewer::brewer.pal(3, "Set2"))%>%
  dyRangeSelector(height = 40)
df[,c(4,20,21,22)]%>%dygraph(main = "Simulation for matching ratio", ylab = "B group Sample size",xlab="Matching ratio kappa")%>%
  dySeries("KP70",label="70%") %>%
  dySeries("KP80",label="80%") %>%
  dySeries("KP90",label="90%") %>%
  dyOptions(stepPlot = TRUE,fillGraph = TRUE, fillAlpha = 0.1,colors = RColorBrewer::brewer.pal(3, "Set2"))%>%
  dyRangeSelector(height = 40)

Now you can evaluate every possibilities of sample size for group B, that correspond to many possible conditions of mean, Sd, kappa or Power. Then you make the right decision by optimising the feasibility and statistical power. Larger sample sizes require more time, money and human resources for data collection.

I hope you enjoyed the show. See you next time for more tips on sample size estimation.

References: The formula was taken from:

Chow S, Shao J, Wang H. 2008. Sample Size Calculations in Clinical Research. 2nd Ed. Chapman & Hall/CRC Biostatistics Series. page 58.