Joel Correa da Rosa
March 29th, 2017
There are two main goals for sample size calculation
When estimating the mean of a sample, the confidence interval is written as
\( \bar{x}\pm Z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} \)
The goal is to have the smallest sample size that
\( 2Z_{1-\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\leq w \)
Equivalentlyy
\( n\geq 4Z_{1-\frac{\alpha}{2}}^2\frac{\sigma^2}{w^2} \)
A clinician wishes to estimate the mean serum albumin level in a specific population of patients with primary billiary cirrhosis of the liver. The goal is to obtain a tight confidence interval around the estimated mean. An earlier study found a mean of 35g/L and a standard deviation of 6g/L. If she wishes to estimate the mean in a new population with 95% confidence of width \( w= \) 4g/L. How many patients should she enroll in this study ?
# standard deviation
sigma<-6
# precision
w<-4
# significance level
alpha<-0.05
# quantile for a significance level
Zalpha<-qnorm(1-alpha/2)
# sample size calculation
n<-4*Zalpha^2*sigma^2/(w^2)
n
[1] 34.57313
For binomial proportions, the mean \( \bar{x} \) will be replaced by the proportion in the sample \( \hat{p} \) and the variance \( \sigma \), used for the confidence interval around the mean will be replaced by \( p(1-p) \). As a consequence, the sample size to get \( 1-\alpha \) confidence is obtained as :
\( n\geq 4Z_{1-\frac{\alpha}{2}}^2\frac{p(1-p)}{w^2} \)
Important: This sample size is using the approximation stated in the Central Limit Theorem.
Suppose a hematologist wishes to estimate the prevalence of Factor V Leiden among patients treated for a deep vein trombosis. On the basis of past studies, she expects this prevalence to be approximately 25%. If she wishes to obtain a 95% confidence interval with a width of 0.1 (on average) for this prevalence, what will be the minimum number of subjects to be enrolled?
# standard deviation for a proportion
sigma<-sqrt(0.5*(0.5))
# width of the confidence interval
w<-0.1
# significance level
alpha<-0.05
# quantile for a significance level
Zalpha<-qnorm(1-alpha/2)
# sample size calculation
n<-4*Zalpha^2*sigma^2/(w^2)
n
[1] 384.1459
Differently from confidence intervals, hypotheses testing is subject to decision errors.
Type-I Error: Medical practice may switch with resultant costs.
Type-II Error: Medical practice would remain unaltered.
Sample size evaluation aims to minimize the probability of both errors.
\( \alpha \) = \( Pr \)(Type-I Error) = Pr(Rej H0|H0 is True)
\( \beta \) = \( Pr \)(Type-II Error) = Pr(Accept H0|H1 is True)
Power: \( 1-\beta \)=\( Pr \)(Rej H0|H1 is True)
Consider the folllowing hypotheses about a population mean:
\( H_0:\mu=\mu_0 \)
\( H_1:\mu=\mu_1 \)
Denote \( \delta = \mu_1-\mu_0 \), the scientifically or clinically meaningful. The decision about this hypothesis test will be done based on the statistics:
\( z = \frac{\bar{x}-\mu_0}{\frac{\sigma}{\sqrt{n}}} \)
If the significance level \( \alpha \) and the type-II error \( \beta \) are fixed, the required sample size is given by:
\( n=\frac{(Z_{1-\frac{\alpha}{2}}+Z_{1-\beta})^2\sigma^2}{\delta^2} \)
Patients with hypertrophic cardiomyopathy (HCM) have enlarged left ventricles (mean 300 g) compared with the general population (mean 120 g). A cardiologist studying a particular genetic mutation that causes HCM wishes to determine whether the mean left ventricular mass of patients with this particular mutation differs from the mean for other patients with HCM. If the true difference equals or exceeds the meaningful difference of \( \delta=10 \) g in either direction it is important to reject the null hypothesis of equality (\( \mu= \) 300g).If a past study suggest that \( \sigma= \) 30g, what is the minimum sample size that will guarantee a minimum 5% significance with 90% power ?
# clinically meaningful difference
delta<-10
# standard deviation
sigma<-30
# significance level
alpha<-0.05
# type-II error probability
beta<-0.1
# quantile for the significance level
Zalpha<-qnorm(1-alpha/2)
# quantile for power
Zbeta<-qnorm(1-beta)
# sample size calculation
n<-(((Zalpha+Zbeta)^2)*sigma^2)/delta^2
n
[1] 94.56681
We can again adapt the sample size calculation for the hypothesis test derived for the mean population to the case where we are interested in making inference about a proportion.
The minimum sample size for a fixed significance level and power is obtained by:
\( n=\frac{(Z_{1-\frac{\alpha}{2}}\sqrt{p_0(1-p_0)}+Z_{1-\beta}\sqrt{p_1(1-p_1)})^2}{\delta^2} \)
obs:\( \alpha_2 \) for two-sided tests and \( \alpha \) for one-sided.
Suppose an oncologist wishes to conduct a Phase II (safety/efficacy) clinical trial to test a new cancer drug. If only 20% of patients will benefit from this drug, she does not wish to continue to study it because drugs with comparable efficacy are already available. Conversely, if at least 40% of patients will benefit from this drug, she wishes to have an 80% chance to reject the null hypothesis and consequently to continue the study. Using a one sided z-test at 5% significance level and 80% power, how many participants should she enroll in this clinical trial?
# clinically meaningful difference
delta<-0.2
# standard deviation according to H0
sigma1<-sqrt(0.2*0.8)
# standard deviation according to H1
sigma2<-sqrt(0.4*0.6)
# significance level
alpha<-0.05
# type-II error probability
beta<-0.2
# quantile for the significance level
Zalpha<-qnorm(1-alpha)
# quantile for power
Zbeta<-qnorm(1-beta)
# sample size calculation
n<-((Zalpha*sigma1+Zbeta*sigma2)^2)/delta^2
n
[1] 28.63587
If we have a sample of pairs (\( (x,y) \)), each one coming from the same subject, consider the two hypothesis:
\( H_0:\mu_d = \mu_0 \)
\( H_1:\mu_d = \mu_1 \)
Again let's consider \( \delta = \mu_1-\mu_0 \) the scientifically or clinically meaningful difference and \( \bar{d}=\bar{x}-\bar{y} \), the sample difference in means. The test statistic for this setup is given by:
\( z = \frac{\bar{d}-\mu_0}{\frac{\sigma_d}{\sqrt{n}}} \)
The required sample size is given by:
\( n=\frac{(Z_{1-\frac{\alpha}{2}}+Z_{1-\beta})^2\sigma_d^2}{\delta^2} \)
Suppose an investigator wishes to design a pilot study to investigate the effect of a new medication on diastolic blood pressure in hypertensive patients. He plans to take two measurements of each subject, one measurement at baseline when the subject has not yet taken the medication(\( x \)), followed by a second measurement when the subject has been taking the medication for 12 weeks(\( y \)). Past laboratory measurements suggest that \( \sigma_x=\sigma_y= \) 20 mmHg. The investigator wishes to perform a two-sided paired z-test at the 5% significance level regarding whether there is a change in average diastolic blood pressure on the new medication. He wants a 90% chance to reject the null hypothesis of equality if the true difference is \( \delta= \) 3mm Hg in either direction. If past measurements suggest that the standard deviation of the difference \( \sigma_d= \) 15 mm Hg, what sample size does he need ?
# clinically meaningful difference
delta<-3
# standard deviation for the difference
sigma_d<-15
# significance level
alpha<-0.05
# type-II error probability
beta<-0.1
# quantile for the significance level
Zalpha<-qnorm(1-alpha/2)
# quantile for power
Zbeta<-qnorm(1-beta)
# sample size calculation
n<-(((Zalpha+Zbeta)^2)*sigma_d^2)/delta^2
n
[1] 262.6856
We will use the R for creating a function to estimate the sample size for Example 05 according to different values for \( \beta \), the type II error probability.
# a function to evaluate the sample size as a function of the power
eval.n<-function(beta,alpha=0.05,sigma_d=15,delta=3){
# quantile for the significance level
Zalpha<-qnorm(1-alpha/2)
# quantile for power
Zbeta<-qnorm(1-beta)
# sample size calculation
n<-(((Zalpha+Zbeta)^2)*sigma_d^2)/delta^2
return(n)
}
beta<-seq(0.5,0.001,-0.0001)
plot(1-beta,eval.n(beta),xlab='power',ylab='sample size')
Suppose an infectious disease specialist wishes to estimate the mean CD4 counts among a population of HIV-infected pregnant women before starting treatment. He expects the data to have (approximately ) a Normal distribution with a mean of 500 cells/\( mm^3 \) and a standard deviation of 50 cells/\( mm^3 \). If he wishes to obtain a 95% confidence interval with a width of 20 cells/\( mm^3 \) for the true mean, show that he should enroll at least n=97 subjects in this study.
Show graphically the increase in power \( 1-\beta \) according to the increase in a) significance level, b) increase in sample size, c) increase in th meaningful difference (\( \delta \)).
Do the plots for one-sided and two-sided hypothesis considering the study of one population mean assuming that the outcome is normally distributed.
Suppose a biochemist wishes to study homocysteine levels in blood specimens from men older than 50 who have cardiovascular disease. The mean serum homocysteine level among these men is 14\( \mu mol/L \) before treatment, and she wants an 80% chance to reject the null hypothesis of no change if the mean serum homocysteine level drops to 12 \( \mu mol/L \) after these men take folate tablets for 10 weeks. She plans to use a one sided z-test at the 5% significance level. If the standard deviation is 4\( \mu mol/L \), show that she needs to enroll at least \( n=25 \) patients in this study.
Suppose a nutritionist wishes to study the weight change among obese men (BMI>30) on a 16-week low-fat diet, complemented by daily exercise. Assume that the standard deviations of the before and after weights are both 25kg whereas the standard deviation of the differences is 15kg. She plans to use a two-sided paired z-test at the 5% significance level. She wishes to have a 90% chance to reject the null hypothesis of equality when the true change in weight (in either direction) is 8 kg. Find the minimum sample size to power her study.
Describe a problem that you would be potentially interested in sample size evaluation (sample size calculation or power analysis) and enumerate the elements that would enable the calculation.