In this tutorial, we will use the tools available in R and R-Studio softwares for hypothesis testing in the most important and frequent situations.
Consider an investigator that is interested in the proportion (rate) of adverse events after application of a drug. The statistical summary is simple in this case, 35 out of 125 patients have showed at least one adverse event.
prop.test(35,125)
##
## 1-sample proportions test with continuity correction
##
## data: 35 out of 125, null probability 0.5
## X-squared = 23.328, df = 1, p-value = 1.366e-06
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.2052033 0.3685699
## sample estimates:
## p
## 0.28
If the investigator expects the proportion of adverse events to be lower than 20%, in other words, he is willing to test the hypothesis
\(H_0:p=0.2\)
\(H_1: p<0.2\)
prop.test(35,125,p=0.2,alternative = 'less')
##
## 1-sample proportions test with continuity correction
##
## data: 35 out of 125, null probability 0.2
## X-squared = 4.5125, df = 1, p-value = 0.9832
## alternative hypothesis: true p is less than 0.2
## 95 percent confidence interval:
## 0.0000000 0.3543709
## sample estimates:
## p
## 0.28
The function prop.test does simultaneously hypothesis testing and confidence interval evaluation with approximation to the normal distribution. To build exact confidence intervals based on the binomial distribution, someone can use the binom.test function.
binom.test(35,125,alternative = 'less')
##
## Exact binomial test
##
## data: 35 and 125
## number of successes = 35, number of trials = 125, p-value =
## 4.62e-07
## alternative hypothesis: true probability of success is less than 0.5
## 95 percent confidence interval:
## 0.0000000 0.3535727
## sample estimates:
## probability of success
## 0.28
Suppose an oncologist wishes to conduct a Phase II (safety/efficacy) clinical trial to test a new cancer drug. If only 20% of patients will benefit from this drug, she does not wish to continue to study it because drugs with comparable efficacy are already available. Conversely, if at least 40% of patients will benefit from this drug, she wishes to have an 80% chance to reject the null hypothesis and consequently to continue the study. Using a one sided z-test at 5% significance level and 80% power, how many participants should she enroll in this clinical trial?
The sample size formulation for this hypothesis testing is given by:
\(n=\frac{(Z_{1-\frac{\alpha}{2}}\sqrt{p_0(1-p_0)}+Z_{1-\beta}\sqrt{p_1(1-p_1)})^2}{\delta^2}\)
There is no implementation of this formula in the R base.
To overcome this problem, we will write a R function to implement this calculation.
power.one.prop<-function(alpha,beta,p0,p1){
# clinically meaningful difference
delta<-abs(p1-p0)
# standard deviation according to H0
sigma0<-sqrt(p0*(1-p0))
# standard deviation according to H1
sigma1<-sqrt(p1*(1-p1))
# quantile for the significance level
Zalpha<-qnorm(1-alpha)
# quantile for power
Zbeta<-qnorm(1-beta)
# sample size calculation
n<-((Zalpha*sigma0+Zbeta*sigma1)^2)/delta^2
return(n)
}
power.one.prop(alpha=0.05,beta=0.2,p0=0.2,p1=0.4)
## [1] 28.63587
As seen in R Tutorial Part I, we will use dataset: Diabetes in Pima Indian Women.
# loading a library with functions and datasets
library(MASS)
# loading a data set called Pima.tr
data(Pima.tr)
# dimensions of the data frame (number of rows / number of columns)
dim(Pima.tr)
## [1] 200 8
# displaying the first 6 rows of the dataset
head(Pima.tr)
## npreg glu bp skin bmi ped age type
## 1 5 86 68 28 30.2 0.364 24 No
## 2 7 195 70 33 25.1 0.163 55 Yes
## 3 5 77 82 41 35.8 0.156 35 No
## 4 0 165 76 43 47.9 0.259 26 No
## 5 0 107 60 25 26.4 0.133 23 No
## 6 5 97 76 27 35.6 0.378 52 Yes
We will test the hypothesis that the proportions of diabetic Pima Indian Women will differ between lean (bmi<30) and obese (bmi>30).
# create a variable that indicates weight group
Pima.tr$obese<-ifelse(Pima.tr$bmi>30,'obese','lean')
# Frequency table
tab.obese.diabetes<-table(Pima.tr$obese,Pima.tr$type)
tab.obese.diabetes
##
## No Yes
## lean 58 10
## obese 74 58
# Proportions in rows
prop.table(tab.obese.diabetes,margin=1)
##
## No Yes
## lean 0.8529412 0.1470588
## obese 0.5606061 0.4393939
# Proportions in columns
prop.table(tab.obese.diabetes,margin = 2)
##
## No Yes
## lean 0.4393939 0.1470588
## obese 0.5606061 0.8529412
# barplot
barplot(tab.obese.diabetes,beside = TRUE)
test.prop<-prop.test(tab.obese.diabetes)
test.prop
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: tab.obese.diabetes
## X-squared = 15.814, df = 1, p-value = 6.988e-05
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1618019 0.4228683
## sample estimates:
## prop 1 prop 2
## 0.8529412 0.5606061
Alternatively, we will use the Chi-square test and the Fisher’s Exact Test
chisq.test(tab.obese.diabetes)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tab.obese.diabetes
## X-squared = 15.814, df = 1, p-value = 6.988e-05
fisher.test(tab.obese.diabetes)
##
## Fisher's Exact Test for Count Data
##
## data: tab.obese.diabetes
## p-value = 3.314e-05
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.059697 10.788318
## sample estimates:
## odds ratio
## 4.513701
A placebo-controlled randomized trial proposes to assess the effectiveness of Drug A in curing infants suffering from sepsis. A previous study showed that the proportion of subjects cured by Drug A is 50% and a clinically important difference of 16% as compared to placebo is acceptable. What is the sample size that provides 80% power at 5% significance to detect the clinically important difference?
power.prop.test(p1=0.5,p2=0.34,power=0.8,sig.level = 0.05)
##
## Two-sample comparison of proportions power calculation
##
## n = 148.1896
## p1 = 0.5
## p2 = 0.34
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
Consider in the Pima Indian Women dataset that we want to test the hypothesis that the mean bmi is different from 30.
\(H_0:\mu=30\)
\(H_1:\mu \neq 30\)
t.test(Pima.tr$bmi,mu = 30)
##
## One Sample t-test
##
## data: Pima.tr$bmi
## t = 5.3291, df = 199, p-value = 2.661e-07
## alternative hypothesis: true mean is not equal to 30
## 95 percent confidence interval:
## 31.45521 33.16479
## sample estimates:
## mean of x
## 32.31
The function t.test generates results that can be stored for future use.
mytest<-t.test(Pima.tr$bmi,mu=30)
To see the stored contents of this object called mytest just type:
names(mytest)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "alternative" "method" "data.name"
Patients with hypertrophic cardiomyopathy (HCM) have enlarged left ventricles (mean 300 g) compared with the general population (mean 120 g). A cardiologist studying a particular genetic mutation that causes HCM wishes to determine whether the mean left ventricular mass of patients with this particular mutation differs from the mean for other patients with HCM. If the true difference equals or exceeds the meaningful difference of \(\delta=10\) g in either direction it is important to reject the null hypothesis of equality (\(\mu=\) 300g).If a past study suggest that \(\sigma=\) 30g, what is the minimum sample size that will guarantee a minimum 5% significance with 90% power ?
power.t.test(delta=10,sd=30,power=0.9,sig.level = 0.05)
##
## Two-sample t test power calculation
##
## n = 190.0991
## delta = 10
## sd = 30
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
##
## NOTE: n is number in *each* group
We will again use the Pima Indian Women example to test the hypothesis that mean bmi is different between diabetic and non-diabetic Indian.
\(H_0:\mu_{diabetic}=\mu_{non diabetic}\)
\(H_1:\mu_{diabetic}\neq \mu_{non diabetic}\)
# boxplot is a graphic that indicates differences
boxplot(Pima.tr$bmi~Pima.tr$type)
# t.test as implemented in R
t.test(Pima.tr$bmi~Pima.tr$type)
##
## Welch Two Sample t-test
##
## data: Pima.tr$bmi by Pima.tr$type
## t = -4.512, df = 171.46, p-value = 1.188e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.224615 -2.044547
## sample estimates:
## mean in group No mean in group Yes
## 31.07424 34.70882
Suppose as investigator wishes to design a pilot study to investigate the effect of a new medication on diastolic blood pressure in hypertensive patients by using a parallel-groups design. He plans to randomly assign patients to receive either the treatment or a placebo. Measurements will be collected at baseline and after 12 weeks’ follow up. The investigator plans to use a two-sided z-test to determine whether the change in the treatment group is different from the placebo at the 5% significance level (\(\alpha = 0.05\)). He wants 90% chance to reject the null hypothesis of equality if the true difference is \(\delta=3\) mmHg in either direction. If past measurements suggest that the common standard deviation of the changes in both groups is \(\sigma=15\) mmHg, what sample size does he need for each group ?
power.t.test(delta = 3,power=0.9,sd=15,type='two.sample')
##
## Two-sample t test power calculation
##
## n = 526.3334
## delta = 3
## sd = 15
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
##
## NOTE: n is number in *each* group
Suppose an infectious disease specialist wishes to estimate the mean CD4 counts among a population of HIV-infected pregnant women before starting treatment. He expects the data to have (approximately ) a Normal distribution with a mean of 500 cells/\(mm^3\) and a standard deviation of 50 cells/\(mm^3\). If he wishes to obtain a 95% confidence interval with a width of 20 cells/\(mm^3\) for the true mean, show that he should enroll at least n=97 subjects in this study.
Show graphically the increase in power \(1-\beta\) according to the increase in a) significance level, b) increase in sample size, c) increase in th meaningful difference (\(\delta\)).
Do the plots for one-sided and two-sided hypothesis considering the study of one population mean assuming that the outcome is normally distributed.
Suppose a biochemist wishes to study homocysteine levels in blood specimens from men older than 50 who have cardiovascular disease. The mean serum homocysteine level among these men is 14\(\mu mol/L\) before treatment, and she wants an 80% chance to reject the null hypothesis of no change if the mean serum homocysteine level drops to 12 \(\mu mol/L\) after these men take folate tablets for 10 weeks. She plans to use a one sided z-test at the 5% significance level. If the standard deviation is 4\(\mu mol/L\), show that she needs to enroll at least \(n=25\) patients in this study.
A psychologist wishes to design a randomized parallel-groups trial to compare the impact of white noise and classical music on the performance of college students on a 200 question problem solving test. On the basis of ther knowledge of past students, she expects the white noise group to have a bell-shaped distribution with a mean of a 120 points and a standard deviation of 15 points. She plans to compare the performance of the students in each group with a two-sided z-test at the 5% significance level. She wants 90% power to detect a better performance of the classical music group by 10 points. What are the sample sizes if : - the standard deviation for the music group is also 15 points - the standard deviation of the music group is 30 points - the standard deviation of the music group is 30 points and she wishes to enroll twice as many students in the music group as in the white noise group.