One of the most common tests in statistics is the t-test, used to determine whether the means of two groups are equal to each other. The assumption for the test is that both groups are sampled from normal distributions with equal variances. The null hypothesis is that the two means are equal, and the alternative is that they are not. It is known that under the null hypothesis, we can calculate a t-statistic that will follow a t-distribution with \(n1 + n2 - 2\) degrees of freedom. There is also a widely used modification of the t-test, known as Welch’s t-test that adjusts the number of degrees of freedom when the variances are thought not to be equal to each other. _(Source: http://statistics.berkeley.edu/computing/r-t-tests)_
The length of odontoblasts (teeth) in each of 10 guinea pigs is observed by dosage and delivery modes of Vitamin C. There are three levels of dosage - 0.5mg, 1.0mg and 2mg. There are two delivery modes - orange juice (OJ) and ascorbic acid (VC).
We assume that,
We begin with a basic summary of the groups
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
tg<-data.frame(ToothGrowth)
summary(filter(tg,supp == "OJ"))
## len supp dose
## Min. : 8.20 OJ:30 Min. :0.500
## 1st Qu.:15.53 VC: 0 1st Qu.:0.500
## Median :22.70 Median :1.000
## Mean :20.66 Mean :1.167
## 3rd Qu.:25.73 3rd Qu.:2.000
## Max. :30.90 Max. :2.000
#
summary(filter(tg,supp == "VC"))
## len supp dose
## Min. : 4.20 OJ: 0 Min. :0.500
## 1st Qu.:11.20 VC:30 1st Qu.:0.500
## Median :16.50 Median :1.000
## Mean :16.96 Mean :1.167
## 3rd Qu.:23.10 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
#
library(dplyr)
tg<-data.frame(ToothGrowth)
summarise(group_by(tg,supp,dose),
avgLen=mean(len))
## Source: local data frame [6 x 3]
## Groups: supp
##
## supp dose avgLen
## 1 OJ 0.5 13.23
## 2 OJ 1.0 22.70
## 3 OJ 2.0 26.06
## 4 VC 0.5 7.98
## 5 VC 1.0 16.77
## 6 VC 2.0 26.14
#
We notice that the mean of tooth length in OJ and VC group are similar for dosage of 2.0mg. Therefore, we state our null hypothesis as
With 2.0mg of dosage, there is not much change in average tooth length based on supplements.
Our alternative hypothesis is
With 2.0mg of dosage, there is an improvement in the average tooth length when supplement is changed to ascorbic acid from orange juice.
Are the sample data likely supporting the null hypothesis?
We will apply the t-test for paired sample test. For this test, we assume that the each guinea pig has been administered the correct dosage and delivery mode. To correctly identify the guniea pigs, we will first sort the data based on delivery modes and take the first 30 to denote the 10 guinea pigs for three dosages with orange juice as the delivery mode. The second 30 will denote the same guniea pigs - in that order - for the three dosages with ascorbic acid as the delivery mode.
library(dplyr)
tg<-data.frame(ToothGrowth)
tgSorted<-arrange(tg,
supp,
dose)
oj<-filter(tgSorted,supp=="OJ" & dose==2.0)
vc<-filter(tgSorted,supp=="VC" & dose==2.0)
t.test(oj$len,vc$len,paired=TRUE,var.equal = TRUE)
##
## Paired t-test
##
## data: oj$len and vc$len
## t = -0.0426, df = 9, p-value = 0.967
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.328976 4.168976
## sample estimates:
## mean of the differences
## -0.08
Suppose, we assume that the variances are not equal. We apply the t-test as shown below.
library(dplyr)
tg<-data.frame(ToothGrowth)
tgSorted<-arrange(tg,
supp,
dose)
oj<-filter(tgSorted,supp=="OJ" & dose==2.0)
vc<-filter(tgSorted,supp=="VC" & dose==2.0)
t.test(oj$len,vc$len,paired=TRUE)
##
## Paired t-test
##
## data: oj$len and vc$len
## t = -0.0426, df = 9, p-value = 0.967
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.328976 4.168976
## sample estimates:
## mean of the differences
## -0.08
With a high value of \(0.967\), the sample data is likely true given true null hypothesis.