Introduction

One of the most common tests in statistics is the t-test, used to determine whether the means of two groups are equal to each other. The assumption for the test is that both groups are sampled from normal distributions with equal variances. The null hypothesis is that the two means are equal, and the alternative is that they are not. It is known that under the null hypothesis, we can calculate a t-statistic that will follow a t-distribution with \(n1 + n2 - 2\) degrees of freedom. There is also a widely used modification of the t-test, known as Welch’s t-test that adjusts the number of degrees of freedom when the variances are thought not to be equal to each other. _(Source: http://statistics.berkeley.edu/computing/r-t-tests)_

Analysis

The length of odontoblasts (teeth) in each of 10 guinea pigs is observed by dosage and delivery modes of Vitamin C. There are three levels of dosage - 0.5mg, 1.0mg and 2mg. There are two delivery modes - orange juice (OJ) and ascorbic acid (VC).

Assumptions

We assume that,

The same guinea pig has been administered Vitamin C in all delivery modes and dosages. Therefore, we will do a paired sample test.
The guinea pigs represent a sample from a normal distribution of guinea pigs (go ahead, laugh!).
The variances of the sample will be equal.
There are two groups based on delivery modes - OJ (Orange Juice) Group and VC (Ascorbic Acid) Group

Exploratory analysis

We begin with a basic summary of the groups

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

tg<-data.frame(ToothGrowth)
summary(filter(tg,supp == "OJ"))

##       len        supp         dose      
##  Min.   : 8.20   OJ:30   Min.   :0.500  
##  1st Qu.:15.53   VC: 0   1st Qu.:0.500  
##  Median :22.70           Median :1.000  
##  Mean   :20.66           Mean   :1.167  
##  3rd Qu.:25.73           3rd Qu.:2.000  
##  Max.   :30.90           Max.   :2.000

#
summary(filter(tg,supp == "VC"))

##       len        supp         dose      
##  Min.   : 4.20   OJ: 0   Min.   :0.500  
##  1st Qu.:11.20   VC:30   1st Qu.:0.500  
##  Median :16.50           Median :1.000  
##  Mean   :16.96           Mean   :1.167  
##  3rd Qu.:23.10           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

library(dplyr)
tg<-data.frame(ToothGrowth)
summarise(group_by(tg,supp,dose),
          avgLen=mean(len))

## Source: local data frame [6 x 3]
## Groups: supp
## 
##   supp dose avgLen
## 1   OJ  0.5  13.23
## 2   OJ  1.0  22.70
## 3   OJ  2.0  26.06
## 4   VC  0.5   7.98
## 5   VC  1.0  16.77
## 6   VC  2.0  26.14

We notice that the mean of tooth length in OJ and VC group are similar for dosage of 2.0mg. Therefore, we state our null hypothesis as

With 2.0mg of dosage, there is not much change in average tooth length based on supplements.

Our alternative hypothesis is

With 2.0mg of dosage, there is an improvement in the average tooth length when supplement is changed to ascorbic acid from orange juice.

Tests

Are the sample data likely supporting the null hypothesis?

t-test

We will apply the t-test for paired sample test. For this test, we assume that the each guinea pig has been administered the correct dosage and delivery mode. To correctly identify the guniea pigs, we will first sort the data based on delivery modes and take the first 30 to denote the 10 guinea pigs for three dosages with orange juice as the delivery mode. The second 30 will denote the same guniea pigs - in that order - for the three dosages with ascorbic acid as the delivery mode.

library(dplyr)
tg<-data.frame(ToothGrowth)
tgSorted<-arrange(tg,
                  supp,
                  dose)
oj<-filter(tgSorted,supp=="OJ" & dose==2.0)
vc<-filter(tgSorted,supp=="VC" & dose==2.0)

t.test(oj$len,vc$len,paired=TRUE,var.equal = TRUE)

## 
##  Paired t-test
## 
## data:  oj$len and vc$len
## t = -0.0426, df = 9, p-value = 0.967
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.328976  4.168976
## sample estimates:
## mean of the differences 
##                   -0.08

Suppose, we assume that the variances are not equal. We apply the t-test as shown below.

library(dplyr)
tg<-data.frame(ToothGrowth)
tgSorted<-arrange(tg,
                  supp,
                  dose)
oj<-filter(tgSorted,supp=="OJ" & dose==2.0)
vc<-filter(tgSorted,supp=="VC" & dose==2.0)

t.test(oj$len,vc$len,paired=TRUE)

## 
##  Paired t-test
## 
## data:  oj$len and vc$len
## t = -0.0426, df = 9, p-value = 0.967
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.328976  4.168976
## sample estimates:
## mean of the differences 
##                   -0.08

Conclusion

With a high value of \(0.967\), the sample data is likely true given true null hypothesis.

Sources

Very interesting links around p-value :

ToothGrowth Data Analysis

Nagesh Subrahmanyam

Sunday 22 March 2015