by epigenus
December 2014
For this project we are asked to statistically analyze the Tooth Growth dataset provided with the base R package. The Tooth Growth experiment documented the effects of vitamin C on the tooth length of guinea pigs as the source and dosage of vitamin C were varied. In this article we setup and perform statistical tests to determine if which sources and dosages had a statistically significant effect on the tooth length.
The available documentation on the Tooth Growth dataset can be loaded using help(ToothGrowth) command in R. Also of note: we will make use of Hadley Wickham’s ggplot2 library for graphing.
require(ggplot2)
The documentation describes the experiment: 10 guinea pigs were each given 3 dosages of vitamin C (0.5 mg, 1.0 mg, 2.0 mg) from two sources of vitamin C (orange juice - OJ and ascorbic acid - VC). The tooth length of each guinea pig was then recorded.
A quick look at the data shows us there are three items of data recorded for each test variable - len (tooth length), dose (dosage), and supp (supplement source), for a total of 60 data points. This limited sample size will make the t-distribution the most appropriate model for our data.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
We see that the pigs are not labeled in the dataset. This leads us to a neccessary assumption that the tooth length data will be treated as coming from independent groups.
We also see that the dose variable is not treated as a classifier (factor). We correct that now for our convienience.
ToothGrowth$dose <- factor(ToothGrowth$dose, levels=c(2.0, 1.0, 0.5))
With only one response variable (tooth length), from one type of population (guinea pigs), being documented we feel safe to make an assumption of constant variance across the indepedent groups.
Finally we wish to get a sense of the data to form some hypotheses. We have not performed an analysis yet, so we use box plots to display the gross effects of dosage and supplement source on the length of the guinea pig teeth.
From this initial exploration we see that higher dosages seems to have an positive effect on tooth length, while orange juice might be a better source of vitamin C than the ascorbic acid supplement for tooth growth. So let’s state and test these hypotheses.
For each pairing of variables we will perform a 95% confidence interval t-test.
We use the following assumptions:
Each test will take the form:
The property of transitivity applies to this data set in the follwing way:
To test the gross effects of vitamin C dosage on tooth growth (irregardless of source), we will run the tests on the following pairs:
To accomplish this we create subsets of the data:
pairAB <- subset(ToothGrowth, dose %in% c(2.0, 1.0))
pairBC <- subset(ToothGrowth, dose %in% c(1.0, 0.5))
Then we run the t test with our stated assumptions:
testAB <- t.test(len~dose, paired=FALSE, var.equal=TRUE, data=pairAB)$conf
testBC <- t.test(len~dose, paired=FALSE, var.equal=TRUE, data=pairBC)$conf
testAB
## [1] 3.736 8.994
## attr(,"conf.level")
## [1] 0.95
From the AB pair test we see that the 95% confidence interval does not contain zero. We therefore reject the null hypothesis that the 2.0mg dose and the 1.0 mg dose had the same effect on tooth length.
testBC
## [1] 6.276 11.984
## attr(,"conf.level")
## [1] 0.95
From the BC pair test we see that the 95% confidence interval does not contain zero. We therefore reject the null hypothesis that the 1.0mg dose and the 0.5 mg dose had the same effect on tooth length.
By transitivity, we also reject the null hypothesis that the 2.0mg and the 0.5 mg doses had the same effect on tooth length.
Having rejected all three null hypothesis, we favor the alternative hypothesis (with 95% confidence) that, in general, higher doses of vitamin C increase tooth growth in guinea pigs.
To test the gross effects of vitamin C source on tooth growth (irregardless of dosage), we will run the tests on the following pair:
We do not need to subset the data because there are already only two sources of vitamin C. So we run the t test with our stated assumptions:
testAB <- t.test(len~supp, paired=FALSE, var.equal=TRUE, data=ToothGrowth)$conf
testAB
## [1] -0.167 7.567
## attr(,"conf.level")
## [1] 0.95
From the AB pair test we see that the 95% confidence interval does contain zero. We therefore fail to reject the null hypothesis that the choice between orange juice and ascorbic acid had the same gross effect on tooth length.
To test the specific effect of vitamin C source on tooth growth by dosage level, we will run the tests on the following pairs:
Note transitivity does not directly apply here.
To accomplish this we create subsets of the data:
dosepairAB <- subset(ToothGrowth, dose %in% c(0.5))
dosepairCD <- subset(ToothGrowth, dose %in% c(1.0))
dosepairEF <- subset(ToothGrowth, dose %in% c(2.0))
Then we run the t test with our stated assumptions:
dosetestAB <- t.test(len~supp, paired=FALSE, var.equal=TRUE, data=dosepairAB)$conf
dosetestCD <- t.test(len~supp, paired=FALSE, var.equal=TRUE, data=dosepairCD)$conf
dosetestEF <- t.test(len~supp, paired=FALSE, var.equal=TRUE, data=dosepairEF)$conf
dosetestAB
## [1] 1.77 8.73
## attr(,"conf.level")
## [1] 0.95
From the AB pair test we see that the 95% confidence interval does not contain zero. We therefore reject the null hypothesis that the choice between orange juice and ascorbic acid had the same overall effect on tooth length, for a 0.5mg dose
dosetestCD
## [1] 2.841 9.019
## attr(,"conf.level")
## [1] 0.95
From the CD pair test we see that the 95% confidence interval does not contain zero. We therefore reject the null hypothesis that the choice between orange juice and ascorbic acid had the same overall effect on tooth length, for a 1.0mg dose
dosetestEF
## [1] -3.723 3.563
## attr(,"conf.level")
## [1] 0.95
From the EF pair test we see that the 95% confidence interval does contain zero. We therefore fail to reject the null hypothesis that the choice between orange juice and ascorbic acid had the same overall effect on tooth length, for a 2.0mg dose
From these results we infer that the choice between sources of vitamin C has an effect on tooth growth dependent on dosage levels, but, we can not reject an assertion that the two sources had equal effect on tooth growth at the highest dosage level tested.
We give a sense of this result graphically below.