The ToothGrowth data set measures the growth of teeth in guinea pigs in response to being given varying doses of Vitamin C via two delivery methods, orange juice or ascorbic acid.
This analysis provides an answer to these questions:
We found that:
For a 2.0mg dosage, there was no significant difference in tooth growth for the guinea pigs given orange juice verses those given ascorbic acid.
For dosages lower than 2.0mg, guinea pigs given orange juice had longer teeth on average (18.0mm) than those given ascorbic acid (12.4mm).
Guinea pigs given 1.0mg doses had longer teeth on average (19.8mm) than those given 0.5mg doses (10.6mm).
The details of the analysis is described in the following sections.
The ToothGrowth data set is a standard data set in R. Let’s load the data set:
require("datasets")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
And look at an excerpt of the documentation for more information on the columns:
?ToothGrowth
The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at
each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery
methods (orange juice or ascorbic acid).
A data frame with 60 observations on 3 variables:
[,1] len numeric Tooth length
[,2] supp factor Supplement type (VC or OJ)
[,3] dose numeric Dose in milligrams
Note:
Tooth length is the dependent variable. Let’s look at how many samples we have for the independent variables, supplement type and dosage:
table(ToothGrowth$supp, ToothGrowth$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
We see there are 10 samples for each dosage of each supplement, 20 samples for each dosage, and 30 samples for each supplement type. 10 is a small number of samples. Let’s keep that in mind in later analysis.
We can do our analysis for each supplement type, dosage pair, or we analyze a single variable at an aggregate level, ignoring the other variable. For example, comparing supplement type and ignoring dosage levels. Again, let’s keep this in mind.
Let’s look at a box plot of tooth length broken down by supplement type and dosage:
require(ggplot2)
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=supp)) +
geom_boxplot() +
ggtitle('Tooth Length by Supplement Type and Dosage') +
xlab('Dosage (mgs)') +
ylab('Tooth Length (mm)') +
guides(fill=guide_legend(title='Supplement Type'))
From this plot, we see:
I know this is a somewhat arbitrary decision, but:
Based on those reasons, I’m going exclude the 2.0 mg dosage data from the rest of the analysis:
require(dplyr)
myToothData <- ToothGrowth %>% filter(dose != 2)
table(ToothGrowth$supp, ToothGrowth$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
For this analysis, we’re going to do a Welch’s t-test. We make the following assumptions (from the wikipedia entry for Welch’s t test):
We have seen that the supplement type appears to have an impact on tooth growth, now let’s do a formal test to see if we can legitimately make that claim.
We will compare the mean tooth length for each of the two supplement types, OJ and VC, across all dosage levels.
For this test:
Let’s do the test (note we’re saying the two sets are not paired and that the variances are not equal):
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=myToothData)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.0503, df = 36.553, p-value = 0.004239
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.875234 9.304766
## sample estimates:
## mean in group OJ mean in group VC
## 17.965 12.375
The output of the test states that:
So, since the 95% confidence interval does not contain 0 and the p-value is less that 5%, we reject the null hypothesis and conclude that:
Now let’s test two dosage levels. For this test:
t.test(len ~ dose, paired=FALSE, var.equal=FALSE, data = myToothData)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
The output of the test states that:
So, since the 95% confidence interval does not contain 0 and the p-value is less that 5%, we reject the null hypothesis and conclude that: