Now in the second portion of the project, I will analyze the ToothGrowth data in the R datasets package. I will:
The data is part of the R datasets. We will load it and convert the dose to a factor for ease of grouping the data by dose, in the interest of more readable plots I will also change the levels of the supplement factor.
library(datasets)
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
levels(ToothGrowth$supp) <- c("Orange Juice","Vitamin C")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 Orange Juice:30 0.5:20
## 1st Qu.:13.07 Vitamin C :30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
Let’s look at some boxplots of the length by supplement and by dose.
par(mfrow=c(1,2))
boxplot(ToothGrowth$len ~ ToothGrowth$supp,col=c("red","green"))
title(main="Length by Supplement", xlab="Supplement",ylab="Length")
boxplot(ToothGrowth$len ~ ToothGrowth$dose, col=c("yellow","orange","red"),xlab="Dose",ylab="Length")
title(main="Length by Dose")
The Length by Dose plot above indicates that there is a clear relationship between the dose of the supplement and the length. The Length by Supplement suggests a relationship, however it seems to be much less clear as the medians of each are contained within the 1st to 3rd quartiles of the other.
Let’s take a closer look at the breakdown of the length data:
ggplot(ToothGrowth, aes(x=dose,y=len)) + geom_boxplot(aes(fill=supp)) + labs(title="Length by Dose, by supplement",x="Dose",y="Length")
It appears that at a dose of 2 the two supplements are equally effective, although Vitamin C has a greater variance than Orange Juice. At lower doses Orange Juice appears to be more effective than Vitamin C.
Based on the plots above I will test the null hypotheses that the two supplements provide equal growth at each of the three dosage levels.
Null Hypothesis 1 is that the two supplements provide equivalent growth at a dose of 0.5.
lowdose <- subset(ToothGrowth, dose == 0.5)
conf <- t.test(len ~ supp, data=lowdose)
conf$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
The confidence interval does not include 0, the test indicates that the difference in means is not equal to 0 with 95% confidence. The p value of 0.0063586 indicates that we can reject the null hypothesis and conclude that Orange Juice does provide greater growth at a dose of 0.5mg/day.
Null Hypothesis 2 is that the two supplements provide equivalent growth at a dose of 1.0mg/day.
middose <- subset(ToothGrowth, dose == 1)
conf2 <- t.test(len ~ supp, data=middose)
conf2$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
Again, the confidence interval does not include 0 with a confidence level of 95%. The p-value of 0.0010384 is lower than the threshhold of 0.05, so we can also reject this hypothesis and conclude that Orange Juice also provides greater growth at a dose of 1mg/day.
Null Hypothesis 3 is that the two supplements provide equivalent growth at a dose of 2.0 mg/day.
highdose <- subset(ToothGrowth, dose == 2)
conf3 <- t.test(len ~ supp, data=highdose)
conf3$conf.int
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
For this hypothesis the confidence level does include 0, in fact it seems to center around 0, with 95% confidence. The p value is 0.9638516 which indicates that we can not reject this hypothesis so conclude that at a dose of 2.0mg/day Orange Juice and Vitamin C provide equivalent benefits to tooth growth.
Based on the T tests, I must conclude that Orange Juice is more effective at improving tooth growth than Vitamin C at doses of 0.5mg/day and 1mg/day. At a dose of 2.0mg/day there does not appear to be a significant difference between the two supplements.
This analysis assumes that the tooth lengths have a normal distribution, which may be a questionable assumption given the high variability of the data.