This exploratory analysis looks at the dataset ToothGrowth, which measures the effects of vitamin C on tooth growth in Guinea pigs. The data cover the following variables:
len - numeric tooth length
supp - supplemental type - vitamin C (VC) or orange juice(OJ)
dose - numeric dose in milligrams per day
The summary statistics and plots below seem to indicate that there is a difference in tooth length based on the type of supplement and quantity. The average tooth length and standard deviation for the OJ group are 20.66 and 6.61 respectively. For the VC group, the average tooth length is 16.96 and the standard deviation is 8.27. If these observations are significantly different, the null hypothesis should be rejected in the next step of the process. Similarly, lengths seem to differ by dosage (length increses with increased dosage). The null hypotheses in this case will test each dosage category against the other two. Dose 1’s mean lenght is 10.6, Dose 2’s 19.7, and Dose 3’s 26.1.
# loading the data and providing summary statistics
data = ToothGrowth
library(plyr)
ddply(data,~supp,summarise,mean=mean(len),sd=sd(len))
## supp mean sd
## 1 OJ 20.66333 6.605561
## 2 VC 16.96333 8.266029
ddply(data,~dose,summarise,mean=mean(len),sd=sd(len))
## dose mean sd
## 1 0.5 10.605 4.499763
## 2 1.0 19.735 4.415436
## 3 2.0 26.100 3.774150
# plotting the data by type of supplement
library(ggplot2)
qplot(y = len, x = dose, data = ToothGrowth, facets = supp ~ .,
color = supp, geom = c("point", "smooth"),
main = "Change In Tooth Length by Supplement",
xlab = "Dose in mg/day", ylab = "Tooth length")
H0 is a statement of no difference. For the purposes of this report H0 states that the means of the two groups (OJ and VC), as well as those of Dose 1, 2, and 3 are not significantly different.
#t-test for OJ vs VC
t.test(len~supp, data=ToothGrowth, paired=FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
At 0.06, the p-value from the first t-test is greater than 0.05. Even though the difference between the OJ and VC means is 3.7, the t-test suggests that that difference is not significant. In this case, H0 cannot be rejected. The next tests compare dosage groups.
#t-tests for Dose 1, 2, and 3
dose1 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
t.test(len~dose, data=dose1, paired=FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
dose2 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
t.test(len~dose, data=dose2, paired=FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
dose3 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
t.test(len~dose, data=dose3, paired=FALSE, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
In all three cases, the p-values are below 0.05. That indicates that the means are significantly different and that respective H0-s should be rejected in favor of the alternative hypotheses. As a result, it can be stated that the data confirm that higher vitamin C doses (regardless of the supplement used) result in greater tooth length in Guinea pigs.
The overall conclusion is that higher vitamin C intake is correlated with tooth growth. This report assumed normality for all distributions. Given the relatively small sample size (n=60), this is a bold assumption. In fact, individual distributions vary in shape (see Appendix). Bootstrapping could help resolve the issue and the package “boot” is a great candidate for deeper analysis, if one is interested in resampling.
hist(dose1$len, main = "Dose 1 by Length", xlab = NULL)
hist(dose2$len, main = "Dose 2 by Length", xlab = NULL)
hist(dose3$len, main = "Dose 3 by Length", xlab = NULL)
Tsplit <- split(ToothGrowth, ToothGrowth$supp)
hist(Tsplit$OJ$len, main = "Length by Supplement OJ", xlab = NULL)
hist(Tsplit$VC$len, main = "Length by Supplement VC", xlab = NULL)