This is part of the project for the Statistical Inference class in the Johns Hopkins Data Science Specialization by Coursera.
This report analyzes the ToothGrowth data in the R datasets package. The goals of this analysis are:
The ToothGrowth datasets has data for the analysis of the effect of vitamin C on tooth growth in Guinea pigs. The data has the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
This data frame has 60 observations and 3 variables:
This following R code compactly displays the internal structure of the ToothGrowth dataset:
library(datasets) ## Loading the package "datasets"
data(ToothGrowth) ## Loading the data
str(ToothGrowth) ## Looking at the dataset variables
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
The following scatterplot shows approximately how much the variable dose is affected by the variable len for each type of supp:
library(ggplot2)
## Scatterplot shows data by factor
ggplot(ToothGrowth, aes(x = dose, y = len)) + geom_point(aes(color=factor(supp))) +
scale_x_discrete("Dosage in mg") + scale_y_continuous("Length of Teeth") +
ggtitle("Dose by Tooth Length for each Supplement")
The data summary by factor len variable is this:
tapply(ToothGrowth$len, ToothGrowth$supp, FUN=summary)
## $OJ
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.2 15.5 22.7 20.7 25.7 30.9
##
## $VC
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.2 11.2 16.5 17.0 23.1 33.9
The following plot shows data summary for each supplement:
## Bloxplot shows data summary for each Supplement
library(plyr)
ggplot(ToothGrowth, aes(x=factor(dose), y=len,fill=supp))+
geom_boxplot()+ facet_grid(.~supp)+ labs(x="X (binned)")+
theme(axis.text.x=element_text(angle=-90, vjust=0.4,hjust=1)) +
scale_x_discrete("Dosage in mg") + scale_y_continuous("Length of Teeth") +
ggtitle("Blox Plot of Dose by Tooth Length for each Supplement")
It performs hypothesis tests by function t.test in R.
It considers the following assumptions in this analysis:
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
attach(ToothGrowth)
suppDoseGroups<-as.data.frame(split(len,list(supp,dose)))
combinationNames<-vector()
c<-0
for ( i in 1:5 ) for ( j in (i+1):6 ) { c<-c+1;
combinationNames[c]<-paste(as.character(names(suppDoseGroups)[i]),as.character(names(suppDoseGroups)[j]),sep="~") }
hypothesisTest<-matrix(data=NA,nrow=length(combinationNames),ncol=3,byrow=TRUE,
dimnames=list(combinationNames,c("P-value","Conf low", "Conf hight")))
c<-0
for ( i in 1:5 ) for ( j in (i+1):6 ) { c<-c+1;
hypothesisTest[c,1]<-t.test(suppDoseGroups[,i],suppDoseGroups[,j])$p.value;
hypothesisTest[c,2]<-t.test(suppDoseGroups[,i],suppDoseGroups[,j])$conf.int[1];
hypothesisTest[c,3]<-t.test(suppDoseGroups[,i],suppDoseGroups[,j])$conf.int[2]
}
hypothesisTest
## P-value Conf low Conf hight
## OJ.0.5~VC.0.5 6.359e-03 1.719 8.78094
## OJ.0.5~OJ.1 8.785e-05 -13.416 -5.52437
## OJ.0.5~VC.1 4.601e-02 -7.008 -0.07189
## OJ.0.5~OJ.2 1.324e-06 -16.335 -9.32476
## OJ.0.5~VC.2 7.196e-06 -17.264 -8.55648
## VC.0.5~OJ.1 3.655e-08 -17.921 -11.51851
## VC.0.5~VC.1 6.811e-07 -11.266 -6.31429
## VC.0.5~OJ.2 1.362e-11 -20.618 -15.54182
## VC.0.5~VC.2 4.682e-08 -21.902 -14.41849
## OJ.1~VC.1 1.038e-03 2.802 9.05785
## OJ.1~OJ.2 3.920e-02 -6.531 -0.18856
## OJ.1~VC.2 9.653e-02 -7.564 0.68433
## VC.1~OJ.2 2.361e-07 -11.720 -6.85967
## VC.1~VC.2 9.156e-05 -13.054 -5.68573
## OJ.2~VC.2 9.639e-01 -3.798 3.63807
P-values are almost all less than 0.05. The confidence intervals do not contain zero for most of the comparisons. So the null hypothesis can be denied. This indicates that the difference in mean values between the supplements is significant for the comparisons performed. It is observed two exceptions for the comparison of orange juice and vitamin C with the dose = 2 mg and for the comparison of orange juice and vitamin C with the dose = 1 mg to 2 mg.
P-values decrease when the dose increase for the same supplement (OJ.0.5~OJ.1 and OJ.0.5~OJ.2, for example). This indicates that increasing the dosages gets a positive impact on teeth growth.
The mainly conclusions are: