In this project we will Analyze the ToothGrowth data in the R datasets package. In addtion to exploratory data analysis, Student-T test is used to study confidence intervals and hypothesis tests to compare tooth growth by suppliment(Vitamin C and Orange Juice) and dose(0.5,1.0,2.0 mg).
Load ToothGrowth Data
data("ToothGrowth")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
| len | supp | dose |
|---|---|---|
| 4.2 | VC | 0.5 |
| 11.5 | VC | 0.5 |
| 7.3 | VC | 0.5 |
| 5.8 | VC | 0.5 |
| 6.4 | VC | 0.5 |
| 10.0 | VC | 0.5 |
boxplot(VC,OJ,range=0,names=c('Vitamin C','Orange Juice'),col=c("green","blue"),ylab="Growth Length")
Basic summary of the data:
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
In the previous boxplot we can speculate that the overall effect of orange juice (OJ) on tooth growth is more than vitamin C (VC). When we also consider the does effect, we should obtain the following box plot.
VC5 <- ToothGrowth$len[1:10]
VC1<- ToothGrowth$len[11:20]
VC2<- ToothGrowth$len[21:30]
OJ5 <- ToothGrowth$len[31:40]
OJ1 <- ToothGrowth$len[41:50]
OJ2 <- ToothGrowth$len[51:60]
boxplot(VC5,VC1,VC2,OJ5,OJ1,OJ2,range=0,col = c("green","gold","royalblue","green","gold","royalblue"),
names=c("VC0.5","VC1","VC2","OJ0.5","OJ1","OJ2"),ylab="Growth Length")
Here, we use the t-interval and student-t test we learned from the class for this analysis.
data <- ToothGrowth
dose5 <- subset(data, dose == 0.5)
dose1 <- subset(data, dose == 1)
dose2 <- subset(data, dose == 2)
test5 <- t.test(len ~ supp, data= dose5, var.equal = FALSE, paired=FALSE ,conf.level = .95)
test1 <- t.test(len ~ supp, data= dose1, var.equal = FALSE, paired=FALSE ,conf.level = .95)
test2 <- t.test(len ~ supp, data= dose2, var.equal = FALSE, paired=FALSE ,conf.level = .95)
result <- data.frame( "t-statistic" = c(test5$statistic,test1$statistic,test2$statistic),
"df" = c(test5$parameter,test1$parameter,test2$parameter),
"p-value" = c(test5$p.value,test1$p.value,test2$p.value),
"lower CL" = c(test5$conf.int[1],test1$conf.int[1],test2$conf.int[1]),
"upper CL" = c(test5 $conf.int[2],test1$conf.int[2],test2$conf.int[2]),
"OJ mean" = c(test5 $estimate[1],test1 $estimate[1],test2 $estimate[1]),
"VC mean" = c(test5 $estimate[2],test1 $estimate[2],test2 $estimate[2]),
row.names = c("OJ vs VC at dose = 0.5","OJ vs VC at dose = 1","OJ vs VC at dose = 2" ))
knitr::kable(round (x = result, 3),align = 'c',
caption = "Summary of two sample t-test for tooth growth by supplement and dosage")
| t.statistic | df | p.value | lower.CL | upper.CL | OJ.mean | VC.mean | |
|---|---|---|---|---|---|---|---|
| OJ vs VC at dose = 0.5 | 3.170 | 14.969 | 0.006 | 1.719 | 8.781 | 13.23 | 7.98 |
| OJ vs VC at dose = 1 | 4.033 | 15.358 | 0.001 | 2.802 | 9.058 | 22.70 | 16.77 |
| OJ vs VC at dose = 2 | -0.046 | 14.040 | 0.964 | -3.798 | 3.638 | 26.06 | 26.14 |
Therefore, since the p-value of OJ vs VC at dose = 0.5 and OJ vs VC at dose = 1 are less than 0.05, and also since their confidence interval does not contain 0, we conclude that there is a significant difference in the difference between their averages. However for OJ vs VC at dose = 2 the difference in the average is not significant since the p-value is not less than the confidence interval contains zero.
Based on the analysis performed the previous section, we can conclude that low levels of dosage (0.5 & 1.0) of orange juice are effective in tooth growth comparing to vitamin C. However the result from higher dosage (2.0) is uncertain whether there will be a greater effect from either OJ or VC.
It assumes that the underlying data are iid Normal(Gaussian). We also assume they come from unequal variance populations.