In this document, we are going to analyze the ToothGrowth data in the R datasets package. We commence with showing a basic summary of the data.This is followed by confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
Load the data
library(datasets)
data(ToothGrowth)
Summarize in three ways:
1.Overall Tooth Length and dose
2.Over the length of tooth by dose supplement type: OJ or VC, irrespect to dose level
3.Over combination of supp type and dose level
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
tapply(ToothGrowth$len, ToothGrowth$supp, FUN=summary)
## $OJ
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 15.52 22.70 20.66 25.72 30.90
##
## $VC
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 11.20 16.50 16.96 23.10 33.90
library(plyr)
ddply(ToothGrowth, .(supp,dose), summarize,mean = round(mean(len), 2),sd = round(sd(len), 2),max=round(max(len),2),min=round(min(len),2))
## supp dose mean sd max min
## 1 OJ 0.5 13.23 4.46 21.5 8.2
## 2 OJ 1.0 22.70 3.91 27.3 14.5
## 3 OJ 2.0 26.06 2.66 30.9 22.4
## 4 VC 0.5 7.98 2.75 11.5 4.2
## 5 VC 1.0 16.77 2.52 22.5 13.6
## 6 VC 2.0 26.14 4.80 33.9 18.5
Visualize summary type 3 above by showing the boxplot over combination of supp type and dose level
library(ggplot2)
ToothGrowth$dose <- factor(ToothGrowth$dose)
ToothGrowth$supp <- factor(ToothGrowth$supp)
ggplot(aes(y = ToothGrowth$len, x = ToothGrowth$dose, fill = ToothGrowth$supp), data = ToothGrowth) + geom_boxplot()+facet_grid(.~supp)+scale_x_discrete("Dosage") + scale_y_continuous("Teeth Length")+ggtitle("BloxPlot of Tooth Length by Dose amount for Two Supplements")
Before making t-test of two groups based on various dose type and level, it is necessary to evaluate the variance assumption that the variances of two groups are homogeneous. Fisher F-test is applied to verify whether the null hypothesis that variance between two groups are same with 0.05 significance level. We further assume that each sample group is drawn from normal distribution.
ToothGrouped <- dlply(ToothGrowth,.(supp,dose))
rbind(
var.test(ToothGrouped[[1]]$len,ToothGrouped[[4]]$len)$p.value,
var.test(ToothGrouped[[2]]$len,ToothGrouped[[5]]$len)$p.value,
var.test(ToothGrouped[[3]]$len,ToothGrouped[[6]]$len)$p.value
)
## [,1]
## [1,] 0.16489022
## [2,] 0.20462137
## [3,] 0.09274336
It is found that p-values are all greater than 0.05, then we can not reject the null hypothesis and can assume that variance are homogeneous(var.equal = TRUE for t.test). We can do t.test with paired or independent two sample.
output <- c()
for (i in 1:3){
x <- t.test(ToothGrouped[[i]]$len,ToothGrouped[[i+3]]$len,var.equal = TRUE,paired = FALSE)
rowName <- paste(as.character(ToothGrouped[[i]]$supp[1]),as.character(ToothGrouped[[i+3]]$sup[1]),as.character(ToothGrouped[[i]]$dose[1]))
output[[i]] <- cbind(x$p.value,x$conf.int[1],x$conf.int[2])
rownames(output[[i]]) <- rowName
colnames(output[[i]]) <- c("p-value","conf-low","conf-high")
}
print(output)
## [[1]]
## p-value conf-low conf-high
## OJ VC 0.5 0.005303661 1.770262 8.729738
##
## [[2]]
## p-value conf-low conf-high
## OJ VC 1 0.0007807262 2.840692 9.019308
##
## [[3]]
## p-value conf-low conf-high
## OJ VC 2 0.9637098 -3.722999 3.562999
for (i in 1:3){
x <- t.test(ToothGrouped[[i]]$len,ToothGrouped[[i+3]]$len,var.equal = TRUE,paired = TRUE)
rowName <- paste(as.character(ToothGrouped[[i]]$supp[1]),as.character(ToothGrouped[[i+3]]$sup[1]),as.character(ToothGrouped[[i]]$dose[1]))
output[[i]] <- cbind(x$p.value,x$conf.int[1],x$conf.int[2])
rownames(output[[i]]) <- rowName
colnames(output[[i]]) <- c("p-value","conf-low","conf-high")
}
print(output)
## [[1]]
## p-value conf-low conf-high
## OJ VC 0.5 0.01547205 1.263458 9.236542
##
## [[2]]
## p-value conf-low conf-high
## OJ VC 1 0.008229248 1.951911 9.908089
##
## [[3]]
## p-value conf-low conf-high
## OJ VC 2 0.9669567 -4.328976 4.168976
We find that p-values for dose level 0.5 and 1.0 are smaller than 0.05, which indicate rejection of null hypothesis that the mean length under dose OJ and VC are the same. Under dose level 2, p-value is close to 1, which indicates failue of rejection the null hypothesis. Therefore, we can conclude that effects of dose OJ and VC are different under dose level 0.5 and 1 , while their effect is the same under dose level 2. This verifies the boxplot above.