This is the second part of the project for the statistical inference class. In it, I am going to analyze the ToothGrowth data in the R data set package.

Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
summary(ToothGrowth)
##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00

From the summary, one can telll that the dataset has 60 observations, for two different supp, OJ and VC, each has 30 observations. In the same supp group, there are three differnt doses, 0.5, 1, and 2.

Provide a basic summary of the data

To give the visulization of the 2x3 groups distributions of the len, I made the histogram plots as below.

The plots shows as teh dose increases, it seems like that the len also increases. But different supp seems no direct effects on the len.

library(ggplot2)
g <- ggplot(ToothGrowth, aes(x = len, fill = supp)) + geom_histogram(binwidth=1.0, colour = "black", aes(y = ..density..)) 
g + facet_grid(dose ~ supp)

plot of chunk unnamed-chunk-2 ## Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose. (Use the techniques from class even if there’s other approaches worth considering)

Comparing len for 10 OJ 0.5 dose versue 10 VC 0.5 dose, and I assumed those two groups are indenpend.

T confidence intervals method

d05 = ToothGrowth[ToothGrowth$dose==0.5,,]
OJd05 = d05[d05$supp=="OJ",,]$len
VCd05 = d05[d05$supp=="VC",,]$len
sd_oj = sd(OJd05)
sd_vc = sd(VCd05)
sp = sqrt((9*sd_oj^2+9*sd_vc^2)/(10+10-2))
mean(OJd05)-mean(VCd05) + c(-1,1)*qt(0.975,18)*sp*(1/9+1/9)^.5
## [1] 1.582 8.918

Because the interval does not contain the zero, I can rule out zeor as possibility for the population difference for these two groups.

Another we can test the hypothesis that there is no significant difference bewteeen these two groups. I can get the same result.

t.test(OJd05, VCd05, paired=FALSE, var.equal=TRUE)$conf
## [1] 1.77 8.73
## attr(,"conf.level")
## [1] 0.95

Similarly, I can compare the two supp groups for other two doses.

d10 = ToothGrowth[ToothGrowth$dose==1.0,,]
OJd10 = d10[d10$supp=="OJ",,]$len
VCd10 = d10[d10$supp=="VC",,]$len
d20 = ToothGrowth[ToothGrowth$dose==2.0,,]
OJd20 = d20[d20$supp=="OJ",,]$len
VCd20 = d20[d20$supp=="VC",,]$len

Different supp groups with 1.0 dose.

t.test(OJd10, VCd10, paired=FALSE, var.equal=TRUE)$conf
## [1] 2.841 9.019
## attr(,"conf.level")
## [1] 0.95

Different supp groups with 1.5 dese.

t.test(OJd20, VCd20, paired=FALSE, var.equal=TRUE)$conf
## [1] -3.723  3.563
## attr(,"conf.level")
## [1] 0.95

Different dose groups with supp OJ

t.test(OJd10, OJd05, paired=FALSE, var.equal=TRUE)$conf
## [1]  5.529 13.411
## attr(,"conf.level")
## [1] 0.95
t.test(OJd20, OJd05, paired=FALSE, var.equal=TRUE)$conf
## [1]  9.382 16.278
## attr(,"conf.level")
## [1] 0.95
t.test(OJd20, OJd10, paired=FALSE, var.equal=TRUE)$conf
## [1] 0.2195 6.5005
## attr(,"conf.level")
## [1] 0.95

Different dose groups with supp VC

t.test(VCd10, VCd05, paired=FALSE, var.equal=TRUE)$conf
## [1]  6.316 11.264
## attr(,"conf.level")
## [1] 0.95
t.test(VCd20, VCd05, paired=FALSE, var.equal=TRUE)$conf
## [1] 14.49 21.83
## attr(,"conf.level")
## [1] 0.95
t.test(VCd20, VCd10, paired=FALSE, var.equal=TRUE)$conf
## [1]  5.771 12.969
## attr(,"conf.level")
## [1] 0.95

Conclusions

With the assumption that all six groups are independent, hypothesis test with H_o: no significant difference with confidence 95%. I conclude, with the same supp, the toothgrowth len will have significant difference with different dose. With the same dose, OJ and VC have significant difference with dose 0.5 and 1.0, but no significant difference with dose 2.0