Author:Anthony

In our statistical inference class, I want to use simulation to investigate the exponential distribution in R and use inference to analyze the ToothGrowth data in the R datasets package.

Inferential data analysis

Next i’m going to analyze the ToothGrowth data in the R datasets package.

library(ggplot2)
ToothGrowth<-ToothGrowth
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
summary(ToothGrowth)
##       len       supp     dose   
##  Min.   : 4.2   OJ:30   0.5:20  
##  1st Qu.:13.1   VC:30   1  :20  
##  Median :19.2           2  :20  
##  Mean   :18.8                   
##  3rd Qu.:25.3                   
##  Max.   :33.9
ggplot(aes(dose,len,fill=supp),data=ToothGrowth)+facet_grid(.~supp)+geom_boxplot()

plot of chunk unnamed-chunk-1

We can easily find out that orange juice(OJ) seems more efficient to teeth length when the dose level is low(0.5 and 1 mg). In addition, orange juice and ascorbic acid seems roughly equal effective when the dose level is 2 mg.

suppressPackageStartupMessages(library(dplyr, quietly=TRUE))
tooth<-arrange(ToothGrowth,supp)
OJ<-tooth[1:30,1]
VC<-tooth[31:60,1]
t.test(OJ,VC)
## 
##  Welch Two Sample t-test
## 
## data:  OJ and VC
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.171  7.571
## sample estimates:
## mean of x mean of y 
##     20.66     16.96

If we set alpha level equal to 0.05(assumption), the p value(0.06063) shows there is actually no significant difference beetween orange juice and ascorbic acid. With respect of OJ and VC, the 95% percent interval is between -0.1710156 and 7.5710156.