Author:Anthony
In our statistical inference class, I want to use simulation to investigate the exponential distribution in R and use inference to analyze the ToothGrowth data in the R datasets package.
Next i’m going to analyze the ToothGrowth data in the R datasets package.
library(ggplot2)
ToothGrowth<-ToothGrowth
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
summary(ToothGrowth)
## len supp dose
## Min. : 4.2 OJ:30 0.5:20
## 1st Qu.:13.1 VC:30 1 :20
## Median :19.2 2 :20
## Mean :18.8
## 3rd Qu.:25.3
## Max. :33.9
ggplot(aes(dose,len,fill=supp),data=ToothGrowth)+facet_grid(.~supp)+geom_boxplot()
We can easily find out that orange juice(OJ) seems more efficient to teeth length when the dose level is low(0.5 and 1 mg). In addition, orange juice and ascorbic acid seems roughly equal effective when the dose level is 2 mg.
suppressPackageStartupMessages(library(dplyr, quietly=TRUE))
tooth<-arrange(ToothGrowth,supp)
OJ<-tooth[1:30,1]
VC<-tooth[31:60,1]
t.test(OJ,VC)
##
## Welch Two Sample t-test
##
## data: OJ and VC
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.171 7.571
## sample estimates:
## mean of x mean of y
## 20.66 16.96
If we set alpha level equal to 0.05(assumption), the p value(0.06063) shows there is actually no significant difference beetween orange juice and ascorbic acid. With respect of OJ and VC, the 95% percent interval is between -0.1710156 and 7.5710156.