In our statistical inference class, I want to use simulation to investigate the exponential distribution in R and use inference to analyze the ToothGrowth data in the R datasets package.
First I set up an empty vector of 1000 NAs to store sample means, then use for loop to take 1000 samples of 40 exponentials and store all of them in “sample_means”.
sample_mean=rep(NA,1000)
for (i in 1:1000){
samp=rexp(40,0.2)
sample_mean[i]=mean(samp)
}
Next we use boxplot to demeonstrate the ditterence between simulation sample mean and exponential distribution.
par(mfrow=c(1,2))
boxplot(samp,ylim=c(0,15),main="Theoretical Boxplot")
boxplot(sample_mean,ylim=c(0,15),main="simulation sample mean")
mean(samp)
## [1] 7.176523
mean(sample_mean)
## [1] 4.979948
var(samp)
## [1] 38.2815
var(sample_mean)
## [1] 0.5869743
We can find out from the boxplot that sample mean is actually pretty close to the theoretical mean. But the exponential distribution definately have more variability compared with simulation sample mean.
par(mfrow=c(1,2))
hist(samp,main="exponential distribution",xlab="exponential")
hist(sample_mean,main="sample_mean dis",xlab="sample mean")
No matter how skewed the distribution was, if we use simulation bootstrap, we can always get a approximately normal distribution like above.
Next i’m going to analyze the ToothGrowth data in the R datasets package.
library(ggplot2)
ToothGrowth<-ToothGrowth
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
ggplot(aes(dose,len,fill=supp),data=ToothGrowth)+facet_grid(.~supp)+geom_boxplot()
We can easily find out that orange juice(OJ) seems more efficient to teeth length when the dose level is low(0.5 and 1 mg). In addition, orange juice and ascorbic acid seems roughly equal effective when the dose level is 2 mg.
suppressPackageStartupMessages(library(dplyr, quietly=TRUE))
tooth<-arrange(ToothGrowth,supp)
OJ<-tooth[1:30,1]
VC<-tooth[31:60,1]
t.test(OJ,VC)
##
## Welch Two Sample t-test
##
## data: OJ and VC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
If we set alpha level equal to 0.05(assumption), the p value(0.06063) shows there is actually no significant difference beetween orange juice and ascorbic acid. With respect of OJ and VC, the 95% percent interval is between -0.1710156 and 7.5710156.