Simulation exercise

In our statistical inference class, I want to use simulation to investigate the exponential distribution in R and use inference to analyze the ToothGrowth data in the R datasets package.

Simulation

First I set up an empty vector of 1000 NAs to store sample means, then use for loop to take 1000 samples of 40 exponentials and store all of them in “sample_means”.

sample_mean=rep(NA,1000)
for (i in 1:1000){
  samp=rexp(40,0.2)
  sample_mean[i]=mean(samp)
}

Next we use boxplot to demeonstrate the ditterence between simulation sample mean and exponential distribution.

par(mfrow=c(1,2))
boxplot(samp,ylim=c(0,15),main="Theoretical Boxplot")
boxplot(sample_mean,ylim=c(0,15),main="simulation sample mean")

plot of chunk unnamed-chunk-2

mean(samp)

## [1] 4.553

mean(sample_mean)

## [1] 4.975

var(samp)

## [1] 18.84

var(sample_mean)

## [1] 0.6766

We can find out from the boxplot that sample mean is actually pretty close to the theoretical mean. But the exponential distribution definately have more variability compared with simulation sample mean.

par(mfrow=c(1,2))
hist(samp,main="exponential distribution",xlab="exponential")
hist(sample_mean,main="sample_mean dis",xlab="sample mean")

plot of chunk unnamed-chunk-3

No matter how skewed the distribution was, if we use simulation bootstrap, we can always get a approximately normal distribution like above.