Use simulation to explore inference

author: angelayuan
date: Friday, March 20, 2015

Overwiew

In this project we will investigate the exponential distribution and compare it with the Central Limit Theorem. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials, and do such simulation a thousand times.

Simulations

First, we set the seed so that others could get the same results when randomly draw from the exponential distribution. We set lambda = 0.2, and perform 1000 times simulation. In each simulation, randomly draw 40 numbers from the exponential distribution and calculate the mean and standard deviation of this sample. Save sample means and sample standard deviations to mns and sds, respectively. Save the mean of sample means and the mean of sample variances to mmns and mvar, respectively.

set.seed(1000)
lambda <- 0.2
mns <- NULL
sds <- NULL
for(i in 1:1000) {
        mns <- c(mns, mean(rexp(40,lambda)))
        sds <- c(sds, sd(rexp(40,lambda)))
}
mmns <- mean(mns)
mmns
## [1] 4.99256
mvar <- mean(sds^2)
mvar
## [1] 25.32456

Sample Mean versus Theoretical Mean

We plot the histogram of the sample means. It is known that the mean of exponential distribution is 1/lambda (i.e. 1/0.2 here). We hightlight this theoretical mean in red color, and highlight the mean of sample means in green corlor, on the histogram.

mu <- 1/lambda
hist(mns, main = "Distribution of sample means", xlab = "Sample means")
abline(v = mu, col = "red", lwd = 2 )
abline(v = mmns, col = "green", lwd = 2)

According to above figure, we can see that sample means centered at the theoretical mean, and the mean of sample means is very near to the theoretical mean. Moreover, the distribution of sample means is approximately normal.

Sample Variance versus Theoretical Variance

We plot the histogram of the sample variance. It is known that the standatd deviation of exponential distribution is 1/lambda (i.e. 1/0.2 here), therefore the theoretical variance is (1/lambda)^2 (i.e. 25). We hightlight this theoretical variance in red color, and highlight the mean of sample variances in green corlor, on the histogram.

sigma <- 1/lambda
hist(sds^2, main = "Distribution of sample variance", xlab = "Sample variances")
abline(v = sigma^2, col = "red", lwd = 2)
abline(v = mvar, col = "green", lwd = 2)

According to above figure, we can see that sample variances have a tendency to center at the theoretical variance, but its distribution is not as normal as sample means does. Moreover, the mean of sample variances is very near to the theoretical variance.

Distribution

To illustrate that the distribution of sample means is approximately normal, we plot two figures in one panel. The left one is the distribution of a large colletion (1000) of random exponentials, and the right one is the distribution of a large collection (1000) of averages of 40 exponentials. The theoretical mean is highlight in red color in the right figure.

par(mar = c(5,5,4,5))
par(mfrow = c(1,2))
hist(rexp(1000,lambda), main = "Distribution of random exponentials", xlab = "Sample")
hist(mns,main = "Distribution of sample means", xlab = "Sample means")
abline(v = mu, col = "red", lwd = 2)

According to above figures, we can see a clear difference between these two distributions. We can conclude that the distribution of sample means is approximately normal.