Statistical Inference Project: Part I

Summary

In this report, I investigate the mean distribution of 40 exponentials and compare it with the Central Limit Theorem (CLT). Through thousand simulations, I compare the mean, variance, and distribution of averages of 40 exponentials to the normal distribution.

Simulations

In R, the exponential distribution can be simulated with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. In this analysis, as instructed, I set lambda = 0.2 for all simulations.

I use the following code to do the thousand simulations. In this code, I set the seed of the simulation, lambda, number of simulations, and sample size. I then use the rexp to run the simulations and finally, find the averages for the 40 exponentials.

set.seed(2015)
lambda <- 0.2
num_sim <- 1000
sample_size <- 40
simulation <- matrix(rexp(num_sim * sample_size, rate = lambda), num_sim, sample_size)
means <- rowMeans(simulation)

Distribution In this section, I first present the distribution of the sample means and then address the questions regarding the differences between the simulation distribution and theoretical normal distribution.

hist(means, breaks = 50, prob = TRUE, main = "Mean Distribution 
     for Exponential Distributions", xlab = "Simulation Mean")
lines(density(means))
abline(v = 1/lambda, col = "blue")
x_fit <- seq(min(means), max(means), length = 100)
y_fit <- dnorm(x_fit, mean = 1/lambda, sd = (1/lambda/sqrt(sample_size)))
lines(x_fit, y_fit, pch = 50, col = "green", lty = 5)
legend('topright', c("Simulation", "Theoretical Normal"), 
       lty = c(1,5), col = c("black", "green"))

Distribution - Sample Mean vs. Theoretical Mean

The mean for the theoretical normal distribution is 5 while the simulation distribution is centered at 5.0115634. We can conclude that the means of two distributions are very close to each other.

simul_mean <- mean(means)
theo_mean <- 1 / lambda

Distribution - Sample Variance vs. Theoretical Variance

The variance for the theoretical normal distribution is 0.625 while the variance of the simulation is 0.6410788. We can conclude that the variances of two distributions are also very close to each other.

simul_var <- var(means)
theo_var <- (1/lambda)^2/sample_size

Distribution - Sample Distribution vs. Theoretical Distribution

The comparison of the sample distribution with the theoretical normal distribution will be done in the following three ways.

First, we visually compare the graph of the distributions. We can see that the distribution of sample means (the histogram) closely matches a theoretical normal distribution.

Second, we need to compare the mean, variance, and confidence intervals between two distributions. In the above sections titled Distribution - Sample Mean vs. Theoretical Mean and Distribution - Sample Variance vs. Theoretical Variance, we compared the mean and variance and showed that they closely match. Now, the confidence intervals:

simul_interval <- round(mean(means) + c(-1,1)*1.96*sd(means)/sqrt(sample_size),3)
theo_interval <- theo_mean + c(-1,1)*1.96*sqrt(theo_var)/sqrt(sample_size)

The 95% confidence interval for the simulation is (4.763, 5.26) while the theoretical 95% confidence interval is (4.755, 5.245). We can conclude that the confidence intervals also closely match each other.

Third, we look at the QQ plot.

qqnorm(means)
qqline(means)

QQ plot shows that the simulation quartiles closely match the theoretical normal quartiles.

Conclusion

I performed three tests in order to evaluate the distribution of the simulation versus the theoretical normal distribution and all of the tests showed that the distribution of the simulation results is approximately normal.

Statistical Inference Project: Part I

Arash Amoozegar