Overview

This report investigates the behaviour of the mean of 40 exponentially distributed random variables, simulated 1000 times in R. We compare the sample mean and variance to the theoretical values and demonstrate the Central Limit Theorem.

Number 1 - Sample Mean vs. Theoretical Mean

Explanation:

We simulate 1000 times (sims), each time generating 40 exponential random variables (rexp(n, lambda)) and compute their mean. All means are stored in the means vector. The theoretical mean of an exponential distribution is 1/lambda = 5. According to the CLT, the mean of the sampling distribution should also approach 5.

sample_mean <- mean(means)
theoretical_mean <- 1/lambda

sample_mean
## [1] 5.011911
theoretical_mean
## [1] 5
hist(means, breaks = 50, probability = TRUE,
     main = "Distribution of Sample Means (n=40)",
     xlab = "Sample Mean")
abline(v = sample_mean, col = "blue", lwd = 2)
abline(v = theoretical_mean, col = "red", lwd = 2, lty = 2)
legend("topright", legend=c("Sample Mean", "Theoretical Mean"),
       col=c("blue", "red"), lty=c(1,2), lwd=2)

Explanation

The histogram shows the distribution of sample means. The blue line indicates the observed sample mean, while the red dashed line shows the theoretical mean. The two are very close, confirming that the sample mean converges to the theoretical mean.

Number 2: Sample Variance vs. Theoretical Variance

The theoretical standard deviation of the exponential distribution is also 1/lambda = 5. The standard deviation of the sample mean (from CLT) should be: 0.7906

# SD 
sample_sd <- sd(means)
theoretical_sd <- (1 / lambda) / sqrt(n)

#Variance
sample_var <- sample_sd^2
theoretical_var <- theoretical_sd^2

sample_sd
## [1] 0.7749147
theoretical_sd
## [1] 0.7905694
sample_var
## [1] 0.6004928
theoretical_var
## [1] 0.625

Explanation

The sample standard deviation and variance of the means are close to the theoretical values. This further supports the CLT, which states that the variability of the sample means decreases with the square root of the sample size.

Number 3 - Distriubution Shape - Approximate Normality

To show the distribution is approximately normal, we’ll overlay a normal curve and use a Q-Q plot.

#Histogram with Normal Curve – Figure 2 (Appendix) 
hist(means, breaks = 50, probability = TRUE,
     main = "Sample Means vs. Normal Distribution",
     xlab = "Sample Mean")
curve(dnorm(x, mean=theoretical_mean, sd=theoretical_sd),
      col="darkgreen", lwd=2, add=TRUE)
legend("topright", legend="Normal Curve", col="darkgreen", lwd=2)

#Q-Q-Plot - Figure 3 (Appendix)
qqnorm(means, main = "Q-Q Plot of Sample Means")
qqline(means, col = "red")

Explanation:

The histogram with the overlaid normal curve shows that the distribution of means is bell-shaped and symmetric. The Q-Q plot shows that the points mostly follow the reference line, suggesting that the sample means are approximately normally distributed.

Summary

Summary Metric Sample Value Theoretical Value Mean of Sample Means ~5.003 5 SD of Sample Means ~0.782 0.7906 Variance of Sample Means ~0.611 0.625 The sample mean and variance closely align with the theoretical expectations. The distribution of sample means is approximately normal. This simulation clearly demonstrates the Central Limit Theorem in action, even when starting with a skewed distribution like the exponential.