Synopsis

This project investigates the exponential distribution and compares it with the Central Limit Theorem. For this purpose, we simulate 1,000 samples of exponentials, with 40 values in each sample. The mean of the 1,000 samples are calculated to form a distribution of sample means. We then explore:

  1. How the mean and variance of the distribution of sample means converges to the theoretical mean and variance of exponential distribution

  2. How the distribution of the sample means is approximately normal

Simulating the data

We will simulate a 1,000 samples of 40 exponentials each, and store them into a dataframe: “exp_data.”

NOTE that for the purpose of this project, we are setting lambda, the rate parameter, at 0.2. Also, theoretically, the mean and standard deviation of exponential distribution are both equal to 1/lambda.

# Setting seed for reproducibility
set.seed(123)

# Setting sample size at 40
n = 40

# Setting lambda at 0.2
lambda = 0.2

# Initiating empty dataframe
exp_data = matrix(nrow = 1000, ncol = 40)

# Running 1000 simulations 
for(i in 1:1000) exp_data[i,] = rexp(n, lambda)

# Creating distribution of sample means
dist.sample.mu = apply(exp_data, 1, mean)

Showing and comparing theoretical and sample mean and variance

# Calculating theoretical mean, variance, and standard deviation
theo.mu = 1/lambda
theo.var = (1/lambda)^2 / n
theo.sd = sqrt(theo.var)


# Calculating mean, variance, and standard deviation of distribution of sample means
sample.mu = mean(dist.sample.mu)
sample.var = var(dist.sample.mu)
sample.sd = sd(dist.sample.mu)

# Comparing theoretical and simulated parameters
results = data.frame(Theoretical = c(theo.mu, theo.var, theo.sd), Simulated = c(sample.mu, sample.var, sample.sd))
rownames(results) = c("Mean", "Variance", "Std.Dev.")
print(results)
##          Theoretical Simulated
## Mean       5.0000000 5.0119113
## Variance   0.6250000 0.6004928
## Std.Dev.   0.7905694 0.7749147

Above we see how close the theoretical and simulated values are.

Demonstrating the distribution is approximately normal

# Plotting histogram of distribution of means 
hist(dist.sample.mu, freq = FALSE,xlab="Sample Means", main = "Histogram of Distribution of Sample Means")

# Plotting normal curve with mean = sample mean and sd = sample sd
range = range(dist.sample.mu)
X = seq(range[1], range[2], length = n)
samplenormY = dnorm(X, mean=sample.mu, sd=sample.sd)
points(X, samplenormY, type="l", lty = 2, lwd = 2, col = "red")

# Plotting normal curve with mean = theoretical mean and sd = theoretical sd
theonormY = dnorm(X, mean=theo.mu, sd=theo.sd)
points(X, theonormY, type="l", lty = 4, lwd = 2, col = "blue")

# Plotting theoretical and sample mean
abline(v=c(sample.mu, theo.mu), lty=1, lwd=2, col=c("red", "blue"))

legend("topright", bty="n", lty=c(2,4,1,1), lwd=2, col=c("red", "blue","red", "blue"), legend = c("Sample Normal Curve", "Theoretical Normal Curve", "Sample Mean", "Theoretical Mean"))

The above plot demonstrates how the distribution of sample means (histogram) approximates a normal distribution and the sample and theoretical means converge.

Conclusion

The report shows that the exponential distribution follows the Central Limit Theorem.