Overview

This project investigates the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution is simulated with rexp(n, lambda) where lambda is the rate parameter, theoretical mean of exponential distribution is 1/lambda and theoretical standard deviation is also 1/lambda. This project performs a thousand simulations to get the distribution of averages of 40 exponentials, where the lambda is set to 0.2 for all of the simulations. The simulated samples are used to illustrate and explain the properties of the distribution of the mean of 40 exponentials in the following ways:

1.Show the sample mean and compare it to the theoretical mean of the distribution.
2.Show how variable the sample is (via variance) and compare it to the theoretical variance 
of the distribution.
3.Show that the distribution is approximately normal.

In point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

Simulation

We start by simulating a thousand sets of 40 exponentials using lambda 0.2 and calculate the mean for each set.

echo = TRUE
set.seed(123456) # For reproducible results
lambda <- 0.2    # Rate for all simulations
n <- 40          # No. of samples
s <- 1000        # No. of simulations

sim <- data.frame(mean=numeric(s)) # data frame to store all the sample means

for (i in 1:s) { # simulate s times
  samples <-rexp(n,lambda) # simulate n samples
  sim[i,1]<-mean(samples) # sample mean
}

1.Sample mean versus theoretical mean

With the simulated data, we calculate the theoretical mean and sample mean for the exponential distribution.

tm <- 1/lambda       # calculate theoretical mean
sm <- mean(sim$mean) # calculate avg sample mean

As shown by the histogram in Figure 1, the sample mean is very close to the theoretical mean at 5:

[1] “Theoretical Mean: 5.00”

[1] “Average Sample Mean: 5.02”

2.Sample variance versus Theoretical Mean

We can make the comparison by defining the theoretical variance as 1/lambda squared, divided by the number of exponential observations (i.e. n), as shown below:

tv<-((1/lambda)^2)/n  # Theoretical Variance
tv
## [1] 0.625

Compute the variance for the simulated samples, as shown below:

sv <- var(sim$mean)    # Sample variance
sv
## [1] 0.6570391

The result shows that the theoretical variance is also very close to the sample variance.

3. Show that the distribution is approximately normal.

When we overlay the histogram of the theoretical mean of the exponential distribution with the histogram of simulated sample means (as shown in Figure 2), the two distribution curves are very similar and normally distributed. We observe that the histogram for the mean of 1000 simulated 40 random exponential values is symmetric around the mean with a bell shape.

Next we use a Quantile-Quantile plot to confirm if the distribution of the simulation means is approximatley normal, where any point (X,Y) denotes a data point from our sample distribution plotted against the theoretical distribution. The plot should lie on the 45 degree line of the QQ plot if its normal. As shown in Figure 3, the plot shows a linearity which confirms that normality is a good approximation.

Conclusion

This simulation is performed with a relatively large population of 40,000 samples (1000 simulations x 40 samples). The above observations show that, given this sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the population is approximately equal to the mean of the population. In conclusion, the normal distribution of the mean of 40 random exponentials is consistent with the characteristic of the Central Limit Theorem.

Appendix A - R Source Codes

The R codes for the figures plotted in this project:

Plot Figure 1 - R Source Codes

hist(sim$mean,  freq=TRUE, breaks=50,
     main="Figure 1 
     Distribution of Sample Means  
     of Exponentials (lambda 0.2)",
     xlab="Sample Means from 1000 Simulations")

###Sample Mean - Green Line
abline(v=sm, col='red', lwd=1)

###Theoretical Mean - Red Line
abline(v=tm, col='blue', lwd=1)

legend('topright', c("Theoretical Mean","Avg Sample Mean"), 
       col=c("blue", "red"), lty=c(1,1), bty = "n")

Plot Figure 2 - R Source Codes

hist(sim$mean,  prob=T,
     main="Figure 2 
     Distribution of Sample Means vs 
     Theoretical Density of Exponentials (lambda 0.2)",
     xlab="Sample Means from 1000 Simulations" )

### Density of the Simulated sample means
lines(density(sim$mean))

###Theoretical Mean - Red Line
abline(v=tm, col='red', lwd=1)

### Theoretical density of the exponential distribution
xfit <- seq(min(sim$mean), max(sim$mean), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(n)))
lines(xfit, yfit, pch=22, col="red", lty=1)

### Legend
legend('topright', c("Simulation", "Theoretical"), 
       col=c("black", "red"), lty=c(1,1))

Plot Figure 3 - R Source Codes

qqnorm(sim$mean, main="Figure 3 
       Normal Q-Q Plot")
qqline(sim$mean, col="2")