Part 1: Simulation Exercise

In this project the exponential distribution in R was investigated and compared with the Central Limit Theorem. The exponential distribution was simulated in R with rexp(n, lambda) where lambda was the rate parameter. Theoretically, the mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. lambda = 0.2 was set for all of the simulations. The distribution of averages of 40 exponentials was investigated with a thousand simulations.

Parameter and simulations

In the following R codes, parameters were set according to the given information. Simulation was run to generate 40 * 1000 exponential values in a random fashion. A thousand of the averages of 40 values was stored. Mean, standard deviation, standard error and variance were calculated.

## parameters
lambda<-0.2
n<- 40 # no of exponentials
theo_mean<- 1/lambda # theoretical mean
theo_sd<- 1/lambda # theoretical sd of the population
theo_se<- theo_sd/sqrt(n) # theoretical standard error of the averages
theo_var<- theo_sd^2 # theoretical var of the population
theo_var_averages <- theo_se^2 # theoretical var of the averages

## simulations
set.seed(2026) # for reproducibility
nsim<-1000 # no of simulations
simdata <- matrix(rexp(nsim * n, rate=lambda), nsim)
data<-rowMeans(simdata)
# data<-sapply(c(1:nsim), function(i)mean(rexp(n, lambda)))
## alternative way to generate the data
sim_mean <- mean(data)
sim_se <- sd(data) # standard error of the averages
sim_sd <- sim_se*sqrt(n) # working out the sd of the simulated populations
sim_var <- sim_sd^2 # var of the simulated populations
sim_var_averages<-sim_se^2 # var of the simulated averages

Comparison

In this part, the theoretical values were compared to the estimated values based on the averages of the simulations.

## ** COMPARISONS **
## theoretical mean: 5
## simulations' mean: 4.994865
## theoretical SD (population): 5
## simulations' SD (population) 4.976796 (estimated from the averages)
## theoretical Var (population): 25
## simulations' Var (population): 24.7685 (estimated from the averages)
## theoretical SD (averages): 0.7905694
## simulations' SD (averages) 0.7869005
## theoretical Var (averages): 0.625
## simulations' Var (averages): 0.6192124

Histogram

The result of the simulated data were plotted as histograms:

par(mfrow = c(1, 2))
hist(simdata,
     breaks = 40,
     prob = TRUE,
     main = "Distribution of 40000 \nRandom Exponentials",
     xlab = "Random Exponentials", ylab="Density")
hist(data, 
     breaks = 40,
     prob = TRUE,
     main = "Averages of Simulated Exponentials",
     xlab = "Sample Average (n=40)", ylab = "Density")
abline(v = theo_mean, col = "green", lwd = 3) # a vertical line for the theoretical mean
abline(v = sim_mean, col = "red", lty = 3, lwd = 3)# a vertical line for the simulations' mean
##Overlay the theoretical normal curve
x <- seq(min(data), max(data), length=100)
y <- dnorm(x, mean=theo_mean, sd=theo_sd/sqrt(n)) #sd is equivalent to standard error
lines(x, y, col="purple", lwd=3)

Result summary

The averages of 40 exponentials were generated with a thousand simulations. The mean, standard deviation and variation of the stimulated data were calculated and compared to the theoretical values:

Mean

  • The sample mean (4.99) and the theoretical mean (5) of the distribution are very close. In the histogram shown above, they are virtually overlaid (green line: theoretical mean, dotted red line: simulations’ mean).

Variance

  • The variance of the stimulated population (24.77) is similar to that of theoretical population (25). Similarly, the variance of the averages of the stimulation (0.619) is very close to the theoretical variance of the averages (0.625).

Distribution

  • The distribution of the averages of stimulated data is bell-shaped/Gaussian , i.e. approximately normal, as shown in the histogram above. And it fits well with the theoretical normal distribution (the purple line)

Conclusion

Thus, the result supports the notion that the Central limit theorem is applicable to the distribution of exponential averages as long as n is large enough.