In this project the exponential distribution in R was investigated and compared with the Central Limit Theorem. The exponential distribution was simulated in R with rexp(n, lambda) where lambda was the rate parameter. Theoretically, the mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. lambda = 0.2 was set for all of the simulations. The distribution of averages of 40 exponentials was investigated with a thousand simulations.
In the following R codes, parameters were set according to the given information. Simulation was run to generate 40 * 1000 exponential values in a random fashion. A thousand of the averages of 40 values was stored. Mean, standard deviation, standard error and variance were calculated.
## parameters
lambda<-0.2
n<- 40 # no of exponentials
theo_mean<- 1/lambda # theoretical mean
theo_sd<- 1/lambda # theoretical sd of the population
theo_se<- theo_sd/sqrt(n) # theoretical standard error of the averages
theo_var<- theo_sd^2 # theoretical var of the population
theo_var_averages <- theo_se^2 # theoretical var of the averages
## simulations
set.seed(2026) # for reproducibility
nsim<-1000 # no of simulations
simdata <- matrix(rexp(nsim * n, rate=lambda), nsim)
data<-rowMeans(simdata)
# data<-sapply(c(1:nsim), function(i)mean(rexp(n, lambda)))
## alternative way to generate the data
sim_mean <- mean(data)
sim_se <- sd(data) # standard error of the averages
sim_sd <- sim_se*sqrt(n) # working out the sd of the simulated populations
sim_var <- sim_sd^2 # var of the simulated populations
sim_var_averages<-sim_se^2 # var of the simulated averages
In this part, the theoretical values were compared to the estimated values based on the averages of the simulations.
## ** COMPARISONS **
## theoretical mean: 5
## simulations' mean: 4.994865
## theoretical SD (population): 5
## simulations' SD (population) 4.976796 (estimated from the averages)
## theoretical Var (population): 25
## simulations' Var (population): 24.7685 (estimated from the averages)
## theoretical SD (averages): 0.7905694
## simulations' SD (averages) 0.7869005
## theoretical Var (averages): 0.625
## simulations' Var (averages): 0.6192124
The result of the simulated data were plotted as histograms:
par(mfrow = c(1, 2))
hist(simdata,
breaks = 40,
prob = TRUE,
main = "Distribution of 40000 \nRandom Exponentials",
xlab = "Random Exponentials", ylab="Density")
hist(data,
breaks = 40,
prob = TRUE,
main = "Averages of Simulated Exponentials",
xlab = "Sample Average (n=40)", ylab = "Density")
abline(v = theo_mean, col = "green", lwd = 3) # a vertical line for the theoretical mean
abline(v = sim_mean, col = "red", lty = 3, lwd = 3)# a vertical line for the simulations' mean
##Overlay the theoretical normal curve
x <- seq(min(data), max(data), length=100)
y <- dnorm(x, mean=theo_mean, sd=theo_sd/sqrt(n)) #sd is equivalent to standard error
lines(x, y, col="purple", lwd=3)
The averages of 40 exponentials were generated with a thousand simulations. The mean, standard deviation and variation of the stimulated data were calculated and compared to the theoretical values:
Thus, the result supports the notion that the Central limit theorem is applicable to the distribution of exponential averages as long as n is large enough.