In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. The lambda is set to 0.2 for all of the simulations.
We look into the distribution of averages of 40 exponentials with one thousand simulations. The properties of the distribution are illustrated and explained by:
lambda <- 0.2 # Rate for all simulations
n <- 40 # No. of samples
nosim <- 1000 # No. of simulations
set.seed(seed = 100) # For reproducible results
# Get means of the 1000 simulations of 40 exponentials
means <- data.frame(mn40 = apply(matrix(rexp(nosim * n, lambda), nosim), 1, mean))
# Sample mean of the means of the simulations of 40 exponentials
mean(means$mn40)
## [1] 4.999702
# Theoretical mean
1/lambda
## [1] 5
The theoretical mean is 5. The sample mean of the means of the simulations of 40 exponentials is 4.9997, which is very close to the theoretical mean. The sample mean is the center of the distribution of the simulations.
# Sample standard deviation and variance
sd(means$mn40)
## [1] 0.7959461
sd(means$mn40)^2
## [1] 0.6335302
# Theoretical standard deviation and variance
(1/lambda)/sqrt(n)
## [1] 0.7905694
((1/lambda)/sqrt(n))^2
## [1] 0.625
The theoretical standard deviation is 0.7906. The sample standard deviation is 0.7959, which is very close to the theoretical standard deviation.
The theoretical variance is 0.625. The sample variance is 0.6335, which is very close to the theoretical variance.
Below is the histogram plot of the above simulations with the normal distribution curve centered at 4.9997 and a standard deviation of 0.7959:
ggplot(data = means, aes(x = mn40)) +
geom_histogram(aes(y=..density..), fill = "salmon",
binwidth = 0.20, color = "black") +
labs(title = "Exponentials Distribution of 1000 Simulations",
x = "Mean of 40 Exponentials", y = "Density") +
stat_function(fun = dnorm, geom = "line",
arg = list(mean = mean(means$mn40),
sd = sd(means$mn40)),
size = 1, col="yellow") +
geom_vline(aes(xintercept = mean(means$mn40)), color = "yellow",
size = 1, show.legend = TRUE)
We further compare the normal distribution curve of the simulations with the theoretical normal distribution curve:
ggplot(data = means, aes(x = mn40)) +
geom_histogram(aes(y=..density..), fill = "salmon",
binwidth = 0.20, color = "black") +
labs(title = "Exponentials Distribution of 1000 Simulations\nvs.\nNormal Distribution",
x = "Mean of 40 Exponentials", y = "Density") +
stat_function(fun = dnorm, geom = "line",
arg = list(mean = 1/lambda, sd = (1/lambda)/sqrt(n)),
size = 3, col="black") +
stat_function(fun = dnorm, geom = "line",
arg = list(mean = mean(means$mn40),
sd = sd(means$mn40)),
size = 1, col="yellow") +
geom_vline(aes(xintercept = 1/lambda, colour = "Theoretical"),
size = 3, show.legend = TRUE) +
geom_vline(aes(xintercept = mean(means$mn40), colour = "Simulation"),
size = 1, show.legend = TRUE) +
scale_colour_manual("", breaks = c("Theoretical", "Simulation"),
values = c("Theoretical" = "black",
"Simulation" = "yellow"))
Clearly, the distribution of our simulations appears to be approximately normal.