In this report we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. We will investigate with averages of 40 exponentials using 1000 simulations. We will compare sample mean and variance to the theoretical values.
In the following code, we will find averages of 40 exponetial distributions by running 1000 simulations. Lambda value is 0.2 for the distribution. The histogram shows the distribution of means from these 1000 simulations.
## set the initial seed (for reproducibility)
set.seed(1)
## set other values as per instructions
numSimulations <- 1000
numExponentials <- 40
lambda <- 0.2
## simulate exponential distribution, calculate means and draw a histogram
means <- NULL
for (idx in 1:numSimulations) means <- c(means, mean(rexp(numExponentials, lambda)))
hist(means, col="salmon", main="Distribution of means for 40 Exponentials in 1000 simulations")
The following code finds the mean for simulation as well as the theoretical mean by applying 1/lambda formula.
## compare simulation and theoretical values for mean
simulationMean <- mean(means)
theoreticalMean <- 1/lambda
paste0("Simulation Mean = ", round(simulationMean, 3),
",Theoretical Mean = ", round(theoreticalMean, 3))
## [1] "Simulation Mean = 4.99,Theoretical Mean = 5"
You can see the simulation mean of 4.99 is very close the theoretical mean of 5.00!
The following code finds the mean for simulation as well as the theoretical mean by applying 1/lambda formula
## compare simulation and theoretical values for variance
simulationVariance <- var(means)
theoreticalVariance <- ((1/lambda)^2)/numExponentials
paste0("Simulation Variance = ", round(simulationVariance, 3),
", Theoretical Variance = ", round(theoreticalVariance, 3))
## [1] "Simulation Variance = 0.611, Theoretical Variance = 0.625"
You can see the simulation variance of 0.611 is reasonably close the theoretical variance of 0.625!
The following code shows the distribution from the above simulation exercise.
## plotting distribution
library(ggplot2)
meansDF <- data.frame(means)
g1 <- ggplot(meansDF, aes(means))
g1 <- g1 + geom_histogram(fill = "salmon", binwidth = 0.1, aes(y = ..density..), colour = "black") +
labs(title =
"Distribution of means for 40 Exponentials with Lambda = 0.2, simulations = 1000", y = "Density") +
stat_function(fun = dnorm, args = list(mean = theoreticalMean, sd = sqrt(theoreticalVariance)),
color = "red", size = 1.5) +
stat_function(fun = dnorm, args = list(mean = simulationMean, sd = sqrt(simulationVariance)),
color = "blue", size = 1.5)
print(g1)
In the above plot, red line is the theoretical normal distribution whereas the blue line is the sample distribution. See how close they are!!!
In the following code, we will run another simulation with the same lambda value but with the number of simulations reduced from 1000 to 10 and the number of samples in each simulation increased from 40 to 4000.
## plotting distribution
## redoing simulation with a large collection of random exponentials and less simulations
numSimulations2 <- 10
numExponentials2 <- 4000
means2 <- NULL
for (idx in 1:numSimulations2) means2 <- c(means2, mean(rexp(numExponentials2, lambda)))
simulationMean2 <- mean(means2)
theoreticalMean2 <- 1/lambda
paste0("Simulation Mean2 = ", round(simulationMean2, 3),
", Theoretical Mean2 =", round(theoreticalMean2, 3))
## [1] "Simulation Mean2 = 5.048, Theoretical Mean2 =5"
simulationVariance2 <- var(means2)
theoreticalVariance2 <- ((1/lambda)^2)/numExponentials2
paste0("Simulation Variance2 = ", round(simulationVariance2, 3),
", Theoretical Variance2 =", round(theoreticalVariance2, 3))
## [1] "Simulation Variance2 = 0.003, Theoretical Variance2 =0.006"
means2DF <- data.frame(means2)
g2 <- ggplot(means2DF, aes(means2))
g2 <- g2 + geom_histogram(fill = "salmon", binwidth = 0.02, aes(y = ..density..), colour = "black") +
labs(title =
"Distribution of means for 4000 Exponentials with Lambda = 0.2, simulations = 10", y = "Density") +
stat_function(fun = dnorm, args = list(mean = theoreticalMean2, sd = sqrt(theoreticalVariance2)),
color = "red", size = 1.5) +
stat_function(fun = dnorm, args = list(mean = simulationMean2, sd = sqrt(simulationVariance2)),
color = "blue", size = 1.5)
print(g2)
In the above plot, red line is the theoretical normal distribution whereas the blue line is the sample distribution. See how different they are!!!
We can see how having more simulations is very helpful!!!