Exponential distribution is a process in which events occur continuously and independently at a constant average rate.
This is an inferential report on the simulation analysis conducted on exponential distribution data using central limit theorem. It proves theoretical mean and variance quoted above with simulated data.
Exponentially distributed data is generated using R’s rexp function. 3 kinds of simulations are done to prove theoretical mean 1/λ and simulated mean converges.
nosims <- 1000;
lambda <- 0.2
n = 40
set.seed(123)
means <- cumsum(rexp(nosims, lambda)) / (1 : nosims); library(ggplot2)
g <- ggplot(data.frame(x = 1 : nosims, y = means), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 1/lambda, color = I('blue'), size = 1.5) + geom_line(size = 2)
g <- g + labs(x = "Number of iterations", y = "Cumulative mean")
g
Above plot demonstrates the cumulative mean of sample mean(blue line) converges eventually at theoretical mean 1/λ = 5
sim_data = data.frame(x = sapply(1:nosims, function(x) { rexp(n, lambda) }))
means = data.frame(x = sapply(sim_data[, 1:nosims], mean))
qplot(x, y, data=data.frame(x = 1:nosims, y = means$x)) +
geom_smooth() +
geom_hline(yintercept = 1/lambda) +
labs(x = "Number of iterations", y = "Sample mean of 40 exponentials")
Black line is theoretical mean and blue one is sample mean
No let us look at what happens when number of exponentials are 400 and iterated for 1000 times.
means = data.frame(x = sapply(1:nosims, function(x) {
mean(rexp(400, lambda))
}))
qplot(x, y, data=data.frame(x = 1:nosims, y = means$x)) +
geom_smooth() +
geom_hline(yintercept = 1/lambda) +
labs(x = "Number of iterations", y = "Sample mean of 400 exponentials")
Above plot once again proves the sample mean converges at theoretical mean. Now the error is very less, distribution quite closer to 1/λ = 5
Let us take Standard Deviation instead of Variance, so that our plots are uniform.
sds = sapply(sim_data[, 1:nosims], sd)
qplot(x, y, data=data.frame(x = 1:nosims, y = sds)) + geom_smooth() + geom_hline(yintercept = 1/lambda) + labs(x = "Number of iteration", y = "Standard Deviation")
This plot also shows the data is dense around 1/λ = 5
ggplot(data = means, aes(x = x)) +
geom_histogram(aes(y=..density..), fill = I('#aaccff'), color = I('blue')) +
stat_function(fun = dnorm, arg = list(mean = 1/lambda, sd = sd(means$x)), color = I('green'), size = 1.5) +
geom_vline(xintercept = 1/lambda, size = 1, color = I('red'))
Above plot is quite Gaussian with dense distribution around the theorteical mean 1/λ = 5 in red line with sample standard deviation of 1/λ = 4.8736512 in green line. Sample mean turned out to be 4.9939304