Exponential Distribution - Simulation vs Calculations

Info

File Name: exponential_distribution.Rmd
Date: 2014.11.22
Author: Luis Jaraquemada S.

Abstract

This paper studies the results from exponential distribution simulations and compares them with theoretical calculations (i.e. Central Limit Theorem).

The experimental work is developed simulating 1000 times exponential distributions of 40 samples and useing lambdas of 0.2.

The theoretical calculations are conducted based on assumptions of central limit theorem, together with general statistical basics.

The results show that the estimated mean through Central Limit Theorem (CLT) concept is a good estimation of the true mean of a population. In addition, we qualitatively show the approximation of experimental mean distribution to a normal distribution, which confirms the correct assumption for the use of CLT.

All calculations are done using R, and the current document is fully reproducible, for which we have set the specific random seeds.

Theoretical vs Experimental Mean and Confidence Interval

As mentioned above, we are simulating exponential statistics with 40 samples and lambda=0.2, repeating them 1000 times to get the mean value for each of the 1000 cases. We store those values in a vector called mns_exp, which is then used to plot a histogram of the results as shown below.

set.seed(8888)
hist(rexp(numsamples,lambda))

mns_exp = NULL
for (i in 1 : nosim) mns_exp = c(mns_exp, mean(rexp(numsamples,lambda)))
n=length(mns_exp)
hist(mns_exp)

Center of the experimental distribution and comparison to the theoretical center of the distribution.

As we know from the definition of the exponential distribution which is generating the artificial data, the true mean of it is 1/lamba = 5.

On the other hand from our simulations, we can calculate the experimental average mean which is 4.9958171, so we are as close as -0.0041829 from the theoretical mean from the previous paragraph.

Variability of experimental distribution and comparison to the theoretical variance of the distribution.

At the same time, we know that the theoretical standard deviation is sqrt(1/lambda^2) = 5.

Experimental data provides a value of standard error of sd/sqrt(n) = 0.0252768, leading us to a 95% confidence interval of:

mean(mns_exp)+c(-1,1)*qnorm(0.95)*sd(mns_exp/sqrt(n))

## [1] 4.954240 5.037394

This means that there is a 95% of probability that the true mean of the distribution is within the limits of this confidence interval, and we know that the true mean is 1/lamba = 5, which gives sense to the analysis.

Convergence to Normal Distribution

We are assuming up to now that the CLT theory is applicable, considering that the number of experiments lead to a distribution of means which follows a normal behavior.

The distribution of averages is approximately normal.

mns_exp_scaled <- scale(mns_exp)
hist(mns_exp_scaled,probability=T, main="", ylim=c(0, 0.5))
lines(density(mns_exp_scaled), col="blue")
# Compare with the standard normal distribution
curve(dnorm(x,0,1), -3, 3, col="red", add=T)

As shown above the experimental data generates a distribution of means with a good approximation to a normal behavior.