In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.
For the project, the following instructions were provided:
The below code setups the paramters as outline in the course project instructions. This includes the rate (lambda), number of exponentials and the number of simulations we wish to run.
We then create a numeric vector (‘means’) which contains the result of the simulations. In this case, our simulations are getting the mean.
Finally, we plot a histogram of the simulated mean values (‘means’).
# set seed for reproducability
set.seed(2016)
# Set sampling values as described in the project instructions
lambda <- 0.2 # lambda
n <- 40 # number of exponentials
sims <- 1000 # number of simulations
#Run simulations
sim_exp <- replicate(sims, rexp(n, lambda))
#Calc the means of the exponential simulations
means_exp <- apply(sim_exp, 2, mean)
#Histogram of the means
hist(means_exp, breaks=40, xlim = c(2,9), main="Exponential Function Simulation Means", col = "aliceblue")
The mean of the exponential distribution is 1/lambda. In this case, lambda is 0.2. Therefore, the theoretical mean should result as 5 (i.e. 1 / 0.2).
# plot histogram of the sample means
hist(means_exp, col="aliceblue", main="Theoretical Mean vs. Actual Mean", xlim = c(2,9),breaks=40, xlab = "Simulation Means")
# plot a vertical red line at the mean of the sample means
abline(v=mean(means_exp), lwd="4", col="red")
# plot a vertical blue line at the mean of the theoretical mean
abline(v=1/lambda, lwd="2", col="blue")
The code above shows us that our sample mean is 4.979 which is pretty close to our theoretical mean of 5.
The standard deviation of the exponential distribution is (1/lambda) / sqrt(n). Next, we’ll see if this matches our simulations.
# theoretical standard deviation vs. simulation standard deviation
print(paste("Theoretical standard deviation: ", round( (1/lambda)/sqrt(n) ,4)))
## [1] "Theoretical standard deviation: 0.7906"
print(paste("Practical standard deviation: ", round(sd(means_exp) ,4)))
## [1] "Practical standard deviation: 0.7991"
print(paste("Theoretical variance: ", round( ((1/lambda)/sqrt(n))^2 ,4)))
## [1] "Theoretical variance: 0.625"
print(paste("Practical variance: ", round(sd(means_exp)^2 ,4)))
## [1] "Practical variance: 0.6385"
The formulas above show us that the variances are very close.
Finally, we’ll investigate whether the exponential distribution is approximately normal. Due to the Central Limit Theorem, the means of the sample simulations should follow a normal distribution.
#General Plot with ditribution curve drawn
hist(means_exp, prob=TRUE, col="aliceblue", main="Exponential Function Simulation Means", breaks=40, xlim=c(2,9), xlab = "Simulation Means")
lines(density(means_exp), lwd=3, col="red")
# Normal distribution line creation
x <- seq(min(means_exp), max(means_exp), length=2*n)
y <- dnorm(x, mean=1/lambda, sd=sqrt(((1/lambda)/sqrt(n))^2))
lines(x, y, pch=22, col="black", lwd=2, lty = 2)
As the graph shows, the distribution of means of our sampled exponential distributions appear to follow a normal distribution, due to the Central Limit Theorem. If we increased our number of samples (currently 1000), the distribution would be even closer to the standard normal distribution.The dotted line above is a normal distribution curve and we can see that it is very close to our sampled curve, which is the red line above.