In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem.
This will be illustrated via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.
library(ggplot2) #load the following packages to support the analysis:
#knitr settings as follows:
knitr::opts_chunk$set(echo=TRUE, fig.path='part1/', fig.width=10, fig.height=6, cache=TRUE)
set.seed(10)
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. * Set lambda = 0.2 for all of the simulations, the number ofexponentials to 40, and number of simulations to 1000.
lambda <- 0.2 # setting lambda
n <- 40 # setting number of exponentials
sim <- 1000 # setting number of simulations
sim_run <- replicate(sim, rexp(n,lambda)) # run simulations
means_exp <- apply(sim_run, 2, mean) # calc the means of the exp simulations
The mean of the exponential distribution is 1/lambda. We are setting lambda as 0.2 for all simulations. As a result, the theoretical should be equal to 5 (1/.02).
sample_mean <- mean(means_exp)
theoretical_mean <- 1/lambda
cat("Sample Mean: ", sample_mean," ", "Theoretical Mean: ", theoretical_mean)
## Sample Mean: 5.04506 Theoretical Mean: 5
It appears the sample mean is comes very close to matching the theoretical mean.
# create a histogram of the exponential means to support further analysis & vertical line for theoretical mean
hist(means_exp, xlab="Mean of 40 Exponentials", ylab="Frequency", xlim=c(2,9), main="Distribution of the Averages for 40 Exponentials", col="red")
abline(v=mean(means_exp), lwd=5, col="blue")
This histogram shows the sample data and the theoretical mean line (blue).
sample_var <- var(means_exp) # calculate the sample variance
theoretical_var <- (1/lambda)^2/n # calculate the theoretical variance
cat("Sample Variance: ", sample_var," ", "Theoretical Variance: ",theoretical_var)
## Sample Variance: 0.6372544 Theoretical Variance: 0.625
The sample variance and the theoretical variance appear to be closely aligned.
# create a histogram with curves. Adjust the density and the breaks in the histogram to get more precise view of the data (test different levels and colors)
hist(means_exp, density=20, breaks=20, prob=TRUE, xlim=c(2,9), xlab="Mean of 40 Exponentials", ylab="Frequency", main="Distribution of the Averages for 40 Exponentials", col="green")
curve(dnorm(x, mean=sample_mean, sd=sqrt(sample_var)), col="black", lwd=3, lty="dotted", add=TRUE)
curve(dnorm(x,theoretical_mean, sd=sqrt(theoretical_var)), col="blue", lwd=3, add=TRUE)
Based upon the results of the histogram, it does appear to be an approximately normal distribution.
Let’s take a look at the Confidence Intervals.
# generate the Confidence Intervals for the sample and theoretical (round to three decimal places)
sample_ci <- round(mean(means_exp) + c(-1,1)*1.96*sd(means_exp)/sqrt(n),3)
theoretical_ci <- theoretical_mean + c(-1,1)*1.96*sqrt(theoretical_var)/sqrt(n)
cat("Sample CI:", sample_ci," ","Theoretical CI:", theoretical_ci)
## Sample CI: 4.798 5.292 Theoretical CI: 4.755 5.245
The Confidence Intervals are very closely aligned.
qqnorm(means_exp)
qqline(means_exp, col="red", lwd=4)
It appears the distribution is approximately normal with some break away on the tail ends.