This project investigates the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution is simulated with rexp(n, lambda) where lambda is the rate parameter, theoretical mean of exponential distribution is 1/lambda and theoretical standard deviation is also 1/lambda. This project performs a thousand simulations to get the distribution of averages of 40 exponentials, where the lambda is set to 0.2 for all of the simulations. The simulated samples are used to illustrate and explain the properties of the distribution of the mean of 40 exponentials in the following ways:
1.Show the sample mean and compare it to the theoretical mean of the distribution.
2.Show how variable the sample is (via variance) and compare it to the theoretical variance
of the distribution.
3.Show that the distribution is approximately normal.
In point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.
We start by simulating a thousand sets of 40 exponentials using lambda 0.2 and calculate the mean for each set.
echo = TRUE
set.seed(123456) # For reproducible results
lambda <- 0.2 # Rate for all simulations
n <- 40 # No. of samples
s <- 1000 # No. of simulations
sim <- data.frame(mean=numeric(s)) # data frame to store all the sample means
for (i in 1:s) { # simulate s times
samples <-rexp(n,lambda) # simulate n samples
sim[i,1]<-mean(samples) # sample mean
}
With the simulated data, we calculate the theoretical mean and sample mean for the exponential distribution.
tm <- 1/lambda # calculate theoretical mean
sm <- mean(sim$mean) # calculate avg sample mean
As shown by the histogram in Figure 1, the sample mean is very close to the theoretical mean at 5:
We can make the comparison by defining the theoretical variance as 1/lambda squared, divided by the number of exponential observations (i.e. n), as shown below:
tv<-((1/lambda)^2)/n # Theoretical Variance
tv
## [1] 0.625
Compute the variance for the simulated samples, as shown below:
sv <- var(sim$mean) # Sample variance
sv
## [1] 0.6570391
The result shows that the theoretical variance is also very close to the sample variance.
When we overlay the histogram of the theoretical mean of the exponential distribution with the histogram of simulated sample means (as shown in Figure 2), the two distribution curves are very similar and normally distributed. We observe that the histogram for the mean of 1000 simulated 40 random exponential values is symmetric around the mean with a bell shape.
Next we use a Quantile-Quantile plot to confirm if the distribution of the simulation means is approximatley normal, where any point (X,Y) denotes a data point from our sample distribution plotted against the theoretical distribution. The plot should lie on the 45 degree line of the QQ plot if its normal. As shown in Figure 3, the plot shows a linearity which confirms that normality is a good approximation.
This simulation is performed with a relatively large population of 40,000 samples (1000 simulations x 40 samples). The above observations show that, given this sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the population is approximately equal to the mean of the population. In conclusion, the normal distribution of the mean of 40 random exponentials is consistent with the characteristic of the Central Limit Theorem.
The R codes for the figures plotted in this project:
hist(sim$mean, freq=TRUE, breaks=50,
main="Figure 1
Distribution of Sample Means
of Exponentials (lambda 0.2)",
xlab="Sample Means from 1000 Simulations")
###Sample Mean - Green Line
abline(v=sm, col='red', lwd=1)
###Theoretical Mean - Red Line
abline(v=tm, col='blue', lwd=1)
legend('topright', c("Theoretical Mean","Avg Sample Mean"),
col=c("blue", "red"), lty=c(1,1), bty = "n")
hist(sim$mean, prob=T,
main="Figure 2
Distribution of Sample Means vs
Theoretical Density of Exponentials (lambda 0.2)",
xlab="Sample Means from 1000 Simulations" )
### Density of the Simulated sample means
lines(density(sim$mean))
###Theoretical Mean - Red Line
abline(v=tm, col='red', lwd=1)
### Theoretical density of the exponential distribution
xfit <- seq(min(sim$mean), max(sim$mean), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(n)))
lines(xfit, yfit, pch=22, col="red", lty=1)
### Legend
legend('topright', c("Simulation", "Theoretical"),
col=c("black", "red"), lty=c(1,1))
qqnorm(sim$mean, main="Figure 3
Normal Q-Q Plot")
qqline(sim$mean, col="2")