Investigate the exponential distribution in R and compare it with the Central Limit Theorem.The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:
The following code performs the simulations to collect necessary data.
#load the plotting library
library(ggplot2)
n = 40
lambda = 0.2
sim_Num = 1000
set.seed(10000)
simulated_data <- matrix(rexp(n= sim_Num*n,rate=lambda), sim_Num, n)
row_mean <- rowMeans(simulated_data)
sample_Mean = mean(row_mean )
Theory_mean = 1/lambda
result1 <-data.frame("Mean"=c(sample_Mean,Theory_mean),
row.names = c("Sample mean ","Theoretical mean"))
result1
## Mean
## Sample mean 5.00599
## Theoretical mean 5.00000
The simulated sample mean of 5.00599 is close to the theoretical value of 5.
hist(row_mean, breaks = 30, prob = TRUE,col = "lightblue",
main="Exponential Distribution of Sample Means",
xlab="Means of 40 Simulated Samples", ylab = "Counts")
abline(v = Theory_mean, col= "blue", lwd = 3)
abline(v = sample_Mean, col = "red",lwd = 2)
legend('topright', c("Theoretical Mean", "Sample Mean"),
bty = "n",
lty = c(1,1),
col = c(col = "blue", col = "red"))
The blue vertical line indicates the theoretical sample mean, whereas the red vertical line is the sample mean. The center of distribution of 40 exponentials averages is very close to the distribution theoretical center.
The variance of the sample means estimates the variance of the population by using the varience of the 1000 entries in the means vector times the sample size, 40.
sample_var = var(row_mean )
The theoretical variance of the population is given by s2=(1/lambda)2/n.
theory_var = (1/lambda)^2/n
result2 <-data.frame("Variance"=c(sample_var, theory_var),
row.names = c("Sample variance","Theoretical variance"))
result2
## Variance
## Sample variance 0.6296518
## Theoretical variance 0.6250000
The sample variance of the distribution is 0.6296518 and the theoretical variance is 0.625.
According to the central limit theorem (CLT), the averages of samples follow normal distribution.
The following plot shows that the density computed using the histogram and the normal density plotted with theoretical mean and variance values indicate that the distribution is approximately normal.
hist(row_mean ,
breaks = 30,
prob = TRUE,col = "lightblue",
main = "Density of Simulated Samples Means",
xlab = "Means of Exponential", ylab = "Mean Density")
lines(density(row_mean ), col = "red", lwd = 2)
abline(v = 1/lambda, col = "green", lwd = 2)
xfit <- seq(min(row_mean ), max(row_mean ), length = 100)
yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda/sqrt(n)))
lines(xfit, yfit, pch = 22, col = "blue", lwd = 2)
legend('topright', c("Theoretical Values", "Simulated Values", "the mean"),
bty = "n", lwd = c(2,2), col = c("blue", "red", "green"))
qqnorm(row_mean ,main ="Normal Q-Q Plot", col = "red")
qqline(row_mean , col = "blue", lwd = 2)
The above plots show that the density curve is very similar as the normal distribution curve.
Also, the q-q plot below suggests the normality. The theoretical quantiles again match closely with the actual quantiles.
This indicates that the sample distribution is approximately normal.