This project investigates the exponential distribution in R and compares it with the Central Limit Theorem. This project simulates the exponential distribution by R function rexp(n, lambda) where lambda is the rate parameter. This project performs one thousand simulations and investigates the distribution of averages of 40 exponentials.
Execute 1000 simulations for 40 samples. UseR function rexp(n, lambda) where lambda is the rate parameter. Set lambda = 0.2 for all of the simulations. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.
set.seed(12345)
lambda <- 0.2
n <- 40
nosim <- 1000
sample_means = NULL
for (i in 1 : nosim) {
sample_means = c(sample_means, mean(rexp(n,lambda)))
}
hist(sample_means)
sample_means_df <- as.data.frame(sample_means)
Compair sample mean and theoretical mean.
simulation_mean <- mean(sample_means)
round(simulation_mean, 3)
## [1] 4.972
theoretical_mean <- 1/lambda
round(theoretical_mean, 3)
## [1] 5
Conclusion:
As Figure-1 in Appendix indicates, the sample mean (blue dashed line) is close to the theoretical mean (red sashed line).
Compare sample variance and theoretical variance.
simulation_variance <- var(sample_means)
round(simulation_variance, 3)
## [1] 0.595
theoretical_variance <- (1/lambda)^2/n
round(theoretical_variance, 3)
## [1] 0.625
simulation_sd <- sd(sample_means)
round(simulation_sd, 3)
## [1] 0.772
theoretical_sd <- (1/lambda)/sqrt(n)
round(theoretical_sd, 3)
## [1] 0.791
Conclusion:
As the above calculation results indicate, the sample variance is close to the theoretical variance. In Figure-2 in Appendix indicates, one sample standard deviation (green vertical line) is close to one theoretical standard deviation (orange vertical line).
Investigate if the sample distribution is approximately normal.
Conclusion:
As Figure-2 in Appendix shows the sample density curve (green curving line) is similar to the normal distribution curve (orange curving line).
library(ggplot2)
g <- ggplot(sample_means_df, aes(x=sample_means))
g <- g + geom_histogram(binwidth = .3, color="black") +
geom_vline(aes(xintercept = theoretical_mean,
color="theoretical_mean"), size=1, linetype=2) +
geom_vline(aes(xintercept = simulation_mean,
color="simulation_mean"), size=1, linetype=2) +
scale_color_manual(values = c(simulation_mean = "blue", theoretical_mean = "red"))+
labs(x="Sample means distribution", y= "Frequecy",
title="Figure-1: Comparing theoretical and simulated means")
g
g <- ggplot(sample_means_df, aes(x=sample_means))
g <- g + geom_histogram(binwidth = .3, color="black") +
geom_vline(aes(xintercept = theoretical_mean,
color="theoretical_mean"), size=1, linetype=2) +
geom_vline(aes(xintercept = simulation_mean,
color="simulation_mean"), size=1, linetype=2) +
geom_vline(aes(xintercept = simulation_mean+simulation_sd,
color="simulation_sd"), size=1, linetype=1) +
geom_vline(aes(xintercept = theoretical_mean+theoretical_sd,
color="theoretical_sd"), size=1, linetype=1) +
geom_vline(aes(xintercept = simulation_mean-simulation_sd,
color="simulation_sd"), size=1, linetype=1) +
geom_vline(aes(xintercept = theoretical_mean-theoretical_sd,
color="theoretical_sd"), size=1, linetype=1) +
scale_color_manual(values = c(simulation_mean = "blue",
theoretical_mean = "red",
simulation_sd = "green",
theoretical_sd = "orange"))+
labs(x="Sample means distribution", y= "Frequecy",
title="Figure-2: Comparing theoretical and simulated variances")
g
g <- ggplot(sample_means_df, aes(x=sample_means))
g <- g + geom_histogram(binwidth = .3, color="black", aes(y=..density..)) +
stat_function(fun=dnorm, args=list(mean=theoretical_mean, sd=theoretical_sd),
aes(color="normal_distribution"), size =1) +
stat_density(geom = "line", aes(color = "simulation_density"), size =1) +
geom_vline(aes(xintercept = theoretical_mean,
color="theoretical_mean"), size=1, linetype=2) +
geom_vline(aes(xintercept = simulation_mean,
color="simulation_mean"), size=1, linetype=2)+
scale_color_manual(values = c(simulation_mean = "blue",
theoretical_mean = "red",
simulation_density = "green",
normal_distribution = "orange"))+
labs(x="Sample means distribution", y= "density",
title="Figure-3: Density of Simulated Exponential Samples Means")
g