In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The rate used in this project is lambda = 0.2. A 1000 simulations each of sample size 40 would be investigated. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda and we hopr to get same via simulation.

Libraries

library(ggplot2)

Initialize simulation variables. This is important for reproducibility

set.seed(1)
lambda <- .2
nsims = 1e4
sample_size = 40

Generate a random exponential of 40 with lambda = 0.2

Now we will simulate an exponential distribution of 1000 with lambda = 0.2 with 40 bootstraps

resamples <- matrix(rexp(nsims*sample_size, rate = lambda), nrow = nsims, ncol = sample_size)

Calculate means of rows in the simulation

resamplesMean <- apply(resamples, 1, mean)

Let’s visualize our sampling distribution and compare it to the CLT

g <- ggplot(data.frame(x = resamplesMean), aes(x=x))
g = g + geom_histogram(breaks = seq(2,9, .2), col = 'blue', aes(fill = ..count..))
g = g + geom_vline(xintercept = mean(resamplesMean), size = 1, linetype = 'dashed', col = 'red')
g = g+ labs(title = 'Histogram of of Means of Exp', x = 'Sample means', y = 'Frequency')
print(g)

Compare means:

now we will compare the actual mean to the theoretical mean respectively We can see the actual mean of 5.002 is close to the theoretical mean of 5.0 calculated below

c(mean(resamplesMean), 1/lambda)
## [1] 5.002873 5.000000

Compare variance:

We can also see the actual variance of 0.626 is closer to the theoretical variance of 0.625calculated below

c(var(resamplesMean), ((1/lambda)^2)/sample_size)
## [1] 0.6261976 0.6250000

Show that the distribution is approximately normal

  1. use qqplot to check normality We can infer that the theoretical quantiles is approximately close to the sample quantiles for the plot below
qqnorm(resamplesMean)
qqline(resamplesMean)

  1. Check 95% confidence interval if theoretical and actual mean
samp_ci <- mean(resamplesMean) + c(-1,1)*1.96*sqrt(var(resamplesMean)/sample_size)
theor_ci <- 1/lambda+ c(-1,1)*1.96*sqrt(( ((1/lambda) ^2) /sample_size)/sample_size)

rbind(samp_ci, theor_ci)
##              [,1]     [,2]
## samp_ci  4.757639 5.248108
## theor_ci 4.755000 5.245000

We can also infer that the theoretical CI is approximately close to the sample CI