In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with \(rexp(n, lambda)\) where lambda is the rate parameter. The mean of exponential distribution is \(1/lambda\) and the standard deviation is also \(1/lambda\). Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that we will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.

Requirement1 Show the sample mean and compare it to the theoretical mean of the distribution.

First, we make some configurations for the exponential distribution simulation, and calculate the theoretical mean and standard deviation.

nosim <- 1000; n <- 40; lambda <- 0.2; 
theoretical_mean <- 1/lambda
theoretical_sigma <- 1/lambda /sqrt(n)

Note that the samples’ sd of means is the population’s \(sd /sqrt(n)\)

Then, we get the simulation data into a 1000*40 matrix, and a vector of 1000 means of the every 40 samples with ‘apply()’ function.

simulation_data <- matrix(rexp(nosim * n, lambda), nosim)
x_bars <- apply(simulation_data, 1, mean)
c(mean(x_bars), 1/lambda) # Empirical vs theoretical mean
## [1] 4.933097 5.000000

It’s clear that the mean of sample means is near to the theoretical population mean. Let’s have a look of the following plot of the means. The average of the means line (the red one) is near to the theoretical mean, 5:

library(ggplot2)
g = qplot(x_bars, geom = 'blank') +   
  geom_histogram(aes(y = ..density..), alpha = 0.2, binwidth = .15,colour="black")+  
  geom_vline(xintercept = mean(x_bars), colour = 'red',size=1)
print(g)

Requirement2: Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

c('Empirical variance: ', var(x_bars))
## [1] "Empirical variance: " "0.562689555630272"
c('theoretical variance: ', (1/lambda)^2/n)
## [1] "theoretical variance: " "0.625"

We can see that the variance of the 1000 means is similar to our theoretical varance: \(sigma^2 /n\)

Requirement3: Show that the distribution is approximately normal. first, let see what the exponential distribution looks like:

hist(rexp(nosim, lambda))

There is nothing in common with normal distribution. However, look at the distribution of the 1000 means of the samples sized 40:

hist(x_bars)

It comforms to a normal distribution. That’s what the The Central Limit Theorem states:

the sampling distribution of the sample mean approximates the normal distribution, regardless of the distribution of the population from which the samples are drawn, if the sample size is sufficiently large.

Let’s put it all togeter:

g = qplot(x_bars, geom = 'blank') +   
  geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density',size=1) +  
  stat_function(fun=dnorm, args=list(mean=theoretical_mean, sd=theoretical_sigma),
                aes(colour = 'Theoretical'), size=1) +                       
  geom_histogram(aes(y = ..density..), alpha = 0.2, binwidth = .15,colour="black")+  
  geom_vline(xintercept = mean(x_bars), colour = 'red',size=1) + 
  scale_colour_manual(name='', values = c('red', 'blue')) + 
  theme(legend.position = c(0.85, 0.85)) +  xlab("means of observation")
print(g)

From plot above, we can clearly seen that the mean of the simulaitons means is very near to the theoretical mean: \(1/.2 = 5\). At the same time, the means’ density line is just around the normal distribution line of N ~ (1/lambda, (1/lambda)/sqrt(n))