Exponential Distribution and Central Limit Theorem

For this part of the project we’re going to investigate the exponential distribution in R and compare it with the Central Limit Theorem. The Central Limit Theorem specifies that the means drawn from multiple samples will resemble the bell-shaped normal curve.

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Using lambda = 0.2 for all of the simulations, we’re goint to investigate the distribution of averages of 40 exponentials, running one thousand (1000) simulations.

Simulation

1. Load required R Libraries

library(ggplot2)

2. Set initial values

set.seed(999)
num_sims <- 1000
n <- 40
lambda <- 0.2

3. Run the Simulations

To run the simulations, a Matrix of num_sims by n (1000x40) was created and populated with the R function rexp.

ed_matrix <- matrix(rexp(n*num_sims, lambda), num_sims, n)

4. Get the Means

Now, the mean of every simulation was calculated using rowMeans. Note that the result was stored as data frame, this is to allow the use of ggplot2 for our plots.

ed_mean <- as.data.frame(rowMeans(ed_matrix))
names(ed_mean) <- 'SampleMeans'

Theoretical Mean vs Sample Mean

The theoretical Mean, as explained before, is 1/lambda:

theoretical_mean <- 1/lambda
theoretical_mean

## [1] 5

While the Sample Mean is the Mean of our simulations:

sample_mean <- mean(ed_mean$SampleMeans)
sample_mean

## [1] 5.029028

Plotting the distribution of samples, along with the values of the Means, we can see the bell-shaped form we expected, as well as how close the computation of the means (Theoretical and Sample) are.

plot1 <- ggplot(ed_mean, aes(SampleMeans)) + 
    geom_histogram(breaks=seq(2, 8, by=0.25), fill="gold") +
    geom_vline(aes(xintercept = theoretical_mean, color="Theoretical")) +
    geom_vline(aes(xintercept = sample_mean, color="Sample"), linetype=5) +
    scale_color_manual(name = "Mean", values = c(Theoretical = "black", Sample = "red")) +
    labs(title="Sample Means Distribution")
print(plot1)

Theoretical Variance vs Sample Variance

The Theoretical Variance for the exponential distribution is:

theoretical_var <- (1/lambda)^2/n
theoretical_var

## [1] 0.625

The Sample variance is:

sample_var <- var(ed_mean$SampleMeans)
sample_var

## [1] 0.5930372

Confidence Intervals can be calculated with the formula \[\ \bar{X} \pm 1.96\sigma/\sqrt{n} \]

Confidence interval for both theoretical and sample values are calculated below:

theoretical_ci <- theoretical_mean + c(-1,1)*1.96*sqrt(theoretical_var)/sqrt(n)
theoretical_ci

## [1] 4.755 5.245

sample_ci <- sample_mean + c(-1,1)*1.96*sqrt(sample_var)/sqrt(n)
sample_ci

## [1] 4.790375 5.267681

Distribution

A QQ-Plot is very useful to show how a sample distribution is close to a normal distribution.

plot2 <- ggplot(ed_mean, aes(sample=SampleMeans)) +
    geom_point(stat="qq", shape=21, size=3) +
    geom_abline(intercept=sample_mean, slope=sqrt(sample_var), color="red") +
    labs(title="Normal QQ Plot")
print(plot2)

The mean, variance and confidence intervals obtained from our simulation showed values very close to the theoretical values. Likewise, the graphs are consistent with what is expected from a normal distribution, confirming the Central Limit Theorem.

Statistical Inference Course Project Part 1

adanlp

March 25, 2018