Comparison of exponential distribution in R with the Central Limit Theorem

Synopsis

In this project I shall investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. I shall set lambda = 0.2 for all of the simulations. I shall investigate the distribution of averages of 40 random exponentials, doing so with a thousand simulations.

This investigation shall then:

Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.

Simulation

The following R code will run 1,000 simulations whereby 40 random exponential values, and the mean of each set of 40 random exponential values, will be generated in each simulation.

# Deliberately not setting a seed as the results should be similar for any 1,000 simulations of 40 exponential values drawn randomly.

# Set simulation parameters
        sims = 1000     # number of simulations
        n = 40          # number of exponentials
        lambda = 0.2    # defined in assignment instructions
        means = NULL
        
# Run simulations
        for (i in 1 : sims) means = c(means, mean(rexp(n, lambda)))
        summary(means)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.745   4.519   5.027   5.041   5.516   7.664

Compare theoretical mean with sample mean

The theoretical mean is calculated in R as 1/lambda.

The sample mean is calculated in R as mean(means).

# Calculate theoretical mean
        theo_mean <- 1/lambda

# Calculate sample mean
        sample_mean <- mean(means)

The theoretical mean is 5.

The sample mean is 5.040891.

The difference between the theoretical mean and the sample mean is -0.040891.

# Create histogram showing distribution of simulation means
        hist(means, col = "light blue", main = "Random exponential values (1,000 simulations)",
             sub = "Sample mean in red", xlab = "Mean values")

# Show sample mean as vertical red line
        abline(v = sample_mean, col = "red", lwd = 2)

Compare theoretical variance with sample variance

The theoretical variance is calculated in R as the square of the theoretical mean divided by the number of random exponential values in each simulation.

The sample variance is calculated in R as var(means).

# Calculate theoretical variance
        theo_var <- (1/lambda)^2 / n

# Calculate sample mean
        sample_var <- var(means)

The theoretical variance is 0.625.

The sample variance is 0.5902553.

The difference between the theoretical variance and the sample variance is 0.0347447.

Compare theoretical distribution with sample distribution

The theoretical distribution is plotted as a normal distribution curve (red line) with mean = theoretical mean and standard deviation = square root of theoretical variance.

The sample distribution is plotted using the histogram generated earlier, but with frequencies converted to probabilities. This histogram is also overlaid with a distribution curve (blue line) of the 1,000 sample means calculated earlier.

# Create histogram showing probability distribution of simulation means
        hist(means, prob = T, col = "light blue", main = "Random exponential values (1,000 simulations)",
             xlab = "Mean values")

# Create sample distribution curve (blue line)
        lines(density(means), col = "blue", lwd = 2)
        
# Create theoretical distribution curve (red line)
        x <- seq(min(means), max(means), length = sims)
        y <- dnorm(x, theo_mean, sqrt(theo_var))
        lines(x, y, col = "red", lwd = 2)

The histogram and sample distribution curve (blue line) both approximate the theoretical normal distribution curve (red line) relatively closely.

Another method with which the theoretical and sample distributions can be compared is the Normal Q-Q plot. This plots the mean random exponential variables from the 1,000 simulations as data points against a normal QQ-line, which represents the theoretical normal distribution.

# Plot sample distribution as blue data points.
        qqnorm(means, col = "blue")

# Plot theoretical distribution as a red line.
        qqline(means, col = "red", lwd = 3)

The Normal Q-Q Plot shows that the sample distribution approximates the theoretical normal distribution relatively closely.

Software versions used

# print relevant software and versions
        sessionInfo()

## R version 3.5.3 (2019-03-11)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.3  magrittr_1.5    tools_3.5.3     htmltools_0.3.6
##  [5] yaml_2.2.0      Rcpp_1.0.0      stringi_1.4.3   rmarkdown_1.12 
##  [9] knitr_1.22      stringr_1.4.0   xfun_0.5        digest_0.6.18  
## [13] evaluate_0.13