Overview

The following report covers Task 1 in the final course project of course “Statistical Inference” available on Coursera. In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem.

Simulations

A thousand simulations are performed. The value for lambda is set at 0.2 and the distribution of means of 40 exponential distributions are used.

The simulations illustrate:

Sample vs Theoretical means

Setup

set.seed(2612)
library(tidyverse)
library(ggrepel)
#library(qqplotr)

#install.packages("ggrepel")
#install.packages("qqplotr")

# given by the assignment
lambda <- 0.2

# number of exponential distributions to use in simulations
n_of_distributions <- 40

#number of simulations
number_of_simulations <- 1000 

#Running the simulations
simulations <- replicate(number_of_simulations, 
                         rexp(n_of_distributions, lambda))
glimpse(simulations)
##  num [1:40, 1:1000] 7.75 1.68 6.89 6.86 21.05 ...

Mean comparison

Theoretical mean is computed as 1 over lambda:

theoretical_mean <- 1/lambda
paste0('Theoretical mean: ', theoretical_mean)
## [1] "Theoretical mean: 5"

For a sample mean, I need to compute a mean for each simulation and then find a mean of the sample means.

simluated_means <- apply(simulations, 2, mean)

mean_of_sampled_means <- mean(simluated_means)
paste0('Sample mean: ', mean_of_sampled_means)
## [1] "Sample mean: 5.00834889322193"

Visually this can be displayed as:

Sample vs Theoretical variances

Theoretical variance of the exponential distribution is 1/lambda^2:

theoretical_variance <- 1/(lambda^2)
paste0('Theoretical variance: ', theoretical_variance)
## [1] "Theoretical variance: 25"

In the same way I calculated means for the simulated dataset, variance calculation is possible:

simulated_variances <- apply(simulations, 2, var)

mean_of_sampled_variances <- mean(simulated_variances)
paste0('Sample data variance: ', mean_of_sampled_variances)
## [1] "Sample data variance: 25.3161795497843"

Visually the differences can be displayed for example as following:

Distribution of the means is normally distributed

  1. quantile-quantile (qq) plot

  1. distribution density compared to normal distribution’s density if it had the same mean

If we find normal distribution with the same standard deviation and mean as our resampled means and compare the densities, we can “eyeball” the plot to see how close the densities are.