The following report covers Task 1 in the final course project of course “Statistical Inference” available on Coursera. In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem.
A thousand simulations are performed. The value for lambda is set at 0.2 and the distribution of means of 40 exponential distributions are used.
The simulations illustrate:
comparison of the sample mean with the theoretical mean of the distribution,
how variable the sample is and compare it to the theoretical variance of the distribution,
that the distribution is approximately normal.
set.seed(2612)
library(tidyverse)
library(ggrepel)
#library(qqplotr)
#install.packages("ggrepel")
#install.packages("qqplotr")
# given by the assignment
lambda <- 0.2
# number of exponential distributions to use in simulations
n_of_distributions <- 40
#number of simulations
number_of_simulations <- 1000
#Running the simulations
simulations <- replicate(number_of_simulations,
rexp(n_of_distributions, lambda))
glimpse(simulations)
## num [1:40, 1:1000] 7.75 1.68 6.89 6.86 21.05 ...
Theoretical mean is computed as 1 over lambda:
theoretical_mean <- 1/lambda
paste0('Theoretical mean: ', theoretical_mean)
## [1] "Theoretical mean: 5"
For a sample mean, I need to compute a mean for each simulation and then find a mean of the sample means.
simluated_means <- apply(simulations, 2, mean)
mean_of_sampled_means <- mean(simluated_means)
paste0('Sample mean: ', mean_of_sampled_means)
## [1] "Sample mean: 5.00834889322193"
Visually this can be displayed as:
Theoretical variance of the exponential distribution is 1/lambda^2:
theoretical_variance <- 1/(lambda^2)
paste0('Theoretical variance: ', theoretical_variance)
## [1] "Theoretical variance: 25"
In the same way I calculated means for the simulated dataset, variance calculation is possible:
simulated_variances <- apply(simulations, 2, var)
mean_of_sampled_variances <- mean(simulated_variances)
paste0('Sample data variance: ', mean_of_sampled_variances)
## [1] "Sample data variance: 25.3161795497843"
Visually the differences can be displayed for example as following:
If we find normal distribution with the same standard deviation and mean as our resampled means and compare the densities, we can “eyeball” the plot to see how close the densities are.