Part 1: Simulation Exercise

Overview

In this project the exponential distribution in R is used to investigate mean value and distribution under the lens of the Central Limit Theorem (CLT) and the Law of the Large Numbers (LLN).

Simulations

We investigate the characteristics of the distribution of averages from 40 exponential values.

Theoretical characteristics of the exponential distribution are set as following: Lambda = 0.2 Mean = 5 (1/lambda) Standard deviation = 5 (1/lambda)

Each simulated mean are obtained from random exponential samples: Formula = rexp (n, rate) n = 40 (exponential samples) rate = 0.2 (lambda)

CLT states that the distribution of averages of iid random variables becomes that of a standard normal as the sample size increases. LLN states that the sample mean of iid samples is consistent for the population mean, and that the sample variance and the sample standard deviation of iid random variables are consistent as well.

To figure this out, we investigate characteristics and distribution of 1000 averages of 40 exponential samples.

Sample Mean versus Theoretical Mean

Theoretical Mean = 5 (1/lambda)

Comparison of averages from 1000 simulation of 40 samples to theoretical mean.

Histogram of averages with theoretical mean illustrate the distribution of obtained means around theoretical mean. Evolution of sample mean over number of simulation highlight the trend toward theoretical mean.

## [1] "1000*40 Samples mean = 5.02"

Sample Variance versus Theoretical Variance

Theoretical Variance = 25 (Standard deviation^2)

Comparison of variances from 1000 simulation of 40 samples to theoretical variance.

Histogram of variances with theoretical variance will illustrate the distribution of obtained variances around theoretical variance.

## [1] "1000*40 Samples variance = 24.57"

As stated, the sample variance, estimates the population variance, The distribution of the sample variance is centered around population variance.

Evolution of sample variance over number of simulation highlight the trend toward theoretical variance.

Distribution

Given CLT and LLN, if we use obtained 1000 sample means (“averages” vector) and we apply normalization by subtracting theoretical mean to each sample mean and dividing by Standard deviation. In our case: (averages-mean(averages))/5

We obtain data which tend towards a standard normal distribution (Mean = 0, SD = 1, calculation details in Appendix) The histogram display normalized sample mean and theoretical normal bell curve obtained with dnorm formula “dnorm(x, mean=0, sd=1)”

## [1] "Normalized mean = 0"
## [1] "Normalized standard deviation = 1"

Appendix code

Sample Mean versus Theoretical Mean

# Averages of 1000 simulation with 40 samples is stored in "averages" vector
averages = NULL
for (i in 1 : 1000) averages = c(averages, mean(rexp(n = 40, rate = 0.2)))

# Sample mean and Standard error
print(paste0("1000*40 Samples mean = ", round(mean(averages),2)))

# Averages distribution and value over simulations
par(mfrow=c(1,2))

hist(averages, 
     xlab = "Sample mean",
     main = "Distribution of 1000 averages \n from 40 samples",
     sub = "(samples from exponential distribution)")
abline(v = 5, col = "red")

n <- 1000
Average <- cumsum(sample(averages, n))/(1:n)
plot(Average, type = "l", 
     main = "Average value over simulations", 
     xlab = "Number of simulations")
abline(h = 5, col = "red")

Sample Variance versus Theoretical Variance

# Variances of 1000 simulation with 40 samples is stored in "variances" vector
variances = NULL
for (i in 1 : 1000) variances = c(variances, var(rexp(n = 40, rate = 0.2)))

# Sample mean and Standard error
print(paste0("1000*40 Samples variance = ", round(mean(variances),2)))

# Averages distribution and value over simulations
par(mfrow=c(1,2))

hist(variances, 
     xlab = "Sample variance",
     main = "Distribution of 1000 variances \n from 40 samples",
     sub = "(samples from exponential distribution)")
abline(v = 25, col = "red")

n <- 1000
Variance <- cumsum(sample(variances, n))/(1:n)
plot(Variance, type = "l", 
     main = "Variance value over simulations", 
     xlab = "Number of simulations")
abline(h = 25, col = "red")

Distribution

normalized_averages <- (averages-mean(averages))/sd(averages)
print(paste0("Normalized mean = ", round(mean(normalized_averages),2)))
print(paste0("Normalized standard deviation = ", round(sd(normalized_averages),2)))

hist(normalized_averages,
     ylim = c(0,0.45),
     freq = FALSE,
     xlab = "Normalized mean",
     main = "Distribution of 1000 normalized averages \n (from 40 samples)",
     sub = "(samples from exponential distribution)")
curve(dnorm(x, mean=0, sd=1), add=TRUE, lwd = 2)