Overview

This is an Simulation Exercise to explore inference. It will investigate the exponential distribution in R, and compare it with the Central Limit Theorem. Here we calculate sample mean and sample variance and compare it with the theoritical value (for an approximatrly normal distribution).

Simulations

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. lambda = 0.2 is set for all of the simulations. We investigate the distribution of averages of 40 exponentials. We need to do a thousand simulations.

#load libraries

library(knitr)
library(ggplot2)

set.seed(121382)

#Parameters
lambda <- 0.2
sample_size <- 40 
n_simulations <- 1000

simulation_matrix <- matrix(rexp(n_simulations*sample_size, rate=lambda), n_simulations, sample_size)

Sample Mean Vs Theoretical Mean

We calculate sample and theoretical means below:

simulation_mean <- rowMeans(simulation_matrix)

sample_mean <- mean(simulation_mean)
#print(paste0("sample mean: ",sample_mean))

theoretical_mean <- 1/lambda
#print(paste0("theoretical mean: ",theoretical_mean))

Sample mean is 5.0328856 and the theoretical mean is 5. They’re very close.

We can visualize like below:

# visualization for mean
hist(simulation_mean,xlab="Mean for 40 simulations", ylab="Frequency", main="Histogram for Exponential Function Simulations", col="peachpuff")
abline(v = sample_mean, col = "red")
abline(v = theoretical_mean, col = "blue")
legend(x = "topright", 
 c("Sample Mean", "Theoretical Mean"),
 col = c("red","blue"),
 lwd = c(2, 2))

According to the above figure we can see the means are very close.

Sample Variance Vs Theoretical Variance

We calculate sample and theoretical variance below:

sample_var <- var(simulation_mean)
sample_sd <-sd(simulation_mean)
#print(paste0("sample variance: ", sample_var))

theoretical_var <- (1/lambda)^2/sample_size  
theoretical_sd <- sqrt(theoretical_var)
#print(paste0("theoretical variance: ", theoretical_var))

Sample variance is 0.5880936 and the theoretical variance is 0.625. They’re very close.

As Variances = Standard Deviation^2 we can plot standard deviations as below:

# visualization for variance
hist(simulation_mean,xlab="Mean for 40 simulations", ylab="Frequency", main="Histogram for Exponential Function Simulations", col="peachpuff")

abline(v = sample_mean+sample_sd, col = "red")
abline(v = sample_mean-sample_sd, col = "red")
abline(v = theoretical_mean+theoretical_sd, col = "blue")
abline(v = theoretical_mean-theoretical_sd, col = "blue")
legend(x = "topright", 
 c("Sample SD","Theoretical SD"),
 col = c("red","blue"),
 lwd = c(2, 2))

print(sample_mean+sample_sd)

## [1] 5.799758

print(sample_mean-sample_sd)

## [1] 4.266013

print(theoretical_mean+theoretical_sd)

## [1] 5.790569

print(theoretical_mean-theoretical_sd)

## [1] 4.209431

According to the above figure we can see the standard deviations, hence variances are very close.

Distribution

Due to the Central Limit Theorem, the means of the sample simulations should follow a normal distribution. There are 3 evidences below, to show that the simulated the distribution is approximately normal.

Evidence 1

We can compare how the sample aligns with a normal distribution:

hist(simulation_mean, breaks=20, prob=TRUE, xlab="Mean for 40 simulations", ylab="Frequency", main="Histogram for Exponential Function Simulations", col="peachpuff")
curve(dnorm(x, mean=sample_mean, sd=sqrt(sample_var)), col="red", lwd=2, lty = "dotted", add=TRUE, yaxt="n")
curve(dnorm(x, mean=theoretical_mean, sd=sqrt(theoretical_var)), col="blue", lwd=2, add=TRUE, yaxt="n")
legend(x = "topright", 
 c("Sample", "Theoretical"),
 col = c("red","blue"),
 lwd = c(2, 2))

The plot above shows the histogram can be approximated with the normal distribution.

Evidence 2

Let’s calculate sample and theoritical confidence intervals:

sample_conf_interval <- round (mean(simulation_mean) + c(-1,1)*1.96*sd(simulation_mean)/sqrt(n_simulations),3)
theoretical_conf_interval <- theoretical_mean + c(-1,1)*1.96*sqrt(theoretical_var)/sqrt(n_simulations)

Sample 95% confidence interval is [4.985, 5.08]. Theoretical 95% confidence interval is [4.951, 5.049]. They’re very close.

Evidence 3

Let’s create a Normal Q-Q Plot.

qqnorm(simulation_mean, main="Normal Q-Q Plot", xlab="Theoretical Quantiles", ylab="Sample Quantiles")
qqline(simulation_mean, col="blue")

The sample quantiles match closely with the theoretical quantiles.

Statistical Inference Part 1: A Simulation Exercise

Nethika Suraweera

October 06, 2017