Statistical Inference Course Project

Overview

In this project we will investigate the exponential distribution, \(X \sim Exp(\lambda)\), in R and compare it with the Central Limit Theorem.
A per project instructions, the rate parameter \(\lambda\) must be \(0.2\) and this determines the mean \(\mu\) and standard deviation \(\sigma\), both equal to \(1/ \lambda=5\).
The properties of the distribution of sample averages will be illustrated via simulation and compared with the theoretical mean \(\mu\) and variance \(\sigma_{\bar X_{n}}^2\).

Simulations

The R function \(rexp(n, lambda)\) will be used to simulate \(n=40\) random samples from the population of interest, \(X \sim Exp(\lambda)\).
We will do a simulation in R, take a sample mean \(\bar x\) and repeat these two steps \(B=1000\) times to get a distribution of sample averages.

# setting up the simulation parameters
lambda <- 0.2    # rate for the exponential distribution
n <- 40          # sample size
B <- 1000        # number of simulations

set.seed(2021)   # seed for reproducibility

# running the simulation and creating the array of sample means
SampleMeans <- sapply( 1:B, function(i) mean(rexp(n, lambda)) )

# a quick check
properties <- matrix( data=c( mean(SampleMeans), 1/lambda, var(SampleMeans), (1/lambda)^2/n ),
         nrow=2, ncol=2, dimnames=list(c("mean","var"), c("sample","theory")), byrow = TRUE )

properties <- round( properties , 3)
properties

##      sample theory
## mean  5.009  5.000
## var   0.631  0.625

Sample Mean versus Theoretical Mean

The LLN states that averages of \(iid\) samples converge to the population mean, \(\mu\), that they are estimating.
The CLT states that averages are approximately normal, with distributions centered at the population mean, \(\mu=5\).

# plotting the distribution of sample averages
hist( SampleMeans, main="Distribution of sample averages", breaks = 101, freq=FALSE,
      xlim=range(2,8), xlab="Sample means" )

abline(v=properties["mean","theory"], col="blue")  # vertical blue line for theoretical mean
abline(v=properties["mean","sample"], col="red" )  # red line for the mean of sample averages

legend("topright", c( paste("theoretical mean =",properties["mean","theory"]),
   paste("sample mean =",properties["mean","sample"]) ), col=c("blue","red"), lwd=2, cex=0.8)

round( mean(SampleMeans) , 3 )   # calculating the mean of sample averages (actual)

## [1] 5.009

We can see that the mean (actual) of sample averages equals 5.009. It is very close to the population (theoretical) mean, \(\mu=\) 5.

Sample Variance versus Theoretical Variance

The CLT states that averages of \(iid\) samples are approximately normal, with standard deviation equal to the standard error of the mean.
Standard error of the mean is \(\sigma / \sqrt{n}\), where \(\sigma\) is \(1/ \lambda\), so the theoretical variance of the sample mean, \(\sigma_{\bar X_{n}}^2\), is \(\sigma^2/n = (1/ \lambda)^2/ n\).

(1/lambda)^2/n   # calculating the variance of the sample mean (theoretical)

## [1] 0.625

round( var(SampleMeans), 3 )   # calculating the actual variance of sample averages

## [1] 0.631

The actual variance of sample averages, \(S_{\bar x}^2\), equals 0.631. It is close to the theoretical variance, \(\sigma_{\bar X_{n}}^2\), which is 0.625.

Distribution

According to the CLT, the distribution of sample averages, \(\bar x_{n}\), should be approximately normal and becomes that of \(N(\mu , \sigma^2/n)\) with larger \(n\).

# plotting the distribution of sample averages
hist( SampleMeans, main="Approximation with normal distribution", breaks = 101, freq=FALSE,
      xlim=range(2,8), xlab="Sample means" )

# vertical blue line for theoretical mean
abline( v = properties["mean","theory"] , col="blue" , lwd=2 )  

# plotting the normal distribution
curve( dnorm(x, mean=1/lambda, sd=1/lambda/sqrt(n)), from=2, to=8, type="l", col='red',
       lwd=2, add=TRUE )

legend( "topright", c("theoretical mean","normal distribution"), col=c("blue","red"),
        lwd=3, cex=0.8 )

The above plot illustrates the distribution of observed sample averages, \(\bar x_{n}\), which appears to follow the normal distribution \(N(\mu , \sigma^2/n)\).
This is consistent with the Central Limit Theorem.

Statistical Inference Course Project

Part 1: Simulation Exercise

Vladimir Maganov

2021-10-24

Overview

Simulations

Sample Mean versus Theoretical Mean

Sample Variance versus Theoretical Variance

Distribution