Central Limit Theorem and Statistical Inference

Introduction

This paper aims to explore the Central Limit Theorem and exponential distribution in R. The distribution can be simulated with rexp(n, lambda). Lambda is the rate parameter .02, the mean of exponential distribution is 1/lambda, and the standard deviation is also 1/lambda. The sample consists of 40 exponentials and a thousand simulations.

Simulation

First we set the seed to any random variable so that our findings are reproducible.

set.seed(5)
rnorm(5)

## [1] -0.84085548  1.38435934 -1.25549186  0.07014277  1.71144087

Then we set our parameters and compare the means of the two distributions.

n <- 40 ##number of exponentials
lam <- 0.2 ##lamda
sim <- 1000 ##number of simulations
simdata <- matrix(rexp(n*sim, lam), sim, n) ##create matrix of exponentials
datamean <- apply(simdata, 1, mean) ##calculate mean of each row in the matrix
meansam <- round(mean(datamean), 3) ##find the mean of the sample
meanthry <- round(1/lam, 3) ##find the mean using the Central Limit Theorem
cat("The mean of the sample is: ", meansam)

## The mean of the sample is:  5.043

cat("The theoretical mean according to the Central Limit Theorem is: ", meanthry)

## The theoretical mean according to the Central Limit Theorem is:  5

Clearly the two means are very close. If we compare the variances, we find the two are similar as well:

varsam <- round(var(datamean), 3)
varthry <- (1/lam)^2/n
cat("The variance of the sample is: ", varsam)

## The variance of the sample is:  0.678

cat("The theoretical variance according to the Central Limit Theorem is: ", varthry)

## The theoretical variance according to the Central Limit Theorem is:  0.625

Plot Histogram

For a graphical demonstration of the distribution and means, we create a simple histogram.

Confidence Intervals

The sections above demonstrate the normality of the distribution. To understand how confident we can be about this statement, we measure the confidence intervals.

samconf <- round (mean(datamean) + c(-1,1)*1.96*sd(datamean)/sqrt(n),3)
thryconf <- meanthry + c(-1,1)*1.96*sqrt(varthry)/sqrt(n);
cat("The 95% Confidence Intervals of the sample are: ", samconf)

## The 95% Confidence Intervals of the sample are:  4.788 5.298

cat("The 95% Confidence Intervals according to the Central Limit Theorem are: ", thryconf)

## The 95% Confidence Intervals according to the Central Limit Theorem are:  4.755 5.245

Plot Quantiles

Finally viewing the quantiles in a plot further demonstrate the how closely the theoretical quantiles match those of the sample.

qqnorm(datamean, main="Plot of Means"); qqline(datamean)

Once again, we can firmly conclude that the distribution of the sample obeys the Central limit Theorem.