Overview

This project will investigate the exponential distribution in R comparing with the Central Limit Theorem. It will investigate the distribution of averages of 40 exponentials through simulations.
Regarding the exponential distribution:
- lambda is the rate parameter
- the mean and the standard deviation is 1/lambda
- lambda = 0.2 for all of the simulations

Simulation

The following codes illustrate the random generation for the exponential distribution with size(n) of 1000 and lambda of 0.2.

n <- 1000 # number of simulations
lambda <- 0.2 # lambda for exponential
samp <- 40 # number of sample size
set.seed(100) # create reproducible simulation

# create a (n x samp) matrix of simulations results
expDist <- matrix(rexp(n * samp, lambda), nrow = n)
expDist_means <- apply(expDist, 1, mean)

# plot the means of the exponential distributions
g <- ggplot(data.frame(x = expDist_means), aes(x = x))
g <- g + geom_histogram(binwidth = 0.1)
g <- g + labs(x = "Means", title = "Exponential Distributions of Means")
g

Sample Mean v.s. Theoretical Mean

The theoretical mean \(\mu\) of the exponential distribution with a rate of \(\lambda\) is \(\mu = \frac{1}{\lambda}\).

mu <- 1/lambda
mu

## [1] 5

The mean, \(\bar X\), of sample means of 1000 simulations of 40 randomly sampled exponential distributions is as below:

meanOfmeans <- mean(expDist_means)
meanOfmeans

## [1] 4.999702

\(\bar X\) is 4.9997019, which is close to the theoretical mean, 5.

Sample Variance v.s. Theoretical Variance

The theoretical standard deviation \(\sigma\) of the exponential distribution with a rate of \(\lambda\) is \(\sigma = \frac{1/\lambda}{\sqrt{n}}\). So the variance is \(\sigma^2\).

sd <- (1/lambda)/sqrt(samp)
sd

## [1] 0.7905694

var <- sd^2
var

## [1] 0.625

The variance, \(\sigma^2_x\) and standard deviation, \(\sigma_x\), of sample means of 1000 simulations of 40 randomly sampled exponential distributions is as below:

varOfmeans <- var(expDist_means)
varOfmeans

## [1] 0.6335302

sdOfmeans <- sd(expDist_means)
sdOfmeans

## [1] 0.7959461

As shown by the results, both the sample variance and standard deviation is close to the theoretical counterparts.

Distribution

m <- ggplot(data.frame(x = expDist_means), aes(x = x))
m <- m + geom_histogram(aes(y = ..density..), fill = "lightblue", alpha = 0.7)
m <- m + labs(title = "Distribution of the Means of 1000 Simulations of 40 Samples", x = "Means", y = "Density")
m <- m + geom_vline(aes(xintercept = meanOfmeans, colour = "blue"))
m <- m + geom_vline(aes(xintercept = mu, colour = "red"))
m <- m + stat_function(fun = dnorm, args = list(mean = meanOfmeans, sd = sdOfmeans), color = "blue", size = 1.0)
m <- m + stat_function(fun = dnorm, args = list(mean = mu, sd = sd), colour = "red", size = 1.0)
m

The above graph shows that sampling distribution of sample size 40 of exponential distributions with 1000 simulations is approximately normal.

Statistical Inference Course Project Part I

Jackie

2017年11月23日

Overview

Simulation

Sample Mean v.s. Theoretical Mean

Sample Variance v.s. Theoretical Variance

Distribution