Statistical Inference Course Project 1

Outline

This project investigates the exponential distribution in R and compares it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the arrival rate or any rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. In this experiment lambda = 0.2 has been set for all simulations. Then the distribution of averages of 40 exponentials is computed with thousand simulations.

Simulations

The plots of the result are generated by ggplot2 libraries.

# set seed for reproducability
set.seed(2016)

# Set sampling values as described in the project instructions
lambda <- 0.2   # lambda

n <- 40         # number of exponentials

sims <- 1000    # number of simulations

#Run simulations
sim_exp <- replicate(sims, rexp(n, lambda))

#Calc the means of the exponential simulations
means_exp <- apply(sim_exp, 2, mean)

#Histogram of the means

hist(means_exp, breaks=40, xlim = c(2,9), main="Exponential Function Simulation Means", col = "blue")

Sample Mean versus Theoretical Mean

The expected mean \(\mu\) of a exponential distribution of rate \(\lambda\) is

\(\mu= \frac{1}{\lambda}\)

mean(means_exp)

## [1] 4.979186

Let \(v\) be the average sample mean of 1000 simulations of 40 randomly sampled exponential distributions.

# plot histogram of the sample means
hist(means_exp, col="blue", main="Theoretical Mean vs. Actual Mean", xlim = c(2,9),breaks=40, xlab = "Simulation Means")

# plot a vertical red line at the mean of the sample means
abline(v=mean(means_exp), lwd="4", col="red")

As you can notice that computed mean \(\mu\) is 5.0 and simulated mean is = 4.994975 which are very very close.

Sample Variance versus Theoretical Variance

# theoretical standard deviation vs. simulation standard deviation
print(paste("Theoretical standard deviation: ", round( (1/lambda)/sqrt(n) ,4)))

## [1] "Theoretical standard deviation:  0.7906"

The variance \(Var\) of standard deviation \(\sigma\) is \(Var = \sigma^2\) .

print(paste("Practical standard deviation: ", round(sd(means_exp) ,4)))

## [1] "Practical standard deviation:  0.7991"

print(paste("Theoretical variance: ", round( ((1/lambda)/sqrt(n))^2 ,4)))

## [1] "Theoretical variance:  0.625"

print(paste("Practical variance: ", round(sd(means_exp)^2 ,4)))

## [1] "Practical variance:  0.6385"

As you can see the theoretical varince and standard deviations are very close to practical variance and standard deviation. Since variance is the square of the standard deviations, minor differnces will persist, but will still be close.

Distribution

Comparing the population means & standard deviation with a normal distribution of the expected values.

# plot the means
#General Plot with ditribution curve drawn
hist(means_exp, prob=TRUE, col="blue", main="Exponential Function Simulation Means", breaks=40, xlim=c(2,9), xlab = "Simulation Means")
lines(density(means_exp), lwd=3, col="red")
# Normal distribution line creation
x <- seq(min(means_exp), max(means_exp), length=2*n)
y <- dnorm(x, mean=1/lambda, sd=sqrt(((1/lambda)/sqrt(n))^2))
lines(x, y, pch=22, col="black", lwd=2, lty = 2)

As the graph shows, the distribution of means of our sampled exponential distributions appear to follow a normal distribution, due to the Central Limit Theorem. If we increased our number of samples (currently 1000), the distribution would be even closer to the standard normal distribution.The dotted line above is a normal distribution curve and we can see that it is very close to our sampled curve, which is the red line above.