Overview

This is the course project for the statistical inference from Coursera. The project investigates the exponential distribution in R and compares it with the Central Limit Theorem.

The project consists of two parts.
1. A simulation exercise.
2. Basic inferential data analysis.

Part 1. Simulation exercise

1.0 Simulation and Distribution of the Sample mean data

set.seed(5)
lambda <- 0.2 #the given parameter
n <- 40 #number of exponentials
simul <- 1000 #number of simulations

# the simulation data and distribution
simuldata <- data.frame(ncol = 2,nrow = simul)
names(simuldata) <- c("simulation","mean")
for (i in 1:simul) {
    simuldata[i,1] <- i
    simuldata[i,2] <- mean(rexp(n,lambda))
}

# the simulation data mean distribution
hist(simuldata$mean, 
     col = "lightblue", breaks = 20, xlab = "Means", 
     main = "Distribution of Sample means")

1.1 Sample Mean versus Theoretical Mean

theo_mean <- 1/0.2
paste(" The Theoretical mean is " ,round(theo_mean,2))
## [1] " The Theoretical mean is  5"
simul_mean <- mean(simuldata$mean)
paste(" The Sample mean is ",round(simul_mean,2))
## [1] " The Sample mean is  5.04"
#Let’s check with simulated data mean and theoretical mean with vertical lines.
hist(simuldata$mean, 
     col = "lightblue", breaks = 20, xlab = "Means", 
     main = "Distribution of Sample means")
rug(simuldata$mean)

abline(v=theo_mean, lwd="4", col="red")
abline(v=simul_mean, lwd="4", col="blue")

text(6.5, 80, paste("Sample mean = ", round(simul_mean,2)), col="blue")
text(6.5, 90, paste("Theoretical mean = ", round(theo_mean, 2)), col="red")

1.2 Sample Variance versus Theoretical Variance

#The standard deviation of the exponential distribution is (1/lambda)/sqrt(n).
paste("Theoretical standard deviation: ", round( (1/lambda)/sqrt(n) ,2), 
      ", Sample standard deviation", round(sd(simuldata$mean) ,2) ) 
## [1] "Theoretical standard deviation:  0.79 , Sample standard deviation 0.78"
paste("Theoretical variance: ", round((1/lambda)^2/n,2), 
      ", Sample variance", round(var(simuldata$mean) ,2) ) 
## [1] "Theoretical variance:  0.62 , Sample variance 0.6"

1.3 Is the distribution of means normal?

hist(simuldata$mean, 
     probability = TRUE, col = "lightblue", breaks = 20, xlab = "Means", 
     main = "Distribution of Sample means")
lines(density(simuldata$mean), lwd =3, col = "blue")

#Generate normal distribution line
x <- seq(min(simuldata$mean), max(simuldata$mean), length.out = 1000)
y <- dnorm(x, mean = 1/lambda, sd = (1 / lambda)/sqrt(n))
lines(x, y, pch=22, col="black", lwd=2, lty = 2)

Conclusion :

Due to the Central Limit Theorem, the averages of samples should follow a normal distribution. Like the graph shown, the Sample distribution of means of random sampled exponantial distributions overlaps with the normal distribution.