First, I make sure that everyone will be able to see the R code, set echo=“TRUE” for the whole document.

knitr:: opts_chunk$set(echo=TRUE, results = "asis", cache = TRUE)

This project is part 1 of course project for my Statistical Inference course that I took on Coursera. This is course number 6 out of 10 courses that I am taking for the Data Science Certificate from Johns Hopkins University through Cousera.

In this project, I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. Note that I will do 1000 simulations. Set up some constants:

lambda = 0.2
n = 40
sim  = 1000
set.seed(12345)

Creat a matrix of 1000 rows and 40 columns of exponential distribution, then calculate the mean of each row:

matrix = matrix(rexp(sim * n,lambda), ncol = n, nrow = sim)
mean_row <- rowMeans(matrix)

Make a histogram of the mean of exponential distribution:

hist(mean_row, xlab = "Mean of 40 exppnentials", ylab="Frequency", col = "green", main = "Distribution of the mean of 40 exponentials")

Compare sample mean vs theoretical mean:

sample_mean <- round(mean(mean_row),3)
theoretical_mean = round(1/lambda,3)

Sample mean is 4.972 and theoretical mean is 5. They are close enough.

Comapre samle variance and theortical variance:

sample_variance <- round(var(mean_row),3)
theoretical_var = round(((1/lambda)^2)/n,3)

Sample variance is 0.616 and theoretical variance is 0.625. Again, they are close enough.

Now, I want to investigate to see if the distribution is approximately normal Now I will show histogram of the mean of exponential distribution and normal distribution with mean = 1/lambda, sd= sqrt((1/lambda)^2/40).

hist(mean_row, breaks = 40,prob=TRUE, col = "green", main = "Histogram of mean of 40 exponentials", xlab = "Mean of each row")
lines(density(mean_row),col = "black", lwd = 2)
curve(dnorm(x,mean=5,sd=sqrt(theoretical_var)), col="red", add=TRUE, yaxt="n", lwd = 3)
legend(6, 0.4, legend=c("Normal Distribution", "Mean of exponential distribution"),col=c("red", "black"),lty = 1:1, cex=0.8)

We can see that the histogram approaches a normal distribution.