1 Overview

This report would investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution would be simulated in R with \[rexp(n, \lambda)\] where lambda is the rate parameter. The mean of exponential distribution is \[1/\lambda\] and the standard deviation is also \[1/\lambda\]. Set \[\lambda = 0.2\] for all of the simulations. It would investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

2 Simulations

lambda=0.2
#create simulation data using
set.seed(1000)
simData <- matrix(rexp(1000*40, lambda), nrow = 1000, ncol = 40)

2.1 Question 1: Mean versus Theoretical Mean:

This simulation shows the sample mean and compares it to the theoretical mean of the distribution.

distMean <- apply(simData, 1, mean)
t_mean=1/lambda
s_mean=mean(distMean)
hist(distMean, breaks = 50, main = "The distribution of 1000 means of 40 random exponentials", 
     xlab = "Value of means", 
     ylab = "Frequency of means", col = "pink")
abline(v = 1/lambda, lty = 1, lwd = 2, col = "blue")
abline(v = mean(simData), lty = 1, lwd = 2, col = "yellow")
legend("topright", lty = 1, lwd = 2, col = c("blue","yellow"), 
       legend = c("Theoretical Mean","Sample Mean"))

The Theoretical Mean of the distribution is 5 and the Sample Mean is 4.9869634 The histogram above shows the theoretical mean with the blue line and the sample mean with the yellow line. The figure also shows that the sample mean of the distribution is very close to its theoretical mean.

2.2 Question 2: Sample Variance versus Theoretical Variance

This simulation shows how variable the sample is (via variance) and compares it to the theoretical variance of the distribution

distVar <- apply(simData, 1, var)
t_var=((1/lambda)^2)
s_var=mean(simData^2)-mean(simData)^2
hist(distVar, breaks = 40, 
     main = "The distribution of 1000 variances of 40 random exponentials", 
     xlab = "Value of Variance", ylab = "Frequency of Variance", col = "light blue")
abline(v = t_var, lty = 1, lwd = 2, col = "blue")
abline(v =s_var, lty = 1, lwd = 2, col = "yellow")
legend("topright", lty = 1, lwd = 2, col = c("blue","yellow"), 
       legend = c("Theoretical Variance","Sample Variance"))

The Theoretical Variance of the distribution is 25 and the Sample variance is 25.1238078 The histogram above shows the theoretical variance with the blue line and the sample variance with the yellow line. The figure also shows that the sample variance of the distribution is very close to its theoretical variance

2.3 Question 3: Distribution

This simulation tests to see if the distribution is normal. To do this,I would use the density plot and the q-q plot to check normalty visualy

2.3.1 Density Plot Normalty Test

#Density plot of distribution Means

ggplot(data.frame(distMean), aes(x=distMean))+ 
        labs(title = "Density plot of the distribution of 1000 means of 40 random exponentials ",
             x = "Distribution Means",
             y = "Density") +
     geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.5,
                   colour="black", fill="white") +
    geom_density(alpha=.2, fill="#FF6666")  # Overlay with transparent density plot

The density plot above is bell shaped implying the distribution is normal.

2.3.2 Q-Q Plot Normalty Test

#Q-Q plot of distribution Means

ggplot(data.frame(distMean), aes(sample = distMean)) + 
    stat_qq(color="blue") + 
    stat_qq_line() + 
        labs(title = "Q-Q plot of the distribution of 1000 means of 40 random exponentials ") 

Q-Q plot draws the correlation between a given sample and the normal distribution. A 45-degree reference line is also plotted.The distribution as shown in blue is approximately normal