This report would investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution would be simulated in R with \[rexp(n, \lambda)\] where lambda is the rate parameter. The mean of exponential distribution is \[1/\lambda\] and the standard deviation is also \[1/\lambda\]. Set \[\lambda = 0.2\] for all of the simulations. It would investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
lambda=0.2
#create simulation data using
set.seed(1000)
simData <- matrix(rexp(1000*40, lambda), nrow = 1000, ncol = 40)
This simulation shows the sample mean and compares it to the theoretical mean of the distribution.
distMean <- apply(simData, 1, mean)
t_mean=1/lambda
s_mean=mean(distMean)
hist(distMean, breaks = 50, main = "The distribution of 1000 means of 40 random exponentials",
xlab = "Value of means",
ylab = "Frequency of means", col = "pink")
abline(v = 1/lambda, lty = 1, lwd = 2, col = "blue")
abline(v = mean(simData), lty = 1, lwd = 2, col = "yellow")
legend("topright", lty = 1, lwd = 2, col = c("blue","yellow"),
legend = c("Theoretical Mean","Sample Mean"))
The Theoretical Mean of the distribution is 5 and the Sample Mean is 4.9869634 The histogram above shows the theoretical mean with the blue line and the sample mean with the yellow line. The figure also shows that the sample mean of the distribution is very close to its theoretical mean.
This simulation shows how variable the sample is (via variance) and compares it to the theoretical variance of the distribution
distVar <- apply(simData, 1, var)
t_var=((1/lambda)^2)
s_var=mean(simData^2)-mean(simData)^2
hist(distVar, breaks = 40,
main = "The distribution of 1000 variances of 40 random exponentials",
xlab = "Value of Variance", ylab = "Frequency of Variance", col = "light blue")
abline(v = t_var, lty = 1, lwd = 2, col = "blue")
abline(v =s_var, lty = 1, lwd = 2, col = "yellow")
legend("topright", lty = 1, lwd = 2, col = c("blue","yellow"),
legend = c("Theoretical Variance","Sample Variance"))
The Theoretical Variance of the distribution is 25 and the Sample variance is 25.1238078 The histogram above shows the theoretical variance with the blue line and the sample variance with the yellow line. The figure also shows that the sample variance of the distribution is very close to its theoretical variance
This simulation tests to see if the distribution is normal. To do this,I would use the density plot and the q-q plot to check normalty visualy
#Density plot of distribution Means
ggplot(data.frame(distMean), aes(x=distMean))+
labs(title = "Density plot of the distribution of 1000 means of 40 random exponentials ",
x = "Distribution Means",
y = "Density") +
geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis
binwidth=.5,
colour="black", fill="white") +
geom_density(alpha=.2, fill="#FF6666") # Overlay with transparent density plot
The density plot above is bell shaped implying the distribution is normal.
#Q-Q plot of distribution Means
ggplot(data.frame(distMean), aes(sample = distMean)) +
stat_qq(color="blue") +
stat_qq_line() +
labs(title = "Q-Q plot of the distribution of 1000 means of 40 random exponentials ")
Q-Q plot draws the correlation between a given sample and the normal distribution. A 45-degree reference line is also plotted.The distribution as shown in blue is approximately normal