In this project I investigate the exponential distribution in R and compare it with the Central Limit Theorem. I also provide the explanations and codes to assure the reproducibility of the investigation.
This report is based at the distribution of averages of 40 exponentials and a thousand simulations.
Codes to set the chosen parameters. The parameters for lambda, number of exponentials and simulations were instructed.
set.seed(2016)
lambda <- .2
exponumber <- 40
simunumber <- 1000
data_simulation <- matrix(rexp(simunumber*exponumber, rate = lambda), simunumber, exponumber)
Code to plot the exponential distribution:
plot(rexp(1000, lambda), pch = 20, cex = .6, main = "The exponential distribution with rate 0.2 and 1.000 observations", col = "#2e8130")
Code to plot the averages of 40 exponentials in a histogram:
averages <- rowMeans(data_simulation)
hist(averages, xlab = "Averages", ylab = "Frequence", main = "Histogram from the averages of 40 exponentials", col = "#2e8130")
Codes to calculate the sample mean and the theoretical mean of the distribution and to plot the comparison:
sample_mean <- mean(averages)
cat("Sample Mean: ", sample_mean)
## Sample Mean: 4.979186
theo_mean <- 1/lambda
cat("Theoretical Mean: ", theo_mean)
## Theoretical Mean: 5
hist(averages, col="#2e8130", main="Sample vs Theoretical Mean", breaks=20)
l1 <- abline(v=mean(theo_mean), lwd="4", col="#c728c9")
l2 <- abline(v=mean(sample_mean), lwd="2", col="#f3eb17")
legend(6.2, 90, c("Theoratical", "Sample"), lty = c(1, 1), lwd = c(2.5, 2.5), col = c("#c728c9", "#f3eb17"))
Codes to calculate the variances (both sample and theoretical):
sample_var <- var(averages)
cat("Sample Variance: ", sample_var)
## Sample Variance: 0.6379013
theo_var <- (1/lambda)^2/exponumber
cat("Theoratical Variance: ", theo_var)
## Theoratical Variance: 0.625
As we can see above, the variances are very close.
Code to create an approximate normal distribution and aligns with the the sample:
hist(averages, density = 20, breaks = 20, prob = TRUE, xlab = "40 exponentials averages", ylab = "Frequence", main = "Distribution Analysis", col = "#f0e8f0")
curve(dnorm(x, mean = sample_mean, sd = sqrt(sample_var)), col = "#e6b818", lwd=2, lty = "dotted", add = TRUE, yaxt="n")
curve(dnorm(x, mean = theo_mean, sd = sqrt(theo_var)), col = "#18aae6", lwd=2, lty = "dotted", add = TRUE, yaxt="n")
As seen above, the histogram can be approximated with the normal distribution. To be really sure, I propose to investigate if the exponential distribution is approximately normal. Due to the Central Limit Theorem, the averages of the sample mean should follow a normal distribution.
Code to plot the exponential distribution:
hist(averages, prob = TRUE, col = "#f0e8f0", main = "Exponential Distribution", density = 20, breaks = 20)
lines(density(averages), lwd = 2, col = "#e6b818")
The Distribution plot and the Exponential Distribution plot are similar. Therefore, I can infer that the distribution is approximately normal.