The Central Limit Theorem tells us to expect that through repeated trials that the exponential density function will resemble the normal distribution. It is reasonable to expect some level of symmetry in the data but when our simulated distribution data is overlaid with the normal curve we see that it not exactly normally distributed. However, treating the simulated data as normally distributed a 95% confidence interval does not provide sufficient evidence to reject the hypothesis that the simulated data is centered around a mean of 5.
To generate the random data we created a 1000 by 40 matrix that was populated by setting the seed equal to 42 and then using the rexp function. From this matrix we created a vector to calculate the mean of each of the 40 columns. Using the dplyr package the mean of the 40 calculated means was found to be 4.9865083 which is very close to the expected value of 5.
The exponential density function is known to have a mean and standard deviation of 1/lambda. We will let lambda equal 0.2 so the mean and standard deviation will both be equal to 5.
The expected value of the variance is 1/lambda^2 = 25. The simulated variance is 24.865 and again is close to the expected value.
The histogram displays the 40,000 random data points overlaid by the normal distribution function. The other graph is the QQ-Plot of the simulated data - the linearity of the chart strengthens the argument for normalcy.
To further strengthen the argument for normality we will use the Student-T distribution to generate a confidence interval: 4.935, 5.038. This interval was generated using a confidence level of 95%. Since the confidence interval does contain the expected mean value of 5 then there is not sufficient evidence to suggest that the simulated data has an expected mean value different from 5.
When simulating data to test the normal distribution of the exponential density function we found that the limited number of data does approximate a normal distribution. The simulation mean was not significantly different than the expected value of 5. Also the QQ-Plot shows a reasonable approximation to a linear function.
library(dplyr)
library(datasets)
library(rmarkdown)
library(tinytex)
library(ggplot2)
n <- 40
lambda <- 0.2
nosim <- 1000
set.seed(42)
simulation_data <- matrix(data = rexp(n*nosim,lambda),nrow = nosim)
simulation_means <- data.frame(means = apply(simulation_data,
MARGIN = 1,FUN = mean))
df <- cbind(simulation_data,simulation_means)
mean_simulation <- simulation_means %>%
summarise(sim_mean = mean(means)) %>%
unlist()
expected_variance = round((1/lambda)^2,3)
variance_simulation <- round(mean_simulation^2,3)
my_CI <- round(t.test(simulation_means)$conf,3)
simulation_means %>%
ggplot(aes(means)) +
geom_histogram(aes(y = ..density..),binwidth = 0.25, alpha = 0.8,
fill = "light blue", color = "black") +
labs(title = "Distribution of Exponential Function")+
stat_function(fun = dnorm,
args = list(mean = 1/lambda, sd = 1/lambda/sqrt(n)))
ggplot(df, aes(sample = means)) +
stat_qq(col = "red") +
labs(title = "QQ-Plot of Simulation Means", y = "means" )