In this project I investigate the exponential distribution in R and compare it with the Central Limit Theorem.
n <- 40
lambda <- 0.2
simul <- 1000
set.seed(1)
simul_data <- matrix(rexp(n * simul, rate = lambda), simul)
simul_means <- (apply(simul_data, 1, mean)) #rowMeans(simul_data)
hist(simul_means, breaks = 40, col = "yellow",
main = "Distribution of 1000 Simulated Means",
xlab = "Mean of 40 Samples")
simul_mean <- mean(simul_means)
simul_mean
## [1] 4.990025
theor_mean <- 1 / lambda # Theoretical Mean
theor_mean
## [1] 5
Performing a t test of the theoretical mean highlights a p-value of 0.6882554 therefore we fail to reject the null hypothesis (p>0.5).
t.test(simul_means, mu=theor_mean)$p.val
## [1] 0.6882554
A check of the simulated mean of means highlights a p-value of 1 which again shows we fail to reject the null hypothesis. This p-value also represents that the mean in this case is the true mean of the data (p=1).
t.test(simul_means, mu=simul_mean)$p.val
## [1] 1
simul_var <- var(simul_means)
simul_var
## [1] 0.6177072
theor_var <- (1 / lambda)^2 / n # Theoretical Variance
theor_var
## [1] 0.625
The simulated mean variance and thoretical mean variancec are similar.
Calculate the Stanadard Deviations, these will be used for comparing the distrubutions in the next section.
simul_SD <- sd(simul_means)
simul_SD
## [1] 0.7859435
theor_SD <- 1/(lambda * sqrt(n))
theor_SD
## [1] 0.7905694
require(ggplot2)
## Loading required package: ggplot2
df <- data.frame(simul_means)
ggplot(df, aes(x = simul_means)) +
geom_histogram(aes(y=..density..), colour="black",
fill = "yellow", bins = 40) +
geom_vline(aes(xintercept = simul_mean)) +
geom_vline(aes(xintercept = theor_mean)) +
stat_function(fun = dnorm, args = list(mean = simul_mean, sd = simul_SD), color = "blue", size = 0.5) +
stat_function(fun = dnorm, args = list(mean = theor_mean, sd = theor_SD), colour = "red", size = 0.5) +
labs(title = "Distribution of Means of 1000 Simulations of 40 Samples",
x = "Mean of 40 Samples",
y = "Density") +
theme_bw()
Next the sampled distribution is compared to the normal distribution ‘dnorm’ using both the simulated and theoretical parameters to define the normal distribution. In this case the simulated means demonstrates a relatively normal distribution.