INTRODUCTION

In this project I investigate the exponential distribution in R and compare it with the Central Limit Theorem.

n <-  40
lambda <- 0.2
simul <- 1000
set.seed(1)
simul_data <- matrix(rexp(n * simul, rate = lambda), simul)
simul_means <- (apply(simul_data, 1, mean)) #rowMeans(simul_data)
hist(simul_means, breaks = 40, col = "yellow", 
     main = "Distribution of 1000 Simulated Means", 
     xlab = "Mean of 40 Samples")

Mean of Means

simul_mean <- mean(simul_means)
simul_mean 
## [1] 4.990025

Theoretical Mean

theor_mean <- 1 / lambda # Theoretical Mean
theor_mean
## [1] 5

Student’s t Test

Performing a t test of the theoretical mean highlights a p-value of 0.6882554 therefore we fail to reject the null hypothesis (p>0.5).

t.test(simul_means, mu=theor_mean)$p.val
## [1] 0.6882554

A check of the simulated mean of means highlights a p-value of 1 which again shows we fail to reject the null hypothesis. This p-value also represents that the mean in this case is the true mean of the data (p=1).

t.test(simul_means, mu=simul_mean)$p.val
## [1] 1

Variance Comparisons

simul_var <- var(simul_means)
simul_var
## [1] 0.6177072
theor_var  <- (1 / lambda)^2 / n # Theoretical Variance
theor_var
## [1] 0.625

The simulated mean variance and thoretical mean variancec are similar.

Calculate the Stanadard Deviations, these will be used for comparing the distrubutions in the next section.

simul_SD <- sd(simul_means)
simul_SD
## [1] 0.7859435
theor_SD <- 1/(lambda * sqrt(n))
theor_SD
## [1] 0.7905694

Compare Distributions

require(ggplot2)
## Loading required package: ggplot2
df <- data.frame(simul_means)
ggplot(df, aes(x = simul_means)) +
geom_histogram(aes(y=..density..), colour="black",
                        fill = "yellow", bins = 40) +
geom_vline(aes(xintercept = simul_mean)) +
geom_vline(aes(xintercept = theor_mean)) +
stat_function(fun = dnorm, args = list(mean = simul_mean, sd = simul_SD), color = "blue", size = 0.5) +
stat_function(fun = dnorm, args = list(mean = theor_mean, sd = theor_SD), colour = "red", size = 0.5) + 
labs(title = "Distribution of Means of 1000 Simulations of 40 Samples", 
     x = "Mean of 40 Samples", 
     y = "Density") +
theme_bw()

Next the sampled distribution is compared to the normal distribution ‘dnorm’ using both the simulated and theoretical parameters to define the normal distribution. In this case the simulated means demonstrates a relatively normal distribution.