In this project, an investigation of exponential distribution (with mean and standard deviation of 5) in r is compared to central limit theorem. The questions to be answered are as follows:
Here, we will use 40 exponentials with lambda = 0.2 and run 1000 simulations as given in the specs.
# Required library
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
# lambda is 0.2
lambda = 0.2
# we will be using 40 exponentials
n = 40
# we will be running 1000 simulations
nsims = 1:1000
# set a seed to reproduce the data
set.seed(500)
# gather the means
means <- data.frame(x = sapply(nsims, function(x) {mean(rexp(n, lambda))}))
# lets take a looks at the top means
head(means)
## x
## 1 4.958067
## 2 5.178396
## 3 5.683279
## 4 5.619003
## 5 4.433293
## 6 5.867299
# Theoretical mean for lamda = 0.2 is
theomean <- 1/lambda
theomean
## [1] 5
# Sample mean of the simulation is
simmean <- mean(means$x)
simmean
## [1] 5.010562
Simulated mean and theoretical means are very close
# Theoretical standard deviation
theosd <- (1/lambda)/sqrt(n)
theosd
## [1] 0.7905694
# Theoretical variance
theovar <- theosd^2
theovar
## [1] 0.625
# Simulated standard deviation
simsd <- sd(means$x)
simsd
## [1] 0.7874779
# Simulated variance
simvar <- var(means$x)
simvar
## [1] 0.6201215
Again, the theoretical variance is very close to simulated variance
ggplot(data = means, x= c(-x, x), aes(x = x)) +
geom_histogram(binwidth=0.1, aes(y=..density..)) +
geom_density(colour="blue", size=2) +
geom_vline(xintercept = simmean, size=2, colour="blue") +
stat_function(fun = dnorm, args = list(mean = theomean , sd = theosd), colour = "green", size=2) +
scale_y_continuous(breaks = NULL) +
geom_vline(xintercept = theomean, size=2, colour="green") +
labs(x="Means") +
labs(y="Density")
By looking at the graph we can see that the distribution of the simulated means (blue) approaches the normal distribution (green) and that their means (blue and green vertical lines, respectively) are very close together as well.This shows that the simulated means is approximately a normal distribution.