Overview

In this project, an investigation of exponential distribution (with mean and standard deviation of 5) in r is compared to central limit theorem. The questions to be answered are as follows:

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Simulations

1. Sample mean vs Theoretical mean

Here, we will use 40 exponentials with lambda = 0.2 and run 1000 simulations as given in the specs.

# Required library
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
# lambda is 0.2
lambda = 0.2

# we will be using 40 exponentials
n = 40

# we will be running 1000 simulations
nsims = 1:1000

# set a seed to reproduce the data
set.seed(500)

# gather the means
means <- data.frame(x = sapply(nsims, function(x) {mean(rexp(n, lambda))}))

# lets take a looks at the top means
head(means)
##          x
## 1 4.958067
## 2 5.178396
## 3 5.683279
## 4 5.619003
## 5 4.433293
## 6 5.867299
# Theoretical mean for lamda = 0.2 is
theomean <- 1/lambda
theomean
## [1] 5
# Sample mean of the simulation is
simmean <- mean(means$x)
simmean
## [1] 5.010562

Simulated mean and theoretical means are very close

2. Sample variance vs Theoretical variance

# Theoretical standard deviation
theosd <- (1/lambda)/sqrt(n)
theosd
## [1] 0.7905694
# Theoretical variance
theovar <- theosd^2
theovar
## [1] 0.625
# Simulated standard deviation
simsd <- sd(means$x)
simsd
## [1] 0.7874779
# Simulated variance
simvar <- var(means$x)
simvar
## [1] 0.6201215

Again, the theoretical variance is very close to simulated variance

3. Is distribution normal?

ggplot(data = means, x= c(-x, x), aes(x = x)) + 
  geom_histogram(binwidth=0.1, aes(y=..density..)) +
  geom_density(colour="blue", size=2) +
  geom_vline(xintercept = simmean, size=2, colour="blue") + 
  stat_function(fun = dnorm, args = list(mean = theomean , sd = theosd), colour = "green", size=2) +
  scale_y_continuous(breaks = NULL) + 
  geom_vline(xintercept = theomean, size=2, colour="green") +
  labs(x="Means") +
  labs(y="Density")

By looking at the graph we can see that the distribution of the simulated means (blue) approaches the normal distribution (green) and that their means (blue and green vertical lines, respectively) are very close together as well.This shows that the simulated means is approximately a normal distribution.