Simulation Exercise: exponential distribution and Central Limit Theorem

library(ggplot2)
library(ggthemes)

Overview

In the current project, it will be investigated the exponential distribution and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.

lambda = 0.2 for all the simulations.
averages of 40 exponentials.
do a thousand simulations.

Simulations.

The seed will be “1”

set.seed(1)

0.2->lambda
40->n
1:1000->n_simulations

In this part, we are simulating the population, and then make a hist graph.

sapply(n_simulations, function(x) {mean(rexp(n, lambda))})->pop

data.frame(pop)->data

The below code setups the paramters, these parameters includes the rate (lambda), number of exponentials and the number of simulations we wish to run.

The graph shows that the data tends to get a Bell Curve shape.

ggplot(data, aes(x=pop, fill=pop)) + 
  geom_histogram(aes(fill=..count..), bins = 40) +
  labs(title="Averages of 40 Exponentials over 1000 Simulations", y="Frequency", x="Mean") + theme_tufte()

Question 1 - Sample Mean vs Theoretical Mean

The mean of the exponential distribution is 1/lambda. We already know that lambda is 0.2.

mean(data$pop)->pop_mean
1/lambda->dist_mean
means<-data.frame(cbind(pop_mean, dist_mean))
names(means)<-c("Sample mean","Theorical mean")
means

In the table below, it’s shown that the sample and theorical mean are very close.

ggplot(data, aes(x=pop, fill=pop)) + 
  geom_histogram(aes(fill=..count..), bins = 40) +
  labs(title="Averages of 40 Exponentials over 1000 Simulations", y="Frequency", x="Mean") + theme_tufte() + geom_vline(xintercept=5, linetype="dashed", 
                color = "red", size=1)+
     geom_vline(xintercept=5.011911,  
                color = "yellow", size=1)+ theme(legend.position = "none")

Applying the t.test function, we can check that the sample mean is between 4.96 and 5.06.

t.test(data$pop)[4]

## $conf.int
## [1] 4.941515 5.038536
## attr(,"conf.level")
## [1] 0.95

Question 2 - Sample Variance vs Theoretical Variance

In the table, we can check both variances, theorical and sample variance, are very close between them.

var(data$pop)->sample_var
((1/lambda)^2)/n->theorical_var

data.frame(cbind(sample_var, theorical_var))->vars
names(vars)<-c("Sample Variance","Theorical Variance");vars

Question 3 - Distribution

As we can see in the graph, the densities of both the normal and the means of the exponential distribution are very close to each other, thanks to the central limit theorem.