Synopsis

In the current project, the relationship of the exponential distribution is analyzed when we take a large sample, simulate it and take the central limit theorem as the central axis for the investigation.

Development

Packages

The packages to be used for the project are:

library(ggplot2)
library(nortest)

Simulation

We will simulate a thousand times the distribution of 40 exponentials. With parameter lambda = 0.2

lambda <- 0.2 #parameter
sample_size <- 40 #sample size 
no_sim <- 1000 #total simulations

set.seed(2020) #We ensure reproducible research
sim_exp <- replicate(sample_size, rexp(no_sim, rate = lambda)) #Simulations

Comparation means

We obtain 1000 means from the simulation distributions.

mean_sim <- as.data.frame(rowMeans(sim_exp))

We obtain the mean of the simulations.

m_sim <- mean(mean_sim[,1])
m_sim
## [1] 5.033948

We compare it with the expected value for an exponential distribution with lambda = 0.2 The expected value for an exponential distribution is calculated as: 1/lambda

m_exp <- 1/lambda
m_exp
## [1] 5

The results are very similar. To better observe it we will graph the distribution.

library(ggplot2)
ggplot(mean_sim,aes(mean_sim[,1])) + 
        geom_histogram(fill="#9F7EC4", color="#FAA3F4", binwidth=0.38) +
        xlab("Mean") +
        ggtitle("Simulation of exponentials with lambda = 0.2") +
        geom_vline(xintercept = m_sim, color = "red") + 
        geom_vline(xintercept = m_exp, color = "#E8F916" )

Comparation variance

We obtain 1000 variances from the simulation distributions.

var_sim <- apply(sim_exp, 1, var)

We obtain the variance of the set of each simulation and average it.

v_sim <- mean(var_sim)
v_sim
## [1] 25.33102

We compare it with the variance for an exponential distribution with lambda = 0.2 The variance for an exponential distribution is calculated as: 1/(lambda^2)

v_exp <- 1/(lambda^2)
v_exp
## [1] 25

The variance of the sample means is analyzed below.

The variance of sample means

v_sim_mean <- var(mean_sim[,1])
v_sim_mean 
## [1] 0.6539053

The theoretical variance of the means

v_teo_mean <- ((1/lambda)^2)/sample_size
v_teo_mean  
## [1] 0.625

The results are very similar.

Relation to normal distribution

Based on the central limit theorem, we analyze whether the sample of the means has a normal distribution.

We will do a hypothesis test. We will use the Pearson test. The null hypothesis will be that the sample is consistent with a normal distribution.

library(nortest)
pearson.test(mean_sim[,1])$p.value  
## [1] 0.06678971

The p-value allows us not to reject the null hypothesis.

Next, we observe how the distribution of the mean of our sample and the normal theoretical distribution behave graphically.

ggplot(mean_sim,aes(mean_sim[,1])) +
        geom_histogram(aes(y=..density..), position="identity", fill="#9F7EC4", color="#FAA3F4", binwidth=0.38) +
        stat_function(fun = dnorm, colour = "red", args = list(mean = m_sim, sd = sqrt(v_sim_mean))) +
        xlab("Mean") +
        ggtitle("Comparison between the sample and the normal distribution") 

Finally we will compare the theoretical quantiles with those of the sample.

ggplot(mean_sim, aes(sample = mean_sim[,1])) +
        stat_qq(alpha = 0.5,colour = "#9F7EC4") +
        stat_qq_line( size = 1 , colour = "#FAA3F4") +
        ggtitle("Normal probability")  

Conclusion

It was observed that the central limit theorem theory can be applied to an exponential distribution when we have a large enough sample.