In the current project, the relationship of the exponential distribution is analyzed when we take a large sample, simulate it and take the central limit theorem as the central axis for the investigation.
The packages to be used for the project are:
library(ggplot2)
library(nortest)
We will simulate a thousand times the distribution of 40 exponentials. With parameter lambda = 0.2
lambda <- 0.2 #parameter
sample_size <- 40 #sample size
no_sim <- 1000 #total simulations
set.seed(2020) #We ensure reproducible research
sim_exp <- replicate(sample_size, rexp(no_sim, rate = lambda)) #Simulations
We obtain 1000 means from the simulation distributions.
mean_sim <- as.data.frame(rowMeans(sim_exp))
We obtain the mean of the simulations.
m_sim <- mean(mean_sim[,1])
m_sim
## [1] 5.033948
We compare it with the expected value for an exponential distribution with lambda = 0.2 The expected value for an exponential distribution is calculated as: 1/lambda
m_exp <- 1/lambda
m_exp
## [1] 5
The results are very similar. To better observe it we will graph the distribution.
library(ggplot2)
ggplot(mean_sim,aes(mean_sim[,1])) +
geom_histogram(fill="#9F7EC4", color="#FAA3F4", binwidth=0.38) +
xlab("Mean") +
ggtitle("Simulation of exponentials with lambda = 0.2") +
geom_vline(xintercept = m_sim, color = "red") +
geom_vline(xintercept = m_exp, color = "#E8F916" )
We obtain 1000 variances from the simulation distributions.
var_sim <- apply(sim_exp, 1, var)
We obtain the variance of the set of each simulation and average it.
v_sim <- mean(var_sim)
v_sim
## [1] 25.33102
We compare it with the variance for an exponential distribution with lambda = 0.2 The variance for an exponential distribution is calculated as: 1/(lambda^2)
v_exp <- 1/(lambda^2)
v_exp
## [1] 25
The variance of the sample means is analyzed below.
The variance of sample means
v_sim_mean <- var(mean_sim[,1])
v_sim_mean
## [1] 0.6539053
The theoretical variance of the means
v_teo_mean <- ((1/lambda)^2)/sample_size
v_teo_mean
## [1] 0.625
The results are very similar.
Based on the central limit theorem, we analyze whether the sample of the means has a normal distribution.
We will do a hypothesis test. We will use the Pearson test. The null hypothesis will be that the sample is consistent with a normal distribution.
library(nortest)
pearson.test(mean_sim[,1])$p.value
## [1] 0.06678971
The p-value allows us not to reject the null hypothesis.
Next, we observe how the distribution of the mean of our sample and the normal theoretical distribution behave graphically.
ggplot(mean_sim,aes(mean_sim[,1])) +
geom_histogram(aes(y=..density..), position="identity", fill="#9F7EC4", color="#FAA3F4", binwidth=0.38) +
stat_function(fun = dnorm, colour = "red", args = list(mean = m_sim, sd = sqrt(v_sim_mean))) +
xlab("Mean") +
ggtitle("Comparison between the sample and the normal distribution")
Finally we will compare the theoretical quantiles with those of the sample.
ggplot(mean_sim, aes(sample = mean_sim[,1])) +
stat_qq(alpha = 0.5,colour = "#9F7EC4") +
stat_qq_line( size = 1 , colour = "#FAA3F4") +
ggtitle("Normal probability")
It was observed that the central limit theorem theory can be applied to an exponential distribution when we have a large enough sample.