library(ggplot2)
library(ggthemes)
In the current project, it will be investigated the exponential distribution and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.
The seed will be “1”
set.seed(1)
0.2->lambda
40->n
1:1000->n_simulations
In this part, we are simulating the population, and then make a hist graph.
sapply(n_simulations, function(x) {mean(rexp(n, lambda))})->pop
data.frame(pop)->data
The below code setups the paramters, these parameters includes the rate (lambda), number of exponentials and the number of simulations we wish to run.
The graph shows that the data tends to get a Bell Curve shape.
ggplot(data, aes(x=pop, fill=pop)) +
geom_histogram(aes(fill=..count..), bins = 40) +
labs(title="Averages of 40 Exponentials over 1000 Simulations", y="Frequency", x="Mean") + theme_tufte()
The mean of the exponential distribution is 1/lambda. We already know that lambda is 0.2.
mean(data$pop)->pop_mean
1/lambda->dist_mean
means<-data.frame(cbind(pop_mean, dist_mean))
names(means)<-c("Sample mean","Theorical mean")
means
In the table below, it’s shown that the sample and theorical mean are very close.
ggplot(data, aes(x=pop, fill=pop)) +
geom_histogram(aes(fill=..count..), bins = 40) +
labs(title="Averages of 40 Exponentials over 1000 Simulations", y="Frequency", x="Mean") + theme_tufte() + geom_vline(xintercept=5, linetype="dashed",
color = "red", size=1)+
geom_vline(xintercept=5.011911,
color = "yellow", size=1)+ theme(legend.position = "none")
Applying the t.test function, we can check that the sample mean is between 4.96 and 5.06.
t.test(data$pop)[4]
## $conf.int
## [1] 4.941515 5.038536
## attr(,"conf.level")
## [1] 0.95
In the table, we can check both variances, theorical and sample variance, are very close between them.
var(data$pop)->sample_var
((1/lambda)^2)/n->theorical_var
data.frame(cbind(sample_var, theorical_var))->vars
names(vars)<-c("Sample Variance","Theorical Variance");vars
As we can see in the graph, the densities of both the normal and the means of the exponential distribution are very close to each other, thanks to the central limit theorem.
ggplot(data, aes(x=pop)) +
geom_histogram(aes(y=..density.., fill=..count..), bins = 40, color="white") +
labs(title="Averages of 40 Exponentials over 1000 Simulations", y="Density", x="Mean") + geom_density(colour="red", size=1.3)+
stat_function(fun=dnorm,args=list( mean=1/lambda,
sd=sqrt(theorical_var)),color =
"yellow", size=1.3)+
theme_tufte() +
annotate(geom="text", x=3.5, y=0.5, label="Density Normal Curve",
color="red")+
annotate(geom="text", x=3.5, y=0.45, label="Density Sample Curve",
color="yellow")+theme(legend.position = "none")