The goal of this assignment is to investigate the exponential distribution and compare it with the Central Limit Theorem. The exponential distribution is simulated in R with following parameters:
In particular, the investigation will include:
The exponential distribution is simulated as required parameters:
Each simulation is recorded as a column of the data matrix. Hence, the data matrix has dimension of 40x1000. Moreover, mean of each simulation is mean column of data matrix.
library(ggplot2)
library(knitr)
n = 40 #sample size
rep = 1000 #number of repetition
lambda = 0.2
set.seed(200)
data = matrix(rexp(rep * n, rate=lambda), n)
dim(data)
## [1] 40 1000
sim.mean = colMeans(data) # mean of each simulations
hist(sim.mean, col="skyblue1",breaks =40,border = "aliceblue")
The comparation result is shown in the table below:
actual.mean = round(mean(sim.mean),3)
theo.mean = round(1/lambda,3)
kable(matrix(c(as.character(actual.mean),as.character(theo.mean)),ncol = 2),
col.names= c("Actual mean","Theoretical mean"),format = "markdown")
| Actual mean | Theoretical mean |
|---|---|
| 4.984 | 5 |
The comparation result is shown in the table below
actual.var = round(var(sim.mean),3)
theo.var = round(((1 / lambda)^2) / n,3)
kable(matrix(c(as.character(actual.var),as.character(theo.var)),ncol = 2),
col.names= c("Actual variance","Theoretical variance"),format="markdown")
| Actual variance | Theoretical variance |
|---|---|
| 0.658 | 0.625 |
Firstly, examine the hitogram of the mean of simulated mean. The blue curve indicates the actual shape of sample distribution and the red curve indicates the theoretical normal distribution. As the figure showing, the shape of the sample distribution is closely to the shape of normal distribution. As the Central Limit Theoream states that “with mean and standard deviation and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed”
ggplot(data.frame(sim.mean), aes(x =sim.mean))+
geom_histogram(aes(y=..density..),fill = "skyblue1", colour="aliceblue",binwidth = 0.15)+
labs(title = "Distribution of simulated means", x = "Simulated means", y = "Density")+
geom_vline(aes(xintercept = actual.mean, colour = "actual mean"),size=1)+
geom_vline(aes(xintercept = theo.mean, colour = "theoretical mean"),size=1)+
geom_density(aes(colour="sample exponential distribution"),size=1.15)+
stat_function(fun = dnorm,
args = list(mean = theo.mean, sd = sqrt(theo.var)),
aes(colour="normal sample distribution"),
size = 1.15)+
scale_color_manual(name = "Variable",
values = c("normal sample distribution" = "red3",
"sample exponential distribution" = "royalblue",
"theoretical mean"= "springgreen3",
"actual mean" = "yellow"))
Next, examine 95% confidence interval of both sample distribution and normal distribution in the below table:
theo.CI = round((theo.mean + c(-1,1)*1.96*sqrt(theo.var/n)),3)
actual.CI = round((actual.mean + c(-1,1)*1.96*sqrt(actual.var/n)),3)
kable(matrix(c(paste("(",actual.CI[1],",",actual.CI[2],")"),
paste("(",theo.CI[1],",",theo.CI[2],")")),ncol = 2),
col.names= c("Actual Confidence Interval","Theoretical Confidence Interval"))
| Actual Confidence Interval | Theoretical Confidence Interval |
|---|---|
| ( 4.733 , 5.235 ) | ( 4.755 , 5.245 ) |
As the above resut showing, the Central Limit Theorem is verified by simulating an exponential distribution with sufficiently large sample size (n=40). All of the properties is examined. The actual mean, the actual variance, the 95% confidence interval and the distribution shape are closely similar to those of normal distribution with sample size 40.