Compare the exponential distribution in R to the central limit theorem.the standard deviation is also \(\frac{1}{\lambda}\) and the mean of the Exponential distribution is \(\frac{1}{\lambda}\). This project will focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.
Simulate the exponential distribution in R with rexp(n, lambda), where lambda is the rate parameter and n is the number of observations.lambda = 0.2.
First we load the ggplot2 plotting library.
library(ggplot2)
Initialize the simulation controlling variables.
noSim <- 1000
sampSize <- 40
lambda <- 0.2
Set seed of Random Number Generator, so analysis is reproducible.
set.seed(3)
Create a matrix with thousand rows to 1000 simulations and 40 columns to each of 40 random simulations.
simulationMatrix <- matrix(rexp(n = noSim * sampSize, rate = lambda), noSim, sampSize)
Create vector of 1000 rows containing the mean of each row of the simulationMatrix.
simulationMean <- rowMeans(simulationMatrix)
Create data frame containing the whole data.
simulationData <- data.frame(cbind(simulationMatrix, simulationMean))
Plot the simulation data.
ggplot(data = simulationData, aes(simulationData$simulationMean)) +
geom_histogram(breaks = seq(2, 9, by = 0.2), col = "blue", aes(fill = ..count..)) +
labs(title = "Histogram of Mean Distribution", x = "Simulation Means", y = "Frequency") +
geom_vline(aes(xintercept=mean(simulationData$simulationMean)), color="red",
linetype="dashed", size=1)
The actual mean of the simulated mean sample data is 4.9866197, calculated by:
actualMean <- mean(simulationMean)
Theoretical mean = 5:
theoreticalMean <- (1 / lambda)
Simulated mean sample data is similar to the theoretical mean of original data distribution.
Actual variance of the simulated mean sample data is 0.6257575, calculated by:
actualVariance <- var(simulationMean)
Theoretical variance = 0.625:
theoreticalVariance <- ((1 / lambda) ^ 2) / sampSize
hence the actual variance of the simulated mean sample data is similar to the theoretical variance of original data distribution.
To tell that the distribution is approximately normal, create approximate normal distribution and study how the sample data alligns with it. Then compare 05% confidence intervals of the simulated mean sample data and the theoretical normally distributed data and plot for the qunatiles.
qplot(simulationMean, geom = 'blank') +
geom_line(aes(y=..density.., colour='Empirical'), stat='density', size=1) +
stat_function(fun=dnorm, args=list(mean=(1/lambda), sd=((1/lambda)/sqrt(sampSize))),
aes(colour='Normal'), size=1) +
geom_histogram(aes(y=..density.., fill=..density..), alpha=0.4,
breaks = seq(2, 9, by = 0.2), col='red') +
scale_fill_gradient("Density", low = "yellow", high = "red") +
scale_color_manual(name='Density', values=c('brown', 'blue')) +
theme(legend.position = c(0.85, 0.60)) +
labs(title = "Mean Density Distribution", x = "Simulation Means", y = "Density")
The simulated mean sample data can be accurately approximated with the normal distribution from the above histogram.
actualConfInterval <- actualMean+c(-1,1)*1.96*sqrt(actualVariance)/sqrt(sampSize)
theoreticalConfInterval <- theoreticalMean+c(-1,1)*1.96*
sqrt(theoreticalVariance)/sqrt(sampSize)
Actual 95% confidence interval is [4.7414712, 5.2317681] and Theoretical 95% confidence interval is [4.755, 5.245] and we can conclude that both are approximately same.
qqnorm(simulationMean)
qqline(simulationMean)
The actual quantiles also almost identical to the theoretical quantiles, therefore we can tell that the distribution is approximately normal.