This document shows how the mean of an exponential distribuion can be estimated using the Central limit Theorem (CLT). Firstly, random numbers have been generated from the exponential distribution. Then, means of such random numbers (across simulations) have been calculated to show that the distribution of such means is normal (gaussian).
Furthermore, mean of the distribution of averages is shown to be an unbiased estimator of the mean of the the orginal distribution (exponential distribution in our case). Additionally, variance of the distribution of the averages has been related to the variance of the exponential function.
nsim*sample_size random exponential numbersmeans_summary dataframe to hold the averages of nsim simulationsExpRandoms in means_summary dataframe containing nsim random exponential numbers. This is done to ,later on, compare distribution of averages with the distribution of exponentialsnsim <- 1000
sample_size <- 40
lambda <- 0.2
set.seed(1234)
simulations <- matrix(data=rexp(n=nsim*sample_size,rate=lambda), nrow=nsim, ncol=sample_size)
means_summary <- data.table(means=rowMeans(x=simulations))
means_summary[, ExpRandoms := rexp(n=nsim, rate=lambda)]
Sample Mean is quite close to the Theoretical Mean. Thus Sample mean is an unbiased estimator of the population mean.
simulation_mean <- mean(means_summary[,means])
theorertical_mean <- 1/lambda
simulation_mean_label <- paste('Simulation Mean = ',round(simulation_mean,4))
theoretical_mean_label <- paste('Theoretical Mean = ',round(theorertical_mean,4))
print(simulation_mean_label)
[1] "Simulation Mean = 4.9742"
print(theoretical_mean_label)
[1] "Theoretical Mean = 5"
Theoretical Variance is \(\frac{1}{sample\,size}\) times the variance of the exponential distribution. Thus, as the sample size for simulations increases, the variance of the distribution of averages decreases. Our sample variance is quite close to the theoretical variance of the averages distribution.
simulation_sd <- sd(means_summary[,means])
theoretical_sd <- 1/lambda/sqrt(sample_size)
print(paste('Simulation Variance', round(simulation_sd^2,4), sep=' = '))
[1] "Simulation Variance = 0.595"
print(paste('Theorertical Variance', round(theoretical_sd^2,4), sep=' = '))
[1] "Theorertical Variance = 0.625"
We can infer the following from the figure:
binwidth <- 1/6
labels <- c('Sampling Distribution','Random Exponentials',simulation_mean_label,
theoretical_mean_label)
variance_text_box <- paste(paste('Simulation SD', round(simulation_sd,4), sep=' = '),
paste('Theorertical SD', round(theoretical_sd,4), sep=' = '), sep='\n')
ggplot(data=means_summary, mapping=aes(x=means)) +
geom_histogram(mapping=aes(x=means, color='MeanExpRandom'),binwidth=binwidth, fill=NA) +
geom_histogram(mapping=aes(x=ExpRandoms, color='ExpRandom'),binwidth=binwidth, fill=NA) +
stat_function(fun = function(x, mean, sd, n) n*dnorm(x=x, mean=mean, sd=sd),
args = list(mean=simulation_mean, sd=simulation_sd, n=nsim*binwidth)) +
geom_vline(mapping=aes(xintercept=simulation_mean, color='SimulationMean')) +
geom_vline(mapping=aes(xintercept=1/lambda, color='TheoreticalMean')) +
labs(title='Distribution of Sample Means and Random Exponentials',
x='Mean/Value of Random Exponential', y='Count') +
scale_color_manual(name = "Legend", values =
c(MeanExpRandom='black', ExpRandom='green',
SimulationMean = "blue", TheoreticalMean = "red"),
labels=c(MeanExpRandom='Sampling Distribution', ExpRandom='Random Exponentials',
SimulationMean=simulation_mean_label, TheoreticalMean=theoretical_mean_label)) +
geom_text(aes(x=20, y=80,label = variance_text_box), vjust = "inward", hjust = "inward", color='black') +
theme_bw() + xlim(0,20)
The QQ Plot, shown below, of the averages distribution coincides with the theoretical line for a normal distribution.
ggplot(data=means_summary, mapping=aes(sample=means)) +
stat_qq() + stat_qq_line() +
labs(title='QQ Plot for Averages Distribution')