Simulation Exercise1

Overview

Wikipedia describes the exponential distribution as “the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate2”. It is characterized by the probability density function \(f(x) = \lambda e^{-\lambda x}\) over the support \([0, \infty)\) (i.e. for non-negative values of \(x\))3.

The goal of the present analysis is to perform a simulation so as to compare the sample mean and variance to the “theoretical”4 mean and variance, as well as to show how the sampling distribution of the mean tends to a normal distribution5.

Theoretical (population) Mean & Variance

It can be shown, by solving the corresponding defining integrals, that the expected value or “theoretical mean” of an exponential is just the reciprocal \(\mu = \frac{1}{\lambda}\) (also known as the “scale” parameter of the exponential), while the variance is just the square of the mean, which implies that the standard deviation is the same as the expected value.

I start by setting up the parameters for the simulation exercise.

lambda = 0.2 # lambda parameter of the exponential function 
n = 40 # sample size for the exponential distributions (input 'n' in the rexp() function) 
B = 1000 # number of replications 

Given such a parameter, we can asnwer the first question of the assignment: the table6 below shows the computed theoretical mean, variance and standard deviation.

\(\lambda = 0.2\) Mean \(\mu\) Variance \(\sigma^{2}\) Standard Deviation \(\sigma\)
Formula \(1/\lambda\) \(1/\lambda^{2}\) \(1/\lambda\)
Value 5 25 5

Simulation

A basic for loop is used to iterate up to the desired number of simulations; for each iteration, the mean is computed for an exp(n=40, \(\lambda\)=.2). The computed means are stored in a vector. Code and output is in the Appendix7.

Results

The average of the simulated means is 4.99: it is basically equal to the “theoretical” mean \(\mu\) of 5. The sampling and theoretical variances are pretty close as well: 0.625 and 0.6344 respectively8.

In order to compare the simulated sampling distribution and the theoretical (normal) distribution under the CLT, I superimpose the corresponding Normal PDF (solid red curve), as well as a kernel density estimated curve (dashed green line), to the histogram9 of the simulated sample means. The result is Figure 1 in the Appendix.

A simple plot comparing the quantiles of the simulated distribution and the normal one shows that there are some discrepancies, especially on the right tail; recall, however, that while the support for a Normal is the real line, an exponential distribution is defined only for non-negative values. Please refer to Figure 2 in the Appendix.

Appendix

Theoretical mean, variance, standard deviation

muExp <- 1/lambda 
sdExp <- 1/lambda 
varExp <- 1/(lambda^2) 
cat(" Theoretical mean =", muExp, "= Theoretical standard deviation", "\n", 
    "Theoretical variance =", varExp) 
##  Theoretical mean = 5 = Theoretical standard deviation 
##  Theoretical variance = 25

Theoretical variance of the sampling distribution of the (sample) mean

sampling.var <- varExp/n 
sampling.sd <- sdExp/sqrt(n) 
cat("Theoretical variance of sampling distribution =", 
       round(sampling.var, digits = 4)) 
## Theoretical variance of sampling distribution = 0.625

Simulation details

set.seed(42) # Answer to the Ultimate Question of Life, the Universe, and Everything 
sample.means <- NULL # Creates a null vector; to be filled as the loop iterates 
for (b in 1:B) { # B is the n° of simulations we are asked to perform  
                 # (i.e., we want to end up with 1000 means from the simulation) 
    sample.means <- c(sample.means, mean(rexp(n, lambda))) 
} 

Results

avg.sample.means <- mean(sample.means) 
sd.sample.means <- sd(sample.means) 
var.sample.means <- var(sample.means) 
cat(" Average of the simulated sampling distribution of means =", 
    round(avg.sample.means, digits = 2), "\n", 
    "Standard deviation of the simulated sampling distribution of means =", 
    round(sd.sample.means, digits = 4), "\n", 
    "Variance of the simulated sampling distribution of means = ", 
    round(var.sample.means, digits = 4) 
)
##  Average of the simulated sampling distribution of means = 4.99 
##  Standard deviation of the simulated sampling distribution of means = 0.7965 
##  Variance of the simulated sampling distribution of means =  0.6344

Plots

Figure 1

library(kdensity) 
kdensity.means <- kdensity(sample.means, start = "gumbel", 
                           kernel = "gaussian", na.rm = TRUE) 
xSeq <- seq(0, 8, length = 80) 
NormalPDF <- dnorm(xSeq, mean = muExp, sd = sampling.sd) # Normal under CLT 
hist(sample.means, col = "lightblue", breaks = 20, freq = FALSE, 
     main = "Sampling Distribution vs Normal PDF", 
     xlab = "Simulated Means", ylab = "Density") 
abline(v = muExp, lty = 1, lwd = 2.5, col = "darkorange2") 
legend(legend = "mean", "topright", lty = 1, lwd = 2.25, 
       bty = "n", col = "darkorange2") 
lines(xSeq, NormalPDF, lty = 1, col = "red") 
lines(kdensity.means, lty = 2, col = "darkgreen") 

Figure 2

qqnorm(sample.means) 
qqline(sample.means, col = "red") 


  1. Please note that blank space and division in sections were used to make the report more readable. Moreover, the Appendix shows code chunks (except those for the plots) for reproducibility. Please do factor these: overall, length requirements are (more or less) met!↩︎

  2. https://en.wikipedia.org/wiki/Exponential_distribution↩︎

  3. The exponential distribution is a special case of the Gamma family of distributions \(f(r, \lambda)\), for \(r = 1\). See e.g. Mood, A., Graybill, F. A., & Boes, D. C. (1974). Introduction to the Theory of Statistics, McGraw-Hill.↩︎

  4. Note that, abiding by the naming convention of the assignment, “theoretical” refers to the population mean of a process following an exponential distribution (for a given \(\lambda\)), as well as the “theoretical” mean and variance from the Central Limit Theorem (henceforth CLT).↩︎

  5. To be more precise, the CLT states that, assuming the mean \(\mu\) and variance \(\sigma^{2}>0\) exist, finite, then \(\forall\) \(-\infty < x < \infty\), \(\sqrt{n}(\bar{X}-\mu) / \sigma\), where \(\bar{X}\) is the sample mean, tends in distribution to a Standard Normal, as the sample size \(n \longrightarrow \infty\). See e.g. Casella, G., & Berger, R. L. (2002), Statistical Inference, Cengage Learning.↩︎

  6. The Appendix also reports these results, computed in R.↩︎

  7. A seed is set for reproducibility. For the choice of seed, please do read Adams, D. (2017). The Ultimate Hitchhiker’s Guide to the Galaxy: The Complete Trilogy in Five Parts (Vol. 6). Pan Macmillan.↩︎

  8. See the Appendix for code and output↩︎

  9. For the histogram, 20 break points were specified in the function call. This is often used as a “rule of thumb” number of bins↩︎