This exercise will illustrate the Central Limit Theorem, which states that the distribution of a sample statistic is normal with mean equal to the population statistic it estimates and variance equal to the square of the standard error (population variance divided by sample size). I will perform 1000 simulations of drawing 40 random numbers from an exponential distribution with lambda = 0.2. I will calculate a sample mean for each simulation.
As an example, the following R code selects 40 random exponentials. The set seed function allows the random draw to be reproduced.
set.seed(8)
x <- rexp(40, 0.2)
For the simulation, I will construct a matrix with 40 columns (my samples) and 1000 rows (the repetitions) filled with random exponentials drawn from a population with lambda = 0.2. I will calculate the mean of each row of the matrix and plot a histogram of these sample means. For reproducibility, I will use the set.seed() function.
nosim <- 1000 # number of simulations (number of matrix rows)
n <- 40 # random draws per simulation (number of matrix columns)
set.seed(8) # reproducibility function
samples <- matrix(rexp(nosim * n, rate = 0.2), nrow = nosim, ncol = n) # populate matrix with exponentials
sampleMeans <- apply(samples, 1, mean) # take mean of each row
It is clear that the distribution of these 40 samples follows an exponential distribution. This sample has mean of 4.44, highlighted by the red line. The values were selected from a population with mean of 1/lambda = 1/0.2 = 5, highlighted with the blue, dashed line.
This histogram plots the squared distance of each random number from the mean. The sum of all of the values in the histogram is equal to the sample variance (red line), which for this sample is 20.16. The population variance (blue line) depends only on lambda and is 1/lambda^2 = 1/0.2^2 = 1/0.04 = 25.
This histogram plots the mean of each 40-value sample. The curve represents a normal distribution centered on the population mean (5) with a standard deviation equal to the population variance (25) divided by the sample size (40). It is a good fit for our data, showing that the distribution of sample means is approximately normal. In particular, recall that 1000 exponentials would form an exponential curve similar to the distribution of 40 random samples above, but 1000 means of exponentials form a normal distribution.