We simulate 1000 sets of random variables that follow an exponential distribution. Each set will have 40 random variables (i.e. a sample size of 40). The mean and variance of each sample set are compared against the true population mean. Their distributions are compared with the normal distribution.
n <- 40 # Set sample size
lambda <- 0.2 # Set exponential distribution rate parameter
nosim <- 1000 # Set number of trials / experiments
meansim <- NULL
varsim <- NULL
# For each sample set, compute the mean and variance.
for(i in 1:nosim) {
meansim <- c(meansim, mean(rexp(n, rate = lambda)))
varsim <- c(varsim, var(rexp(n, rate = lambda)))
}
The average sample means for 1000 simulations is 5.0266734, while the theoretical population mean is 5 (red vertical line in the historgram). Therefore the average of simulated sample means is very close to the theoretical population mean. Collecting large numbers of samples give a good estimation for the theoretical population mean of an exponential distribution.
The average sample means for 1000 simulations is 24.872329, while the theoretical population variance is 25 (red vertical line in the histogram). Therefore the average of simulated sample variances is very close to the theoretical population variance.
First we simulate 1000 sets of 40-random variables that follow a normal distribution, just as what we did for the exponential distribution.
The means of the normal samples are centered at 4.999637, and has a variance of 25.2336215. These values are close to the means and variances of the exponential samples, which are centered at 5.0266734 and 24.872329 respectively. Visualized through plots:
Both exponentially (red circles) and normally (blue circles) distributed random variables have sample means and variances that are distributed around, and close to the true population mean and variance (black horizontal lines).
To further visualize the distribution of the sample means and variances for both distributions:
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
The upper plot indicates that the distribution of sample means look roughly the same for exponentially (salmon colored) and normally (cyan) distributed variables. Both are centered at the population mean 5.
The sample variance distribution for exponentially distributed variables is a little skewed to the left, although still roughly centered at the population mean 25, just as the normally distributed variables do.