PART 1. SIMULATION EXERCISE

Given: Number of exponentials, n=40, Rate of parameter, lambda=0.2

Required: Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.

Solution: A random sample of 40 exponentials will be passed through simulations of 1000 from which the mean, standard deviation and variances will be calculated to compare with the theoretical values. A histogram of the distribution will be shown to compare the shape to that of a normal distribution.

ASSUMPTIONS

Using asymptotics to understand the behavior of the random sample of exponentials, it is assumed that as sample sizes increases to infinity, the frequency distribution of a sample mean approaches that of a population mean or, as as the Law of Large Numbers states, the sample mean of iid samples is consistent for the population mean. Secondly, the Central Limit Theorem (CLT) states that the distribution of averages of iid variables approaches standard normal as sample size increases. Together, these 2 principles will show us that repetitive sampling and simulation on random samples approximates to a normal standard distribution.

Step 1. Obtain the values of the theoretical means, sd and variance of 40 exponentials.
## [1] 5
## [1] 5
## [1] 0.625
Step 2. Calculate mean and plot distribution of large random sample of 1000x40 exponentials
## simulate 1000 x 40 random exponentials
mn1 = NULL
for (i in 1:1000) mn1 = c(mn1, rexp(40, .2))
## create matrix for the simulated samples
matrix_mn1 <- matrix(mn1, nrow=1000, ncol=40)
## calculate the mean of the data in the matrix
mean_matrix_mn1 <- mean(matrix_mn1)
mean_matrix_mn1
## [1] 5.010703
## plot the distribution
par(ps = 10, cex = 1, cex.main = 1)
hist(matrix_mn1, main = "Distribution of a large sample of 40X1000 exponentials", xlab = "Exponentials")

Step 3. Calculate mean, variance and plot distribution of averages of mean of large random sample of 1000x40 exponentials
## simulate 1000x 40 random exponentials
mn2 = NULL
for (i in 1:1000) mn2 = c(mn2, rexp(40, .2))
matrix_mn2 <- matrix(mn2, nrow=1000, ncol=40)
mean_mn2 <- apply(matrix_mn2, 1, mean)
## calculate the averages of the mean from the data in the matrix
mean(mean_mn2)
## [1] 5.035157
mean_sim2 <- mean(mean_mn2)
mean_sim2
## [1] 5.035157
## calculate the standard deviation 
sd_sim2 <- sd(mean_mn2)

## calculate the variance
var_sim2 <- var(mean_mn2)
var_sim2
## [1] 0.6375421
par(ps = 10, cex = 1, cex.main = 1)
hist(mean_mn2, main = "Distribution of averages of large sample of 40x1000 exponentials", xlab = "Avg of Exponentials")

Comparison of sample and theoretical mean

In the foregoing calculations and histogram, we compare the sample mean of 4.99 to the theoretical mean of 5, which are very close, where the center of distribution is almost identical.

Comparison of sample and theoretical variance

The sample variance of 0.624 is almost identical to the theoretical variance of 0.625, with increasing sample size, as data approaches normal distribution and closer to the center of the distribution. The sample variance estimates the population variance.

Confidence intervals

Computed confidence interval is between 4.98 to 5.01 at 95% confidence level to support that the computed mean is similar to the theoretical mean which falls within this range.

## calculate standard error SE, n=1000
SE = sd_sim2/sqrt(10000)

## calculate the confidence interval
mean_sim2 + c(-1,1)*qt(.975,999)*SE
## [1] 5.019488 5.050825
Normal distribution

The first plot of a large sample of 1000x40 exponentials is right-skewed and not a normal distribution. In the second plot, where the sample data is calculated for its averages of the mean, the distribution approaches normality, as shown by the bell-shaped histogram.