Simulation Exercise

By: Mohammed Teslim
5/10/22

Overview

The central limit theorem is known to be one of the pillars of statistical inference. The theorem enables us to visualize or at least know more about the distribution of statistics, albeit, we can only lay our hands on one. What this means in simple terms is that we can learn about distribution of statistics (e.g mean) of identically independent samples from our population even though we only have one - which is our sample statistic. This report intends to investigate the exponential distribution in R and compare it with the central limit theorem.

Simulations

To explore the exponential distribution, we simulate with a sample size of 40 and lambda of 0.2. This simulation shall be ran a thousand times so the distribution of the means can be visualized.

mns <- vector("double", 1000)
for (i in 1 : 1000){
    mns[[i]] <- mean(rexp(40,0.2))}

As can be seen above, an empty vector of length 1000 and class doubles was created to store the means of each simulation. A for loop is utilized to iterate over each number from 1 to 1000 by which the mean of each simulation is stored.

standard_devs <- vector("double", 1000)
for (i in 1 : 1000){
    standard_devs[[i]] <- sd(rexp(40,0.2))}

Similar thing is done to assess how variable the samples. The standard deviation for each sample is stored in the vector called standard_devs and ran a thousand times, using similiar iteration technique as above.

Sample Mean versus Theoretical mean

The theoretical mean for an exponential distribution is \(1/lambda\). Lambda here is 0.2, so 1/0.2 equals 5.

hist(mns, main = "Distribution of 1000 averages")

From the figure above, we can see that the distribution is centered around 5 which is the theoretical mean, which is in keeping with the central limit theorem.

Sample Variance versus Theoretical variance.

The theoretical standard deviation for this distribution is also \(1/lambda\). Lambda here is 0.2, so 1/0.2 equals 5 which is the theoretical standard deviation.

hist(standard_devs, main = "Distribution of 1000 SDs")

According to the central limit theorem, it is expected that a distribution of standard deviations should be centered around the population standard deviation, in this case, the theoretical standard deviation. That is exactly what we see here with the distribution centered around 5 as calculated.

Distribution

We compare two distributions; a distribution of a large sample size and a distribution of averages.

hist(rexp(1000, 0.2), main = "Histogram of one sample of size 1000")

hist(mns, main = "Distribution of 1000 averages")

From the figures, we can see the distribution of a large sample size is not normal (it is not centered around any particular value), however the distribution of the averages is essentially normal (centered around the population mean), thus further confirming the central limit theorem.