Simulation Exercise
By: Mohammed Teslim
5/10/22
The central limit theorem is known to be one of the pillars of statistical inference. The theorem enables us to visualize or at least know more about the distribution of statistics, albeit, we can only lay our hands on one. What this means in simple terms is that we can learn about distribution of statistics (e.g mean) of identically independent samples from our population even though we only have one - which is our sample statistic. This report intends to investigate the exponential distribution in R and compare it with the central limit theorem.
To explore the exponential distribution, we simulate with a sample size of 40 and lambda of 0.2. This simulation shall be ran a thousand times so the distribution of the means can be visualized.
mns <- vector("double", 1000)
for (i in 1 : 1000){
mns[[i]] <- mean(rexp(40,0.2))}
As can be seen above, an empty vector of length 1000 and class doubles was created to store the means of each simulation. A for loop is utilized to iterate over each number from 1 to 1000 by which the mean of each simulation is stored.
standard_devs <- vector("double", 1000)
for (i in 1 : 1000){
standard_devs[[i]] <- sd(rexp(40,0.2))}
Similar thing is done to assess how variable the samples. The
standard deviation for each sample is stored in the vector called
standard_devs and ran a thousand times, using similiar
iteration technique as above.
The theoretical mean for an exponential distribution is \(1/lambda\). Lambda here is
0.2, so 1/0.2 equals 5.
hist(mns, main = "Distribution of 1000 averages")
From the figure above, we can see that the distribution is centered
around 5 which is the theoretical mean, which is in keeping
with the central limit theorem.
The theoretical standard deviation for this distribution is also
\(1/lambda\). Lambda here is
0.2, so 1/0.2 equals 5 which is
the theoretical standard deviation.
hist(standard_devs, main = "Distribution of 1000 SDs")
According to the central limit theorem, it is expected that a
distribution of standard deviations should be centered around the
population standard deviation, in this case, the theoretical standard
deviation. That is exactly what we see here with the distribution
centered around 5 as calculated.
We compare two distributions; a distribution of a large sample size and a distribution of averages.
hist(rexp(1000, 0.2), main = "Histogram of one sample of size 1000")
hist(mns, main = "Distribution of 1000 averages")
From the figures, we can see the distribution of a large sample size is not normal (it is not centered around any particular value), however the distribution of the averages is essentially normal (centered around the population mean), thus further confirming the central limit theorem.