Part 1:

Title: Simulation of Exponential Distribution

Overview

The objective of this exercise is to investigate the exponential distribution in R and compare it with the Central Limit Theorem. We will study the simulated mean, variance and distribution profile against the theoretical distribution.

Set up the simulation

Before we start the simulation, we shall determine the corresonding parameter of the distribution. We will set the random seed = 1000, lambda = 0.2, n = 40 and number of simulation = 1000.

The sample matrix is generated as below.

# parameter of the simulation
lambda = 0.2
n = 40
n.sim = 1000

set.seed(1000)
x = rexp(n.sim * n, rate = lambda)
sim = matrix(data = x, nrow = n.sim, ncol = n)

After that, the sample mean of each simulation is calculated.

# calculate the mean of each sampling
sim.mean = apply(sim, 1, mean)
head(sim.mean)
## [1] 4.450697 6.105520 4.933228 5.329610 4.989080 7.080864

Compare simulation mean of sampling distribution with theoretical mean of distribution

Now compare both of the mean and visualise with histogram.

t.mean = 1/lambda
sampling.mean = mean(sim.mean)
## [1] "Theoretical mean: 5"
## [1] "Simulation mean: 4.99"
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

From the histogram of the sampling distribution, we can see that the distribution is approximatedly centered at 5. The reference line further evidence that the sampling distribution mean is a good approximation of the theoretical distribution mean.

Compare simulation mean of sampling distribution with theoretical mean of distribution

Calculate the respective variance of the sampling and theoretical distribution.

# calculate the standard error of the theoretical mean
t.var = (1/lambda)^2/n

sim.var = var(sim.mean)
## [1] "Theoretical variance: 0.625"
## [1] "Simulation variance: 0.6584"

Again, we can see that the simulated mean standard error is a fairly good approximation of the theoretical mean standard error.

Show that the simulation distribution follows a normal distribution

Superimpose the theoretical mean distribution and the simulation mean distribution in one plot to compare the distribution profile.

From the above plot, we can see that the shape of both distribution are almost identical.

We can further confirm this result by plotting the quantile-quantile plot.

From the quantile plot, we observed that the simulation distribution quantile does fit to the theoretical quantile, especially for those data points that are close to the central of the distribution.

Conclusion

From the above analysis, we can conclude that the Central Limit Theorem (CLM) does hold true irregardless of the distribution of the subject population or sample. However, we should note that several assumptions should be made in the application of CLM.

The assumptions are as below:

  1. The random samples are identical and independent.
  2. The size of the sample is sufficiently large
  3. The population has a finite variance.

Appendix A

  1. R code for Figure 1.
library(ggplot2)

# Plot the histogram of simulated sample
g = ggplot(data = as.data.frame(sim.mean)) + 
  geom_histogram(mapping = aes(x = sim.mean), fill = 'white', color = 'gray') + theme_bw()

# Add reference line to show the sampling mean and theoretical mean
g + geom_vline(aes(xintercept = sampling.mean, color = 'Simulated mean')) +
  geom_vline(aes(xintercept = t.mean, color = 'Theoretical mean')) + 
  scale_color_manual(name = '', values = c('Simulated mean'= 'black', 'Theoretical mean' = 'red')) + 
  labs(title = 'Figure 1\nHistogram of sampling distribution', x = 'Sampling mean', y = 'Frequency')
  1. R code for Figure 2.
ggplot()+geom_density(aes( x = sim.mean, color = 'Simulation')) + 
  stat_function(aes(x=c(2,8), color = 'Theoretical'),fun = dnorm, args = list(mean = t.mean, sd = sqrt(t.var))) + 
  geom_vline(mapping = aes(xintercept = qnorm(.975, t.mean, sqrt(t.var)), linetype = '95% conf'), show.legend = F) + 
  geom_vline(mapping = aes(xintercept = qnorm(.025, t.mean, sqrt(t.var)), linetype = '95% conf')) + 
  scale_color_manual(name = '', values = c('Simulation' = 'red', 'Theoretical' = 'black'))  + 
  scale_linetype_manual(name = '', values = c('95% conf' = 2)) + theme_bw() +
  labs(title = 'Figure 2\nDensity plot of Simulation Distribution and Theoretical Distribution', x = 'Mean', y = 'Density')
  1. R code for Figure 3.
qqnorm(sim.mean, main = 'Figure 3
       Quantile - Quantile Plot')
qqline(sim.mean, col =2)