Sampling Distribution in R

In this part we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. We will do a thousand simulations to investigate the distribution of averages of 40 exponentials.

First, we will generate 1000 samples and compute the sample mean of each.

sample_means = rep(NA, 1000)
for(i in 1:1000){
  sample_means[i] = mean(rexp(40,0.2))
}

After that, we will compare the sample mean from the simulation to the theoretical mean of the distribution.

Here is how to compute the sample mean of the distribution from the simulation.

mean(sample_means)

## [1] 4.962613

And, here we will calculate the theoretical mean of the distribution.

1 / 0.2

## [1] 5

From the calculation above, we can see that the sample mean from the simulation is closely similar with the theoretical mean.

Now, we will compare the sample variance from the simulation to the theoretical variance of the distribution.

Here is how to compute the variance of the distribution from the simulation.

var(sample_means)

## [1] 0.6480597

And, here we will calculate the theoretical variance of the distribution.

((1/0.2)^2)/40

## [1] 0.625

From the calculation above, we can see that the sample variance from the simulation is closely similar with the theoretical mean.

Now, we will try to show if the distribution from our simulation is normal or not.

We can verify the normal distribution by comparing the frequency histogram and the density line like shown below :

hist(sample_means, main = "", xlab = "Sample Means", prob = T, col = "darkred")
lines(density(sample_means), col = "darkblue", lwd = 2)

Or, we could verify the normal distribution just by using qqplot like shown below :

qqnorm(sample_means, col = "darkred", main = "Normal Q-Q Plot")
qqline(sample_means, col = "darkblue", lwd = 3)

From the plot above we can see that the distribution is very close to normal one.