Overview

The objective of this project will be to investigage the exponential distribution in R and comparing it with the Central Limit Theorem. To this effect, I will use 1000 simulations in order to investigate the distribution of the means of 40 exponentials. By analysing those results, I hope to be able to answer the questions asked.

Exponential Distribution

The exponential distribution can be siméated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. As requisite for this project, lambda will be set to 0.2 in all simulations

Simulations

Starting with a simulation of 1000 observations of exponential distributions of lambda = 0.2 (Appendix #1) We can see it shows as a right-skewed distribution. The Central Limit Theorem postulates that by taking the means of a large number of samples, the resulting distribution of those means will be approximately show normality even if the original population is skewed.

Let us see if it is true by running 1000 simulations of 40 samples of the Exponential Distribution with lambda = 0.2 (Appendix #1)

From the analyis of this histogram, we can see the CLT has come into action and due to the large number of simulations (1000) and the fact we have plotted the means of the results, the resulting histogram is aproximately normal. By plotting a QQ plot (Appendix #2) we can further ascertain the normality of the resulting plot.

As we can see, the qqline is a good fit to the qqplot, showing the normality of the plot.

Sample Mean vs Theoretical Mean

According to the data given to us by Prof. Brian, the mean of the exponential distribution is 1/lambda. As such the theoretical mean of the distribution can be summed up by: (Appendix #3) The theoretical mean is, thus, 5. The sample mean can be calculated by taking the mean of the means vector which contains the results of all the simulations. (Appendix #3)

As shown, the sample mean is very similar to the theoretical mean of this distribution which is expected as according to the CTL, the sample mean of a large enough number of samples of a population will converge to the true mean of the population. With an ilustrating plot with the sample mean colored green and the theoretical mean colored red (Appendix #4)

Sample Variance vs Theoretical Variance

According to Prof. Brian, the standard deviation of this distribution is also calculated by 1/lambda. The variance of a distriubtion is the standard deviation squared so: (Appendix #5)

The theoretical variance however in this case has to be divided by the sample size (40) as the resulting normal distribution is (mew, sigma^2/n) (Appendix #5)

The sample variance is calculated from: (Appendix #5)

As we can see and as expected, both variance values are really similar to one another as due to the CTL, we are expecting the sample mean to be near the theoretical mean due to the large number of samples and simulations.

Show the Distrbution is aproximately normal

Was already shown before thanks to the histogram of the simulations which seems aproximately normal and not-skewed and the qqplot whose qqline fits quite well. Those two together prove that the exponential distribution as plotted is aproximately normal.

Appendix

Appendix #1

n_trials = 1000
sample_size = 40
lambda = 0.2
set.seed(14)

par(mfrow=c(1,2))

hist(rexp(1000, 0.2), col="red", 
     main="Histogram of 1000 obs", 
     xlab="Exponential Distribution Result")

set.seed(14)

means <- replicate(n_trials, mean(rexp(sample_size, lambda)))
hist(means, 
     main="Histogram Averages of Exp Dist", 
     xlab="Means", col="blue")

Appendix #2

qqnorm(means, col="red")
qqline(means, col="blue")

Appendix #3

theoretical_mean <- 1 / lambda
theoretical_mean

## [1] 5

sample_mean <- mean(means)
sample_mean

## [1] 5.013765

Appendix #4

hist(means, 
     main="Histogram of the averages of the Exponential Distribution", 
     xlab="Mean", col="blue")
abline(v = theoretical_mean, col="red", lwd=3)
abline(v = sample_mean, col="green", lwd=3)

Appendix #5

sd <- 1/lambda
theoretical_variance <- sd^2
theoretical_variance

## [1] 25

theoretical_variance <- theoretical_variance/40
theoretical_variance

## [1] 0.625

sample_variance <- var(means)
sample_variance

## [1] 0.6333924

Statistical Inference Project Part 1

José Pereira