Part 1: Simulation Exercise

Overview

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. For this project -

1) We take lambda = 0.2 for all of the simulations.
2) We investigate the distribution of averages of 40 exponentials.
3) We do a thousand simulations.

Simulations

We initally load the required Libraries, set a seed for reproducability and then declare variables that have been provided in the assignment.

library(ggplot2)
set.seed(2020)
n<-40
simulation<-1000
lambda<-0.2

The below code performs 1000 simulations of 40 samples which are exponentially distributed with lambda = 0.2 and we take the average of each simulation.

sample_mean<-1:simulation

for( i in 1:simulation)
{
  sample<-rexp(n,lambda)
  sample_mean[i]<-mean(sample)
}

Sample Parameters vs Population Parameters

Here, we first compute sample parameters using the sample and then compare it with Population Parameters.The parameters are Mean and Variance.

xbar<-mean(sample_mean)
pop_mean<-1/lambda
pop_var<-1/(lambda^2*n)
sample_var<-var(sample_mean)
xbar
## [1] 5.033948
pop_mean
## [1] 5
sample_var
## [1] 0.6070127
pop_var
## [1] 0.625

Plotting the Distribution of Sample Mean

We plot the distribution of Sample Mean of Exponential Random Variable.

data<-data.frame(sample_mean)
distb_plot<-ggplot(data)+stat_bin(aes(x=data,y=..density..),fill="salmon")
distb_plot<- distb_plot+ geom_vline(aes(xintercept = xbar, colour = "sample"))
distb_plot<- distb_plot + geom_vline(aes(xintercept = pop_mean, colour = "theoretical"))
distb_plot<-distb_plot + labs(title="Distribution of Sample Mean",x="Sample Means",y="Density")
distb_plot
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We see that the sample mean is 5.033948 while the population mean is 5. As seen in the histogram, the sample mean of the average exponentials (red vertical line) is very close to the theoretical mean of an exponential distribution (blue vertical line).

Also the sample variance is found to be 0.6070127, which is very close to population variance that is 0.625.

Examining Normality of Distribution of Sample Mean

Here, we wish to show that due to the Central Limit Theorem, the distribution of Sample Mean of Exponential Random Variables becomes asymptotically Standard Normal.

distb_plot<- distb_plot + stat_function(fun = dnorm, args = list(mean = xbar, sd = sqrt(sample_var)), color = "blue", size = 2.0)
distb_plot <- distb_plot + stat_function(fun = dnorm, args = list(mean = pop_mean, sd = sqrt(pop_var)), colour = "green", size = 2.0)
distb_plot
## Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The above plot shows that density curve of Sample Mean (blue curve) almost coincides with the density of Standard Normal Distribution (red curve). This indicates that the distribution is asymptotically Normal.

Morever, the closeness of Sample and Population Parameters confirm this notion.

Conclusion

We find that after running 1000 simulations of average of 40 Exponential Random Variables, the distribution of Sample Mean becomes asymptotically Normal.