This report is the concluding study on Statistical Inference. Part A .It involves simulating a sample of an exponential distribution and understanding its distribution with reference to Central Limit Theorm , variance etc .

Lets look into the distribution of averages of 40 exponentials. Let lambda =0.2 Theoretically, the mean of exponential distribution is 1/lambda and the standard deviation is 1/lambda .

n=40 
lambda =0.2
expdata <- rexp(n,lambda)
hist(expdata)

Simulating 1000 averages of sample of 40 exponentials .

mns <- NULL 
set.seed(123)
for (i in 1: 1000) mns <- c(mns , mean(rexp(40,0.2)))
hist(mns)

Observed Mean and Variance of the mean of 1000 averages of 40 exponential data.

ObservedMean <- mean(mns)
ObservedVar <- var(mns)
ObservedMean
## [1] 5.011911
ObservedVar
## [1] 0.6004928

Compare the observed mean and variance with the theoritical mean.

Theoretical mean = 1/0.2 = 5 Also the theoretical variance is = (1/0.2)^2/n

theoreticalMean <- 1/lambda 
theoreticalVar <- (1/lambda)^2/40
theoreticalMean
## [1] 5
theoreticalVar
## [1] 0.625

Thus it proves the CLT - The central limit theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample’s size.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
ggplot() + 
  aes(mns) + 
  geom_histogram(bins=50, colour="black", fill="green") + 
  geom_vline(xintercept = 5, colour = "red") + 
  geom_vline(xintercept = 5.01191, colour = "blue") + 
  ggtitle("Distribution of 1000 means of exponential distribution") + 
  xlab("Means") + 
  ylab("Frequency")

Checking for the normality of the distribution of averages of sample of exponential data.

df <- data.frame(mns)
ggplot(df,aes(x = mns)) +
  geom_histogram(aes(y=..density..), bins=50, colour="black",fill="grey") +
  labs(title="Distribution of Means of exponential distribution", y="Frequency") +
  stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(theoreticalVar)), color = "blue", size = 1.0) +
  stat_function(fun=dnorm,args=list( mean=mean(mns), sd=sqrt(ObservedVar)), color = "green", size = 1.0)

qqnorm(mns)
qqline(mns , col =5)

Conclusion:

Simulation of a large number of averages of a moderately sized sample data of an exponential distribution , shows that the CLT is proved to hold good . The sample mean and sample variance is same as their respective theoretical values. Also the distribution of the averages , follows a normal distribution.