This report is the concluding study on Statistical Inference. Part A .It involves simulating a sample of an exponential distribution and understanding its distribution with reference to Central Limit Theorm , variance etc .
Lets look into the distribution of averages of 40 exponentials. Let lambda =0.2 Theoretically, the mean of exponential distribution is 1/lambda and the standard deviation is 1/lambda .
n=40
lambda =0.2
expdata <- rexp(n,lambda)
hist(expdata)
mns <- NULL
set.seed(123)
for (i in 1: 1000) mns <- c(mns , mean(rexp(40,0.2)))
hist(mns)
Observed Mean and Variance of the mean of 1000 averages of 40 exponential data.
ObservedMean <- mean(mns)
ObservedVar <- var(mns)
ObservedMean
## [1] 5.011911
ObservedVar
## [1] 0.6004928
Theoretical mean = 1/0.2 = 5 Also the theoretical variance is = (1/0.2)^2/n
theoreticalMean <- 1/lambda
theoreticalVar <- (1/lambda)^2/40
theoreticalMean
## [1] 5
theoreticalVar
## [1] 0.625
Thus it proves the CLT - The central limit theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample’s size.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
ggplot() +
aes(mns) +
geom_histogram(bins=50, colour="black", fill="green") +
geom_vline(xintercept = 5, colour = "red") +
geom_vline(xintercept = 5.01191, colour = "blue") +
ggtitle("Distribution of 1000 means of exponential distribution") +
xlab("Means") +
ylab("Frequency")
df <- data.frame(mns)
ggplot(df,aes(x = mns)) +
geom_histogram(aes(y=..density..), bins=50, colour="black",fill="grey") +
labs(title="Distribution of Means of exponential distribution", y="Frequency") +
stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(theoreticalVar)), color = "blue", size = 1.0) +
stat_function(fun=dnorm,args=list( mean=mean(mns), sd=sqrt(ObservedVar)), color = "green", size = 1.0)
qqnorm(mns)
qqline(mns , col =5)
Simulation of a large number of averages of a moderately sized sample data of an exponential distribution , shows that the CLT is proved to hold good . The sample mean and sample variance is same as their respective theoretical values. Also the distribution of the averages , follows a normal distribution.