Overview

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. Main tasks will be 1. Show the sample mean and compare it to the theoretical mean of the distribution. 2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal.

Simulations

Loading library and setting data

suppressWarnings(library(ggplot2))
suppressWarnings(library(knitr))
set.seed(12345)
# Data as per assignmnet
lambda <- 0.2 
nexp <- 40 
numsim <- 1000 
mns <- NULL  ## set mns (mean of simulations) to null

Doing 1000 simulations with for function and calculating mean for rexp data

for (i in 1 : numsim) mns <- c(mns, mean(rexp(nexp,lambda)))

Sample Mean versus Theoretical Mean

sample_mean<-mean(mns)  ## Sample mean
theoretical_mean <- 1/lambda  ## Theoretical mean
Mean_data <-data.frame("Mean"=c(sample_mean,theoretical_mean), row.names = c("Mean from the sample data ","Theoretical mean"))
kable(x = round(Mean_data,3),align = NULL, padding = 10, caption = "Comparision of Sample and Theoritocal mean")
Comparision of Sample and Theoritocal mean
Mean
Mean from the sample data 4.972
Theoretical mean 5.000

Folowing is Histogram of the above data

hist(mns, breaks=40, freq=FALSE, main="Distribution of samples mean\n exponential distribution", xlab="Samples mean")
lines(density(mns))
abline(v=theoretical_mean, col="red")  # theoritical mean line

As we can see the sample mean and theoritical value are almost same. #### Sample Variance versus Theoretical Variance

sample_var<-var(mns) ## Sample variance
theoritical_var <-(1/lambda)^2/nexp  ## Theoretical variance
Variance_data <-data.frame("Variance"=c(sample_var,theoritical_var), row.names = c("Variance from sample data ","Theoretical variance"))
kable(x = round(Variance_data,3),align = NULL, padding = 10, caption = "Comparision of Sample and Theoritical variance")
Comparision of Sample and Theoritical variance
Variance
Variance from sample data 0.595
Theoretical variance 0.625

As we can see sample and theoritical variance are also close.

Distribution

According to the central limit theorem (CLT), the averages of samples follow normal distribution. We will perform two check 1. Confidence interval

se_sample <- sd(mns)/sqrt(nexp)
lower_sample <- mean(mns) - 1.96 * se_sample
upper_sample <- mean(mns) + 1.96 * se_sample
sample_conf<-c(lower_sample, upper_sample)

se_theoritical<-sqrt(theoritical_var)/sqrt(nexp)
lower_theory <- mean(mns) - 1.96 * se_theoritical
upper_theory <- mean(mns) + 1.96 * se_theoritical
theory_conf<-c(lower_theory, upper_theory)

conf_data <-data.frame("Confidence Interval Data"=rbind(sample_conf,theory_conf), row.names= c("Sample Confidence interval data","Theoritcal Confidence interval data" ))
colnames(conf_data)<- c("Limit 1", "Limit 2")

kable(x = round(conf_data,3),align = NULL, padding = 10, caption = "95% confidence interval data for smaple and theory")
95% confidence interval data for smaple and theory
Limit 1 Limit 2
Sample Confidence interval data 4.733 5.211
Theoritcal Confidence interval data 4.727 5.217
From the above we can see that limits are very close.
  1. Visual Histogram
plot1 <- ggplot(data.frame(mns),aes(x = mns))
plot1 <- plot1 +geom_histogram(aes(y=..density..), colour="black",fill="blue")
plot1<-plot1+labs(title="Distribution of Means\nBlack line is for exponential sample data\nRed line is for theoritical data ", y="Density")
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(theoritical_var)),color = "red", size = 1.0)
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=mean(mns), sd=sqrt(sample_var)),color = "black", size = 1.0)
print(plot1)

From the Histogram we can infer that the distribution is approximately normal as the distribution of the sample means almost matches the normal distribution.

Conclusion

From the above data points we can say that sample data distribution is very close to normal distribution.