In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. Main tasks will be 1. Show the sample mean and compare it to the theoretical mean of the distribution. 2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal.
Loading library and setting data
suppressWarnings(library(ggplot2))
suppressWarnings(library(knitr))
set.seed(12345)
# Data as per assignmnet
lambda <- 0.2
nexp <- 40
numsim <- 1000
mns <- NULL ## set mns (mean of simulations) to null
Doing 1000 simulations with for function and calculating mean for rexp data
for (i in 1 : numsim) mns <- c(mns, mean(rexp(nexp,lambda)))
sample_mean<-mean(mns) ## Sample mean
theoretical_mean <- 1/lambda ## Theoretical mean
Mean_data <-data.frame("Mean"=c(sample_mean,theoretical_mean), row.names = c("Mean from the sample data ","Theoretical mean"))
kable(x = round(Mean_data,3),align = NULL, padding = 10, caption = "Comparision of Sample and Theoritocal mean")
| Mean | |
|---|---|
| Mean from the sample data | 4.972 |
| Theoretical mean | 5.000 |
Folowing is Histogram of the above data
hist(mns, breaks=40, freq=FALSE, main="Distribution of samples mean\n exponential distribution", xlab="Samples mean")
lines(density(mns))
abline(v=theoretical_mean, col="red") # theoritical mean line
As we can see the sample mean and theoritical value are almost same. #### Sample Variance versus Theoretical Variance
sample_var<-var(mns) ## Sample variance
theoritical_var <-(1/lambda)^2/nexp ## Theoretical variance
Variance_data <-data.frame("Variance"=c(sample_var,theoritical_var), row.names = c("Variance from sample data ","Theoretical variance"))
kable(x = round(Variance_data,3),align = NULL, padding = 10, caption = "Comparision of Sample and Theoritical variance")
| Variance | |
|---|---|
| Variance from sample data | 0.595 |
| Theoretical variance | 0.625 |
As we can see sample and theoritical variance are also close.
According to the central limit theorem (CLT), the averages of samples follow normal distribution. We will perform two check 1. Confidence interval
se_sample <- sd(mns)/sqrt(nexp)
lower_sample <- mean(mns) - 1.96 * se_sample
upper_sample <- mean(mns) + 1.96 * se_sample
sample_conf<-c(lower_sample, upper_sample)
se_theoritical<-sqrt(theoritical_var)/sqrt(nexp)
lower_theory <- mean(mns) - 1.96 * se_theoritical
upper_theory <- mean(mns) + 1.96 * se_theoritical
theory_conf<-c(lower_theory, upper_theory)
conf_data <-data.frame("Confidence Interval Data"=rbind(sample_conf,theory_conf), row.names= c("Sample Confidence interval data","Theoritcal Confidence interval data" ))
colnames(conf_data)<- c("Limit 1", "Limit 2")
kable(x = round(conf_data,3),align = NULL, padding = 10, caption = "95% confidence interval data for smaple and theory")
| Limit 1 | Limit 2 | |
|---|---|---|
| Sample Confidence interval data | 4.733 | 5.211 |
| Theoritcal Confidence interval data | 4.727 | 5.217 |
| From the above we can see that limits are very close. |
plot1 <- ggplot(data.frame(mns),aes(x = mns))
plot1 <- plot1 +geom_histogram(aes(y=..density..), colour="black",fill="blue")
plot1<-plot1+labs(title="Distribution of Means\nBlack line is for exponential sample data\nRed line is for theoritical data ", y="Density")
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(theoritical_var)),color = "red", size = 1.0)
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=mean(mns), sd=sqrt(sample_var)),color = "black", size = 1.0)
print(plot1)
From the Histogram we can infer that the distribution is approximately normal as the distribution of the sample means almost matches the normal distribution.
From the above data points we can say that sample data distribution is very close to normal distribution.