This is good markdown for a very important theorm of STATISTICAL INFERENCE recognized as Central Limit Theorem.
In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem.
As mentioned above we create a simulation of exponantial distribution with 1000 times and evry time we take average of 40 points generated from exponantial distribution.
rexp(n, lambda) is funtion we use to generate data and take 1000 simulation and store it in variable average.
## we set the seed for reproduce same points every time
library(ggplot2)
set.seed(42)
lambda <- 0.2
n <- 40
average <- NULL
for(i in 1:1000)
average <- c(average, mean(rexp(n, lambda)))
We should take a look of data we generate from rexp function .for a better graph distribution we take 1000 observations
qplot(rexp(1000, lambda),geom="density")
CENTRAL LIMIT THEOREM
theo_mean<-1/lambda
sample_mean<-mean(average)
thvar<-(lambda * sqrt(n)) ^ -2
samvar<-var(average)
We make a plot to show the distribution of sample and compare the mean with population mean.
dfRowMeans<-data.frame(average) # convert to data.frame for ggplot
mp<-ggplot(dfRowMeans,aes(x=average))
mp<-mp+geom_histogram(binwidth = lambda,fill="green",color="black",aes(y = ..density..))
mp<-mp + labs(title="Density of 40 Numbers from Exponential Distribution", x="Mean of 40 Selections", y="Density")
mp<-mp + geom_vline(xintercept=sample_mean,size=1.0, color="black") # add a line for the actual mean
mp<-mp + stat_function(fun=dnorm,args=list(mean=sample_mean, sd=sqrt(samvar)),color = "blue", size = 1.0)
mp<-mp + geom_vline(xintercept=theo_mean,size=1.0,color="yellow",linetype = "longdash")
mp<-mp + stat_function(fun=dnorm,args=list(mean=theo_mean, sd=sqrt(thvar)),color = "red", size = 1.0)
mp
As we can see according to central limit theorem distribution of mean of sample means are apporximately normal as stated in the theorem.No matter what is the distribution of data as our data has exponentail distribution.
In the graph we can also sean that our sample mean and population mean are almost same.
As we can see our red and blue density curve line are almost overlapping so we can say that our variances of mean of samples and population varince are also comparable as we see the values before.