Central limit Theorem

Overview

This is good markdown for a very important theorm of STATISTICAL INFERENCE recognized as Central Limit Theorem.

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem.

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter.
The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations.
We will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Simulations

As mentioned above we create a simulation of exponantial distribution with 1000 times and evry time we take average of 40 points generated from exponantial distribution.

rexp(n, lambda) is funtion we use to generate data and take 1000 simulation and store it in variable average.

## we set the seed for reproduce same points every time
library(ggplot2)
set.seed(42)

lambda <- 0.2
n <- 40
average <- NULL
for(i in 1:1000)
    average <- c(average, mean(rexp(n, lambda)))

Genrated data

We should take a look of data we generate from rexp function .for a better graph distribution we take 1000 observations

qplot(rexp(1000, lambda),geom="density")

Theorm

CENTRAL LIMIT THEOREM

The Central Limit Theorem states that the sampling distribution of the sampling means approaches a normal distribution as the sample size gets larger — no matter what the shape of the population distribution. This fact holds especially true for sample sizes over 30. All this is saying is that as you take more samples, especially large ones, your graph of the sample means will look more like a normal distribution.

Sample Mean versus Theoretical Mean

theo_mean<-1/lambda

Theoretical mean of distribution: 5

sample_mean<-mean(average)

Sample mean of distribution: 4.9865083

Sample Variance versus Theoretical Variance

thvar<-(lambda * sqrt(n)) ^ -2

Theoretical variance of distribution : 0.625

samvar<-var(average)

Sample variance of distribution : 0.6344405

Again it can be seen that both the theoretical as well as sample variance are approximately same with a very small difference between them.

Visualize Our Results

We make a plot to show the distribution of sample and compare the mean with population mean.

dfRowMeans<-data.frame(average) # convert to data.frame for ggplot
mp<-ggplot(dfRowMeans,aes(x=average))
mp<-mp+geom_histogram(binwidth = lambda,fill="green",color="black",aes(y = ..density..))
mp<-mp + labs(title="Density of 40 Numbers from Exponential Distribution", x="Mean of 40 Selections", y="Density")
mp<-mp + geom_vline(xintercept=sample_mean,size=1.0, color="black") # add a line for the actual mean
mp<-mp + stat_function(fun=dnorm,args=list(mean=sample_mean, sd=sqrt(samvar)),color = "blue", size = 1.0)
mp<-mp + geom_vline(xintercept=theo_mean,size=1.0,color="yellow",linetype = "longdash")
mp<-mp + stat_function(fun=dnorm,args=list(mean=theo_mean, sd=sqrt(thvar)),color = "red", size = 1.0)
mp

As we can see according to central limit theorem distribution of mean of sample means are apporximately normal as stated in the theorem.No matter what is the distribution of data as our data has exponentail distribution.
In the graph we can also sean that our sample mean and population mean are almost same.
As we can see our red and blue density curve line are almost overlapping so we can say that our variances of mean of samples and population varince are also comparable as we see the values before.