Overview

In this report we will be investigating the exponential distribution and distribution of sample means of exponential distribution in R and compare our findings with the theory of Central Limit Theorem.

We will be comparing the mean and variance of the sample distribution and population distribution and will also be showing that as per the C.L.T. the distribution of sample means is approximately normal.

Simulation

Simulation Of Population

The exponential distribution can be simulated in R using the function rexp(n,lambda)

We will be using the following parameters for our simulation exercise

Parameter	Value
Lambda	0.2
Sample Size (n)	40
No of Simulations	1000

require(ggplot2)

lambda<-0.2
n<-40
nos<-1000

Population

Simulate population distribution of 1000 random exponents with the parameters shown above.

set.seed(1000)
population<-rexp(1000,lambda)
head(population)

## [1]  5.0233101  2.5884649 12.1869221 10.8162445  2.3871486  0.8353505

Population Statistics

Calculation of Mean and Variance of Population

mean_population<-mean(population)
mean_population

## [1] 5.015616

var_population<-var(population)
var_population

## [1] 26.07005

Mean and Variance of the population are 5.0156161 and 26.0700456 respectively.

Simulation Of Mean Distribution

Simulate average distribution of exponentials with sample size 40 and over 1000 simulations

Sample Population

sample<-sapply(1:nos,function(x){mean(rexp(n,lambda))})
head(sample)

## [1] 5.229161 5.786361 4.354502 5.352259 4.535410 4.332345

Sample Statistics

Calculation of Mean and Variance of Population

mean_sample<-mean(sample)
mean_sample

## [1] 4.992426

var_sample<-var(sample)
var_sample

## [1] 0.6562298

Mean and Variance of the population are 4.9924256 and 0.6562298 respectively.

Sample Mean versus Theoretical Mean

We can see that,

population mean is 5.0156161 and

sample mean is 4.9924256

If we plot our sample distribution with population mean and sample mean

g1<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)
g1<-g1+geom_histogram(aes(fill=..count..))
g1<-g1+labs(x="Sample",y="Frequency",title="Sample Distribution")

g1<-g1+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=0.5)
g1<-g1+annotate('text',x=mean_sample,y=0,
                 label=round(mean_sample,2),angle=90,vjust=-0.5,size=3)
g1<-g1+annotate('text',x=mean_sample,y=max(sample),
                 label="mu[sample]",angle=90,hjust=-2,vjust=-0.5,parse = T)

g1<-g1+geom_vline(xintercept = mean_population,col='gray',linetype="longdash",size=0.5)
g1<-g1+annotate('text',x=mean_population,y=0,
                 label=round(mean_population,2),angle=90,vjust=1.25,size=3)
g1<-g1+annotate('text',x=mean_population,y=max(sample),
                 label="mu[population/theoretical]",angle=90,hjust=-0.5,vjust=1.25,parse=T)
g1

Here we can see that

Our sample distribution is centred around population mean
Sample mean and population mean/theoretical mean are approximately equal
Central Limit Theorem also tells us that sample mean should approach and be centred around population mean and for very large n it should be approximately equal which we can see here clearly.

Sample Variance versus Theoretical Variance

We can see that,

Variance of Population = 26.0700456 and

Variance of Sample = 0.6562298

As per C.L.T the relationship between variance of sample distribution and that of population distribution is

\[\sigma_x^2=\sigma^2/n\]

Where \(\sigma_x^2\) is variance of Sample Mean and \(\sigma^2\) is variance of population. Hence,

\[n=\sigma^2/\sigma_x^2\]

Comparing our population variance and sample variance

calculated_var_sample<-var_population/n

We can see that this 0.6517511 is approximately equal to our variance sample which is 0.6562298.

g2<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)
g2<-g2+geom_histogram(aes(fill=..count..))
g2<-g2+labs(x="Sample",y="Frequency",title="Sample Distribution")

#Plotting Mean
g2<-g2+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=0.5)
g2<-g2+annotate('text',x=mean_sample,y=0,
                 label=round(mean_sample,2),angle=90,vjust=-0.5,size=3)
g2<-g2+annotate('text',x=mean_sample,y=max(sample),
                 label="mu[sample]",angle=90,vjust=-0.5,hjust=-2.5,size=3,parse = T)

#Plotting Variances Of Sample Population
g2<-g2+geom_vline(xintercept = mean_sample+var_sample,col='gray',linetype="longdash")
g2<-g2+geom_vline(xintercept = mean_sample-var_sample,col='gray',linetype="longdash")
g2<-g2+annotate('text',x=mean_sample+var_sample,y=0,
                label=round(mean_sample+var_sample,2),angle=90,vjust=-0.5,size=3)
g2<-g2+annotate('text',x=mean_sample-var_sample,y=0,
                label=-round(mean_sample-var_sample,2),angle=90,vjust=-0.5,size=3)
g2<-g2+annotate('text',x=mean_sample+var_sample,y=max(sample),
                label="+sigma[sample]",angle=90,vjust=-0.5,hjust=-2,size=3,parse = T)
g2<-g2+annotate('text',x=mean_sample-var_sample,y=max(sample),
                label="-sigma[sample]",angle=90,vjust=-0.5,hjust=-2,size=3,parse = T)

#Plotting Theoretical/Calculated Variances
g2<-g2+geom_vline(xintercept = mean_sample+calculated_var_sample,col='gray',linetype="longdash")
g2<-g2+geom_vline(xintercept = mean_sample-calculated_var_sample,col='gray',linetype="longdash")
g2<-g2+annotate('text',x=mean_sample+calculated_var_sample,y=0,
                label=round(mean_sample+calculated_var_sample,2),angle=90,vjust=1.25,size=3)
g2<-g2+annotate('text',x=mean_sample-calculated_var_sample,y=0,
                label=-round(mean_sample-calculated_var_sample,2),angle=90,vjust=1.25,size=3)
g2<-g2+annotate('text',x=mean_sample+calculated_var_sample,y=max(sample),
                label="+sigma[theoretical]",angle=90,vjust=1.25,hjust=-2,size=3,parse = T)
g2<-g2+annotate('text',x=mean_sample-calculated_var_sample,y=max(sample),
                label="-sigma[theoretical]",angle=90,vjust=1.25,hjust=-2,size=3,parse = T)
g2

Here we can see that

Variances of Sample Population and Theoretical/Calculated variances plotted together
We can see that sample variance should be \(\sigma_x^2=\sigma^2/n\) which is our theoretical variance and we can see in this plot clearly that they are almost equal which is as per C.L.T.

Distribution

Here we can plot the actual population distribution and distribution of averages of population distribution.

Population Distribution

Plotting the population distribution via ggplot2 plotting.

g3<-ggplot(data.frame(seq_along(1:length(population)),population))+aes(population)
g3<-g3+geom_histogram(aes(fill=..count..))
g3<-g3+geom_vline(xintercept = mean_population,col='gray',linetype="longdash",size=1)
g3<-g3+labs(x="Population",y="Frequency",title="Population Distribution")
g3<-g3+annotate('text',x=mean_population,y=0,
                 label=round(mean_population,2),angle=90,vjust=1.25,size=3)
g3<-g3+annotate('text',x=mean_population,y=max(sample),
                 label="mu[population]",angle=90,hjust=-0.5,vjust=1.25,parse=T)
g3

We can see that the distribution of population is not uniform.

Distribution of Sample Mean

g4<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)
g4<-g4+geom_histogram(aes(fill=..count..))
g4<-g4+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=1)
g4<-g4+labs(x="Sample",y="Frequency",title="Sample Distribution")
g4<-g4+annotate('text',x=mean_sample,y=0,
                 label=round(mean_sample,2),angle=90,vjust=-0.5,size=3)
g4<-g4+annotate('text',x=mean_sample,y=max(sample),
                 label="mu[sample]",angle=90,vjust=-0.5,hjust=-2.5,size=3,parse = T)
g4

Here we can see that even though the distribution of orignal population is not normal but the distribution of sample mean/averages is approximately Normally Distributed which is as per Central Limit Theorem.

Understanding Central Limit Theorem With Exponential Distribution

Dhawal Kapil

February 24, 2016