In this report we will be investigating the exponential distribution and distribution of sample means of exponential distribution in R and compare our findings with the theory of Central Limit Theorem.
We will be comparing the mean and variance of the sample distribution and population distribution and will also be showing that as per the C.L.T. the distribution of sample means is approximately normal.
The exponential distribution can be simulated in R using the function rexp(n,lambda)
We will be using the following parameters for our simulation exercise
| Parameter | Value |
|---|---|
| Lambda | 0.2 |
| Sample Size (n) | 40 |
| No of Simulations | 1000 |
require(ggplot2)
lambda<-0.2
n<-40
nos<-1000
Simulate population distribution of 1000 random exponents with the parameters shown above.
set.seed(1000)
population<-rexp(1000,lambda)
head(population)
## [1] 5.0233101 2.5884649 12.1869221 10.8162445 2.3871486 0.8353505
Calculation of Mean and Variance of Population
mean_population<-mean(population)
mean_population
## [1] 5.015616
var_population<-var(population)
var_population
## [1] 26.07005
Mean and Variance of the population are 5.0156161 and 26.0700456 respectively.
Simulate average distribution of exponentials with sample size 40 and over 1000 simulations
sample<-sapply(1:nos,function(x){mean(rexp(n,lambda))})
head(sample)
## [1] 5.229161 5.786361 4.354502 5.352259 4.535410 4.332345
Calculation of Mean and Variance of Population
mean_sample<-mean(sample)
mean_sample
## [1] 4.992426
var_sample<-var(sample)
var_sample
## [1] 0.6562298
Mean and Variance of the population are 4.9924256 and 0.6562298 respectively.
We can see that,
population mean is 5.0156161 and
sample mean is 4.9924256
If we plot our sample distribution with population mean and sample mean
g1<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)
g1<-g1+geom_histogram(aes(fill=..count..))
g1<-g1+labs(x="Sample",y="Frequency",title="Sample Distribution")
g1<-g1+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=0.5)
g1<-g1+annotate('text',x=mean_sample,y=0,
label=round(mean_sample,2),angle=90,vjust=-0.5,size=3)
g1<-g1+annotate('text',x=mean_sample,y=max(sample),
label="mu[sample]",angle=90,hjust=-2,vjust=-0.5,parse = T)
g1<-g1+geom_vline(xintercept = mean_population,col='gray',linetype="longdash",size=0.5)
g1<-g1+annotate('text',x=mean_population,y=0,
label=round(mean_population,2),angle=90,vjust=1.25,size=3)
g1<-g1+annotate('text',x=mean_population,y=max(sample),
label="mu[population/theoretical]",angle=90,hjust=-0.5,vjust=1.25,parse=T)
g1
Here we can see that
n it should be approximately equal which we can see here clearly.We can see that,
Variance of Population = 26.0700456 and
Variance of Sample = 0.6562298
As per C.L.T the relationship between variance of sample distribution and that of population distribution is
\[\sigma_x^2=\sigma^2/n\]
Where \(\sigma_x^2\) is variance of Sample Mean and \(\sigma^2\) is variance of population. Hence,
\[n=\sigma^2/\sigma_x^2\]
Comparing our population variance and sample variance
calculated_var_sample<-var_population/n
We can see that this 0.6517511 is approximately equal to our variance sample which is 0.6562298.
g2<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)
g2<-g2+geom_histogram(aes(fill=..count..))
g2<-g2+labs(x="Sample",y="Frequency",title="Sample Distribution")
#Plotting Mean
g2<-g2+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=0.5)
g2<-g2+annotate('text',x=mean_sample,y=0,
label=round(mean_sample,2),angle=90,vjust=-0.5,size=3)
g2<-g2+annotate('text',x=mean_sample,y=max(sample),
label="mu[sample]",angle=90,vjust=-0.5,hjust=-2.5,size=3,parse = T)
#Plotting Variances Of Sample Population
g2<-g2+geom_vline(xintercept = mean_sample+var_sample,col='gray',linetype="longdash")
g2<-g2+geom_vline(xintercept = mean_sample-var_sample,col='gray',linetype="longdash")
g2<-g2+annotate('text',x=mean_sample+var_sample,y=0,
label=round(mean_sample+var_sample,2),angle=90,vjust=-0.5,size=3)
g2<-g2+annotate('text',x=mean_sample-var_sample,y=0,
label=-round(mean_sample-var_sample,2),angle=90,vjust=-0.5,size=3)
g2<-g2+annotate('text',x=mean_sample+var_sample,y=max(sample),
label="+sigma[sample]",angle=90,vjust=-0.5,hjust=-2,size=3,parse = T)
g2<-g2+annotate('text',x=mean_sample-var_sample,y=max(sample),
label="-sigma[sample]",angle=90,vjust=-0.5,hjust=-2,size=3,parse = T)
#Plotting Theoretical/Calculated Variances
g2<-g2+geom_vline(xintercept = mean_sample+calculated_var_sample,col='gray',linetype="longdash")
g2<-g2+geom_vline(xintercept = mean_sample-calculated_var_sample,col='gray',linetype="longdash")
g2<-g2+annotate('text',x=mean_sample+calculated_var_sample,y=0,
label=round(mean_sample+calculated_var_sample,2),angle=90,vjust=1.25,size=3)
g2<-g2+annotate('text',x=mean_sample-calculated_var_sample,y=0,
label=-round(mean_sample-calculated_var_sample,2),angle=90,vjust=1.25,size=3)
g2<-g2+annotate('text',x=mean_sample+calculated_var_sample,y=max(sample),
label="+sigma[theoretical]",angle=90,vjust=1.25,hjust=-2,size=3,parse = T)
g2<-g2+annotate('text',x=mean_sample-calculated_var_sample,y=max(sample),
label="-sigma[theoretical]",angle=90,vjust=1.25,hjust=-2,size=3,parse = T)
g2
Here we can see that
Here we can plot the actual population distribution and distribution of averages of population distribution.
Plotting the population distribution via ggplot2 plotting.
g3<-ggplot(data.frame(seq_along(1:length(population)),population))+aes(population)
g3<-g3+geom_histogram(aes(fill=..count..))
g3<-g3+geom_vline(xintercept = mean_population,col='gray',linetype="longdash",size=1)
g3<-g3+labs(x="Population",y="Frequency",title="Population Distribution")
g3<-g3+annotate('text',x=mean_population,y=0,
label=round(mean_population,2),angle=90,vjust=1.25,size=3)
g3<-g3+annotate('text',x=mean_population,y=max(sample),
label="mu[population]",angle=90,hjust=-0.5,vjust=1.25,parse=T)
g3
We can see that the distribution of population is not uniform.
g4<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)
g4<-g4+geom_histogram(aes(fill=..count..))
g4<-g4+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=1)
g4<-g4+labs(x="Sample",y="Frequency",title="Sample Distribution")
g4<-g4+annotate('text',x=mean_sample,y=0,
label=round(mean_sample,2),angle=90,vjust=-0.5,size=3)
g4<-g4+annotate('text',x=mean_sample,y=max(sample),
label="mu[sample]",angle=90,vjust=-0.5,hjust=-2.5,size=3,parse = T)
g4
Here we can see that even though the distribution of orignal population is not normal but the distribution of sample mean/averages is approximately Normally Distributed which is as per Central Limit Theorem.