In this report we will be investigating the exponential distribution and distribution of sample means of exponential distribution in R and compare our findings with the theory of Central Limit Theorem.
We will be comparing the mean and variance of the sample distribution and population distribution and will also be showing that as per the C.L.T. the distribution of sample means is approximately normal.
The exponential distribution can be simulated in R using the function rexp(n,lambda).
We will be using the following parameters for our simulation exercise
| Parameter | Value |
|---|---|
| Lambda | 0.2 |
| Sample Size (n) | 40 |
| No of Simulations | 1000 |
require(ggplot2)
lambda<-0.2
n<-40
nos<-1000
Simulate population distribution of 1000 random exponents with the parameters shown above.
set.seed(1000)
population<-rexp(1000,lambda)
head(population)
## [1] 5.0233101 2.5884649 12.1869221 10.8162445 2.3871486 0.8353505
Calculation of Mean and Variance of Population
mean_population<-mean(population)
mean_population
## [1] 5.015616
var_population<-var(population)
var_population
## [1] 26.07005
Mean and Variance of the population are 5.0156161 and 26.0700456 respectively.
Simulate average distribution of exponentials with sample size 40 and over 1000 simulations
sample<-sapply(1:nos,function(x){mean(rexp(n,lambda))})
head(sample)
## [1] 5.229161 5.786361 4.354502 5.352259 4.535410 4.332345
Calculation of Mean and Variance of Population
mean_sample<-mean(sample)
mean_sample
## [1] 4.992426
var_sample<-var(sample)
var_sample
## [1] 0.6562298
Mean and Variance of the population are 4.9924256 and 0.6562298 respectively.
We can see that,
population mean is 5.0156161 and
sample mean is 4.9924256
If we plot our sample distribution with population mean
g1<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)+geom_histogram(aes(fill=..count..))
g1<-g1+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=1)
g1<-g1+labs(x="Sample",y="Frequency",title="Sample Distribution")
g1<-g1+geom_text(x=mean_population,y=0,label=round(mean_population,2),vjust=1)
g1<-g1+geom_text(x=mean_population,y=max(sample),label="population mean",angle=90,vjust=2,hjust=-0.5)
g1
Here we can see how our sample distribution is centred around population mean and Central Limit Theorem also tells us that sample mean should approach and be centred around population mean and for very large n it should be approximately equal which we can see here clearly.
We can see that,
Variance of Population = 26.0700456 and
Variance of Sample = 0.6562298
As per C.L.T the relationship between variance of sample distribution and that of population distribution is
\[\sigma_x^2=\sigma^2/n\]
Where \(\sigma_x^2\) is variance of Sample Mean and \(\sigma^2\) is variance of population. Hence,
\[n=\sigma^2/\sigma_x^2\]
Comparing our population variance and sample variance
calculated_var_sample<-var_population/n
We can see that this 0.6517511 is approximately equal to our variance sample which is 0.6562298.
Also we can see the variance in our figure which is much less than population variance as explained by C.L.T.
g2<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)+geom_histogram(aes(fill=..count..))
g2<-g2+labs(x="Sample",y="Frequency",title="Sample Distribution")
g2<-g2+geom_vline(xintercept = mean_sample+var_sample,col='gray',linetype="longdash",size=1)
g2<-g2+geom_vline(xintercept = mean_sample-var_sample,col='gray',linetype="longdash",size=1)
g2<-g2+geom_text(x=mean_sample+var_sample,y=0,label=round(mean_sample+var_sample,2),vjust=1)
g2<-g2+geom_text(x=mean_sample-var_sample,y=0,label=-round(mean_sample-var_sample,2),vjust=1)
g2<-g2+geom_text(x=mean_sample+var_sample,y=max(sample),label="+sigma",angle=90,vjust=2,hjust=-0.5)
g2<-g2+geom_text(x=mean_sample-var_sample,y=max(sample),label="-sigma",angle=90,vjust=2,hjust=-0.5)
g2
Here we can plot the actual population distribution and distribution of averages of population distribution.
Plotting the population distribution via ggplot2 plotting.
g3<-ggplot(data.frame(seq_along(1:length(population)),population))+aes(population)+geom_histogram(aes(fill=..count..))
g3<-g3+geom_vline(xintercept = mean_population,col='gray',linetype="longdash",size=1)
g3<-g3+labs(x="Population",y="Frequency",title="Population Distribution")
g3<-g3+geom_text(x=mean_population,y=0,label=round(mean_population,2),vjust=1)
g3<-g3+geom_text(x=mean_population,y=max(population),label="mean",angle=90,vjust=1)
g3
We can see that the distribution of population is not uniform.
g4<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)+geom_histogram(aes(fill=..count..))
g4<-g4+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=1)
g4<-g4+labs(x="Sample",y="Frequency",title="Sample Distribution")
g4<-g4+geom_text(x=mean_sample,y=0,label=round(mean_sample,2),vjust=1)
g4<-g4+geom_text(x=mean_sample,y=max(sample),label="mean",angle=90,vjust=1)
g4
Here we can see that even though the distribution of orignal population is not normal but the distribution of sample mean/averages is approximately Normally Distributed which is as per Central Limit Theorem.