Overview

In this report we will be investigating the exponential distribution and distribution of sample means of exponential distribution in R and compare our findings with the theory of Central Limit Theorem.

We will be comparing the mean and variance of the sample distribution and population distribution and will also be showing that as per the C.L.T. the distribution of sample means is approximately normal.

Simulation

Simulation Of Population

The exponential distribution can be simulated in R using the function rexp(n,lambda).

We will be using the following parameters for our simulation exercise

Parameter Value
Lambda 0.2
Sample Size (n) 40
No of Simulations 1000
require(ggplot2)
lambda<-0.2
n<-40
nos<-1000

Population

Simulate population distribution of 1000 random exponents with the parameters shown above.

set.seed(1000)
population<-rexp(1000,lambda)
head(population)
## [1]  5.0233101  2.5884649 12.1869221 10.8162445  2.3871486  0.8353505

Population Statistics

Calculation of Mean and Variance of Population

mean_population<-mean(population)
mean_population
## [1] 5.015616
var_population<-var(population)
var_population
## [1] 26.07005

Mean and Variance of the population are 5.0156161 and 26.0700456 respectively.

Simulation Of Mean Distribution

Simulate average distribution of exponentials with sample size 40 and over 1000 simulations

Sample Population

sample<-sapply(1:nos,function(x){mean(rexp(n,lambda))})
head(sample)
## [1] 5.229161 5.786361 4.354502 5.352259 4.535410 4.332345

Sample Statistics

Calculation of Mean and Variance of Population

mean_sample<-mean(sample)
mean_sample
## [1] 4.992426
var_sample<-var(sample)
var_sample
## [1] 0.6562298

Mean and Variance of the population are 4.9924256 and 0.6562298 respectively.

Sample Mean versus Theoretical Mean

We can see that,

population mean is 5.0156161 and

sample mean is 4.9924256

If we plot our sample distribution with population mean

g1<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)+geom_histogram(aes(fill=..count..))
g1<-g1+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=1)
g1<-g1+labs(x="Sample",y="Frequency",title="Sample Distribution")
g1<-g1+geom_text(x=mean_population,y=0,label=round(mean_population,2),vjust=1)
g1<-g1+geom_text(x=mean_population,y=max(sample),label="population mean",angle=90,vjust=2,hjust=-0.5)
g1

Here we can see how our sample distribution is centred around population mean and Central Limit Theorem also tells us that sample mean should approach and be centred around population mean and for very large n it should be approximately equal which we can see here clearly.

Sample Variance versus Theoretical Variance

We can see that,

Variance of Population = 26.0700456 and

Variance of Sample = 0.6562298

As per C.L.T the relationship between variance of sample distribution and that of population distribution is

\[\sigma_x^2=\sigma^2/n\]

Where \(\sigma_x^2\) is variance of Sample Mean and \(\sigma^2\) is variance of population. Hence,

\[n=\sigma^2/\sigma_x^2\]

Comparing our population variance and sample variance

calculated_var_sample<-var_population/n

We can see that this 0.6517511 is approximately equal to our variance sample which is 0.6562298.

Also we can see the variance in our figure which is much less than population variance as explained by C.L.T.

g2<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)+geom_histogram(aes(fill=..count..))
g2<-g2+labs(x="Sample",y="Frequency",title="Sample Distribution")
g2<-g2+geom_vline(xintercept = mean_sample+var_sample,col='gray',linetype="longdash",size=1)
g2<-g2+geom_vline(xintercept = mean_sample-var_sample,col='gray',linetype="longdash",size=1)
g2<-g2+geom_text(x=mean_sample+var_sample,y=0,label=round(mean_sample+var_sample,2),vjust=1)
g2<-g2+geom_text(x=mean_sample-var_sample,y=0,label=-round(mean_sample-var_sample,2),vjust=1)
g2<-g2+geom_text(x=mean_sample+var_sample,y=max(sample),label="+sigma",angle=90,vjust=2,hjust=-0.5)
g2<-g2+geom_text(x=mean_sample-var_sample,y=max(sample),label="-sigma",angle=90,vjust=2,hjust=-0.5)
g2

Distribution

Here we can plot the actual population distribution and distribution of averages of population distribution.

Population Distribution

Plotting the population distribution via ggplot2 plotting.

g3<-ggplot(data.frame(seq_along(1:length(population)),population))+aes(population)+geom_histogram(aes(fill=..count..))
g3<-g3+geom_vline(xintercept = mean_population,col='gray',linetype="longdash",size=1)
g3<-g3+labs(x="Population",y="Frequency",title="Population Distribution")
g3<-g3+geom_text(x=mean_population,y=0,label=round(mean_population,2),vjust=1)
g3<-g3+geom_text(x=mean_population,y=max(population),label="mean",angle=90,vjust=1)
g3

We can see that the distribution of population is not uniform.

Distribution of Sample Mean

g4<-ggplot(data.frame(seq_along(1:length(sample)),sample))+aes(sample)+geom_histogram(aes(fill=..count..))
g4<-g4+geom_vline(xintercept = mean_sample,col='gray',linetype="longdash",size=1)
g4<-g4+labs(x="Sample",y="Frequency",title="Sample Distribution")
g4<-g4+geom_text(x=mean_sample,y=0,label=round(mean_sample,2),vjust=1)
g4<-g4+geom_text(x=mean_sample,y=max(sample),label="mean",angle=90,vjust=1)
g4

Here we can see that even though the distribution of orignal population is not normal but the distribution of sample mean/averages is approximately Normally Distributed which is as per Central Limit Theorem.