Overview

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponential. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential. You should

Simulations

Sample Mean versus Theoretical Mean

Sample mean : The sample mean or empirical mean and the sample covariance are statistics computed from a collection (the sample) of data on one or more random variables.

To analyse the issue, as per the given assignment we are taking distribution of 1000 random uniforms, We will run a series of 1000 simulations, each simulation will contain 40 observations and the exponential distribution function will be set to “rexp(40, 0.2)”, so here are the known values and variables

sims = 1000;
n = 40; ## number of distributions
lambda = 0.2; ## number of simulations
means <- vector("numeric")
means_sum <- vector("numeric")
means_cum <- vector("numeric")

now we are calculating the mean

for (i in 1:sims) { means[i] <- mean(rexp(n, lambda))}
means_sum[1] <- means[1]
for (i in 2:sims) { means_sum[i] <- means_sum[i-1] + means[i] }
for (i in 1:sims) { means_cum[i] <- means_sum[i]/i }

now with the above mean calculation

The sample means

means_cum[sims]
## [1] 5.033982

The theoretical mean

1/lambda
## [1] 5

now if we plot both on a graph using ggplot2

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
g <- ggplot(data.frame(x = 1:sims, y = means_cum), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + geom_abline(intercept = 1 / lambda, slope = 0, color = "blue", size = 1)
g <- g + scale_y_continuous(breaks=c(4.50, 4.75, 5.00, 5.25, 5.50, 5.75), limits=c(4.25, 6))
g <- g + labs(title="Sample Mean vs Theoretical  Mean")
g <- g + labs(x = "Simulations", y = "Sample Mean")
print(g)
## Warning: Removed 1 rows containing missing values (geom_hline).

Sample Variance versus Theoretical Variance

We will compare the variance present in the sample means of the 1000 simulations to the theoretical variance of the population.

as per above calculation, the variance of sample mean

var(means)*n
## [1] 25.62674

theoretical variance

(1/lambda)^2
## [1] 25

As per the result, we can see variance of sample mean is 25.73227 and theoretical variance is 25 which is almost same and comparable

Distribution of Sample Means vs Normal Distribution

As per above results, lets plot the result on graph

library(ggplot2)
g <- ggplot(data.frame(x = means), aes(x = x))
g <- g + geom_histogram(position="identity", fill="yellow", color="black", alpha=0.2,binwidth=0.5, aes(y= ..density..))
g <- g + stat_function(fun = dnorm, colour = "red", args=list(mean=5))
g <- g + scale_x_continuous(breaks=c(1, 2, 3, 4, 5, 6, 7, 8, 9), limits=c(1, 9))
g <- g + scale_y_continuous(breaks=c()) 
g <- g + theme(plot.title = element_text(size=12, face="bold", vjust=2, hjust=0.5))
g <- g + labs(title="Distribution of Samle Means vs Normal Distribution")
g <- g + labs(x = "Sample Mean", y = "Frequency")
print(g)