Statistical inference project

Introduction

In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. I will set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. Note that I will need to do a thousand simulations.

Simulations

sims = 1000;
n = 40; ## number of distributions
lambda = 0.2; ## number of simulations
means <- vector("numeric")
means_sum <- vector("numeric")
means_cum <- vector("numeric")

for (i in 1:sims) { means[i] <- mean(rexp(n, lambda))}
means_sum[1] <- means[1]
for (i in 2:sims) { means_sum[i] <- means_sum[i-1] + means[i] }
for (i in 1:sims) { means_cum[i] <- means_sum[i]/i }

Sample Mean versus Theoretical Mean

means_cum[sims]  #Sample mean

## [1] 5.019162

1/lambda  #Theoretical mean

## [1] 5

We observed the distribution of means is centered in the theoretical mean.

Sample variance versus Theoretical variance

var(means)*n  #Sample variance

## [1] 26.05707

(1/lambda)^2  #Theoretical variance

## [1] 25

We see that the theoretical variance and the variance of the means are comparable.

Visualize Sample Mean vs Theoretical Mean

library(ggplot2)

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang

g <- ggplot(data.frame(x = 1:sims, y = means_cum), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + geom_abline(intercept = 1 / lambda, slope = 0, color = "red", size = 1)
g <- g + scale_y_continuous(breaks=c(4.50, 4.75, 5.00, 5.25, 5.50, 5.75), limits=c(4.25, 6))
g <- g + labs(title="Sample Mean vs Theoretical  Mean")
g <- g + labs(x = "Simulations", y = "Sample Mean")
print(g)

## Warning: Removed 1 rows containing missing values (geom_hline).

We see that Sample Mean vs Theoretical Mean are comparable

Normal Distribution

library(ggplot2)
g <- ggplot(data.frame(x = means), aes(x = x))
g <- g + geom_histogram(position="identity", fill="green", color="red", alpha=0.2,binwidth=0.5, aes(y= ..density..))
g <- g + stat_function(fun = dnorm, colour = "red", args=list(mean=5))
g <- g + scale_x_continuous(breaks=c(1, 2, 3, 4, 5, 6, 7, 8, 9), limits=c(1, 9))
g <- g + scale_y_continuous(breaks=c()) 
g <- g + theme(plot.title = element_text(size=12, face="bold", vjust=2, hjust=0.5))
g <- g + labs(title="Distribution of Samle Means vs Normal Distribution")
g <- g + labs(x = "Sample Mean", y = "Frequency")
print(g)

## Warning: Removed 2 rows containing missing values (geom_bar).

The figure above also shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values.

Additionally, the q-q plot below indicates normality. The theoretical quantiles again match closely with the actual quantiles. Therefore, we conclude that the distribution is approximately normal.

qqnorm(means, 
       main ="Q-Q plot of the standardized Sample Means")
qqline(means, 
       col = "3")