Overview:

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution will be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. We will set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials by doing thousand simulations.

In this simulation we will show: - Show the sample mean and compare it to the theoretical mean of the distribution. - Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. - Show that the distribution is approximately normal.

Simulation:

First we will collect necessary data.

#load the ggplot library
library(ggplot2)

#variables that control the simulation
no.sim <- 1000
lambda <- 0.2
n <- 40

set.seed(1234)

#Create a matrix of 1000 rows with the columns corresponding to random simulation 40 times
sim.matrix <- matrix(rexp(no.sim * n, rate=lambda), no.sim, n)
sim.mean <- rowMeans(sim.matrix)

We will now plot the data and explore.

hist(sim.mean, col = "deepskyblue1")

1. Sample Mean versus Theoretical Mean:

We will now calculate the sample data mean and theoritcal mean.

mean.data <- mean(sim.mean)
theoritical.mean <- 1/lambda

print(mean.data)

## [1] 4.974239

print(theoritical.mean)

## [1] 5

We can see from the outputs of the code that the actual mean of the distribution is 4.974239 and the theortical mean for lambda = 0.2 is 5.

2. Sample Variance versus Theoretical Variance:

We will now calculate the actual variance and theoretical variance.

actual.var <- var(sim.mean)
theoritical.var <- (1/lambda)^2/n

print(actual.var)

## [1] 0.5949702

print(theoritical.var)

## [1] 0.625

We can see from the outputs of the code that the actual variance of the distribution is 0.5949702 and the theortical variance is 0.625.

3. Distribution:

3.1: Using a histogram we will show that the distribution is normal.

plot <- data.frame(sim.mean);
m <- ggplot(plot, aes(x =sim.mean))
m <- m + geom_histogram(aes(y=..density..), colour="black",fill = "deepskyblue1")
m + geom_density(colour="gold", size=1);

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3.2: Now let us try to see the confidence intervals to see if the variance and mean of the data is resembles normal distribution.

actual.confidence.interval <- round (mean(sim.mean) + c(-1,1)*1.96*sd(sim.mean)/sqrt(n),3)
theoritical.confidence.interval <- theoritical.mean + c(-1,1)*1.96*sqrt(theoritical.var)/sqrt(n)

print(actual.confidence.interval)

## [1] 4.735 5.213

print(theoritical.confidence.interval)

## [1] 4.755 5.245

We can see that actual confidence interval is: 4.735, 5.213 and the thoeritical coinfidence interval is 4.755, 5.245

3.3: Now let us try to plot the acutal quantiles to the theoritcial quantiles.

qqnorm(sim.mean)
qqline(sim.mean)

We can see from the plot that the actual quantiles closely match the theorical quantiles.

Using these 3 evidences (means, variances and quantiles) we can confidently say that the distribution is failry normal.

Statistical Inference Course Project Part 1: Simulation Exercise

Karthik Chawala

September 10, 2017

Overview:

Simulation:

1. Sample Mean versus Theoretical Mean:

2. Sample Variance versus Theoretical Variance:

3. Distribution: