======================================

Title: “Statistical Inference - Course Project (Part 1)”

Date: “December 23, 2015”

======================================

Synopsis

This is the project for the statistical inference class (Part 1). In it, you will use simulation to explore inference and do some simple inferential data analysis. The project consists of two parts:

  1. A simulation exercise.
  2. Basic inferential data analysis.

Problem Description

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.

Rate (Lambda) = 0.2; Number of Observations = 40; Number of simulations = 1000;

Lambda = 0.2
n = 40
nsims = 1:1000

For each of the 1000 simulations calculate the average across 40 observations.

set.seed(1)

avg <- data.frame(x = sapply(nsims, function(x){mean(rexp(n, Lambda))} ))
head(avg)
##          x
## 1 4.860372
## 2 5.961285
## 3 4.279204
## 4 4.702298
## 5 5.196446
## 6 4.397114

Question 1 Show the sample mean and compare it to the theoretical mean of the distribution.

Theoretical_Mean <- 1/Lambda
print (paste("Theoretical mean of the distribution = ", Theoretical_Mean))
## [1] "Theoretical mean of the distribution =  5"
Sample_Mean <- mean(avg$x)
print (paste("Sample mean of the distribution = ", Sample_Mean))
## [1] "Sample mean of the distribution =  4.99002520077716"
print(paste("Difference between Theoretical and sample mean =", abs(Theoretical_Mean-Sample_Mean)))
## [1] "Difference between Theoretical and sample mean = 0.00997479922283873"

Question 2 Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

Theoretical_Var <- (1/Lambda)^2/n
print (paste("Theoretical variance of the distribution = ", Theoretical_Var))
## [1] "Theoretical variance of the distribution =  0.625"
Sample_Var <- var(avg$x)
print (paste("Sample variance of the distribution = ", Sample_Var))
## [1] "Sample variance of the distribution =  0.611116466559575"
print(paste("Difference between Theoretical and sample variance =", abs(Theoretical_Var-Sample_Var)))
## [1] "Difference between Theoretical and sample variance = 0.0138835334404246"

Question 3 Show that the distribution is approximately normal.

library(ggplot2)
g <- ggplot(data.frame(avg), aes(x = avg))
g <- g + geom_histogram(aes(y=..density..), colour = "black", fill = "green")
g <- g + labs(x = "Simulated Mean of the Distribution", y = "Density")
g + stat_function(fun = dnorm, arg = list(mean = mean(avg$x), sd = sd(avg$x)), 
                  colour = "blue", size = 2)

The above plot shows that the distribution is approximately normal.