This is the project for the statistical inference class (Part 1). In it, you will use simulation to explore inference and do some simple inferential data analysis. The project consists of two parts:
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Lambda = 0.2
n = 40
nsims = 1:1000
set.seed(1)
avg <- data.frame(x = sapply(nsims, function(x){mean(rexp(n, Lambda))} ))
head(avg)
## x
## 1 4.860372
## 2 5.961285
## 3 4.279204
## 4 4.702298
## 5 5.196446
## 6 4.397114
Theoretical_Mean <- 1/Lambda
print (paste("Theoretical mean of the distribution = ", Theoretical_Mean))
## [1] "Theoretical mean of the distribution = 5"
Sample_Mean <- mean(avg$x)
print (paste("Sample mean of the distribution = ", Sample_Mean))
## [1] "Sample mean of the distribution = 4.99002520077716"
print(paste("Difference between Theoretical and sample mean =", abs(Theoretical_Mean-Sample_Mean)))
## [1] "Difference between Theoretical and sample mean = 0.00997479922283873"
Theoretical_Var <- (1/Lambda)^2/n
print (paste("Theoretical variance of the distribution = ", Theoretical_Var))
## [1] "Theoretical variance of the distribution = 0.625"
Sample_Var <- var(avg$x)
print (paste("Sample variance of the distribution = ", Sample_Var))
## [1] "Sample variance of the distribution = 0.611116466559575"
print(paste("Difference between Theoretical and sample variance =", abs(Theoretical_Var-Sample_Var)))
## [1] "Difference between Theoretical and sample variance = 0.0138835334404246"
library(ggplot2)
g <- ggplot(data.frame(avg), aes(x = avg))
g <- g + geom_histogram(aes(y=..density..), colour = "black", fill = "green")
g <- g + labs(x = "Simulated Mean of the Distribution", y = "Density")
g + stat_function(fun = dnorm, arg = list(mean = mean(avg$x), sd = sd(avg$x)),
colour = "blue", size = 2)
The above plot shows that the distribution is approximately normal.