Final Assignment - Statistical inference

SYNOPSIS

This is the project for the statistical inference class. In it, you will use simulation to explore inference and do some simple inferential data analysis. The project consists of two parts:

A simulation exercise. Basic inferential data analysis.

INTRODUCTION

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.

Question 1

Show where the distribution is centered at and compare it to the theoretical center of the distribution.

set.seed(10)
simulation_count<-1000  
lambda<-0.2 
sample_mean_vector<-NULL
for(i in 1:simulation_count)  sample_mean_vector=c(sample_mean_vector, mean(rexp(40,lambda)))
sample_mean<-mean(sample_mean_vector)
hist(sample_mean_vector, freq = FALSE,main="Histogram of means of 40 exponential variables via 1000 runs", xlab="sample mean", ylab="Density",ylim=c(0,0.7),breaks=20)
abline(v=mean(sample_mean),col='RED',lwd=4)
text(6.8,0.65, paste (c("mean via simulation is ",sample_mean),collapse = ""))
text(6.8,0.60, "theoretical mean is 5")

## Question 2 Show how variable it is and compare it to the theoretical variance of the distribution.

set.seed(10)
sample_variance_vector <- NULL
for(i in 1:simulation_count)  sample_variance_vector=c(sample_variance_vector, var(rexp(40,lambda)))
sample_variance<-mean(sample_variance_vector)
hist(sample_variance_vector,freq = FALSE, main="Histogram of Variance of 40 exponential variables via 1000 runs", xlab="sample variance", ylab="Density",ylim = c(0,0.05),breaks=20)
abline(v=mean(sample_variance),col='RED',lwd=4)
text(58,0.05, paste (c("Variance via simulation is ",sample_variance),collapse = ""))
text(58,0.047, "theoretical variance is 25")

Question 3

Show that the distribution is approximately normal.

set.seed(10)
sample_CLT_vector <- NULL
for(i in 1:simulation_count)  sample_CLT_vector=c(sample_CLT_vector, ((mean(rexp(40,lambda))-5)/(5/sqrt(40))))
hist(sample_CLT_vector,freq = FALSE, main="Standardized mean (via CLT) of 40 exponential variables via 1000 runs", xlab="standardized sample mean via CLT", ylab="Density",ylim = c(0,0.5),breaks=40)
sample_CLT_mean<-mean(sample_CLT_vector)
abline(v=mean(sample_CLT_mean),col='RED',lwd=4)
text(2.8,0.45, paste (c("Standardized mean via CLT is ",sample_CLT_mean),collapse = ""))
text(2.8,0.41, "theoretical variance is 0")
text(2.8,0.3, "match the normal distribution ")
text(2.8,0.27,"the blue curve well")
# overlap with the normal distribution
curve(dnorm(x,mean=sample_CLT_mean,sd=sd(sample_CLT_vector)),col="blue", add=T,lwd=4)