This is the project for the statistical inference class. In it, you will use simulation to explore inference and do some simple inferential data analysis. The project consists of two parts: A simulation exercise. Basic inferential data analysis.
Step 1: Generate an exponential distribution for 40 exponentials with a rate of ‘lambda’ = 0.2 (we assumed lambda == 0.2).
# Ensure that the random numbers are always the same by setting the seed
set.seed(3890)
numberOfSimulations <- 1000
numberOfExponentials <- 40
lambda <- 0.2
Using the above values, generate 1000 sets of simulations and store them in a dataframe.
simulationSetDF <- data.frame(mean = (numberOfSimulations))
for (simulationIndex in 1:numberOfSimulations) {
thisSimulationSet <- rexp(numberOfExponentials, lambda)
simulationSetDF[simulationIndex, 1] <- mean(thisSimulationSet)
}
# Sample contents
head(simulationSetDF)
## mean
## 1 4.457028
## 2 3.786469
## 3 5.678124
## 4 4.702073
## 5 4.585880
## 6 4.136375
I will be investigating the distribution of averages of 40 exponentials and in the process with explain and illustrate the following:
Sample Mean is the value obtained by dividing the sum of a set of quantities by the number of quantities in the set. Also called average, Sample Mean is an unbiased estimator for the population mean.
sampleMean <- mean(simulationSetDF$mean)
## [1] 4.99911
Theoretical Mean is the mean of the exponential distribution and is calculated as 1/lambda.
theoreticalMean <- 1/lambda
## [1] 5
We now plot both sample and theoretical means on the histogram below:
# plot histogram
hist(simulationSetDF$mean, probability = TRUE, main = "Distribution of Simulated and Theoretical Means", xlab = "Mean of 40 Exponentials")
# plot the density curve
lines(density(simulationSetDF$mean), col="red", lwd=5)
# plot theoretical mean
abline(v=theoreticalMean, col="yellow", lwd=5)
# capture first 100 means between the range of simulationSet
sequenceOfMeans <- seq(min(simulationSetDF$mean), max(simulationSetDF$mean), length=100)
# capture the density
densities <- dnorm(sequenceOfMeans, mean=theoreticalMean, sd=theoreticalMean/sqrt(numberOfExponentials))
# plot theoretical density curve
lines(sequenceOfMeans,densities, col="blue", lwd=5)
# add legend
legend('topright',c('Simulated Density Curve','Theoretical Density Curve','Theoretical Mean'),cex=0.8,col=c('red','blue','yellow'),lty=1,lwd=5)
From the above histogram, we can clearly infer that the sample mean value is very close to theoretical mean.
Calculate Theoretical Variance
# theoretical variance
theoreticalVariance <- ((1/lambda)^2) / numberOfExponentials
## [1] 0.625
Calculate Sample Variance
# sample variance
sampleVariance <- var(simulationSetDF$mean)
## [1] 0.623486
We observe that the sample variance and theoretical variances are very close.
From the above plot, we observe that both sample and theoretical distributions are approximately normal