The goal of the project is for the students to answer particular questions, demonstrate the use of simulation to explore inference, and do some simple inferential data analysis. A report will be created to answer the questions. I chose to use RPubs to publish this project, to demonstrate my knowledge of markdown, knitr and pdf-conversion. Ultimately the goal is to investigate the exponential distribution in R and compare it with the Central Limit Theorem.
library(knitr)
library(ggplot2)
library(dplyr)
numberOfSim <- 1000
sampleSize <- 40
lambda <- 0.2
set.seed(2)
simulatedData <- matrix(rexp(numberOfSim*sampleSize, rate=lambda), numberOfSim, sampleSize)
rowMeansSimulated <- rowMeans(simulatedData)
generatedMeansData<-data.frame(rowMeansSimulated)
theoreticalMean <- 1/lambda;
theoreticalSD <- 1/(lambda*sqrt(sampleSize))
theoreticalVar <- theoreticalSD^2
firstplot <- ggplot(generatedMeansData,aes(x=rowMeansSimulated)) + labs(title="Sample Mean versus Theoretical Mean") + geom_histogram(bindwidth=lambda, fill="white",color="black", aes(y = ..density..)) + geom_density(alpha=.2, fill="#FF6666")
#create a vertical line showing the mean of the simulations distrbuted
firstplot = firstplot + geom_vline(aes(xintercept=mean(rowMeansSimulated, na.rm=T)), color="red", linetype="dashed", size=1) + xlab("Sample Mean") + ylab("Density")
#create a line to show where the theoretical mean is
firstplot = firstplot + geom_vline(aes(xintercept=theoreticalMean), color="blue", linetype="solid", size=1)
firstplot
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
The standard mean is computed by mean(generatedMeansData$rowMeansSimulated) Meanwhile, the theoretical mean, as computed above is: 5
set.seed(2)
simulatedDataForVariance <- replicate(numberOfSim, (sd(rexp(sampleSize,lambda)/sqrt(sampleSize)))^2 )
#sample variance vs normal distribution
dfVariance <- data.frame(simulatedDataForVariance)
secondplot <- ggplot(dfVariance, aes(x=simulatedDataForVariance)) + geom_histogram(binwidth = lambda,fill="blue3",color="white", alpha=.3, aes(y=..density..))+ ylab("Density") + xlab("Sample Variance")+ ggtitle("Sample Variance VS Theoretical Variance (w/1,000 simulations)")
secondplot = secondplot + stat_function(fun = dnorm, color = "black", size = 2, arg = list(mean=mean(simulatedDataForVariance), sd = sd(simulatedDataForVariance)))
print(secondplot)
We can see that the sample variance matches the shape and central central weight of the theoretical variance generated by dnorm. Also, the more simulations, the narrower the graph is towards the center. Again, this distribution is in accordance to the predictions of Central Limit Theorem and Borel’s Law of Large Numbers.