In this project we are going to try to demonstrate three main things:
library(ggplot2)
library(knitr)
These are the simulation parameters:
n <- 40
lambda <- 0.2
num_simulations <- 1000
set.seed(41925) # To ensure people can reproduce the research
Here I perform the simulation
means <- NULL
vars <- NULL
for (i in 1 : num_simulations) {
simulatedData <- rexp(n, lambda)
means <- c(means, mean(simulatedData))
vars <- c(vars, var(simulatedData))
}
Here we calculate the sample mean
sample_mean <- mean(means)
sample_mean
## [1] 5.008348
theoretical_mean <- 1/lambda
theoretical_mean
## [1] 5
As we can see the sample mean is almost the same as the theoritical one. To represent this idea I provide the next plot image:
g1 <- qplot(means, geom="histogram", xlab="Mean of 40 exponentials simulation", binwidth=0.2, xlim=c(1,9),
main="Distribution of the mean of 1000 data samples (40 exponentials each)")
g1 <- g1 + geom_vline(xintercept = theoretical_mean, color="yellow")
g1 <- g1 + geom_text(mapping=aes(x=sample_mean, y=110, label=paste("sample mean=",round(sample_mean,3))), size=4, vjust= 1, hjust=-0.1)
g1
Here we calculate the sample variance and we can observe that the sample variance is very similar to the theoritical one:
sample_var <- mean(vars)
sample_var
## [1] 25.11678
theoretical_var <- (1/lambda)^2
theoretical_var
## [1] 25
We represent also a plot to see this:
g2 <- qplot(vars, geom="histogram", xlab="Variance of 40 exponentials simulation", binwidth=2,
main="Distribution of the variance of 1000 data samples (40 exponentials each)")
g2 <- g2 + geom_vline(xintercept = theoretical_var, color="yellow")
g2 <- g2 + geom_text(mapping=aes(x=sample_var, y=130, label=paste("sample variance=",round(sample_var,3))), size=4, hjust=-0.1)
g2
One of the easiest and clearest way to see if the data is normally distributed is to represent a Q-Q Plot.
qqnorm(means)
qqline(means, col = "yellow")
The linearity of the data across the straight line strongly suggests that the population data is normally distributed as we expected.
We have shown how the sample mean and variance and good stimators of the population mean and variance if the number of data samples is big enough. Additionally, we can see how the distribution tends to be normally distributed.
The session info is:
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
##
## locale:
## [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.11 ggplot2_1.0.1
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.2-6 digest_0.6.8 evaluate_0.8 formatR_1.0
## [5] grid_3.0.2 gtable_0.1.2 htmltools_0.2.6 labeling_0.3
## [9] MASS_7.3-29 munsell_0.4.2 plyr_1.8.1 proto_0.3-10
## [13] Rcpp_0.11.5 reshape2_1.4.1 rmarkdown_0.8 scales_0.2.4
## [17] stringr_0.6.2 tools_3.0.2 yaml_2.1.13