Create the population
pop.size <- 10000
pop.mean <- 130
pop.sd <- 10
## Creating population with the parameters given above
population <- rnorm(n = pop.size, mean = pop.mean, sd = pop.sd)
Define standard error of mean simulator
sem.simulator <- function(population = population, sample.sizes = c(2, 10), iteration = 100) {
## Create 100 samples of different sizes
samples.of.different.sizes <-
lapply(sample.sizes, # Outer loop for size
function(sample.size) {
lapply(seq_len(iteration), # Inner loop for iteration
function(sample.number) { # Sampling procedure
sample(population, size = sample.size)
}
)
}
)
sample.means <-
lapply(samples.of.different.sizes, # Outer loop for size
function(samples.of.same.size) {
lapply(samples.of.same.size, # Inner loop for iteration
mean) # Mean for each sample
}
)
## Aggregate means for appropriate format
sample.means.unlist <- lapply(sample.means,
unlist)
sample.means.matrix <- do.call(cbind, sample.means.unlist)
sample.means.data.frame <- data.frame(sample.means.matrix)
names(sample.means.data.frame) <- paste("n", sample.sizes, sep = "_")
## Melt data for ggplot
require(reshape)
sample.means.melt <- melt(sample.means.data.frame)
## Create graphic
require(ggplot2)
gg.out <-
ggplot(sample.means.melt) +
geom_histogram(aes(value)) +
facet_grid(variable ~ .)
gg.out
}
Plot graphs for sample sizes of 1, 5, 10, 20, 50, and 100.
100 samples of each of these sizes are created (total 6 * 100 = 600 samples).
Then, the sample mean is calculated for each of these.
Then, the sample means are plotted as histograms grouped by the sample sizes.
graph.out <- sem.simulator(population, sample.sizes = c(1, 5, 10, 20, 50, 100))
graph.out
As the sample size becomes bigger, the standard error of the mean (SEM), defined as \( \frac {standard\ deviation\ of\ sample} {\sqrt {sample\ size}} \), becomes smaller. Thefore, the means of bigger samples closesly approximate the population mean, giving them more credibility!