Standard error of mean

Create the population

pop.size <- 10000
pop.mean <- 130
pop.sd   <- 10

## Creating population with the parameters given above
population <- rnorm(n = pop.size, mean = pop.mean, sd = pop.sd)

Define standard error of mean simulator

sem.simulator <- function(population = population, sample.sizes = c(2, 10), iteration = 100) {
    ## Create 100 samples of different sizes
    samples.of.different.sizes <-
        lapply(sample.sizes,                    # Outer loop for size
               function(sample.size) {
                   lapply(seq_len(iteration),   # Inner loop for iteration
                          function(sample.number) { # Sampling procedure
                              sample(population, size = sample.size)
                          }
                          )
               }
               )
    sample.means <-
        lapply(samples.of.different.sizes,      # Outer loop for size
               function(samples.of.same.size) {
                   lapply(samples.of.same.size, # Inner loop for iteration
                          mean)                 # Mean for each sample
               }
               )

    ## Aggregate means for appropriate format
    sample.means.unlist <- lapply(sample.means,
                                  unlist)
    sample.means.matrix <- do.call(cbind, sample.means.unlist)
    sample.means.data.frame <- data.frame(sample.means.matrix)
    names(sample.means.data.frame) <- paste("n", sample.sizes, sep = "_")

    ## Melt data for ggplot
    require(reshape)
    sample.means.melt <- melt(sample.means.data.frame)

    ## Create graphic
    require(ggplot2)
    gg.out <-
        ggplot(sample.means.melt) +
            geom_histogram(aes(value)) +
            facet_grid(variable ~ .)
    gg.out
}

Plot graphs for sample sizes of 1, 5, 10, 20, 50, and 100.
100 samples of each of these sizes are created (total 6 * 100 = 600 samples).
Then, the sample mean is calculated for each of these.
Then, the sample means are plotted as histograms grouped by the sample sizes.

graph.out <- sem.simulator(population, sample.sizes = c(1, 5, 10, 20, 50, 100))
graph.out

plot of chunk unnamed-chunk-4

As the sample size becomes bigger, the standard error of the mean (SEM), defined as \( \frac {standard\ deviation\ of\ sample} {\sqrt {sample\ size}} \), becomes smaller. Thefore, the means of bigger samples closesly approximate the population mean, giving them more credibility!