1 Introduction

The standard error of the mean is estimated by taking the standard deviation of the sample and dividing by the square root of the sample size but where does this square root of the sample size come from? It was actually determined from De Moivre’s equation and originally statisticians had thought that it would be proportional to 1/n. I am going to use simulations to show that it is dependent on the square root of n where n is the sample size and not n.

1.1 The First Simulation

The first simulation is going to be for the mean of 10,000 repetitions of rolls of a ten die. This will construct a sample distribution showing the variation between samples and the overall mean across all the samples.

y <- vector()
for (i in 1:10000){
  sample <- sample(c(1:6), size=10, replace=TRUE)
  y[i] <- mean(sample)
}
hist(y, main="Histogram of the Mean of Ten Die", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.5,2.5,by=0.01)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

The mean value of rolling any number of die multiple times should be 3.5 as this is the midpoint of the uniform distribution between 1 and 6. This is clearly shown in the graph. It is also clearly normally distributed for the number of repeats which is very large.

In this case the sample size is 10 but what is the standard deviation between these 10,000 samples?

The standard deviation of the mean is 0.54 when the sample size is 10.

1.2 The Second Simulation

I am going to repeat the process but this time the sample size will be 1000 dice and there will be 10,000 repeats of the sampling to get the sample distribution of means.

y <- vector()
for (i in 1:10000){
  sample <- sample(c(1:6), size=1000, replace=TRUE)
  y[i] <- mean(sample)
}
hist(y, main="Histogram of the Mean of a Thousand Die", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.5,2.5,by=0.005)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

As before but in this case the sample size is 10000. What is the standard deviation between these 10,000 samples?

The standard deviation of the mean is 0.054 when the sample size is 1000

This is one tenth of the size of the standard deviation for the sample distribution when the sample size was 10. 1000 is 100 times 10 and the square root of 100 is 10 which suggests that the relationship between the standard error of the mean is related to the root of the sample size and not the sample size itself.

These simulations have shown that it is true for the simulation of die but what about real data such as heights and their sample distributions?

2 Simulating the Sample Distribution of Height Data.

I am going to use a normally distributed dataset with a mean of 178 cm and a standard deviation of 3 cm. Those are my population parameters. I can now generate samples from this population.

2.1 Height Simulation 1

y <- vector()
for (i in 1:10000){
  sample <- rnorm(10,178,3)
  y[i] <- mean(sample)
}
hist(y, main="Histogram of the Mean of Ten Heights", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.5,2.5,by=0.01)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

For the sample distribution the standard deviation of the mean is 0.95 when the sample size is 10.

2.2 Height Simulation Two

y <- vector()
for (i in 1:10000){
  sample <- rnorm(1000,178,3)
  y[i] <- mean(sample)
}
hist(y, main="Histogram of the Mean of a Thousand Heights", xlab="Mean", ylab="Density", freq=F)
x<-seq(0.5,2.5,by=0.01)
curve(dnorm(x, mean=mean(y),sd=sd(y)), add=TRUE)

Now the sample size is 1000 and the standard deviation of the mean is 0.096 for this new sample distribution.

This again is one tenth the value for the original sample but the population parameter had a standard deviation of 3 why is the standard error of the mean for a sample of ten 0.95 and for a thousand 0.095?

That is because those population parameters assume a sample of size 1. and so a sample of size 10 is 10 times bigger.

That means the expected standard error for the sample of size 10 is: \(\dfrac{3}{\sqrt{10}}\)

= 0.95

The standard deviation of this sampling distribution is the standard deviation of the sampled means and so it is called the standard error of the mean.