Following the law of large numbers (read more here: https://en.wikipedia.org/wiki/Law_of_large_numbers), the average of the results obtained from a large number of trials is close to the expected value and will tend to become closer to the expected value as the sample becomes larger. If the population we are sampling from is normally distributed, our sample will look more and more like a normal distribution as we increase the sample size n.

Let’s simulate collecting data about height from n people. For the sake of this example, let’s assume that these people come from a population whose mean height is \(\mu\) = 180 cm, with a standard deviation of \(\sigma\) = 5 (i.e., around 2/3 of the sample will have a height of between 175 and 185 cm). Of course, in real life we never know what the population mean and standard deviation are, but we can infer those from the sample mean \(\bar{x}\) and sample standard deviation \(s\).

We use the rnorm() function to randomly sample from a theoretical population that follows a normal distribution. Let’s start by simulating the sampling of one sample with \(n\) = 10. We can plot the simulation in a histogram directly by “wrapping” the rnorm() function within the hist() function:

hist(rnorm(n = 10, m = 180, sd = 5), breaks = 10)

As you can see, the distribution doesn’t look very normal, but is rather quite randomly distributed. Let’s try again, this time simulating \(n\) = 50:

hist(rnorm(n = 50, m = 180, sd = 5), breaks = 50)

We start getting hints of a normal distribution. As we increase \(n\), the distribution becomes more and more normally distributed:

hist(rnorm(n = 100, m = 180, sd = 5), breaks = 100)

hist(rnorm(n = 1000, m = 180, sd = 5), breaks = 1000)

At n = 10,000, the distribution is almost perfectly normal!

hist(rnorm(n = 10000, m = 180, sd = 5), breaks = 10000)