Demonstration of what normally distributed data looks like

Introduction

This document gives a short example of what data from a normal distribution looks like. The aim is to show that even when you know that the underlying population is normally distributed, the actual data might not fit perfectly.

Generating random data

Even when the population is normally distributed, a random sample from that population might not look perfectly normally distributed. This is particularly true for small samples. As the sample size gets bigger, the sample will more closely represent the population that it is drawn from.

To show this, I will generate four samples from a normal distribution of different sizes.

From the plots we can see that the data doesn’t perfectly fit the normal distribution on either the histograms or the quantile plot. We can see that the fit gets better for bigger samples. This shows how it’s important to make allowances for the sample size when interpreting histograms and quantile plots.

Histogram bin-widths

It’s also important to bear in mind that the width of the ‘bins’ used on the histogram might also affect how ‘normal’ the data looks. We can illustrate this by plotting the same data but changing the bin-width as we go:

This gives us some sense of how the number of bins used in the histogram can make the distribution of the data look subtly different. Remember that we know that the underlying population is a normal distribution here, because we used a computer to randomly select our data from a normal distribution.