Only one way of getting 2, lots of ways of getting 7
Lets throw the two dice 10,000 times, what distribution do we get?
Not quite normal
Average the score over three games
Last slide was one game repeated 10,000 times
Now, lets average over three games, 10,000 times
A normal distibution, the central limit theorem in action
But why does real world data often fit a normal distribution?
The central limit theorem
If you take repeated samples from a population and calculate the averages, then these averages will be normally distributed.
Real world data is often the result of numerous processes interacting, that is averages
Think of all the reasons that you are the height you are.
The normal distribution is well studied
Its symmetrical, so half of the values are below the mean and a half above
We can know the distribution in various parts e.g. 16% of samples will be more than 1 standard deviation above the mean.
Because we know this, we can work out the values that can be predicted by chance alone. The fact that 95% (remember 0.05?) of values lie within 1.96 standard deviations of the mean is often used in statisical tests.