Content you should have understood before watching this video:
- Number 3, ‘Variation in Data’
- Number 4, ‘Basic Statistical Metrics’
- Number 5, ‘Standard Deviation and Standard Error’
- Number 7, ‘Distributions’
The normal distribution
The standard deviation is symmetrical, both its tails extend infinitely
The two parameters are the mean and the standard deviation
The standard normal distribution has mean 0 and standard deviation 1
In R, you can create normally distributed random numbers using the function
rnorm()The normal distribution has superior importance! (Central Limit Theorem, assumptions of standard parametric tests)
From quartiles to quantiles
For a standard normal distribution:
Playing with a simple data set to compute probabilities and quantiles
- Let’s retrieve a simple data set: values of body height together with sex (female/male)
- How is the variable ‘body height’ distributed? (Histogram!)
- How frequently do we expect a value of 150, 170, 190 or less to pop up for females/males?
- To answer these questions, we approximate the distribution of female/male body height using the normal distribution!
Examples with human body height
Females: mean = 160 cm, sd = 6 cm, males: mean = 170 cm, sd = 7 cm
Examples with human body height
What is the probability of being shorter than 175 cm if you are a woman ?
pnorm(q = 175, mean = 160, sd = 6)
[1] 0.9937903
Examples with human body height
What is the maximum height for 95 % of the male population ?
qnorm(p = .95, mean = 170, sd = 7)
[1] 181.514
Tricky one:
Between what two values will we find 70% of mean woman body heights?
The most important in a nutshell
- We need to understand a probability density plot, have a sense for how rare/common an outcome is, given a certain mean and standard deviation
- We need to know how to work out probabilites and quantiles using pnorm (p for probablity) and qnorm (q for quantile) given we face a normal distribution
- pnorm and qnorm are the reciprocal functions
- Be mindful of whether you are after the left or right tail under the curve
- Make a sketch and shade the probability you are after, that will help!