Content you should have understood before watching this video:
- Number 1, ‘Variables’
- Number 2, ‘Variation’
- Number 3, ‘Measuring Variation’
- Number 4, ‘Standard Deviation and Standard Error’
What is a distribution in a statistical sense?
Distributions in R
hist(rnorm(1000), xlab = 'Score or Quantile',
ylab = 'Density or Probability', main = "",
axes = F, cex = 1.5); axis(1, cex = 1.5)
The uniform distribution
- Parameters to specify are the minimum and the maximum
- Equal probablities between the min and the max
The poisson distribution
- Parameter to specify is lambda (\(\lambda\)), which represents both the mean and the variance
- Count data: only integers, so for categorical (ordinal) variables
The poisson distribution - how can lambda be the mean AND the variance?
If lambda (\(\lambda\)) increases, this means that both the spread gets larger, AND the mean gets larger! Let us illustrate this with an example:
The normal distribution
The standard deviation is symmetrical, both its tails extend infinitely
The two parameters are the mean and the standard deviation
The standard normal distribution has mean 0 and standard deviation 1
In R, you can create normally distributed random numbers using the function
rnorm()The normal distribution has superior importance! (Central Limit Theorem, assumptions of standard parametric tests)
The Central Limit Theorem
’The sampling distribution of the sample means of any distribution approaches a normal distribution as the sample size gets larger!
Let’s visualise this:
https://seeing-theory.brown.edu/probability-distributions/index.html#section3
The ‘seeing theory’ page is absolutely great by the way!
The most important in a nutshell
- To see what distribution a variable might follow, you have to look at its histogram
- A distribution can be visualised with a histogram (looks like a city skyline) OR a density plot (looks like a curve)
- You should be familiar with a few common distribution - the uniform, the Poisson, and definitely the normal distribution
- You should know their parameters and be able to simulate numbers in R that follow those distributions