Content you should have understood before watching this video:
- Number 2, ‘Variables’
- Number 3, ‘Variation in data’
- Number 4, ‘Basic statistical metrics’
- Number 5, ‘Standard deviation and standad error’
- Number 6, ‘Populations, samples, hypotheses’
- Number 7, ‘Distributions’
- Number 8, ‘Quantiles and probabilities’
Quick reminder
In terms of body height, between what limits are 95 % of the male population? You need to be on top of those kinds of questions
qnorm(p = .025, mean = 170, sd = 7) [1] 156.2803 qnorm(p = .975, mean = 170, sd = 7) [1] 183.7197
Survey on body height of female AUT students
From our survey, we get:
mean(d1$bodyheight[d1$sex == 'F']) [1] 163.3333 length(d1$bodyheight[d1$sex == 'F']) #what does the function 'length()' do again? [1] 87
From Wikipedia, we can learn that the average female New Zealander is 164 cm with a standard deviation of 6
What question can we ask now?
Is our sample of body heights of female AUT students a ‘typial’ one?
If we want a more quantitative statement on this question
- We need to know whether our sample is ‘unusual’ or ‘normal’?
- We need to know what ‘unusual’ or ‘normal’ means, so
- we need to quantify what is usual, normal, rare etc.!
- We need a testable hypothesis
So let’s try
Our hypothesis, called the ‘Null’ hypothesis
- Female AUT students are NOT different in terms of body height from the average New Zealand female (this is our so-called null hypothesis \(H_0\))
- Because we can stick with this hypothesis or reject it, we need an alternative hypothesis \(H_A\): Our students ARE different from a typical sample of New Zealanders
- Note that the Null hypothesis is negative, which makes it easier to falsify!
- Also note that we can never accept \(H_0\), we can only fail to reject \(H_0\)
Are female AUT students typical NZers?
Now we need a test statistic and knowledge of the distribution we compare against:
Our test statistic is simply the mean of our sample:
mean(d1$bodyheight[d1$sex == 'F']) [1] 163.3333
How does that compare with
pop = rnorm(2000000, mean = 164, sd = 6) #why 2000000? mean(pop) [1] 164.0014
Are female AUT students typical NZers?
YES, our female students are typical New Zealanders in terms of body height!
We can tell by just looking at how we compare to the distribution of NZ female bodyheight. More quantitatively…:
Are female AUT students typical NZers?
The probability of obtaining a value equal or smaller than the mean we got for our female students when sampling from the NZ population is about 50%:
pnorm(q = mean(d1$bodyheight[d1$sex == 'F']) , mean = 164, sd = 6) [1] 0.4557641
In other words our sample is NOT unusual and hence we cannot reject our null hypothesis!
Our students are NOT different from a typical sample of New Zealanders
OK, but what if our sample mean had been different, say 160 cm, or 150 cm…?
Are female AUT students typical NZers?
pnorm(q = 160, mean = 164, sd = 6) [1] 0.2524925
Is a value that we’d get 25% of the time by chance rare?
Are female AUT students typical NZers?
pnorm(q = 150, mean = 164, sd = 6) [1] 0.009815329
Is a value that we’d get 1% of the time by chance rare?
P-values and statistical significance
- Those probabilities (50%, 25%, 1%) are our p-values. They always are to be interpreted as ‘the probability of obtaining such an extreme value by chance’
- We need a threshold to objectively distinguish from ‘rare’ and ‘unusually rare’:
- Normally, we say that if our sample is more extreme than what we would find 5% of the time by chance, then our p-value is significant (in a statistical sense)
- This threshold is also called the \(\alpha\)-threshold
In summary…
- We formulate a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_A\))
- We obtain a (or several) sample(s)
- We calculate a metric (in our example this was simply the mean). This is our test statistic
- We then compare our test statistic against a random distribution of the same variable with known parameters (e.g. mean and standard deviation)
- If our sample is sufficiently ‘rare’ (i.e. past the \(\alpha\)-threshold), then we consider our test significant, i.e. we reject the null hypothesis and turn to \(H_A\)
Note that this protocol is VERY generic, it will differ slightly depending on what test you are performing. This is not (yet) a proper statistical test, just the general idea behind it.
Type I vs. type II error
In all of this, we can make 2 types of errors!
Type I vs. type II error
- A type I error is when we falsly reject the null hypothesis \(H_0\)
- In plain language, this means that we call something ‘significant’ (e.g. a difference, 2 samples, etc.) while in reality there is no significant difference (or, more generally ‘nothing going on’)
- A type II error is when we falsly fail to reject the null hypothesis \(H_0\)
- In plain language, this means that “we don’t see anything where in reality things (e.g. samples) are different”
Note that the ‘plain language’ definitions are inexact, but hopefully help you to understand the principle of type I/II errors
Type I vs. type II error
Maybe easier to remember…:
That was too much…a practical example
OK:
- Two boxes with pieces of paper with numbers written on them
- I claim that those numbers come from a standard normal distribution (this may or may not be true)
- Sanaa will test this:
- She states her \(H_0\): ‘The sample is no different from a standard normal distribution’
- She picks a number from each box
- She then compares the number (her test statistic) to a standard normal distribution, and asks ‘is it unusually low/high’?
- She then rejects or fails to reject her \(H_0\)
- She makes a decision whether box 1/box 2 actually contains numbers that follow a standard normal distribution!
Again, how did Sanaa decide?
So what was the real story?
So what was the real story?
The most important in a nutshell
- How to formulate a null and an alternative hypothesis
- The principle of a statistical test, using a simple test statistic, e.g. a mean
- Using that test statistic to make a frequentist statistical decision
- Understand that in this decision, we can make two correct decisions, and two errors, namely a type I and a type II error