11 - Error Types

Department of Environmental Science, AUT

Error Types: Prerequisites

Error Types

Content you should have understood before watching this video:

Number 2, ‘Variables’
Number 3, ‘Variation in data’
Number 4, ‘Basic statistical metrics’
Number 5, ‘Standard deviation and standad error’
Number 6, ‘Populations, samples, hypotheses’
Number 7, ‘Distributions’
Number 8, ‘Quantiles and probabilities’

Quick reminder

Error Types

In terms of body height, between what limits are 95 % of the male population? You need to be on top of those kinds of questions

qnorm(p = .025, mean = 170, sd = 7)
[1] 156.2803
qnorm(p = .975, mean = 170, sd = 7)
[1] 183.7197

Survey on body height of female AUT students

Error Types

From our survey, we get:

mean(d1$bodyheight[d1$sex == 'F']) 
[1] 163.3333
length(d1$bodyheight[d1$sex == 'F']) #what does the function 'length()' do again?
[1] 87

From Wikipedia, we can learn that the average female New Zealander is 164 cm with a standard deviation of 6

What question can we ask now?

Is our sample of body heights of female AUT students a ‘typial’ one?

Error Types

If we want a more quantitative statement on this question

We need to know whether our sample is ‘unusual’ or ‘normal’?
We need to know what ‘unusual’ or ‘normal’ means, so
- we need to quantify what is usual, normal, rare etc.!
We need a testable hypothesis

So let’s try

Our hypothesis, called the ‘Null’ hypothesis

Error Types

Female AUT students are NOT different in terms of body height from the average New Zealand female (this is our so-called null hypothesis \(H_0\))
Because we can stick with this hypothesis or reject it, we need an alternative hypothesis \(H_A\): Our students ARE different from a typical sample of New Zealanders
Note that the Null hypothesis is negative, which makes it easier to falsify!
Also note that we can never accept \(H_0\), we can only fail to reject \(H_0\)

Are female AUT students typical NZers?

Error Types

Now we need a test statistic and knowledge of the distribution we compare against:

Our test statistic is simply the mean of our sample:

mean(d1$bodyheight[d1$sex == 'F']) 
[1] 163.3333

How does that compare with

pop = rnorm(2000000, mean = 164, sd = 6) #why 2000000?
mean(pop)
[1] 164.0014

Are female AUT students typical NZers?

Error Types

YES, our female students are typical New Zealanders in terms of body height!

We can tell by just looking at how we compare to the distribution of NZ female bodyheight. More quantitatively…:

Are female AUT students typical NZers?

Error Types

The probability of obtaining a value equal or smaller than the mean we got for our female students when sampling from the NZ population is about 50%:

pnorm(q = mean(d1$bodyheight[d1$sex == 'F']) , mean = 164, sd = 6)
[1] 0.4557641

In other words our sample is NOT unusual and hence we cannot reject our null hypothesis!

Our students are NOT different from a typical sample of New Zealanders

OK, but what if our sample mean had been different, say 160 cm, or 150 cm…?

Are female AUT students typical NZers?

Error Types

pnorm(q = 160, mean = 164, sd = 6)
[1] 0.2524925

Is a value that we’d get 25% of the time by chance rare?

Are female AUT students typical NZers?

Error Types

pnorm(q = 150, mean = 164, sd = 6)
[1] 0.009815329

Is a value that we’d get 1% of the time by chance rare?

P-values and statistical significance

Error Types

Those probabilities (50%, 25%, 1%) are our p-values. They always are to be interpreted as ‘the probability of obtaining such an extreme value by chance’
We need a threshold to objectively distinguish from ‘rare’ and ‘unusually rare’:
- Normally, we say that if our sample is more extreme than what we would find 5% of the time by chance, then our p-value is significant (in a statistical sense)
- This threshold is also called the \(\alpha\)-threshold

In summary…

Error Types

We formulate a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_A\))
We obtain a (or several) sample(s)
We calculate a metric (in our example this was simply the mean). This is our test statistic
We then compare our test statistic against a random distribution of the same variable with known parameters (e.g. mean and standard deviation)
If our sample is sufficiently ‘rare’ (i.e. past the \(\alpha\)-threshold), then we consider our test significant, i.e. we reject the null hypothesis and turn to \(H_A\)

Note that this protocol is VERY generic, it will differ slightly depending on what test you are performing. This is not (yet) a proper statistical test, just the general idea behind it.

Type I vs. type II error

Error Types

In all of this, we can make 2 types of errors!

Type I vs. type II error

Error Types

A type I error is when we falsly reject the null hypothesis \(H_0\)
- In plain language, this means that we call something ‘significant’ (e.g. a difference, 2 samples, etc.) while in reality there is no significant difference (or, more generally ‘nothing going on’)
A type II error is when we falsly fail to reject the null hypothesis \(H_0\)
- In plain language, this means that “we don’t see anything where in reality things (e.g. samples) are different”

Note that the ‘plain language’ definitions are inexact, but hopefully help you to understand the principle of type I/II errors

Type I vs. type II error

Error Types

Maybe easier to remember…:

That was too much…a practical example

Error Types

OK:

Two boxes with pieces of paper with numbers written on them
I claim that those numbers come from a standard normal distribution (this may or may not be true)
Sanaa will test this:
- She states her \(H_0\): ‘The sample is no different from a standard normal distribution’
- She picks a number from each box
- She then compares the number (her test statistic) to a standard normal distribution, and asks ‘is it unusually low/high’?
- She then rejects or fails to reject her \(H_0\)
- She makes a decision whether box 1/box 2 actually contains numbers that follow a standard normal distribution!

Again, how did Sanaa decide?

Error Types

So what was the real story?

Error Types

So what was the real story?

Error Types

The most important in a nutshell

Error Types

How to formulate a null and an alternative hypothesis
The principle of a statistical test, using a simple test statistic, e.g. a mean
Using that test statistic to make a frequentist statistical decision
Understand that in this decision, we can make two correct decisions, and two errors, namely a type I and a type II error