Lecture 6 Asking a question in statistics

Eamonn Mallon
7/09/2020

Everything varies (Separating signal from noise)

  • Think about height
  • We need a way of discriminating between variation that is scientifically interesting and variation that just represents background heterogeneity
  • key concept the amount of variation that we would expect to occur by chance alone
  • when we find a difference bigger than this, we say it is statistically significant (a result unlikely to have occurred by chance)

Good and bad hypotheses

A good hypothesis must be capable of rejection (Popper)

  1. There are vultures in the park
  2. There are no vultures in the park

absence of evidence is not evidence of absence

Null hypotheses

The null hypothesis says nothing is happening

  • when comparing two samples' means, the null hypothesis is that the two samples are the same
  • when looking at a graph of y against x, null hypothesis is that y is independent of x

p Values

  • p value is the estimate of the probability that a particular result or an even more extreme could occur by chance, if the null hypothesis were true
  • p < 0.05
  • we can reject the null hypothesis when it is true (Type I error)
  • we can accept the null hypothesis when it is false (Type II errors)
  • How to remember: A pregnant pause

Power

  • The power of a test is the probability of rejecting the null hypothesis when it is false
  • \( \beta \) is the probability of accepting the null hypothesis when it is false (Type II error)
  • \( \beta \) should be as small as possible
  • but the smaller we make \( \beta \) (reducing Type II error), the larger the probability of a Type I error
  • Compromise \( \alpha = 0.05 \) and \( \beta = 0.2 \)
  • power is \( 1 - \beta = 0.8 \)
  • can use this and the variance (\( s^2 \)) to calculate the number of replicates required (n) \[ n \approx \frac{8 \times s^2}{\partial^2} \]