The logic of an Anova

  • imagine the means are not different
    • then the residuals would be the same as the previous graph (because the horizontal lines would not have moved) (SSE = SSY)
  • imagine now that the means are different (the amount of ozone in the two gardens is different)
    • We would predict that the residuals should be smaller when computed from the individual means (SSE) compared to the residuals computed from the overall mean (SSY)
  • We are back to signal versus noise (SSA vs SSE)
  • How do we do that in a test?

An Anova sort of by hand {background-image=“background.jpg”}

So SSA = 44 - 24 = 20 (SSY = SSE + SSA)

An Anova table

Source Sum of squares Degrees of freedom Mean squares F
Garden 20 1 20 15
Error 24 18 s^2 = 1.3333
Total 44 19
  • Degrees of freedom (n-p)
    • Garden: 2 levels, 1 parameter, therefore 2-1
    • Error: 20 samples, 2 parameters (look at the equation). 20-2
    • Total: Add up the other two
  • Mean squares (Mean squared deviation - lecture 2) = SS/df
  • F = Mean squares (treatment) / Mean squares (error) = 20/1.333 [Think signal over noise]

ANOVA is much more

  • What if you are interested in two (or more factors)?
  • It would be cool to know if these factors interact
  • ANOVA (repeated, nested, multiway) can do this and more by partioning out the variance just like in the one way example.
  • Imagine you are looking at the effect of two drugs. You measure men and women.
    • ANOVA can remove the variation due to sex (if its uninteresting), statistically allowing you to act like you controlled for sex experimentally
    • And/or it can check the interaction between drug and sex, letting you say which drug is better for men and which is better for women.