Lecture 11 Getting R do ANOVAs and the alternatives

Eamonn Mallon
9/11/2020

An R command to do all this automatically

oneway <- read.csv("~/Dropbox/Teaching/old_teaching/zipped/oneway.csv") #Just getting the data in
model_ozone<-lm(oneway$ozone~oneway$garden) #Creates the linear model
ozone_anova<-aov(model_ozone) #Creates the anova from the linear model
summary(ozone_anova) #Outputs an ANOVA table
              Df Sum Sq Mean Sq F value  Pr(>F)   
oneway$garden  1     20  20.000      15 0.00111 **
Residuals     18     24   1.333                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The gardens differ in their ozone level (One-way ANOVA: \( F_{1,18} \) = 15.0, p = 0.0011). This is the correct way to report it

Does feed type affect weight in chickens (4 treatments)

plot of chunk unnamed-chunk-2

Does feed type affect weight in chickens (4 treatments)

model_weight<-lm(weight~Diet, data = ChickWeight)
chick_anova<-aov(model_weight)
summary(chick_anova)
             Df  Sum Sq Mean Sq F value   Pr(>F)    
Diet          3  155863   51954   10.81 6.43e-07 ***
Residuals   574 2758693    4806                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Does feed type affect weight in chickens (4 treatments)

Diet type affects the weight of chickens (One-way ANOVA: \( F_{3,574} \) =10.81, p = \( 6.433 \times 10^{-7} \))

Great, which diet is best? Eh ANOVA doesn't tell you, it just says diet has an effect

Tukey's post hoc test

TukeyHSD(chick_anova)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = model_weight)

$Diet
         diff         lwr      upr     p adj
2-1 19.971212  -0.2998092 40.24223 0.0552271
3-1 40.304545  20.0335241 60.57557 0.0000025
4-1 32.617257  12.2353820 52.99913 0.0002501
3-2 20.333333  -2.7268370 43.39350 0.1058474
4-2 12.646045 -10.5116315 35.80372 0.4954239
4-3 -7.687288 -30.8449649 15.47039 0.8277810

Think of them like legit t-tests.

Assumptions of an ANOVA

  • Independence of observations .
  • Normality – the distributions of the residuals are normal. (Robust)
  • Homoscedasticity — the variance of data in groups should be the same.

Assumptions of an ANOVA

library("ggfortify")
autoplot(model_weight)

plot of chunk unnamed-chunk-5

Non-parametric options

  • Kruskal–Wallis oneway ANOVA on ranks
  • Dunn's test (post-hoc test)

ANOVA is much more

  • What if you are interested in two (or more factors)?
  • It would be cool to know if these factors interact
  • ANOVA (repeated, nested, multiway) can do this and more by partioning out the variance just like in the one way example.
  • Imagine you are looking at the effect of two drugs. You measure men and women.
    • ANOVA can remove the variation due to sex (if its uninteresting), statistically allowing you to act like you controlled for sex experimentally
    • And/or it can check the interaction between drug and sex, letting you say which drug is better for men and which is better for women.