Understanding ANOVA

by Olesya Volchenko, v.2

It can be hard to imagine what ‘variances’ mean.

Here is an example:

Data set 1: mean = -3, 0, 3; sd = 1; normal distribution

library(ggplot2)
library(car)

dat1 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -3), rnorm(200, mean = 0), rnorm(200, mean = 3)))

What would it look like? Try drawing a sketch of the data.

dat1:

Data set 2: mean = -1, 0, 1; sd = 1; normal distribution

dat2 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -1), rnorm(200, mean = 0), rnorm(200, mean = 1)))

In which ways would this data set be different from the first one?

What will happen to the shape of the distributions?

dat2:

Data set 3: mean = -3, 0, 3; sd = 3; normal distribution

dat3 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -3, sd = 3), rnorm(200, mean = 0, sd = 3), rnorm(200, mean = 3, sd = 3)))

How would this data set bet different from the first and the second one?

dat3:

reminds of a group of ghosts from a music video.

Data set 4: mean = -1, 0, 1; sd = 3; normal distribution

dat4 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -1, sd = 3), rnorm(200, mean = 0, sd = 3), rnorm(200, mean = 1, sd = 3)))

dat4:

Compare oneway ANOVA results for the 4 datasets

anova1 <- aov(dat1$rating ~ dat1$cond, var.equal = T) 
anova2 <- aov(dat2$rating ~ dat2$cond, var.equal = T) 
anova3 <- aov(dat3$rating ~ dat3$cond, var.equal = T) 
anova4 <- aov(dat4$rating ~ dat4$cond, var.equal = T) 
  1. Do you expect the results to be the same?

  2. Which of the four data sets will have the largest F?

  3. Would the degrees of freedom be the same across the four data sets?

Summary table:

data1 data2 data3 data4
group mean 1 -3.000 -1.000 -3.000 -1.000
group mean 2 0.000 0.000 0.000 0.000
group mean 3 3.000 1.000 3.000 1.000
SD 1.000 1.000 3.000 3.000
F-statistic 1690.600 207.030 193.170 35.000
omega-squared 0.853 0.371 0.383 0.102
df num 2.000 2.000 2.000 2.000
df denom 597.000 597.000 597.000 597.000
p-value 0.000 0.000 0.000 0.000

Take home messages