Understanding ANOVA

Data set 1: mean = -3, 0, 3; sd = 1; normal distribution

library(ggplot2)
library(car)

dat1 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -3), rnorm(200, mean = 0), rnorm(200, mean = 3)))

What would it look like? Try drawing a sketch of the data.

Data set 2: mean = -1, 0, 1; sd = 1; normal distribution

dat2 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -1), rnorm(200, mean = 0), rnorm(200, mean = 1)))

In which ways would this data set be different from the first one?

What will happen to the shape of the distributions?

Data set 3: mean = -3, 0, 3; sd = 3; normal distribution

dat3 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -3, sd = 3), rnorm(200, mean = 0, sd = 3), rnorm(200, mean = 3, sd = 3)))

How would this data set bet different from the first and the second one?

dat3:

reminds of a group of ghosts from a music video.

Data set 4: mean = -1, 0, 1; sd = 3; normal distribution

dat4 <- data.frame(cond = factor(rep(c("A", "B", "C"), each = 200)), rating = c(rnorm(200, 
    mean = -1, sd = 3), rnorm(200, mean = 0, sd = 3), rnorm(200, mean = 1, sd = 3)))

Compare oneway ANOVA results for the 4 datasets

anova1 <- aov(dat1$rating ~ dat1$cond, var.equal = T) 
anova2 <- aov(dat2$rating ~ dat2$cond, var.equal = T) 
anova3 <- aov(dat3$rating ~ dat3$cond, var.equal = T) 
anova4 <- aov(dat4$rating ~ dat4$cond, var.equal = T)

Do you expect the results to be the same?
Which of the four data sets will have the largest F?
Would the degrees of freedom be the same across the four data sets?

Summary table:

	data1	data2	data3	data4
group mean 1	-3.000	-1.000	-3.000	-1.000
group mean 2	0.000	0.000	0.000	0.000
group mean 3	3.000	1.000	3.000	1.000
SD	1.000	1.000	3.000	3.000
F-statistic	1690.600	207.030	193.170	35.000
omega-squared	0.853	0.371	0.383	0.102
df num	2.000	2.000	2.000	2.000
df denom	597.000	597.000	597.000	597.000
p-value	0.000	0.000	0.000	0.000

Take home messages

Variance in the “Analysis of variance” refers to how diverse the observations are within each group that you compare.
The larger the variance, the looser the groups, and the lower the F-ratio.
If you have tightly distributed groups with the widest differences in mean values, the chances are higher that the differences between the group means will be statistically significant.

Understanding ANOVA

by Olesya Volchenko, v.2

It can be hard to imagine what ‘variances’ mean.

Here is an example: