This document is a quick sanity check to see if ANOVA outputs from R and SPSS are equivalent (i.e., am I specifying the parameters correctly)
We will use also look at two ways of calculating ANOVA in R - the aov function from base R and the ezANOVA function from the ez package.
Firstly let’s load the packages we need.
require(dplyr)
require(ez)
require(tidyr)
require(knitr)
options(scipen=999) # turn off scientific notation (so p-values are readable)
Now we’ll load in some practice data. This is a ‘depression’ dataset from Andy Field (he specifies how to do the SPSS analysis with contrasts here: http://twu.seanho.com/08fall/cpsy501/field/ContrastsUsingSyntax.pdf)
This is a 5x2 mixed factorial design with: - one between-subjects factor (Treatment: 5 levels: No-Treatment, Placebo, Seroxat, Effexor, Cheerup) - one within-subjects factor (Time: 2 levels: Before, After)
This will load the data and make sure it is in ‘long’ format:
filename <- file.path(getwd(), "depression.csv") # identify data location
depressionData <- read.csv(filename, header=TRUE) %>% # read in the data
gather(condition, value, before:after) %>% # convert to long format
rename(time = condition) %>% # rename condition column as 'time' to make things clearer
rename(treatment = treat) %>% # rename treat column as 'treatment' to make things clearer
mutate(participant = factor(participant)) %>% # make sure that participant column is of factor type
mutate(treatment = factor(treatment)) # make sure treatment column is of factor type
Here’s what the data looks like:
participant | treatment | time | value |
---|---|---|---|
1 | 0 | before | 13 |
2 | 0 | before | 10 |
3 | 0 | before | 19 |
4 | 0 | before | 20 |
5 | 0 | before | 18 |
6 | 0 | before | 14 |
7 | 0 | before | 16 |
8 | 0 | before | 20 |
9 | 0 | before | 15 |
10 | 0 | before | 16 |
11 | 1 | before | 16 |
12 | 1 | before | 14 |
13 | 1 | before | 20 |
14 | 1 | before | 14 |
15 | 1 | before | 16 |
16 | 1 | before | 12 |
17 | 1 | before | 12 |
18 | 1 | before | 20 |
19 | 1 | before | 19 |
20 | 1 | before | 19 |
21 | 2 | before | 17 |
22 | 2 | before | 14 |
23 | 2 | before | 20 |
24 | 2 | before | 16 |
25 | 2 | before | 17 |
26 | 2 | before | 20 |
27 | 2 | before | 16 |
28 | 2 | before | 16 |
29 | 2 | before | 19 |
30 | 2 | before | 14 |
31 | 3 | before | 17 |
32 | 3 | before | 20 |
33 | 3 | before | 13 |
34 | 3 | before | 16 |
35 | 3 | before | 15 |
36 | 3 | before | 16 |
37 | 3 | before | 14 |
38 | 3 | before | 20 |
39 | 3 | before | 20 |
40 | 3 | before | 13 |
41 | 4 | before | 17 |
42 | 4 | before | 18 |
43 | 4 | before | 17 |
44 | 4 | before | 18 |
45 | 4 | before | 17 |
46 | 4 | before | 16 |
47 | 4 | before | 19 |
48 | 4 | before | 17 |
49 | 4 | before | 17 |
50 | 4 | before | 18 |
1 | 0 | after | 16 |
2 | 0 | after | 18 |
3 | 0 | after | 13 |
4 | 0 | after | 15 |
5 | 0 | after | 18 |
6 | 0 | after | 16 |
7 | 0 | after | 18 |
8 | 0 | after | 19 |
9 | 0 | after | 9 |
10 | 0 | after | 16 |
11 | 1 | after | 13 |
12 | 1 | after | 12 |
13 | 1 | after | 11 |
14 | 1 | after | 8 |
15 | 1 | after | 13 |
16 | 1 | after | 8 |
17 | 1 | after | 9 |
18 | 1 | after | 15 |
19 | 1 | after | 16 |
20 | 1 | after | 12 |
21 | 2 | after | 15 |
22 | 2 | after | 19 |
23 | 2 | after | 14 |
24 | 2 | after | 16 |
25 | 2 | after | 10 |
26 | 2 | after | 11 |
27 | 2 | after | 11 |
28 | 2 | after | 13 |
29 | 2 | after | 15 |
30 | 2 | after | 10 |
31 | 3 | after | 19 |
32 | 3 | after | 12 |
33 | 3 | after | 10 |
34 | 3 | after | 15 |
35 | 3 | after | 15 |
36 | 3 | after | 11 |
37 | 3 | after | 17 |
38 | 3 | after | 13 |
39 | 3 | after | 12 |
40 | 3 | after | 18 |
41 | 4 | after | 11 |
42 | 4 | after | 10 |
43 | 4 | after | 13 |
44 | 4 | after | 5 |
45 | 4 | after | 10 |
46 | 4 | after | 11 |
47 | 4 | after | 11 |
48 | 4 | after | 14 |
49 | 4 | after | 7 |
50 | 4 | after | 12 |
First up, we’ll use the aov function.
aov_outcome <- aov(value ~treatment*time + Error(participant/time), data = depressionData)
In plain English this reads as “value as a function of treatment by time plus within-subjects error”
Let’s run an ANOVA with ezANOVA too, the function is a little more readable (although NB. ‘wid’ means ‘individual cases’ which for psychology experiments is typically the participants vector):
ez_outcome <- ezANOVA(
data = depressionData,
dv = value,
wid = participant,
within = time,
between = treatment,
type = 3,
detailed = TRUE,
return_aov = TRUE)
Now let’s compare the output. ezANOVA outputs a useful summary that looks like this:
Effect | DFn | DFd | SSn | SSd | F | p | p<.05 | ges |
---|---|---|---|---|---|---|---|---|
(Intercept) | 1 | 45 | 22052.25 | 359.95 | 2756.914155 | 0.0000000 | * | 0.9700737 |
treatment | 4 | 45 | 64.30 | 359.95 | 2.009654 | 0.1092469 | 0.0863551 | |
time | 1 | 45 | 306.25 | 320.35 | 43.019354 | 0.0000000 | * | 0.3104252 |
treatment:time | 4 | 45 | 125.90 | 320.35 | 4.421336 | 0.0042385 | * | 0.1561647 |
ezANOVA also outputs an aov object. So let’s see if we have the same output from running ezANOVA and running from aov directly. Here is the ezANOVA aov object:
summary(ez_outcome$aov)
##
## Error: participant
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 4 64.3 16.075 2.01 0.109
## Residuals 45 359.9 7.999
##
## Error: participant:time
## Df Sum Sq Mean Sq F value Pr(>F)
## time 1 306.2 306.25 43.019 0.0000000461 ***
## treatment:time 4 125.9 31.48 4.421 0.00424 **
## Residuals 45 320.4 7.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and the direct aov output:
summary(aov_outcome)
##
## Error: participant
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 4 64.3 16.075 2.01 0.109
## Residuals 45 360.0 7.999
##
## Error: participant:time
## Df Sum Sq Mean Sq F value Pr(>F)
## time 1 306.2 306.25 43.019 0.0000000461 ***
## treatment:time 4 125.9 31.48 4.421 0.00424 **
## Residuals 45 320.4 7.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Phew, they are the same - somebody, somewhere, is doing something right. Now let’s look at the table of means from ezANOVA:
model.tables(ez_outcome$aov, "means")
## Tables of means
## Grand mean
##
## 14.85
##
## treatment
## treatment
## 0 1 2 3 4
## 15.95 13.95 15.15 15.30 13.90
##
## time
## time
## before after
## 16.6 13.1
##
## treatment:time
## time
## treatment before after
## 0 16.1 15.8
## 1 16.2 11.7
## 2 16.9 13.4
## 3 16.4 14.2
## 4 17.4 10.4
and from aov directly:
model.tables(aov_outcome, "means")
## Tables of means
## Grand mean
##
## 14.85
##
## treatment
## treatment
## 0 1 2 3 4
## 15.95 13.95 15.15 15.30 13.90
##
## time
## time
## before after
## 16.6 13.1
##
## treatment:time
## time
## treatment before after
## 0 16.1 15.8
## 1 16.2 11.7
## 2 16.9 13.4
## 3 16.4 14.2
## 4 17.4 10.4
Funky. Now we want to know if students of the SPSS variety will also get this output. If we load up the data in SPSS (see file: depression.sav) and using the gui click through: analyse >>> general linear model >>> repeated measures >>> use the within-subject factor name ‘time’ and have 2 levels >>> assign before and after to the within-subjects levels and treat as the between-subjects factor >>> use options to turn on ‘descriptives’ and bob’s your uncle.
Here is the output. The descriptives:
Ok on the same wavelength. Now within- and between- effects
Compared to the ezANOVA output…
ez_outcome$ANOVA
## Effect DFn DFd SSn SSd F
## 1 (Intercept) 1 45 22052.25 359.95 2756.914155
## 2 treatment 4 45 64.30 359.95 2.009654
## 3 time 1 45 306.25 320.35 43.019354
## 4 treatment:time 4 45 125.90 320.35 4.421336
## p p<.05 ges
## 1 0.000000000000000000000000000000000000000005079656 * 0.97007375
## 2 0.109246942702084889886826601923530688509345054626 0.08635509
## 3 0.000000046073539132099580600854824095208295275938 * 0.31042522
## 4 0.004238472987990770865507350606549152871593832970 * 0.15616472
We could also use the following SPSS syntax:
MANOVA
before after BY treat(0,4)
/WSFACTORS=time(2)
/METHOD UNIQUE
/ERROR WITHIN+RESIDUAL.
To get the same result:
This all matches up will Andy Field’s output: http://twu.seanho.com/08fall/cpsy501/field/ContrastsUsingSyntax.pdf
Looks like we’re all on the same page.