This document is a quick sanity check to see if ANOVA outputs from R and SPSS are equivalent (i.e., am I specifying the parameters correctly)

We will use also look at two ways of calculating ANOVA in R - the aov function from base R and the ezANOVA function from the ez package.

Set-up

Firstly let’s load the packages we need.

require(dplyr)
require(ez)
require(tidyr)
require(knitr)
options(scipen=999) # turn off scientific notation (so p-values are readable)

Now we’ll load in some practice data. This is a ‘depression’ dataset from Andy Field (he specifies how to do the SPSS analysis with contrasts here: http://twu.seanho.com/08fall/cpsy501/field/ContrastsUsingSyntax.pdf)

This is a 5x2 mixed factorial design with: - one between-subjects factor (Treatment: 5 levels: No-Treatment, Placebo, Seroxat, Effexor, Cheerup) - one within-subjects factor (Time: 2 levels: Before, After)

This will load the data and make sure it is in ‘long’ format:

filename <- file.path(getwd(), "depression.csv") # identify data location
depressionData <- read.csv(filename, header=TRUE) %>% # read in the data
  gather(condition, value, before:after) %>% # convert to long format
  rename(time = condition) %>% # rename condition column as 'time' to make things clearer
  rename(treatment = treat) %>% # rename treat column as 'treatment' to make things clearer
  mutate(participant = factor(participant)) %>% # make sure that participant column is of factor type
  mutate(treatment = factor(treatment)) # make sure treatment column is of factor type

Here’s what the data looks like:

participant treatment time value
1 0 before 13
2 0 before 10
3 0 before 19
4 0 before 20
5 0 before 18
6 0 before 14
7 0 before 16
8 0 before 20
9 0 before 15
10 0 before 16
11 1 before 16
12 1 before 14
13 1 before 20
14 1 before 14
15 1 before 16
16 1 before 12
17 1 before 12
18 1 before 20
19 1 before 19
20 1 before 19
21 2 before 17
22 2 before 14
23 2 before 20
24 2 before 16
25 2 before 17
26 2 before 20
27 2 before 16
28 2 before 16
29 2 before 19
30 2 before 14
31 3 before 17
32 3 before 20
33 3 before 13
34 3 before 16
35 3 before 15
36 3 before 16
37 3 before 14
38 3 before 20
39 3 before 20
40 3 before 13
41 4 before 17
42 4 before 18
43 4 before 17
44 4 before 18
45 4 before 17
46 4 before 16
47 4 before 19
48 4 before 17
49 4 before 17
50 4 before 18
1 0 after 16
2 0 after 18
3 0 after 13
4 0 after 15
5 0 after 18
6 0 after 16
7 0 after 18
8 0 after 19
9 0 after 9
10 0 after 16
11 1 after 13
12 1 after 12
13 1 after 11
14 1 after 8
15 1 after 13
16 1 after 8
17 1 after 9
18 1 after 15
19 1 after 16
20 1 after 12
21 2 after 15
22 2 after 19
23 2 after 14
24 2 after 16
25 2 after 10
26 2 after 11
27 2 after 11
28 2 after 13
29 2 after 15
30 2 after 10
31 3 after 19
32 3 after 12
33 3 after 10
34 3 after 15
35 3 after 15
36 3 after 11
37 3 after 17
38 3 after 13
39 3 after 12
40 3 after 18
41 4 after 11
42 4 after 10
43 4 after 13
44 4 after 5
45 4 after 10
46 4 after 11
47 4 after 11
48 4 after 14
49 4 after 7
50 4 after 12

ANOVA in R

First up, we’ll use the aov function.

aov_outcome <- aov(value ~treatment*time + Error(participant/time), data = depressionData)

In plain English this reads as “value as a function of treatment by time plus within-subjects error”

Let’s run an ANOVA with ezANOVA too, the function is a little more readable (although NB. ‘wid’ means ‘individual cases’ which for psychology experiments is typically the participants vector):

ez_outcome <- ezANOVA(
  data = depressionData,
  dv = value,
  wid = participant,
  within = time,
  between = treatment,
  type = 3,
  detailed = TRUE,
  return_aov = TRUE)

Now let’s compare the output. ezANOVA outputs a useful summary that looks like this:

Effect DFn DFd SSn SSd F p p<.05 ges
(Intercept) 1 45 22052.25 359.95 2756.914155 0.0000000 * 0.9700737
treatment 4 45 64.30 359.95 2.009654 0.1092469 0.0863551
time 1 45 306.25 320.35 43.019354 0.0000000 * 0.3104252
treatment:time 4 45 125.90 320.35 4.421336 0.0042385 * 0.1561647

ezANOVA also outputs an aov object. So let’s see if we have the same output from running ezANOVA and running from aov directly. Here is the ezANOVA aov object:

summary(ez_outcome$aov)
## 
## Error: participant
##           Df Sum Sq Mean Sq F value Pr(>F)
## treatment  4   64.3  16.075    2.01  0.109
## Residuals 45  359.9   7.999               
## 
## Error: participant:time
##                Df Sum Sq Mean Sq F value       Pr(>F)    
## time            1  306.2  306.25  43.019 0.0000000461 ***
## treatment:time  4  125.9   31.48   4.421      0.00424 ** 
## Residuals      45  320.4    7.12                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and the direct aov output:

summary(aov_outcome)
## 
## Error: participant
##           Df Sum Sq Mean Sq F value Pr(>F)
## treatment  4   64.3  16.075    2.01  0.109
## Residuals 45  360.0   7.999               
## 
## Error: participant:time
##                Df Sum Sq Mean Sq F value       Pr(>F)    
## time            1  306.2  306.25  43.019 0.0000000461 ***
## treatment:time  4  125.9   31.48   4.421      0.00424 ** 
## Residuals      45  320.4    7.12                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Phew, they are the same - somebody, somewhere, is doing something right. Now let’s look at the table of means from ezANOVA:

model.tables(ez_outcome$aov, "means")
## Tables of means
## Grand mean
##       
## 14.85 
## 
##  treatment 
## treatment
##     0     1     2     3     4 
## 15.95 13.95 15.15 15.30 13.90 
## 
##  time 
## time
## before  after 
##   16.6   13.1 
## 
##  treatment:time 
##          time
## treatment before after
##         0 16.1   15.8 
##         1 16.2   11.7 
##         2 16.9   13.4 
##         3 16.4   14.2 
##         4 17.4   10.4

and from aov directly:

model.tables(aov_outcome, "means")
## Tables of means
## Grand mean
##       
## 14.85 
## 
##  treatment 
## treatment
##     0     1     2     3     4 
## 15.95 13.95 15.15 15.30 13.90 
## 
##  time 
## time
## before  after 
##   16.6   13.1 
## 
##  treatment:time 
##          time
## treatment before after
##         0 16.1   15.8 
##         1 16.2   11.7 
##         2 16.9   13.4 
##         3 16.4   14.2 
##         4 17.4   10.4

ANOVA in SPSS

Funky. Now we want to know if students of the SPSS variety will also get this output. If we load up the data in SPSS (see file: depression.sav) and using the gui click through: analyse >>> general linear model >>> repeated measures >>> use the within-subject factor name ‘time’ and have 2 levels >>> assign before and after to the within-subjects levels and treat as the between-subjects factor >>> use options to turn on ‘descriptives’ and bob’s your uncle.

Here is the output. The descriptives:

descriptives

Ok on the same wavelength. Now within- and between- effects

within between

Compared to the ezANOVA output…

ez_outcome$ANOVA
##           Effect DFn DFd      SSn    SSd           F
## 1    (Intercept)   1  45 22052.25 359.95 2756.914155
## 2      treatment   4  45    64.30 359.95    2.009654
## 3           time   1  45   306.25 320.35   43.019354
## 4 treatment:time   4  45   125.90 320.35    4.421336
##                                                    p p<.05        ges
## 1 0.000000000000000000000000000000000000000005079656     * 0.97007375
## 2 0.109246942702084889886826601923530688509345054626       0.08635509
## 3 0.000000046073539132099580600854824095208295275938     * 0.31042522
## 4 0.004238472987990770865507350606549152871593832970     * 0.15616472

We could also use the following SPSS syntax:


MANOVA
before after BY treat(0,4)
/WSFACTORS=time(2)
/METHOD UNIQUE
/ERROR WITHIN+RESIDUAL.

To get the same result: syntax

This all matches up will Andy Field’s output: http://twu.seanho.com/08fall/cpsy501/field/ContrastsUsingSyntax.pdf

Looks like we’re all on the same page.