This document is a quick sanity check to see if ANOVA outputs from R and SPSS are equivalent (i.e., am I specifying the parameters correctly)

We will use also look at two ways of calculating ANOVA in R - the aov function from base R and the ezANOVA function from the ez package.

Set-up

Firstly let’s load the packages we need.

require(dplyr)
require(ez)
require(tidyr)
require(knitr)
options(scipen=999) # turn off scientific notation (so p-values are readable)

Now we’ll load in some practice data. This is a ‘depression’ dataset from Andy Field (he specifies how to do the SPSS analysis with contrasts here: http://twu.seanho.com/08fall/cpsy501/field/ContrastsUsingSyntax.pdf)

This is a 5x2 mixed factorial design with: - one between-subjects factor (Treatment: 5 levels: No-Treatment, Placebo, Seroxat, Effexor, Cheerup) - one within-subjects factor (Time: 2 levels: Before, After)

This will load the data and make sure it is in ‘long’ format:

filename <- file.path(getwd(), "depression.csv") # identify data location
depressionData <- read.csv(filename, header=TRUE) %>% # read in the data
  gather(condition, value, before:after) %>% # convert to long format
  rename(time = condition) %>% # rename condition column as 'time' to make things clearer
  rename(treatment = treat) %>% # rename treat column as 'treatment' to make things clearer
  mutate(participant = factor(participant)) %>% # make sure that participant column is of factor type
  mutate(treatment = factor(treatment)) # make sure treatment column is of factor type

Here’s what the data looks like:

participant	treatment	time	value
1	0	before	13
2	0	before	10
3	0	before	19
4	0	before	20
5	0	before	18
6	0	before	14
7	0	before	16
8	0	before	20
9	0	before	15
10	0	before	16
11	1	before	16
12	1	before	14
13	1	before	20
14	1	before	14
15	1	before	16
16	1	before	12
17	1	before	12
18	1	before	20
19	1	before	19
20	1	before	19
21	2	before	17
22	2	before	14
23	2	before	20
24	2	before	16
25	2	before	17
26	2	before	20
27	2	before	16
28	2	before	16
29	2	before	19
30	2	before	14
31	3	before	17
32	3	before	20
33	3	before	13
34	3	before	16
35	3	before	15
36	3	before	16
37	3	before	14
38	3	before	20
39	3	before	20
40	3	before	13
41	4	before	17
42	4	before	18
43	4	before	17
44	4	before	18
45	4	before	17
46	4	before	16
47	4	before	19
48	4	before	17
49	4	before	17
50	4	before	18
1	0	after	16
2	0	after	18
3	0	after	13
4	0	after	15
5	0	after	18
6	0	after	16
7	0	after	18
8	0	after	19
9	0	after	9
10	0	after	16
11	1	after	13
12	1	after	12
13	1	after	11
14	1	after	8
15	1	after	13
16	1	after	8
17	1	after	9
18	1	after	15
19	1	after	16
20	1	after	12
21	2	after	15
22	2	after	19
23	2	after	14
24	2	after	16
25	2	after	10
26	2	after	11
27	2	after	11
28	2	after	13
29	2	after	15
30	2	after	10
31	3	after	19
32	3	after	12
33	3	after	10
34	3	after	15
35	3	after	15
36	3	after	11
37	3	after	17
38	3	after	13
39	3	after	12
40	3	after	18
41	4	after	11
42	4	after	10
43	4	after	13
44	4	after	5
45	4	after	10
46	4	after	11
47	4	after	11
48	4	after	14
49	4	after	7
50	4	after	12

ANOVA in R

First up, we’ll use the aov function.

aov_outcome <- aov(value ~treatment*time + Error(participant/time), data = depressionData)

In plain English this reads as “value as a function of treatment by time plus within-subjects error”

Let’s run an ANOVA with ezANOVA too, the function is a little more readable (although NB. ‘wid’ means ‘individual cases’ which for psychology experiments is typically the participants vector):

ez_outcome <- ezANOVA(
  data = depressionData,
  dv = value,
  wid = participant,
  within = time,
  between = treatment,
  type = 3,
  detailed = TRUE,
  return_aov = TRUE)

Now let’s compare the output. ezANOVA outputs a useful summary that looks like this:

Effect	DFn	DFd	SSn	SSd	F	p	p<.05	ges
(Intercept)	1	45	22052.25	359.95	2756.914155	0.0000000	*	0.9700737
treatment	4	45	64.30	359.95	2.009654	0.1092469		0.0863551
time	1	45	306.25	320.35	43.019354	0.0000000	*	0.3104252
treatment:time	4	45	125.90	320.35	4.421336	0.0042385	*	0.1561647

ezANOVA also outputs an aov object. So let’s see if we have the same output from running ezANOVA and running from aov directly. Here is the ezANOVA aov object:

summary(ez_outcome$aov)

## 
## Error: participant
##           Df Sum Sq Mean Sq F value Pr(>F)
## treatment  4   64.3  16.075    2.01  0.109
## Residuals 45  359.9   7.999               
## 
## Error: participant:time
##                Df Sum Sq Mean Sq F value       Pr(>F)    
## time            1  306.2  306.25  43.019 0.0000000461 ***
## treatment:time  4  125.9   31.48   4.421      0.00424 ** 
## Residuals      45  320.4    7.12                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and the direct aov output:

summary(aov_outcome)

## 
## Error: participant
##           Df Sum Sq Mean Sq F value Pr(>F)
## treatment  4   64.3  16.075    2.01  0.109
## Residuals 45  360.0   7.999               
## 
## Error: participant:time
##                Df Sum Sq Mean Sq F value       Pr(>F)    
## time            1  306.2  306.25  43.019 0.0000000461 ***
## treatment:time  4  125.9   31.48   4.421      0.00424 ** 
## Residuals      45  320.4    7.12                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Phew, they are the same - somebody, somewhere, is doing something right. Now let’s look at the table of means from ezANOVA:

model.tables(ez_outcome$aov, "means")

## Tables of means
## Grand mean
##       
## 14.85 
## 
##  treatment 
## treatment
##     0     1     2     3     4 
## 15.95 13.95 15.15 15.30 13.90 
## 
##  time 
## time
## before  after 
##   16.6   13.1 
## 
##  treatment:time 
##          time
## treatment before after
##         0 16.1   15.8 
##         1 16.2   11.7 
##         2 16.9   13.4 
##         3 16.4   14.2 
##         4 17.4   10.4

and from aov directly:

model.tables(aov_outcome, "means")

## Tables of means
## Grand mean
##       
## 14.85 
## 
##  treatment 
## treatment
##     0     1     2     3     4 
## 15.95 13.95 15.15 15.30 13.90 
## 
##  time 
## time
## before  after 
##   16.6   13.1 
## 
##  treatment:time 
##          time
## treatment before after
##         0 16.1   15.8 
##         1 16.2   11.7 
##         2 16.9   13.4 
##         3 16.4   14.2 
##         4 17.4   10.4

ANOVA in SPSS

Funky. Now we want to know if students of the SPSS variety will also get this output. If we load up the data in SPSS (see file: depression.sav) and using the gui click through: analyse >>> general linear model >>> repeated measures >>> use the within-subject factor name ‘time’ and have 2 levels >>> assign before and after to the within-subjects levels and treat as the between-subjects factor >>> use options to turn on ‘descriptives’ and bob’s your uncle.

Here is the output. The descriptives:

descriptives

Ok on the same wavelength. Now within- and between- effects

within between

Compared to the ezANOVA output…

ez_outcome$ANOVA

##           Effect DFn DFd      SSn    SSd           F
## 1    (Intercept)   1  45 22052.25 359.95 2756.914155
## 2      treatment   4  45    64.30 359.95    2.009654
## 3           time   1  45   306.25 320.35   43.019354
## 4 treatment:time   4  45   125.90 320.35    4.421336
##                                                    p p<.05        ges
## 1 0.000000000000000000000000000000000000000005079656     * 0.97007375
## 2 0.109246942702084889886826601923530688509345054626       0.08635509
## 3 0.000000046073539132099580600854824095208295275938     * 0.31042522
## 4 0.004238472987990770865507350606549152871593832970     * 0.15616472

We could also use the following SPSS syntax:

MANOVA
before after BY treat(0,4)
/WSFACTORS=time(2)
/METHOD UNIQUE
/ERROR WITHIN+RESIDUAL.

To get the same result: syntax

This all matches up will Andy Field’s output: http://twu.seanho.com/08fall/cpsy501/field/ContrastsUsingSyntax.pdf

Looks like we’re all on the same page.

Comparing ANOVAs in R and SPSS

Tom Hardwicke

25 March 2015

Set-up

ANOVA in R

ANOVA in SPSS