Factorial Repeated Measures ANOVA

This gist pertains to a simple example of repeated measures ANOVA found in Field's Discovering Statistics Using R (1st Edition). There are two categorical variables: 3 levels of a drink variable (beer, wine, and water), and 3 levels of an imagery variable (positive, negative, and neutral), with each participant giving a rating for each cell.

The issue we had with this example is that the data are provided in wide format, but the ANOVA function used requires it in long. This is a demonstration of one way of converting the data from wide to long.

## Factorial Repeated Measures ANOVA in WIDE FORMAT
dat <- structure(list(beerpos = c(1L, 43L, 15L, 40L, 8L, 17L, 30L, 34L, 34L, 
    26L, 1L, 7L, 22L, 30L, 40L, 15L, 20L, 9L, 14L, 15L), beerneg = c(6L, 30L, 
    15L, 30L, 12L, 17L, 21L, 23L, 20L, 27L, -19L, -18L, -8L, -6L, -6L, -9L, 
    -17L, -12L, -11L, -6L), beerneut = c(5L, 8L, 12L, 19L, 8L, 15L, 21L, 28L, 
    26L, 27L, -10L, 6L, 4L, 3L, 0L, 4L, 9L, -5L, 7L, 13L), winepos = c(38L, 
    20L, 20L, 28L, 11L, 17L, 15L, 27L, 24L, 23L, 28L, 26L, 34L, 32L, 24L, 29L, 
    30L, 24L, 34L, 23L), wineneg = c(-5L, -12L, -15L, -4L, -2L, -6L, -2L, -7L, 
    -10L, -15L, -13L, -16L, -23L, -22L, -9L, -18L, -17L, -15L, -14L, -15L), 
    wineneut = c(4L, 4L, 6L, 0L, 6L, 6L, 16L, 7L, 12L, 14L, 13L, 19L, 14L, 21L, 
        19L, 7L, 12L, 18L, 20L, 15L), waterpos = c(10L, 9L, 6L, 20L, 27L, 9L, 
        19L, 12L, 12L, 21L, 33L, 23L, 21L, 17L, 15L, 13L, 16L, 17L, 19L, 29L), 
    waterneg = c(-14L, -10L, -16L, -10L, 5L, -6L, -20L, -12L, -9L, -6L, -2L, 
        -17L, -19L, -11L, -10L, -17L, -4L, -4L, -1L, -1L), waterneu = c(-2L, 
        -13L, 1L, 2L, -5L, -13L, 3L, 2L, 4L, 0L, 9L, 5L, 0L, 4L, 2L, 8L, 10L, 
        8L, 12L, 10L), participant = structure(c(1L, 12L, 14L, 15L, 16L, 17L, 
        18L, 19L, 20L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L), .Label = c("P1", 
        "P10", "P11", "P12", "P13", "P14", "P15", "P16", "P17", "P18", "P19", 
        "P2", "P20", "P3", "P4", "P5", "P6", "P7", "P8", "P9"), class = "factor")), 
    .Names = c("beerpos", "beerneg", "beerneut", "winepos", "wineneg", "wineneut", 
        "waterpos", "waterneg", "waterneu", "participant"), class = "data.frame", 
    row.names = c(NA, -20L))

The first step is to look at the structure of the supplied data:

str(dat)
## 'data.frame':    20 obs. of  10 variables:
##  $ beerpos    : int  1 43 15 40 8 17 30 34 34 26 ...
##  $ beerneg    : int  6 30 15 30 12 17 21 23 20 27 ...
##  $ beerneut   : int  5 8 12 19 8 15 21 28 26 27 ...
##  $ winepos    : int  38 20 20 28 11 17 15 27 24 23 ...
##  $ wineneg    : int  -5 -12 -15 -4 -2 -6 -2 -7 -10 -15 ...
##  $ wineneut   : int  4 4 6 0 6 6 16 7 12 14 ...
##  $ waterpos   : int  10 9 6 20 27 9 19 12 12 21 ...
##  $ waterneg   : int  -14 -10 -16 -10 5 -6 -20 -12 -9 -6 ...
##  $ waterneu   : int  -2 -13 1 2 -5 -13 3 2 4 0 ...
##  $ participant: Factor w/ 20 levels "P1","P10","P11",..: 1 12 14 15 16 17 18 19 20 2 ...

In order to convert this to long format, we need to use the melt and cast functions in reshape2:

require(reshape2)
## Loading required package: reshape2
datl <- melt(dat, id = "participant")
str(datl)
## 'data.frame':    180 obs. of  3 variables:
##  $ participant: Factor w/ 20 levels "P1","P10","P11",..: 1 12 14 15 16 17 18 19 20 2 ...
##  $ variable   : Factor w/ 9 levels "beerpos","beerneg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value      : int  1 43 15 40 8 17 30 34 34 26 ...

The output of melt is a dataframe with 180 observations. This is actually almost the correct format - the issue is that the “variable” variable is currently a factor with 9 levels - each combination of drink with imagery condition. What we actually need is two variables: one for drink and one for imagery.

To do this, we have to manipulate the string. To use the substr command, we will want the imagery condition strings to be the same length (“neut” is one character longer than “pos” and “neg”), and the drink condition strings to be the same length (“water” is one character longer than “wine” and “beer”).

# Truncate factor levels to 3 characters:
datl$variable <- gsub("neut", "neu", datl$variable)
datl$variable <- gsub("water", "wate", datl$variable)

# Create separate variables for drink and imagery:
datl$drink <- substr(datl$variable, start = 1, stop = 4)
datl$imagery <- substr(datl$variable, start = 5, stop = 7)

# Declare new variables as factors and supply full names for levels:
datl$drink <- factor(datl$drink, labels = c("beer", "water", "wine"))
datl$imagery <- factor(datl$imagery, labels = c("negative", "neutral", "positive"))

Almost there! Now, our data has:

str(datl)
## 'data.frame':    180 obs. of  5 variables:
##  $ participant: Factor w/ 20 levels "P1","P10","P11",..: 1 12 14 15 16 17 18 19 20 2 ...
##  $ variable   : chr  "beerpos" "beerpos" "beerpos" "beerpos" ...
##  $ value      : int  1 43 15 40 8 17 30 34 34 26 ...
##  $ drink      : Factor w/ 3 levels "beer","water",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ imagery    : Factor w/ 3 levels "negative","neutral",..: 3 3 3 3 3 3 3 3 3 3 ...

To clean it up, run the object through dcast (which takes and returns a dataframe). This will get rid of the variable we no longer need (the character “variable”), and restructure the dataframe to be in a meaningful order. Note the use of the ~ 1 because we have no variables that are constant across timepoints.

datl2 <- dcast(datl, participant + imagery + drink ~ 1)
names(datl2)[4] <- "attitude"
str(datl2)
## 'data.frame':    180 obs. of  4 variables:
##  $ participant: Factor w/ 20 levels "P1","P10","P11",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ imagery    : Factor w/ 3 levels "negative","neutral",..: 1 1 1 2 2 2 3 3 3 1 ...
##  $ drink      : Factor w/ 3 levels "beer","water",..: 1 2 3 1 2 3 1 2 3 1 ...
##  $ attitude   : int  6 -14 -5 5 -2 4 1 10 38 27 ...

To make sure this dataframe lines up as it should, let's look at a plot and the basic ANOVA:

require(ggplot2)
## Loading required package: ggplot2
bar1 <- ggplot(datl2, aes(imagery, attitude, colour = drink, group = drink))
bar1 + stat_summary(fun.y = mean, geom = "point") + stat_summary(fun.y = mean, 
    geom = "line") + labs(x = "Valence of Imagery", y = "Mean attitude toward beverage")

plot of chunk unnamed-chunk-1

library(ez)
ezANOVA(data = datl2, dv = attitude, wid = participant, within = c(imagery, 
    drink), detailed = T)
## $ANOVA
##          Effect DFn DFd   SSn  SSd       F         p p<.05    ges
## 1   (Intercept)   1  19 11218 1920 111.005 2.255e-09     * 0.4127
## 2       imagery   2  38 21629 3353 122.565 2.680e-17     * 0.5753
## 3         drink   2  38  2092 7786   5.106 1.086e-02     * 0.1159
## 4 imagery:drink   4  76  2624 2907  17.155 4.589e-10     * 0.1412
## 
## $`Mauchly's Test for Sphericity`
##          Effect      W         p p<.05
## 2       imagery 0.6621 2.445e-02     *
## 3         drink 0.2672 6.952e-06     *
## 4 imagery:drink 0.5950 4.357e-01      
## 
## $`Sphericity Corrections`
##          Effect    GGe     p[GG] p[GG]<.05    HFe     p[HF] p[HF]<.05
## 2       imagery 0.7474 1.757e-13         * 0.7968 3.143e-14         *
## 3         drink 0.5771 2.977e-02         * 0.5907 2.881e-02         *
## 4 imagery:drink 0.7984 1.900e-08         * 0.9786 6.810e-10         *