RM Anova

2022-01-23

Introduction

The repeated measures ANOVA makes the following assumptions about the data:

No significant outliers in any cell of the design. This can be checked by visualizing the data using box plot methods and by using the function identify_outliers() [rstatix package].
Normality: the outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (shapiro_test() [rstatix]) or by visual inspection using QQ plot (ggqqplot() [ggpubr package]).
Assumption of sphericity: the variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity, which is automatically reported when using the R function anova_test() [rstatix package].

Note that, if the above assumptions are not met there are a non-parametric alternative (Friedman test) to the one-way repeated measures ANOVA!

Unfortunately, there are no non-parametric alternatives to the two-way and the three-way repeated measures ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way repeated measures ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.

If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way repeated measures ANOVA on the original data.

It’s also possible to perform robust ANOVA test using the WRS2 R package.

No matter your choice, you should report what you did in your results.

RM Anova in R

Key R functions:

anova_test() [rstatix package], a wrapper around car::Anova() for making easy the computation of repeated measures ANOVA. Key arguments for performing repeated measures ANOVA:
- data: data frame
- dv: (numeric) the dependent (or outcome) variable name.
- wid: variable name specifying the case/sample identifier.
- within: within-subjects factor or grouping variable
get_anova_table() [rstatix package]. Extracts the ANOVA table from the output of anova_test(). It returns ANOVA table that is automatically corrected for eventual deviation from the sphericity assumption. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).

1-way RM Anova

The dataset “selfesteem” contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.

data("selfesteem", package = "datarium")
head(selfesteem, 3)

## # A tibble: 3 x 4
##      id    t1    t2    t3
##   <int> <dbl> <dbl> <dbl>
## 1     1  4.01  5.18  7.11
## 2     2  2.56  6.91  6.31
## 3     3  3.24  4.44  9.78

selfesteemcopy <- selfesteem

The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:

selfesteem <- selfesteem %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
head(selfesteem, 3)

## # A tibble: 3 x 3
##   id    time  score
##   <fct> <fct> <dbl>
## 1 1     t1     4.01
## 2 2     t1     2.56
## 3 3     t1     3.24

Descriptive statistics

Firstly let us know a little bit more about the data we are going to conduct RM Anova on. Below there are shown: Spread of data with use of s-l plot (spread vs location-in this case means of self-esteem scores)

df1 <- selfesteem %>%
  group_by(time)  %>%
  mutate( Mean = mean(score),
          Residuals = abs(score - Mean))   

ggplot(df1, aes(x = Mean, y = Residuals)) + 
  geom_jitter(alpha = 0.4, width = 0.05, height = 0) +
  stat_summary(fun = mean, geom = "line", col = "red") +
  ylab(expression( abs(" Residuals "))) +
  geom_text(aes(x = Mean, y = 1.75, label = time))

#Means of groups for checking
selfesteem%>%group_by(time)%>%summarise_at(vars(score), list(name=mean))

## # A tibble: 3 x 2
##   time   name
##   <fct> <dbl>
## 1 t1     3.14
## 2 t2     4.93
## 3 t3     7.64

The looks of data prompts that that the means and variances probably differ at different times of experiment. A little boxplot woulnd’t hurt to check the data distribution and identify if there are any outliers

selfesteem%>%
  ggplot(aes(x = time, y = score, color = time)) + 
  geom_boxplot() + 
  xlab("Time") + ylab("Score") +
  theme_minimal()

Assumptions

As for assumptions for repeated measures ANOVA there are 4 of them to be met:

#Independence Right off the bat we can conclude that this assumption is met, as from the description of data set we know that the data came from 10 individuals

#Normality Next one to check is the normality of dependent variable, firstly let’s look at QQ plot

library(ggpubr)
ggqqplot(data=selfesteem,'score', facet.by = 'time')

From the plot we see that the data is normal, but let’s check with a Shapiro-Wilk test just to be 100% sure

shapiro_test(selfesteem$score)

## # A tibble: 1 x 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 selfesteem$score     0.949   0.158

selfesteem %>%
  group_by(time) %>%
  shapiro_test(score)

## # A tibble: 3 x 4
##   time  variable statistic     p
##   <fct> <chr>        <dbl> <dbl>
## 1 t1    score        0.967 0.859
## 2 t2    score        0.876 0.117
## 3 t3    score        0.923 0.380

The result gives clear information: data follows Gaussian distribution - p-value is greater than 0.05, hence we can’t reject the null hypothesis (values sampled from normal distribution)

#Sphercity This assumption is checked via Mauchly’s Test, which is used to assess whether all possible pairs of within-subject conditions are equal. This test is done automatically in anova test, but we can also measure shpericity by “hand”, and then compare the results.

# Used copy of selfesteem in order to calculate smoothly variances of differences below 
# 1. Compute group differences
grp.diff <- selfesteemcopy %>%
  transmute(
    `t1-t2` = t1 - t2,
    `t1-t3` = t1 - t3,
    `t2-t3` = t2 - t3
  )

# 2. Compute the variances
grp.diff  %>% map(var)

## $`t1-t2`
## [1] 1.303951
## 
## $`t1-t3`
## [1] 1.155306
## 
## $`t2-t3`
## [1] 3.081988

As can be seen variance of t2-t3 is rather troublesome
For now this assumption may be violated, but let’s not be frightened, we’ll compare the results with automatic Mauchly’s test.

#Outliers Last assumption to be met is that there are no outliers. As could be seen on the boxplot befor there are a few outliers, but we have to check wether they are extreme values.

selfesteem%>%group_by(time)%>%identify_outliers(score)

## # A tibble: 2 x 5
##   time  id    score is.outlier is.extreme
##   <fct> <fct> <dbl> <lgl>      <lgl>     
## 1 t1    6      2.05 TRUE       FALSE     
## 2 t2    2      6.91 TRUE       FALSE

As we can see there are no extreme outliers, we can move on

Anova

Finally, computation of RM ANOVA

result <- anova_test(data = selfesteem, dv = score, wid = id, within = time); result

## ANOVA Table (type III tests)
## 
## $ANOVA
##   Effect DFn DFd      F        p p<.05   ges
## 1   time   2  18 55.469 2.01e-08     * 0.829
## 
## $`Mauchly's Test for Sphericity`
##   Effect     W     p p<.05
## 1   time 0.551 0.092      
## 
## $`Sphericity Corrections`
##   Effect  GGe      DF[GG]    p[GG] p[GG]<.05   HFe      DF[HF]    p[HF]
## 1   time 0.69 1.38, 12.42 2.16e-06         * 0.774 1.55, 13.94 6.03e-07
##   p[HF]<.05
## 1         *

Luckly the sphericity assumption was met. Mauchly’s Test of Sphericity indicated that the assumption of sphericity had not been violated, p-value is equal to 0.09 > 0.05

get_anova_table(result)

## ANOVA Table (type III tests)
## 
##   Effect DFn DFd      F        p p<.05   ges
## 1   time   2  18 55.469 2.01e-08     * 0.829

The self-esteem score was statistically significantly different at the different time points during the diet, F(2, 18) = 55.5, which is a result that is far into the critical region.

Post-hoc tests

Null hypothesis rejected, thus next step is to conduct post-hoc test. For post-hoc test we’ll use pair-wise comparisons to complement the analysis of variance.

posthoc <- selfesteem %>% pairwise_t_test( score ~ time, paired = TRUE, p.adjust.method = "bonferroni"); posthoc

## # A tibble: 3 x 10
##   .y.   group1 group2    n1    n2 statistic    df          p  p.adj p.adj.signif
## * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>      <dbl>  <dbl> <chr>       
## 1 score t1     t2        10    10     -4.97     9    7.72e-4   2e-3 **          
## 2 score t1     t3        10    10    -13.2      9    3.34e-7   1e-6 ****        
## 3 score t2     t3        10    10     -4.87     9    8.86e-4   3e-3 **

All the pairwise differences are statistically significant.

Conclusions

-The self-esteem score was statistically significant at the different time points -Post-hoc analysis with a Bonferroni adjustment revealed that all pairwise differences were statistically significant

2-way RM Anova

For Two-Way Repeated Measures ANOVA, “Two-way” means that there are two factors in the experiment, for example, different treatments and different conditions. “Repeated-measures” means that the same subject received more than one treatment and/or more than one condition. Similar to two-way ANOVA, two-way repeated measures ANOVA can be employed to test for significant differences between the factor level means within a factor and for interactions between factors.

Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures, and the data violates the ANOVA assumption of independence. Two-Way Repeated Measures ANOVA designs can be two repeated measures factors, or one repeated measures factor and one non-repeated factor. If any repeated factor is present, then the repeated measures ANOVA should be used.

Please apply Two-way RM-ANOVA to analyze if any of interactions are significant (between time and music, time and image, music and image, or music and time and image)! The response variable is level of stress experienced by a person watching one of 2 movie genres. Interpret your results. Use the following data set:

set.seed(5250)
myData <- data.frame(PID = rep(seq(from = 1,
                               to = 60, by = 1), 20),
                     stress = sample(x = 1:100,
                                     size = 1200,
                                     replace = TRUE),
                     image = sample(c("Happy", "Angry"),
                                    size = 1200,
                                    replace = TRUE),
                     music = sample(c("Disney", "Horror"),
                                    size = 1200,
                                    replace = TRUE)
)
myData <- within(myData, {
  PID   <- factor(PID)
  image <- factor(image)
  music <- factor(music)
})
myData <- myData[order(myData$PID), ]
head(myData)

##     PID stress image  music
## 1     1     90 Happy Horror
## 61    1      7 Angry Disney
## 121   1     31 Happy Disney
## 181   1     68 Angry Disney
## 241   1      6 Happy Disney
## 301   1     80 Angry Horror

First of all we have to find the mean stress values for each participant for each combination of conditions. Also to get to know the data more let’s explore the summary of each group of it.

myData.mean <- aggregate(myData$stress,
                      by = list(myData$PID, myData$music,
                              myData$image),
                      FUN = 'mean')
colnames(myData.mean) <- c("PID","music","image","stress")
myData.mean <- myData.mean[order(myData.mean$PID), ]
myData.mean %>% group_by(image,music) %>% get_summary_stats(stress)

## # A tibble: 4 x 15
##   music  image variable     n   min   max median    q1    q3   iqr   mad  mean
##   <fct>  <fct> <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Disney Angry stress      60  21.5  73.8   48.7  41.8  58.7  16.9  11.7  49.1
## 2 Horror Angry stress      60  19.5  78     53.3  45.1  62.7  17.5  13.5  53.4
## 3 Disney Happy stress      60   5    77.5   50.2  43.8  58.2  14.3  11.4  49.3
## 4 Horror Happy stress      59  11    90     47.6  37.3  57.4  20.2  16.2  47.0
## # ... with 3 more variables: sd <dbl>, se <dbl>, ci <dbl>

Descriptive statistics

As for looking into the data and visually present what we’re working with, let’s dive into mean stress values for each combinations of conditions:

df2 <- myData.mean %>%
  group_by(image,music)  %>%
  mutate( Mean = mean(stress),
          Residuals = sqrt(abs(stress - Mean)))   

ggplot(df2, aes(x = Mean, y = Residuals)) + 
  geom_jitter(alpha = 0.4, width = 0.05, height = 0) +
  stat_summary(fun = mean, geom = "line", col = "red") +
  ylab(expression( abs(" Residuals "))) +
  geom_text(aes(x = Mean, y = Mean-50, label = paste(image, music)), size=2.7, col = "blue")

myData.mean%>%group_by(image,music)%>%summarise_at(vars(stress), list(name=mean))

## # A tibble: 4 x 3
## # Groups:   image [2]
##   image music   name
##   <fct> <fct>  <dbl>
## 1 Angry Disney  49.1
## 2 Angry Horror  53.4
## 3 Happy Disney  49.3
## 4 Happy Horror  47.0

ggboxplot(
  myData.mean, x = "image", y = "stress",
  color = "music"
  )

As for now, we can identify 2 things: Disney’s impact is rather weak and there a few outliers in Happy category

Assumptions

#Normality The first one to be checked is the normality of dependent variable, which will be tested with Shapiro-Wilk test for each combination of factor levels:

myData.mean %>% group_by(image,music) %>% shapiro_test(stress)

## # A tibble: 4 x 5
##   music  image variable statistic      p
##   <fct>  <fct> <chr>        <dbl>  <dbl>
## 1 Disney Angry stress       0.977 0.315 
## 2 Horror Angry stress       0.985 0.649 
## 3 Disney Happy stress       0.964 0.0773
## 4 Horror Happy stress       0.988 0.816

Luckily at 5% significance level we can’t reject the hypothesis that each group follows normal distribution.

Now, let’s look at QQ plot and investigate potential outliers.

ggqqplot(myData.mean, "stress", ggtheme = theme_bw()) +
  facet_grid(image ~ music, labeller = "label_both")

#Outliers

myData.mean%>%group_by(image,music)%>%identify_outliers(stress)

## # A tibble: 5 x 6
##   music  image PID   stress is.outlier is.extreme
##   <fct>  <fct> <fct>  <dbl> <lgl>      <lgl>     
## 1 Disney Happy 15      18.7 TRUE       FALSE     
## 2 Disney Happy 17      12   TRUE       FALSE     
## 3 Disney Happy 21       5   TRUE       FALSE     
## 4 Disney Happy 23      16.7 TRUE       FALSE     
## 5 Horror Happy 1       90   TRUE       FALSE

Although there are a few outliers, none of them is an extreme one.

#Sphericity

This assumption is checked via Mauchly’s Test, which is used to assess whether all possible pairs of within-subject conditions are equal. This test is done automatically in anova test.

#Homogenity of variance

library(car)

## Warning: package 'car' was built under R version 4.0.5

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

leveneTest(stress ~ music * image, data = myData.mean)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.7876 0.5019
##       235

As p value is very high, we can conclude that this assumption is met.

#ANOVA

res.aov <- anova_test(
  data = myData.mean, dv = stress, wid = PID,
  within = c(image, music)
  ); 
get_anova_table(res.aov)

## ANOVA Table (type III tests)
## 
##        Effect DFn DFd     F     p p<.05   ges
## 1       image   1  58 2.027 0.160       0.011
## 2       music   1  58 0.315 0.577       0.001
## 3 image:music   1  58 3.863 0.054       0.017

summary(res.aov)

##     Effect               DFn         DFd           F               p         
##  Length:3           Min.   :1   Min.   :58   Min.   :0.315   Min.   :0.0540  
##  Class :character   1st Qu.:1   1st Qu.:58   1st Qu.:1.171   1st Qu.:0.1070  
##  Mode  :character   Median :1   Median :58   Median :2.027   Median :0.1600  
##                     Mean   :1   Mean   :58   Mean   :2.068   Mean   :0.2637  
##                     3rd Qu.:1   3rd Qu.:58   3rd Qu.:2.945   3rd Qu.:0.3685  
##                     Max.   :1   Max.   :58   Max.   :3.863   Max.   :0.5770  
##     p<.05                ges          
##  Length:3           Min.   :0.001000  
##  Class :character   1st Qu.:0.006000  
##  Mode  :character   Median :0.011000  
##                     Mean   :0.009667  
##                     3rd Qu.:0.014000  
##                     Max.   :0.017000

stress.aov <- with(myData.mean,
                   aov(stress ~ music * image +
                       Error(PID / (music * image)))
)

## Warning in aov(stress ~ music * image + Error(PID/(music * image))): Error()
## model is singular

summary(stress.aov)

## 
## Error: PID
##           Df Sum Sq Mean Sq F value Pr(>F)
## music      1     21   21.37   0.129   0.72
## Residuals 58   9586  165.27               
## 
## Error: PID:music
##           Df Sum Sq Mean Sq F value Pr(>F)
## music      1     61   61.01   0.341  0.562
## image      1      6    6.02   0.034  0.855
## Residuals 58  10388  179.10               
## 
## Error: PID:image
##             Df Sum Sq Mean Sq F value Pr(>F)
## image        1    582   581.8   2.399  0.127
## music:image  1    406   405.9   1.674  0.201
## Residuals   58  14066   242.5               
## 
## Error: PID:music:image
##             Df Sum Sq Mean Sq F value Pr(>F)  
## music:image  1    767   766.8   3.863 0.0542 .
## Residuals   58  11513   198.5                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Conclusion Unfortunately, not one variable/interaction has any significance from what has the test reported.