Statistical Inference

Repeated Measures Anova

Introduction

The repeated measures ANOVA makes the following assumptions about the data:

  • No significant outliers in any cell of the design. This can be checked by visualizing the data using box plot methods and by using the function identify_outliers() [rstatix package].

  • Normality: the outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (shapiro_test() [rstatix]) or by visual inspection using QQ plot (ggqqplot() [ggpubr package]).

  • Assumption of sphericity: the variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity, which is automatically reported when using the R function anova_test() [rstatix package].

Note that, if the above assumptions are not met there are a non-parametric alternative (Friedman test) to the one-way repeated measures ANOVA!

Unfortunately, there are no non-parametric alternatives to the two-way and the three-way repeated measures ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way repeated measures ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.

If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way repeated measures ANOVA on the original data.

It’s also possible to perform robust ANOVA test using the WRS2 R package.

No matter your choice, you should report what you did in your results.

RM Anova in R

Key R functions:

  • anova_test() [rstatix package], a wrapper around car::Anova() for making easy the computation of repeated measures ANOVA. Key arguments for performing repeated measures ANOVA:

    • data: data frame

    • dv: (numeric) the dependent (or outcome) variable name.

    • wid: variable name specifying the case/sample identifier.

    • within: within-subjects factor or grouping variable

  • get_anova_table() [rstatix package]. Extracts the ANOVA table from the output of anova_test(). It returns ANOVA table that is automatically corrected for eventual deviation from the sphericity assumption. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).

1-way RM Anova

The dataset “selfesteem” contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.

data("selfesteem", package = "datarium")
head(selfesteem, 3)
## # A tibble: 3 × 4
##      id    t1    t2    t3
##   <int> <dbl> <dbl> <dbl>
## 1     1  4.01  5.18  7.11
## 2     2  2.56  6.91  6.31
## 3     3  3.24  4.44  9.78

The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:

selfesteem <- selfesteem %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
head(selfesteem, 3)
## # A tibble: 3 × 3
##   id    time  score
##   <fct> <fct> <dbl>
## 1 1     t1     4.01
## 2 2     t1     2.56
## 3 3     t1     3.24

Descriptive statistics

selfesteem %>%
  group_by(time) %>%
  get_summary_stats(score, type = "mean_sd")
## # A tibble: 3 × 5
##   time  variable     n  mean    sd
##   <fct> <fct>    <dbl> <dbl> <dbl>
## 1 t1    score       10  3.14 0.552
## 2 t2    score       10  4.93 0.863
## 3 t3    score       10  7.64 1.14
bxp<- ggboxplot(selfesteem, y="score", x="time", color ="time") 
bxp

Assumptions

selfesteem %>%
  group_by(time) %>%
  identify_outliers(score)
## # A tibble: 2 × 5
##   time  id    score is.outlier is.extreme
##   <fct> <fct> <dbl> <lgl>      <lgl>     
## 1 t1    6      2.05 TRUE       FALSE     
## 2 t2    2      6.91 TRUE       FALSE
selfesteem %>%
  group_by(time) %>%
  shapiro_test(score)
## # A tibble: 3 × 4
##   time  variable statistic     p
##   <fct> <chr>        <dbl> <dbl>
## 1 t1    score        0.967 0.859
## 2 t2    score        0.876 0.117
## 3 t3    score        0.923 0.380
ggqqplot(selfesteem, "score") +
  facet_grid(~time)

Anova

results <- anova_test(data = selfesteem,
           dv= score,
           wid= id,
           within = time)
results
## ANOVA Table (type III tests)
## 
## $ANOVA
##   Effect DFn DFd      F        p p<.05   ges
## 1   time   2  18 55.469 2.01e-08     * 0.829
## 
## $`Mauchly's Test for Sphericity`
##   Effect     W     p p<.05
## 1   time 0.551 0.092      
## 
## $`Sphericity Corrections`
##   Effect  GGe      DF[GG]    p[GG] p[GG]<.05   HFe      DF[HF]    p[HF]
## 1   time 0.69 1.38, 12.42 2.16e-06         * 0.774 1.55, 13.94 6.03e-07
##   p[HF]<.05
## 1         *

Post-hoc tests

pwc <- selfesteem %>%
  emmeans_test(score ~time, p.adjust.method = "bonferroni")
pwc
## # A tibble: 3 × 9
##   term  .y.   group1 group2    df statistic        p    p.adj p.adj.signif
## * <chr> <chr> <chr>  <chr>  <dbl>     <dbl>    <dbl>    <dbl> <chr>       
## 1 time  score t1     t2        27     -4.53 1.08e- 4 3.25e- 4 ***         
## 2 time  score t1     t3        27    -11.3  8.82e-12 2.65e-11 ****        
## 3 time  score t2     t3        27     -6.82 2.51e- 7 7.54e- 7 ****
pwc <- pwc %>% add_xy_position(x="time")
bxp + 
  stat_pvalue_manual(pwc) +
  labs(
    subtitle=get_test_label(results,detailed=TRUE),
    caption=get_pwc_label(pwc)
  )

Conclusions

Hence, we can conclude that the score was statistically significantly different during different times, with the result with F(2,18) = 55.5 and p = 0.0000000201. From the post-hoc tests, when pairwise comparisons were conducted we can state that all the differences between pairs are statistically significant.

2-way RM Anova

For Two-Way Repeated Measures ANOVA, “Two-way” means that there are two factors in the experiment, for example, different treatments and different conditions. “Repeated-measures” means that the same subject received more than one treatment and/or more than one condition. Similar to two-way ANOVA, two-way repeated measures ANOVA can be employed to test for significant differences between the factor level means within a factor and for interactions between factors.

Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures, and the data violates the ANOVA assumption of independence. Two-Way Repeated Measures ANOVA designs can be two repeated measures factors, or one repeated measures factor and one non-repeated factor. If any repeated factor is present, then the repeated measures ANOVA should be used.

Please apply Two-way RM-ANOVA to analyze if any of interactions are significant (between time and music, time and image, music and image, or music and time and image)! The response variable is level of stress experienced by a person watching one of 2 movie genres. Interpret your results. Use the following data set:

set.seed(5250)
myData <- data.frame(PID = rep(seq(from = 1,
                               to = 60, by = 1), 20),
                     stress = sample(x = 1:100,
                                     size = 1200,
                                     replace = TRUE),
                     image = sample(c("Happy", "Angry"),
                                    size = 1200,
                                    replace = TRUE),
                     music = sample(c("Disney", "Horror"),
                                    size = 1200,
                                    replace = TRUE)
)
myData <- within(myData, {
  PID   <- factor(PID)
  image <- factor(image)
  music <- factor(music)
})
myData <- myData[order(myData$PID), ]
head(myData)
##     PID stress image  music
## 1     1     90 Happy Horror
## 61    1      7 Angry Disney
## 121   1     31 Happy Disney
## 181   1     68 Angry Disney
## 241   1      6 Happy Disney
## 301   1     80 Angry Horror
head(myData, 60)
##      PID stress image  music
## 1      1     90 Happy Horror
## 61     1      7 Angry Disney
## 121    1     31 Happy Disney
## 181    1     68 Angry Disney
## 241    1      6 Happy Disney
## 301    1     80 Angry Horror
## 361    1     45 Angry Disney
## 421    1     30 Happy Disney
## 481    1     26 Happy Disney
## 541    1     59 Angry Horror
## 601    1     19 Angry Disney
## 661    1     15 Angry Disney
## 721    1      6 Angry Horror
## 781    1     97 Happy Disney
## 841    1     97 Happy Disney
## 901    1     28 Angry Disney
## 961    1     49 Angry Horror
## 1021   1     39 Angry Horror
## 1081   1      4 Angry Disney
## 1141   1     20 Happy Disney
## 2      2     31 Angry Horror
## 62     2     78 Happy Disney
## 122    2     12 Happy Horror
## 182    2     47 Angry Disney
## 242    2     34 Happy Disney
## 302    2     23 Happy Horror
## 362    2     48 Happy Horror
## 422    2     19 Angry Disney
## 482    2     67 Happy Disney
## 542    2      5 Angry Disney
## 602    2     67 Angry Horror
## 662    2     19 Happy Disney
## 722    2     37 Angry Disney
## 782    2     32 Happy Disney
## 842    2     77 Angry Horror
## 902    2     24 Angry Disney
## 962    2     25 Angry Disney
## 1022   2     56 Happy Disney
## 1082   2     74 Angry Horror
## 1142   2     74 Angry Horror
## 3      3     56 Happy Horror
## 63     3     75 Happy Disney
## 123    3     56 Happy Horror
## 183    3     29 Angry Disney
## 243    3     71 Happy Horror
## 303    3     73 Angry Disney
## 363    3     72 Happy Horror
## 423    3     57 Angry Disney
## 483    3      7 Angry Disney
## 543    3     38 Happy Horror
## 603    3     85 Angry Horror
## 663    3      9 Happy Disney
## 723    3     93 Angry Horror
## 783    3      7 Happy Disney
## 843    3     12 Angry Horror
## 903    3     52 Angry Horror
## 963    3     77 Angry Disney
## 1023   3     86 Happy Horror
## 1083   3     67 Happy Horror
## 1143   3    100 Happy Horror

The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:

res <- xtabs(~ image + music, data = myData)
res
##        music
## image   Disney Horror
##   Angry    310    305
##   Happy    289    296

Descriptive statistics

ggplot(myData, aes(x=stress)) + 
  geom_histogram(bins=20) + 
  facet_grid(image ~ music) + 
  theme_classic()

bxp<- ggboxplot(myData, y="stress", x="image", color ="music") 
bxp

Computing the mean and the SD (standard deviation) of the score by groups:

myData %>%
  group_by(music, image) %>%
  get_summary_stats(stress, type = "mean_sd")
## # A tibble: 4 × 6
##   image music  variable     n  mean    sd
##   <fct> <fct>  <fct>    <dbl> <dbl> <dbl>
## 1 Angry Disney stress     310  48.9  29.4
## 2 Happy Disney stress     289  49.5  29.2
## 3 Angry Horror stress     305  53.0  28.4
## 4 Happy Horror stress     296  47.3  28.4

Assumptions

Outliers

Identifying outliers in each cell design:

myData %>%
  group_by(image, music) %>%
  identify_outliers(stress)
## [1] image      music      PID        stress     is.outlier is.extreme
## <0 rows> (or 0-length row.names)

There were no extreme outliers. Check normality assumption by analyzing the model residuals. QQ plot and Shapiro-Wilk test of normality are used.

myData %>%
  group_by(music, image) %>%
  shapiro_test(stress)
## # A tibble: 4 × 5
##   image music  variable statistic             p
##   <fct> <fct>  <chr>        <dbl>         <dbl>
## 1 Angry Disney stress       0.944 0.00000000171
## 2 Happy Disney stress       0.952 0.0000000403 
## 3 Angry Horror stress       0.957 0.0000000924 
## 4 Happy Horror stress       0.957 0.000000124

The score were normally distributed (p > 0.05) for each cell, as assessed by Shapiro-Wilk’s test of normality.

ggqqplot(myData, "stress") +
  facet_grid(music~image)

Homogeneity of variance

This can be checked using the Levene’s test:

myData %>%
  levene_test(stress~music*image)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     3  1196     0.623 0.600

The p-value is > 0.05, which is not significant. This means that, there is not significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in the different treatment groups.

The residuals versus fits plot can be used to check the homogeneity of variances:

model<-lm(stress~music*image,data=myData)
plot(model,1)

Anova

fit <- aov(stress ~ music * image, data = myData)

# Run the ANOVA
Anova(fit, type = "II")
## Anova Table (Type II tests)
## 
## Response: stress
##             Sum Sq   Df F value  Pr(>F)  
## music          331    1  0.3976 0.52845  
## image         1939    1  2.3320 0.12701  
## music:image   2991    1  3.5970 0.05812 .
## Residuals   994510 1196                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
results2 <- myData %>% anova_test(stress ~ music * image)
results2
## ANOVA Table (type II tests)
## 
##        Effect DFn  DFd     F     p p<.05      ges
## 1       music   1 1196 0.398 0.528       0.000332
## 2       image   1 1196 2.332 0.127       0.002000
## 3 music:image   1 1196 3.597 0.058       0.003000

There are no statistically significant interaction between image and music.

Pairwise comparisons

A statistically significant simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. We’ll now perform multiple pairwise comparisons between the different education_level groups by gender.

# Group the data by gender and fit  anova
model <- lm(stress ~ image * music, data = myData)
myData %>%
  group_by(image) %>%
  anova_test(stress ~ music, error = model)
## # A tibble: 2 × 8
##   image Effect   DFn   DFd     F     p `p<.05`      ges
## * <fct> <chr>  <dbl> <dbl> <dbl> <dbl> <chr>      <dbl>
## 1 Angry music      1  1196 3.15  0.076 ""      0.003   
## 2 Happy music      1  1196 0.842 0.359 ""      0.000703

We can see it is not significant.