Repeated Measures ANOVA

Daria Skarbek, 184869

Published on 23.01.2022

Introduction

The repeated measures ANOVA makes the following assumptions about the data:

  • No significant outliers in any cell of the design. This can be checked by visualizing the data using box plot methods and by using the function identify_outliers() (rstatix package).

  • Normality: the outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (shapiro_test() (rstatix)) or by visual inspection using QQ plot (ggqqplot() (ggpubr package)).

  • Assumption of sphericity: the variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity, which is automatically reported when using the R function anova_test() (rstatix package).

Note that, if the above assumptions are not met there are a non-parametric alternative (Friedman test) to the one-way repeated measures ANOVA!

Unfortunately, there are no non-parametric alternatives to the two-way and the three-way repeated measures ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way repeated measures ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.

If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way repeated measures ANOVA on the original data.

It’s also possible to perform robust ANOVA test using the WRS2 R package.

No matter your choice, you should report what you did in your results.

RM Anova in R

Key R functions:

  • anova_test() [rstatix package], a wrapper around car::Anova() for making easy the computation of repeated measures ANOVA. Key arguments for performing repeated measures ANOVA:

    • data: data frame

    • dv: (numeric) the dependent (or outcome) variable name.

    • wid: variable name specifying the case/sample identifier.

    • within: within-subjects factor or grouping variable

  • get_anova_table() (rstatix package). Extracts the ANOVA table from the output of anova_test(). It returns ANOVA table that is automatically corrected for eventual deviation from the sphericity assumption. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05). Read more in Chapter @ref(mauchly-s-test-of-sphericity-in-r).

1-way RM Anova

The dataset “selfesteem” contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.

t1, t2, t3 = three time points one after the other. Response variable = selfesteem score given by participants in these times.

First, to view the data:

id t1 t2 t3
1 4.005027 5.182286 7.107831
2 2.558124 6.912915 6.308434
3 3.244241 4.443434 9.778410

The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:

id time score
1 t1 4.005027
2 t1 2.558124
3 t1 3.244241

Descriptive statistics

Now we would like to understand the data. We will start with a nice boxplot that shows self esteem scores in all three periods.

ggplot(aes(x = score, y = time, color = time), data = selfesteem) + geom_boxplot() + 
  xlab("Self esteem score") + ylab("Time period") + 
  ggtitle("Self esteem score distribution in three time periods") + 
  theme(legend.position = 'none')

Now we would like to see some summary of the data. Grouped by time when the observations was performed, we can see the range, mean, median and other statistics.

time variable n min max median q1 q3 iqr mad mean sd se ci
t1 score 10 2.046 4.005 3.212 2.914 3.486 0.571 0.452 3.140 0.552 0.174 0.395
t2 score 10 3.908 6.913 4.601 4.411 5.301 0.890 0.602 4.934 0.863 0.273 0.617
t3 score 10 6.308 9.778 7.463 6.700 8.440 1.740 1.401 7.636 1.143 0.361 0.817

Assumptions

Before getting into calculating ANOVA we must check if assumptions are met.

  1. NORMALITY

We can check normality assumption using Shapiro test. This can be either performend for total dataset at once (mshapiro_test) or grouped by time.

kbl(mshapiro_test(selfesteem$score))
statistic p.value
0.9488613 0.1575811
x <- selfesteem %>%
  group_by(time) %>%
  shapiro_test(score)
kbl(x)
time variable statistic p
t1 score 0.9666901 0.8585757
t2 score 0.8758846 0.1169956
t3 score 0.9227150 0.3801563

In both calculations we get p value higher than our significance level (0.05). Hence, we cannot reject H0 stating that the distribution is in fact normal.

We can also view the QQ plot:

ggqqplot(selfesteem, 'score', facet.by = 'time') + ggtitle("QQ plot grouped by times") 

We can continue checking further assumptions:

  1. HOMOGENEITY OF VARIANCE

To check homogeneity of variance we can use Levene’s Test.

kbl(leveneTest(score ~ time, data = selfesteem))
Df F value Pr(>F)
group 2 2.711826 0.0844841
27 NA NA

Levene Test shows that the variance is homogeneous.

  1. OUTLIERS

Lastly, we will check whether our data has any significant outliers.

kbl(selfesteem %>%
  group_by(time) %>%
  identify_outliers(score))
time id score is.outlier is.extreme
t1 6 2.045868 TRUE FALSE
t2 2 6.912915 TRUE FALSE

We managed to find 2 outliers. However, they are not extreme and we must remember that our dataset has only 10 observations. We will then continue using all values.

Anova

Now, since we managed to fulfill all assumptions we can calculate ANOVA.

x <- selfesteem %>% 
  anova_test(dv = score, wid = id, within = time) %>%
  get_anova_table()
kbl(x)
Effect DFn DFd F p p<.05 ges
time 2 18 55.469 0
0.829
Looking at our result, we can see that we managed to reject H0. Hence, the self esteem scores are improving significantly during the time.

Post-hoc tests

Now as we rejected H0, we must perform post-hoc test. We will use pairwise comparison for dependent sample (paired = TRUE).

kbl(pairwise_t_test(data = selfesteem, score ~ time, paired = TRUE, p.adjust.method = "bonferroni"))
.y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
score t1 t2 10 10 -4.967618 9 7.72e-04 2e-03 **
score t1 t3 10 10 -13.228148 9 3.00e-07 1e-06 ****
score t2 t3 10 10 -4.867816 9 8.86e-04 3e-03 **

From the result we can see that all pairs are significantly different. We can also review the plot from ggstatsplot package:

ggwithinstats(data = selfesteem, y = score, x = time)

Conclusions

From our calculations, we managed to observe that during the time stated in the task the self esteem scores of the participants significantly increased. The difference is not only between time1 and time3, but is significant in all three pairs.

2-way RM Anova

For Two-Way Repeated Measures ANOVA, “Two-way” means that there are two factors in the experiment, for example, different treatments and different conditions. “Repeated-measures” means that the same subject received more than one treatment and/or more than one condition. Similar to two-way ANOVA, two-way repeated measures ANOVA can be employed to test for significant differences between the factor level means within a factor and for interactions between factors.

Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures, and the data violates the ANOVA assumption of independence. Two-Way Repeated Measures ANOVA designs can be two repeated measures factors, or one repeated measures factor and one non-repeated factor. If any repeated factor is present, then the repeated measures ANOVA should be used.

Please apply Two-way RM-ANOVA to analyze if any of interactions are significant (between time and music, time and image, music and image, or music and time and image)! The response variable is level of stress experienced by a person watching one of 2 movie genres. Interpret your results. Use the following data set:

PID stress image music
1 1 6 Happy Horror
61 1 45 Happy Disney
121 1 66 Happy Horror
181 1 96 Happy Horror
241 1 1 Happy Horror
301 1 52 Angry Disney

Understanding the data

First, we will try to understand the dataset by creating summary tables. We assume following variables:

  • PID = participant ID
  • stress = stress level value given by the participant
  • image = Happy / Angry type of image shown to the participant
  • music = Horror / Disney type of music played to the participant

We can see that each of 60 participants took part in 20 tests and each time giving one of 4 possible collocations of music / image.

Now we can view the summary grouped by music and image for all participants at once. This gives us 4 possibilities: 2 musics * 2 images. We can also review the resutls for each participant seperately.

x1 <- myData %>% 
  group_by(image, music) %>%
  get_summary_stats(stress)
kbl(x1)
image music variable n min max median q1 q3 iqr mad mean sd se ci
Angry Disney stress 276 1 100 52.0 29.75 77 47.25 37.065 52.058 28.626 1.723 3.392
Angry Horror stress 283 1 100 49.0 25.50 76 50.50 37.065 49.855 28.122 1.672 3.291
Happy Disney stress 340 1 100 47.5 24.00 77 53.00 38.548 49.768 29.365 1.593 3.133
Happy Horror stress 301 1 100 52.0 24.00 76 52.00 37.065 51.060 29.581 1.705 3.355
x2 <- myData %>%
  group_by(PID) %>%
  get_summary_stats(stress)
kbl(head(x2))
PID variable n min max median q1 q3 iqr mad mean sd se ci
1 stress 20 1 98 52.5 17.75 74.50 56.75 44.478 49.35 32.896 7.356 15.396
2 stress 20 3 99 51.5 23.75 85.00 61.25 48.184 52.30 31.006 6.933 14.511
3 stress 20 2 100 52.0 31.75 70.00 38.25 33.358 50.80 29.581 6.614 13.844
4 stress 20 3 97 37.0 22.00 60.75 38.75 24.463 43.50 28.203 6.306 13.200
5 stress 20 3 97 36.5 23.75 61.25 37.50 28.169 43.45 28.913 6.465 13.532
6 stress 20 2 99 66.5 29.75 89.50 59.75 44.478 57.35 32.053 7.167 15.001

For each music / image collocation we can view the number of observations, min and max value as well as other statistics. In the second table we divide participants and check statistics for value of stress.

Plots

For each music / image collocation we can view the number of observations, min and max value as well as other statistics. In the second table we divide participants and check statistics for value of stress.

Now we would like to visualize the distribution of data. First, by possible collocations image-music using a boxplot.

ggplot(data = myData, aes(x = stress, y = image, fill = music)) + geom_boxplot() + 
  ggtitle("Stress result for music + image for all participants") + 
  xlab("Stress score") + ylab('') + scale_fill_brewer(name = "", palette = "Greens")

We can see that actually the general distribution of stress level between all 4 categories is quite similar - looking at all participants’ responses we do not observe any particular patterns.

We can also view the impact of factors on the stress score using plot.design.

plot.design(stress ~ image * music, data = myData, col = "brown", 
            ylab = "Mean of stress score", 
            main = "Impact of factors on stress score") 

Here we can see how factors impact observed stress score. We can see that actually music does not have big influence on stress score, higher difference is visible based on image. However, the total impact is negligible - the values on the scale do not even exceed a difference of 1 point.

interaction.plot(myData$music, myData$image, myData$stress, col = c("blue", "purple"), 
                 lty = 1, lwd = 2, ylim = c(48, 52), trace.label = "Image",
                 xlab = "Music type", ylab = "Mean of stress result")

From the interaction plot we can observe that all interactions that could occur are very small - the difference between stress levels under different conditions is barely 2 points (in a 100 scale). Hence, we cannot conclude anything and as the results are randomly generated that leaves us with no significant data.

Transforming the data

Now, we want to group the data by participant ID, music and image. This will reduce our dataset as we have some observations that were made under the same conditions and we are interested only in their mean.

myData2 <- myData %>% 
  group_by(PID, music, image) %>%
  summarise(mean_stress = mean(stress), num = n())

kbl(head(myData2))
PID music image mean_stress num
1 Disney Angry 37.500 4
1 Disney Happy 25.000 4
1 Horror Angry 84.500 4
1 Horror Happy 49.875 8
2 Disney Angry 54.000 6
2 Disney Happy 66.500 6

Assumptions

Now, we will try to calculate Repeated Measures ANOVA. We must check whether our dataset meets needed assumptions before we will be able to calculate it

  1. NORMALITY
kbl(mshapiro_test(myData2$mean_stress))
statistic p.value
0.9956031 0.7342363
kbl(myData2 %>%
  group_by(image, music) %>%
  shapiro_test(mean_stress))
music image variable statistic p
Disney Angry mean_stress 0.9794829 0.4070552
Horror Angry mean_stress 0.9835274 0.6052111
Disney Happy mean_stress 0.9873186 0.7886983
Horror Happy mean_stress 0.9905514 0.9283733

We can see that shapiro test returns a p value bigger than our significance level (0.05). Hence, we do not reject H0 stating that the population is normal.

We can also observe QQ plots:

ggqqplot(myData2, "mean_stress", ggtheme = theme_bw()) + 
  facet_grid(image ~ music, labeller = "label_both")

  1. HOMOGENEITY OF VARIANCE

To check homogeneity of variance we can use Levene Test.

kbl(leveneTest(mean_stress ~ music * image, data = myData2))
Df F value Pr(>F)
group 3 0.3597948 0.7820978
234 NA NA

We can see that p-value is high. Hence, we assume the variance is homogeneous.

  1. OUTLIERS

Now we would like to see if our data has any significant outliers.

kbl(myData2 %>%
  group_by(music, image) %>%
  identify_outliers(mean_stress))
music image PID mean_stress num is.outlier is.extreme
Horror Angry 5 97 1 TRUE FALSE

We can see that the identify_outliers function returns us one non extreme outlier for participant that was watching angry image with horror music playing. However, it is not extreme value so we will not delete it.

RM ANOVA CALCULATIONS

kbl(anova(aov(data = myData2, mean_stress ~ image * music)))
Df Sum Sq Mean Sq F value Pr(>F)
image 1 11.269282 11.269282 0.0539715 0.8164944
music 1 6.528303 6.528303 0.0312657 0.8598018
image:music 1 50.253829 50.253829 0.2406784 0.6241763
Residuals 234 48859.365968 208.800709 NA NA

Looking at our results we see that we are not able to reject any null hypothesis. Hence, there is no need to perform post-hoc tests.