Introduction
The repeated measures ANOVA makes the following assumptions about the data:
No significant outliers in any cell of the design. This can be checked by visualizing the data using box plot methods and by using the function
identify_outliers()(rstatix package).Normality: the outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (
shapiro_test()(rstatix)) or by visual inspection using QQ plot (ggqqplot()(ggpubr package)).Assumption of sphericity: the variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity, which is automatically reported when using the R function
anova_test()(rstatix package).
Note that, if the above assumptions are not met there are a non-parametric alternative (Friedman test) to the one-way repeated measures ANOVA!
Unfortunately, there are no non-parametric alternatives to the two-way and the three-way repeated measures ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way repeated measures ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.
If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way repeated measures ANOVA on the original data.
It’s also possible to perform robust ANOVA test using the WRS2 R package.
No matter your choice, you should report what you did in your results.
RM Anova in R
Key R functions:
anova_test()[rstatix package], a wrapper aroundcar::Anova()for making easy the computation of repeated measures ANOVA. Key arguments for performing repeated measures ANOVA:data: data framedv: (numeric) the dependent (or outcome) variable name.wid: variable name specifying the case/sample identifier.within: within-subjects factor or grouping variable
get_anova_table()(rstatix package). Extracts the ANOVA table from the output ofanova_test(). It returns ANOVA table that is automatically corrected for eventual deviation from the sphericity assumption. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05). Read more in Chapter @ref(mauchly-s-test-of-sphericity-in-r).
1-way RM Anova
The dataset “selfesteem” contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.
t1, t2, t3 = three time points one after the other. Response variable = selfesteem score given by participants in these times.
First, to view the data:
| id | t1 | t2 | t3 |
|---|---|---|---|
| 1 | 4.005027 | 5.182286 | 7.107831 |
| 2 | 2.558124 | 6.912915 | 6.308434 |
| 3 | 3.244241 | 4.443434 | 9.778410 |
The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:
| id | time | score |
|---|---|---|
| 1 | t1 | 4.005027 |
| 2 | t1 | 2.558124 |
| 3 | t1 | 3.244241 |
Descriptive statistics
Now we would like to understand the data. We will start with a nice boxplot that shows self esteem scores in all three periods.
ggplot(aes(x = score, y = time, color = time), data = selfesteem) + geom_boxplot() +
xlab("Self esteem score") + ylab("Time period") +
ggtitle("Self esteem score distribution in three time periods") +
theme(legend.position = 'none')
Now we would like to see some summary of the data. Grouped by time when the observations was performed, we can see the range, mean, median and other statistics.
| time | variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| t1 | score | 10 | 2.046 | 4.005 | 3.212 | 2.914 | 3.486 | 0.571 | 0.452 | 3.140 | 0.552 | 0.174 | 0.395 |
| t2 | score | 10 | 3.908 | 6.913 | 4.601 | 4.411 | 5.301 | 0.890 | 0.602 | 4.934 | 0.863 | 0.273 | 0.617 |
| t3 | score | 10 | 6.308 | 9.778 | 7.463 | 6.700 | 8.440 | 1.740 | 1.401 | 7.636 | 1.143 | 0.361 | 0.817 |
Assumptions
Before getting into calculating ANOVA we must check if assumptions are met.
- NORMALITY
We can check normality assumption using Shapiro test. This can be either performend for total dataset at once (mshapiro_test) or grouped by time.
kbl(mshapiro_test(selfesteem$score))
| statistic | p.value |
|---|---|
| 0.9488613 | 0.1575811 |
x <- selfesteem %>%
group_by(time) %>%
shapiro_test(score)
kbl(x)
| time | variable | statistic | p |
|---|---|---|---|
| t1 | score | 0.9666901 | 0.8585757 |
| t2 | score | 0.8758846 | 0.1169956 |
| t3 | score | 0.9227150 | 0.3801563 |
In both calculations we get p value higher than our significance level (0.05). Hence, we cannot reject H0 stating that the distribution is in fact normal.
We can also view the QQ plot:
ggqqplot(selfesteem, 'score', facet.by = 'time') + ggtitle("QQ plot grouped by times")
We can continue checking further assumptions:
- HOMOGENEITY OF VARIANCE
To check homogeneity of variance we can use Levene’s Test.
kbl(leveneTest(score ~ time, data = selfesteem))
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 2 | 2.711826 | 0.0844841 |
| 27 | NA | NA |
Levene Test shows that the variance is homogeneous.
- OUTLIERS
Lastly, we will check whether our data has any significant outliers.
kbl(selfesteem %>%
group_by(time) %>%
identify_outliers(score))
| time | id | score | is.outlier | is.extreme |
|---|---|---|---|---|
| t1 | 6 | 2.045868 | TRUE | FALSE |
| t2 | 2 | 6.912915 | TRUE | FALSE |
We managed to find 2 outliers. However, they are not extreme and we must remember that our dataset has only 10 observations. We will then continue using all values.
Anova
Now, since we managed to fulfill all assumptions we can calculate ANOVA.
x <- selfesteem %>%
anova_test(dv = score, wid = id, within = time) %>%
get_anova_table()
kbl(x)
| Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|
| time | 2 | 18 | 55.469 | 0 |
|
0.829 |
Post-hoc tests
Now as we rejected H0, we must perform post-hoc test. We will use pairwise comparison for dependent sample (paired = TRUE).
kbl(pairwise_t_test(data = selfesteem, score ~ time, paired = TRUE, p.adjust.method = "bonferroni"))
| .y. | group1 | group2 | n1 | n2 | statistic | df | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|---|
| score | t1 | t2 | 10 | 10 | -4.967618 | 9 | 7.72e-04 | 2e-03 | ** |
| score | t1 | t3 | 10 | 10 | -13.228148 | 9 | 3.00e-07 | 1e-06 | **** |
| score | t2 | t3 | 10 | 10 | -4.867816 | 9 | 8.86e-04 | 3e-03 | ** |
From the result we can see that all pairs are significantly different. We can also review the plot from ggstatsplot package:
ggwithinstats(data = selfesteem, y = score, x = time)
Conclusions
From our calculations, we managed to observe that during the time stated in the task the self esteem scores of the participants significantly increased. The difference is not only between time1 and time3, but is significant in all three pairs.
2-way RM Anova
For Two-Way Repeated Measures ANOVA, “Two-way” means that there are two factors in the experiment, for example, different treatments and different conditions. “Repeated-measures” means that the same subject received more than one treatment and/or more than one condition. Similar to two-way ANOVA, two-way repeated measures ANOVA can be employed to test for significant differences between the factor level means within a factor and for interactions between factors.
Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures, and the data violates the ANOVA assumption of independence. Two-Way Repeated Measures ANOVA designs can be two repeated measures factors, or one repeated measures factor and one non-repeated factor. If any repeated factor is present, then the repeated measures ANOVA should be used.
Please apply Two-way RM-ANOVA to analyze if any of interactions are significant (between time and music, time and image, music and image, or music and time and image)! The response variable is level of stress experienced by a person watching one of 2 movie genres. Interpret your results. Use the following data set:
| PID | stress | image | music | |
|---|---|---|---|---|
| 1 | 1 | 6 | Happy | Horror |
| 61 | 1 | 45 | Happy | Disney |
| 121 | 1 | 66 | Happy | Horror |
| 181 | 1 | 96 | Happy | Horror |
| 241 | 1 | 1 | Happy | Horror |
| 301 | 1 | 52 | Angry | Disney |
Understanding the data
First, we will try to understand the dataset by creating summary tables. We assume following variables:
- PID = participant ID
- stress = stress level value given by the participant
- image = Happy / Angry type of image shown to the participant
- music = Horror / Disney type of music played to the participant
We can see that each of 60 participants took part in 20 tests and each time giving one of 4 possible collocations of music / image.
Now we can view the summary grouped by music and image for all participants at once. This gives us 4 possibilities: 2 musics * 2 images. We can also review the resutls for each participant seperately.
x1 <- myData %>%
group_by(image, music) %>%
get_summary_stats(stress)
kbl(x1)
| image | music | variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Angry | Disney | stress | 276 | 1 | 100 | 52.0 | 29.75 | 77 | 47.25 | 37.065 | 52.058 | 28.626 | 1.723 | 3.392 |
| Angry | Horror | stress | 283 | 1 | 100 | 49.0 | 25.50 | 76 | 50.50 | 37.065 | 49.855 | 28.122 | 1.672 | 3.291 |
| Happy | Disney | stress | 340 | 1 | 100 | 47.5 | 24.00 | 77 | 53.00 | 38.548 | 49.768 | 29.365 | 1.593 | 3.133 |
| Happy | Horror | stress | 301 | 1 | 100 | 52.0 | 24.00 | 76 | 52.00 | 37.065 | 51.060 | 29.581 | 1.705 | 3.355 |
x2 <- myData %>%
group_by(PID) %>%
get_summary_stats(stress)
kbl(head(x2))
| PID | variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | stress | 20 | 1 | 98 | 52.5 | 17.75 | 74.50 | 56.75 | 44.478 | 49.35 | 32.896 | 7.356 | 15.396 |
| 2 | stress | 20 | 3 | 99 | 51.5 | 23.75 | 85.00 | 61.25 | 48.184 | 52.30 | 31.006 | 6.933 | 14.511 |
| 3 | stress | 20 | 2 | 100 | 52.0 | 31.75 | 70.00 | 38.25 | 33.358 | 50.80 | 29.581 | 6.614 | 13.844 |
| 4 | stress | 20 | 3 | 97 | 37.0 | 22.00 | 60.75 | 38.75 | 24.463 | 43.50 | 28.203 | 6.306 | 13.200 |
| 5 | stress | 20 | 3 | 97 | 36.5 | 23.75 | 61.25 | 37.50 | 28.169 | 43.45 | 28.913 | 6.465 | 13.532 |
| 6 | stress | 20 | 2 | 99 | 66.5 | 29.75 | 89.50 | 59.75 | 44.478 | 57.35 | 32.053 | 7.167 | 15.001 |
For each music / image collocation we can view the number of observations, min and max value as well as other statistics. In the second table we divide participants and check statistics for value of stress.
Plots
For each music / image collocation we can view the number of observations, min and max value as well as other statistics. In the second table we divide participants and check statistics for value of stress.
Now we would like to visualize the distribution of data. First, by possible collocations image-music using a boxplot.
ggplot(data = myData, aes(x = stress, y = image, fill = music)) + geom_boxplot() +
ggtitle("Stress result for music + image for all participants") +
xlab("Stress score") + ylab('') + scale_fill_brewer(name = "", palette = "Greens")
We can see that actually the general distribution of stress level between all 4 categories is quite similar - looking at all participants’ responses we do not observe any particular patterns.
We can also view the impact of factors on the stress score using plot.design.
plot.design(stress ~ image * music, data = myData, col = "brown",
ylab = "Mean of stress score",
main = "Impact of factors on stress score")
Here we can see how factors impact observed stress score. We can see that actually music does not have big influence on stress score, higher difference is visible based on image. However, the total impact is negligible - the values on the scale do not even exceed a difference of 1 point.
interaction.plot(myData$music, myData$image, myData$stress, col = c("blue", "purple"),
lty = 1, lwd = 2, ylim = c(48, 52), trace.label = "Image",
xlab = "Music type", ylab = "Mean of stress result")
From the interaction plot we can observe that all interactions that could occur are very small - the difference between stress levels under different conditions is barely 2 points (in a 100 scale). Hence, we cannot conclude anything and as the results are randomly generated that leaves us with no significant data.
Transforming the data
Now, we want to group the data by participant ID, music and image. This will reduce our dataset as we have some observations that were made under the same conditions and we are interested only in their mean.
myData2 <- myData %>%
group_by(PID, music, image) %>%
summarise(mean_stress = mean(stress), num = n())
kbl(head(myData2))
| PID | music | image | mean_stress | num |
|---|---|---|---|---|
| 1 | Disney | Angry | 37.500 | 4 |
| 1 | Disney | Happy | 25.000 | 4 |
| 1 | Horror | Angry | 84.500 | 4 |
| 1 | Horror | Happy | 49.875 | 8 |
| 2 | Disney | Angry | 54.000 | 6 |
| 2 | Disney | Happy | 66.500 | 6 |
Assumptions
Now, we will try to calculate Repeated Measures ANOVA. We must check whether our dataset meets needed assumptions before we will be able to calculate it
- NORMALITY
kbl(mshapiro_test(myData2$mean_stress))
| statistic | p.value |
|---|---|
| 0.9956031 | 0.7342363 |
kbl(myData2 %>%
group_by(image, music) %>%
shapiro_test(mean_stress))
| music | image | variable | statistic | p |
|---|---|---|---|---|
| Disney | Angry | mean_stress | 0.9794829 | 0.4070552 |
| Horror | Angry | mean_stress | 0.9835274 | 0.6052111 |
| Disney | Happy | mean_stress | 0.9873186 | 0.7886983 |
| Horror | Happy | mean_stress | 0.9905514 | 0.9283733 |
We can see that shapiro test returns a p value bigger than our significance level (0.05). Hence, we do not reject H0 stating that the population is normal.
We can also observe QQ plots:
ggqqplot(myData2, "mean_stress", ggtheme = theme_bw()) +
facet_grid(image ~ music, labeller = "label_both")
- HOMOGENEITY OF VARIANCE
To check homogeneity of variance we can use Levene Test.
kbl(leveneTest(mean_stress ~ music * image, data = myData2))
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 3 | 0.3597948 | 0.7820978 |
| 234 | NA | NA |
We can see that p-value is high. Hence, we assume the variance is homogeneous.
- OUTLIERS
Now we would like to see if our data has any significant outliers.
kbl(myData2 %>%
group_by(music, image) %>%
identify_outliers(mean_stress))
| music | image | PID | mean_stress | num | is.outlier | is.extreme |
|---|---|---|---|---|---|---|
| Horror | Angry | 5 | 97 | 1 | TRUE | FALSE |
We can see that the identify_outliers function returns us one non extreme outlier for participant that was watching angry image with horror music playing. However, it is not extreme value so we will not delete it.
RM ANOVA CALCULATIONS
kbl(anova(aov(data = myData2, mean_stress ~ image * music)))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| image | 1 | 11.269282 | 11.269282 | 0.0539715 | 0.8164944 |
| music | 1 | 6.528303 | 6.528303 | 0.0312657 | 0.8598018 |
| image:music | 1 | 50.253829 | 50.253829 | 0.2406784 | 0.6241763 |
| Residuals | 234 | 48859.365968 | 208.800709 | NA | NA |
Looking at our results we see that we are not able to reject any null hypothesis. Hence, there is no need to perform post-hoc tests.