Repeated Measures ANOVA

Daria Skarbek, 184869

Published on 23.01.2022

Introduction

The repeated measures ANOVA makes the following assumptions about the data:

No significant outliers in any cell of the design. This can be checked by visualizing the data using box plot methods and by using the function identify_outliers() (rstatix package).
Normality: the outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (shapiro_test() (rstatix)) or by visual inspection using QQ plot (ggqqplot() (ggpubr package)).
Assumption of sphericity: the variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity, which is automatically reported when using the R function anova_test() (rstatix package).

Note that, if the above assumptions are not met there are a non-parametric alternative (Friedman test) to the one-way repeated measures ANOVA!

Unfortunately, there are no non-parametric alternatives to the two-way and the three-way repeated measures ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way repeated measures ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.

If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way repeated measures ANOVA on the original data.

It’s also possible to perform robust ANOVA test using the WRS2 R package.

No matter your choice, you should report what you did in your results.

RM Anova in R

Key R functions:

anova_test() [rstatix package], a wrapper around car::Anova() for making easy the computation of repeated measures ANOVA. Key arguments for performing repeated measures ANOVA:
- data: data frame
- dv: (numeric) the dependent (or outcome) variable name.
- wid: variable name specifying the case/sample identifier.
- within: within-subjects factor or grouping variable
get_anova_table() (rstatix package). Extracts the ANOVA table from the output of anova_test(). It returns ANOVA table that is automatically corrected for eventual deviation from the sphericity assumption. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05). Read more in Chapter @ref(mauchly-s-test-of-sphericity-in-r).

1-way RM Anova

The dataset “selfesteem” contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.

t1, t2, t3 = three time points one after the other. Response variable = selfesteem score given by participants in these times.

First, to view the data:

id	t1	t2	t3
1	4.005027	5.182286	7.107831
2	2.558124	6.912915	6.308434
3	3.244241	4.443434	9.778410

The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:

id	time	score
1	t1	4.005027
2	t1	2.558124
3	t1	3.244241

Descriptive statistics

Now we would like to understand the data. We will start with a nice boxplot that shows self esteem scores in all three periods.

ggplot(aes(x = score, y = time, color = time), data = selfesteem) + geom_boxplot() + 
  xlab("Self esteem score") + ylab("Time period") + 
  ggtitle("Self esteem score distribution in three time periods") + 
  theme(legend.position = 'none')

Now we would like to see some summary of the data. Grouped by time when the observations was performed, we can see the range, mean, median and other statistics.

time	variable	n	min	max	median	q1	q3	iqr	mad	mean	sd	se	ci
t1	score	10	2.046	4.005	3.212	2.914	3.486	0.571	0.452	3.140	0.552	0.174	0.395
t2	score	10	3.908	6.913	4.601	4.411	5.301	0.890	0.602	4.934	0.863	0.273	0.617
t3	score	10	6.308	9.778	7.463	6.700	8.440	1.740	1.401	7.636	1.143	0.361	0.817

Assumptions

Before getting into calculating ANOVA we must check if assumptions are met.

NORMALITY

We can check normality assumption using Shapiro test. This can be either performend for total dataset at once (mshapiro_test) or grouped by time.

kbl(mshapiro_test(selfesteem$score))

statistic	p.value
0.9488613	0.1575811

x <- selfesteem %>%
  group_by(time) %>%
  shapiro_test(score)
kbl(x)

time	variable	statistic	p
t1	score	0.9666901	0.8585757
t2	score	0.8758846	0.1169956
t3	score	0.9227150	0.3801563

In both calculations we get p value higher than our significance level (0.05). Hence, we cannot reject H₀ stating that the distribution is in fact normal.

We can also view the QQ plot:

ggqqplot(selfesteem, 'score', facet.by = 'time') + ggtitle("QQ plot grouped by times")

We can continue checking further assumptions:

HOMOGENEITY OF VARIANCE

To check homogeneity of variance we can use Levene’s Test.

kbl(leveneTest(score ~ time, data = selfesteem))

	Df	F value	Pr(>F)
group	2	2.711826	0.0844841
	27	NA	NA

Levene Test shows that the variance is homogeneous.

OUTLIERS

Lastly, we will check whether our data has any significant outliers.

kbl(selfesteem %>%
  group_by(time) %>%
  identify_outliers(score))

time	id	score	is.outlier	is.extreme
t1	6	2.045868	TRUE	FALSE
t2	2	6.912915	TRUE	FALSE

We managed to find 2 outliers. However, they are not extreme and we must remember that our dataset has only 10 observations. We will then continue using all values.

Anova

Now, since we managed to fulfill all assumptions we can calculate ANOVA.

x <- selfesteem %>% 
  anova_test(dv = score, wid = id, within = time) %>%
  get_anova_table()
kbl(x)

Effect	DFn	DFd	F	p	p<.05	ges
time	2	18	55.469	0		0.829

Looking at our result, we can see that we managed to reject H₀. Hence, the self esteem scores are improving significantly during the time.

Post-hoc tests

Now as we rejected H₀, we must perform post-hoc test. We will use pairwise comparison for dependent sample (paired = TRUE).

kbl(pairwise_t_test(data = selfesteem, score ~ time, paired = TRUE, p.adjust.method = "bonferroni"))

.y.	group1	group2	n1	n2	statistic	df	p	p.adj	p.adj.signif
score	t1	t2	10	10	-4.967618	9	7.72e-04	2e-03	**
score	t1	t3	10	10	-13.228148	9	3.00e-07	1e-06	****
score	t2	t3	10	10	-4.867816	9	8.86e-04	3e-03	**

From the result we can see that all pairs are significantly different. We can also review the plot from ggstatsplot package:

ggwithinstats(data = selfesteem, y = score, x = time)

Conclusions

From our calculations, we managed to observe that during the time stated in the task the self esteem scores of the participants significantly increased. The difference is not only between time1 and time3, but is significant in all three pairs.

2-way RM Anova

For Two-Way Repeated Measures ANOVA, “Two-way” means that there are two factors in the experiment, for example, different treatments and different conditions. “Repeated-measures” means that the same subject received more than one treatment and/or more than one condition. Similar to two-way ANOVA, two-way repeated measures ANOVA can be employed to test for significant differences between the factor level means within a factor and for interactions between factors.

Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures, and the data violates the ANOVA assumption of independence. Two-Way Repeated Measures ANOVA designs can be two repeated measures factors, or one repeated measures factor and one non-repeated factor. If any repeated factor is present, then the repeated measures ANOVA should be used.

Please apply Two-way RM-ANOVA to analyze if any of interactions are significant (between time and music, time and image, music and image, or music and time and image)! The response variable is level of stress experienced by a person watching one of 2 movie genres. Interpret your results. Use the following data set:

	PID	stress	image	music
1	1	6	Happy	Horror
61	1	45	Happy	Disney
121	1	66	Happy	Horror
181	1	96	Happy	Horror
241	1	1	Happy	Horror
301	1	52	Angry	Disney

Understanding the data

First, we will try to understand the dataset by creating summary tables. We assume following variables:

PID = participant ID
stress = stress level value given by the participant
image = Happy / Angry type of image shown to the participant
music = Horror / Disney type of music played to the participant

We can see that each of 60 participants took part in 20 tests and each time giving one of 4 possible collocations of music / image.

Now we can view the summary grouped by music and image for all participants at once. This gives us 4 possibilities: 2 musics * 2 images. We can also review the resutls for each participant seperately.

x1 <- myData %>% 
  group_by(image, music) %>%
  get_summary_stats(stress)
kbl(x1)

image	music	variable	n	min	max	median	q1	q3	iqr	mad	mean	sd	se	ci
Angry	Disney	stress	276	1	100	52.0	29.75	77	47.25	37.065	52.058	28.626	1.723	3.392
Angry	Horror	stress	283	1	100	49.0	25.50	76	50.50	37.065	49.855	28.122	1.672	3.291
Happy	Disney	stress	340	1	100	47.5	24.00	77	53.00	38.548	49.768	29.365	1.593	3.133
Happy	Horror	stress	301	1	100	52.0	24.00	76	52.00	37.065	51.060	29.581	1.705	3.355

x2 <- myData %>%
  group_by(PID) %>%
  get_summary_stats(stress)
kbl(head(x2))

PID	variable	n	min	max	median	q1	q3	iqr	mad	mean	sd	se	ci
1	stress	20	1	98	52.5	17.75	74.50	56.75	44.478	49.35	32.896	7.356	15.396
2	stress	20	3	99	51.5	23.75	85.00	61.25	48.184	52.30	31.006	6.933	14.511
3	stress	20	2	100	52.0	31.75	70.00	38.25	33.358	50.80	29.581	6.614	13.844
4	stress	20	3	97	37.0	22.00	60.75	38.75	24.463	43.50	28.203	6.306	13.200
5	stress	20	3	97	36.5	23.75	61.25	37.50	28.169	43.45	28.913	6.465	13.532
6	stress	20	2	99	66.5	29.75	89.50	59.75	44.478	57.35	32.053	7.167	15.001

For each music / image collocation we can view the number of observations, min and max value as well as other statistics. In the second table we divide participants and check statistics for value of stress.

Plots

Now we would like to visualize the distribution of data. First, by possible collocations image-music using a boxplot.

ggplot(data = myData, aes(x = stress, y = image, fill = music)) + geom_boxplot() + 
  ggtitle("Stress result for music + image for all participants") + 
  xlab("Stress score") + ylab('') + scale_fill_brewer(name = "", palette = "Greens")

We can see that actually the general distribution of stress level between all 4 categories is quite similar - looking at all participants’ responses we do not observe any particular patterns.

We can also view the impact of factors on the stress score using plot.design.

plot.design(stress ~ image * music, data = myData, col = "brown", 
            ylab = "Mean of stress score", 
            main = "Impact of factors on stress score")

Here we can see how factors impact observed stress score. We can see that actually music does not have big influence on stress score, higher difference is visible based on image. However, the total impact is negligible - the values on the scale do not even exceed a difference of 1 point.

interaction.plot(myData$music, myData$image, myData$stress, col = c("blue", "purple"), 
                 lty = 1, lwd = 2, ylim = c(48, 52), trace.label = "Image",
                 xlab = "Music type", ylab = "Mean of stress result")

From the interaction plot we can observe that all interactions that could occur are very small - the difference between stress levels under different conditions is barely 2 points (in a 100 scale). Hence, we cannot conclude anything and as the results are randomly generated that leaves us with no significant data.

Transforming the data

Now, we want to group the data by participant ID, music and image. This will reduce our dataset as we have some observations that were made under the same conditions and we are interested only in their mean.

myData2 <- myData %>% 
  group_by(PID, music, image) %>%
  summarise(mean_stress = mean(stress), num = n())

kbl(head(myData2))

PID	music	image	mean_stress	num
1	Disney	Angry	37.500	4
1	Disney	Happy	25.000	4
1	Horror	Angry	84.500	4
1	Horror	Happy	49.875	8
2	Disney	Angry	54.000	6
2	Disney	Happy	66.500	6

Assumptions

Now, we will try to calculate Repeated Measures ANOVA. We must check whether our dataset meets needed assumptions before we will be able to calculate it

NORMALITY

kbl(mshapiro_test(myData2$mean_stress))

statistic	p.value
0.9956031	0.7342363

kbl(myData2 %>%
  group_by(image, music) %>%
  shapiro_test(mean_stress))

music	image	variable	statistic	p
Disney	Angry	mean_stress	0.9794829	0.4070552
Horror	Angry	mean_stress	0.9835274	0.6052111
Disney	Happy	mean_stress	0.9873186	0.7886983
Horror	Happy	mean_stress	0.9905514	0.9283733

We can see that shapiro test returns a p value bigger than our significance level (0.05). Hence, we do not reject H₀ stating that the population is normal.

We can also observe QQ plots:

ggqqplot(myData2, "mean_stress", ggtheme = theme_bw()) + 
  facet_grid(image ~ music, labeller = "label_both")

HOMOGENEITY OF VARIANCE

To check homogeneity of variance we can use Levene Test.

kbl(leveneTest(mean_stress ~ music * image, data = myData2))

	Df	F value	Pr(>F)
group	3	0.3597948	0.7820978
	234	NA	NA

We can see that p-value is high. Hence, we assume the variance is homogeneous.

OUTLIERS

Now we would like to see if our data has any significant outliers.

kbl(myData2 %>%
  group_by(music, image) %>%
  identify_outliers(mean_stress))

music	image	PID	mean_stress	num	is.outlier	is.extreme
Horror	Angry	5	97	1	TRUE	FALSE

We can see that the identify_outliers function returns us one non extreme outlier for participant that was watching angry image with horror music playing. However, it is not extreme value so we will not delete it.

RM ANOVA CALCULATIONS

kbl(anova(aov(data = myData2, mean_stress ~ image * music)))

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
image	1	11.269282	11.269282	0.0539715	0.8164944
music	1	6.528303	6.528303	0.0312657	0.8598018
image:music	1	50.253829	50.253829	0.2406784	0.6241763
Residuals	234	48859.365968	208.800709	NA	NA

Looking at our results we see that we are not able to reject any null hypothesis. Hence, there is no need to perform post-hoc tests.