BIO2POS Lecture Topic 3B

class: middle
background-image: url(data:image/png;base64,#LTU_logo_clear.jpg)
background-position: top left
background-size: 25%

# BIO2POS 
# Repeated Measures ANOVA
## Data Analysis Topic 3B
### La Trobe University

---

# Welcome!

### In this lecture we will continue our focus on ANOVAs and introduce the repeated measures ANOVA and its non-parametric equivalent.

Over the following slides, we will cover:

* .orangered_style[Repeated Measures ANOVA]

* Definition and Hypotheses
--

* Post-hoc testing via multiple comparisons
--

* Interpreting Output
--

* Assumptions
--

* .orangered_style[Friedman Test]

---

# Intended Learning Objectives

### By the end of this lecture you will:

* understand when and how to use a .orangered_style[repeated measures ANOVA]

* be able to assess whether repeated measures ANOVA test assumptions have been met

* understand when and how to use a .orangered_style[Friedman test]
  
--

* be able to correctly .seagreen_style[interpret] and .seagreen_style[summarise] the results of the above tests
  
--

The content you learn in Topics 3A and 3B extends the skills you have developed in Topics [2A](https://rpubs.com/LTU_BIO2POS/DA2A) and [2B](https://rpubs.com/LTU_BIO2POS/DA2B).

We will practice content from this topic in this week's DA computer lab, and the computer lab has some additional extension material if you would like to extend your knowledge.

---

# Repeated Measures ANOVA

Recall that we introduced the .orangered_style[One-way ANOVA] in [Topic 3A](https://rpubs.com/LTU_BIO2POS/DA3A).

* The one-way ANOVA can be used to compare the means of **two or more** independent groups, and generalises the two sample `$t$`-test

If we are assessing  **two or more dependent groups**, we can instead use a .orangered_style[Repeated Measures ANOVA].

* We can use a repeated measures ANOVA when **the same individuals are measured multiple times**, for one or more variables of interest
  
--

* The repeated measures ANOVA generalises the .orangered_style[paired *t* test]

* In ED terminology, we are dealing with .seagreen_style[within subject design] data, with each individual being a .seagreen_style[block]

---

# Repeated Measures ANOVA Hypotheses

Suppose we have `$k$` observations recorded for a variable of interest, for each individual in a study `$(k \geq 2)$`.

* They might denote recordings made under different conditions
   
    * E.g. heart rate after walking, running, swimming
    
--

* Or they may denote the same condition at different time points
    
    * E.g. muscle mass over time in an exercise program

Let `$\mu_i$` denote the population mean for recording `$i$`, where `$i = 1, 2, \ldots, k$`.

Our .orangered_style[repeated measures ANOVA hypotheses] will be:

`$$H_0: \mu_1 = \mu_2 = \cdots = \mu_k \text{ vs } H_1: \text{ Not all means are equal}$$`
--

*Just as for the one-way ANOVA, it is important to note here that our alternate hypothesis is not saying that all the means are unequal necessarily, but rather that* **at least 2 means are not equal**.

---

# Repeated Measures ANOVA Benefits

Some of the benefits of using a repeated measures ANOVA include:

* .seagreen_style[Cost effective] - measure the same individuals for all conditions/time points
  
--

* .seagreen_style[Minimise error variance] - Individuals act as their own controls, meaning any variation we observe in our results is more likely to be due to the factor(s) in which we are interested
  
--

* .seagreen_style[Flexibility] in experiment design - data can be measured over days, months, years, decades

---

# Repeated Measures ANOVA - Sauna Example

Chang et al. (2023) assessed the neurological, psychological and physiological effects of .seagreen_style[sauna bathing].

Initial measurements were taken for all participants, before the sauna group (10 individuals) underwent three sets of bathing:

* Each set consisted of a .seagreen_style[hot sauna] (10 mins), a .seagreen_style[cold bath] (1- 2 mins), and then .seagreen_style[outdoor rest] (7 mins)

* As part of the study, participants completed .seagreen_style[self-report measure to assess relaxation effects (S-MARE)] questionnaires initially and after each bathing set (agreement scale 0 to 100)

* In this example, we will focus on the effect of sauna bathing on individuals' responses to the psychological calm question 
  
.center[*Question 23: Right now, I am completely calm*
  ]
  
---

#RM ANOVA Hypotheses - Sauna Example

For this scenario, we have `$k=4$` observations for each individual.

* Let `$\mu_{pre}$` denote the population mean response score for Q23 pre-sauna
  
--

* Let `$\mu_{post1}$`, `$\mu_{post2}$` and `$\mu_{post3}$` denote the population mean response scores for Q23 after the first, second and third set of bathing, respectively

Our .orangered_style[repeated measures ANOVA hypotheses] will be:

`$$H_0: \mu_{pre} = \mu_{post1} = \mu_{post2} = \mu_{post3} \text{  vs  } H_1: \text{ Not all means are equal}$$`

---

# Sauna Example - Descriptives Plot

.left-column[

The Level of Calm sample means look quite different
{{content}}
]

This suggests a repeated measures ANOVA test is worthwhile

.right-column[
<img src="data:image/png;base64,#sauna_q23_marginal_means.png" width="450px" style="display: block; margin: auto;" />
]

---

# Repeated Measures ANOVA jamovi output

---

# What next?

At this stage, we have our initial results, and it might be tempting to conclude our analysis is done (we have `$p < 0.05$`).

However, this is just the beginning - we also have to consider:

1. The appropriate .orangered_style[*F*-test version] to use
  
--

2. The clinical importance of our results (.orangered_style[effect size])
  
--

3. What to do if we reject `$H_0$` `$(p\text{-value} < \alpha)$`
 
--

4. What to do if the .orangered_style[test assumptions are violated]
  
--

These are the concepts we will cover during the rest of this lecture.

---

# Repeated Measures ANOVA `$F$`-test versions

For a repeated measures ANOVA, just like for a one-way ANOVA we compute a .orangered_style[test statistic *F*], which follows an `$F$`-distribution, with two separate degrees of freedom values, `$df_1$` and `$df_2$` `$(F \sim F_{df_{1}, df_{2}})$`

The calculations for `$df_1$` and `$df_2$` are slightly different to the one-way ANOVA case, with:

* `$df_1 = k - 1$`

* `$df_2 = (n-1) \times (k-1)$`

* E.g. for the sauna example, `$df_2 = (10-1) \times (4-1) = 27$`

* If we obtain a large `$F$` value, this suggests one or more of the means of the different conditions/time points is/are different

* A small `$F$` value suggests the means of the different conditions/time points are all similar

To compute the repeated measures `$F$` test statistic, the process is similar to that of a one-way ANOVA, although we can further partition the sources of variation.

---

# Sphericity

Our specific calculation of the  `$F$` test statistic depends upon a key repeated measures ANOVA assumption - .orangered_style[Sphericity].

Sphericity is the assumption that the **variances of the differences** between all possible pairs of within-subject conditions are equal.

* For the sauna example, we would be assuming that the variances of the differences (for each individual) between `pre-post1`, `pre-post2`, `pre-post3`, `post1-post2`, `post1-post3` and  `post2-post3` were all equal

* We can think of this as the repeated measures equivalent of the one-way ANOVA assumption of equal variances between groups

---

# Mauchly's Sphericity Test

We can test this assumption using .orangered_style[Mauchly's Sphericity Test], which has the hypotheses:

`$$H_0: \text{variances of differences in pairs all equal  vs  }$$`

`$$H_1: \text{ Not all such variances are equal}$$`
--

* We can think of this as the repeated measures equivalent of the .orangered_style[Levene's Test]

If the Mauchly's Sphericity Test `$p$`-value is less than `$0.05$`, we reject `$H_0$`. This means we **should not use** the default repeated measures ANOVA `$F$` test statistic value, as the Type I error rate may be inflated.

Instead, we should use an `$F$` test statistic with a .seagreen_style[correction] applied:

* .orangered_style[Greenhouse-Geisser] (use if `$\epsilon < 0.75$`)
 
 * .orangered_style[Huynh-Feldt] (less conservative)

---

* Note how the curves change for even small `$df_1$` and `$df_2$` changes
---

# RN ANOVA jamovi output - all options

* *The main RMANOVA outcome remains the same (reject `$H_0$`), although some values change slightly. This is not always the case - sometimes changes can be large*
  
---

# ANOVA Effect Sizes

Just as for the `$t$`-tests we considered previously, we can compute various effect sizes for our one-way ANOVA and repeated measures ANOVA.

* Recall we use effect sizes to determine the .seagreen_style[clinical significance] of our results
  
--

There are numerous ANOVA effect size options - we will focus on `$\eta^2$` (.orangered_style[eta-squared]).

We can think of `$\eta^2$` as a measure of the proportion of variation in our response variable which can be attributed to the independent variable.

* E.g.: What proportion of the variation in the responses to Q23: *Right now, I am completely calm* can be attributed to the sauna status?
  
--

Therefore, the larger the effect size, the more relevant the selected independent variable is, in explaining the variation we observe in our results.

---

# Interpreting the `$\eta^2$` Effect Size

The following conventions apply for interpreting `$\eta^2$` (J. Cohen, 1988):

.shadedbox[ .center[
`$\eta^2 < 0.01$`: "negligible effect size"

`$0.01 \leq \eta^2 < 0.06$`: "small effect size"

`$0.06 \leq \eta^2 < 0.14$`: "medium effect size"

`$\eta^2 \geq 0.14$`: "large effect size"
]
]

Note that these are very different to the Cohen's `$d$` specifications for `$t$`-tests!

---

# RM ANOVA Post-hoc Tests

If our initial repeated measures ANOVA `$F$` test statistic provides evidence to reject `$H_0$`, we should then perform .orangered_style[post-hoc tests] to determine which population means are statistically significantly different.

We will focus on two .orangered_style[pairwise comparison] post-hoc tests:

* The .orangered_style[Tukey HSD] post-hoc test - a robust, good all-round post-hoc test choice

* The .orangered_style[Bonferroni Correction] post-hoc test - this can be overly conservative

---

# Post-hoc Tests - Sauna Example

Note that, while our initial test result was significant, the pairwise comparisons are all .orangered_style[non-significant], once we account for multiple comparisons.

---

# Multiple Comparisons Note

Whenever we conduct statistical inference, there is a chance of observing a .orangered_style[Type I error (false positive)].

* Recall `$\alpha$` denotes our .orangered_style[level of significance] (our accepted Type I error rate)

If we carry out a single test, with `$\alpha = 0.05$`, our Type I error rate is 5%.

However, by conducting multiple tests simultaneously, we **compound the chance of observing false positives**.

* Consider our sauna example, with 6 post-hoc pairwise comparisons, and an initial `$\alpha = 0.05$`. We want a 95% chance of not observing a Type I error for each test. But overall, our chance of not observing **any** Type I errors becomes:

`$$0.95 \times 0.95 \times 0.95 \times 0.95 \times 0.95 \times 0.95 = 0.95^6 \approx 0.735$$`
--

* Here, our Type I error rate has increased to an unacceptable 26.5%, using just 6 tests! This only gets worse as the number of simultaneous tests increases

This is why we apply the .seagreen_style[statistical correction methods], to ensure each individual comparison is conducted at the `$\alpha$` initially specified.

---

# RM ANOVA Summary - Sauna Example

A .orangered_style[repeated measures ANOVA] was conducted to determine whether .seagreen_style[sauna bathing] had an impact on individuals' .seagreen_style[psychological calm]. The sample size was `$n=10$`. The mean psychological calm S-MARE scores (0-100) across the 4 statuses were different, with:

* .seagreen_style[pre] `$(M_{pre} = 48.594, SD_{pre} = 26.881)$`,

* .seagreen_style[post1] `$(M_{post1} = 60.990, SD_{post1} = 19.591)$`, 
  
  * .seagreen_style[post2] `$(M_{post2} = 73.573, SD_{post2} = 15.130)$`, and
  
  * .seagreen_style[post3] `$(M_{post3} = 73.062, SD_{post3} = 20.319)$`

There was a .orangered_style[clinically significant] and .orangered_style[statistically significant] effect of sauna status on psychological calm, at the `$\alpha = 0.05$` level of significance, with a large effect size of `$\eta^2 = 0.211$`, and `$F(3, 27) = 3.846$`, `$p = 0.021 < 0.05$`.

However, Tukey and Bonferroni Correction post-hoc tests indicated that none of the pairwise comparisons were statistically significant (all `$p$`-values `$> 0.05$`). This may be due to the two-tailed nature of these tests.

---

# Repeated Measures ANOVA assumptions

When we conduct a repeated measures ANOVA, we make several assumptions.

To ensure our analysis procedure and conclusion are valid, we should always check these assumptions!

### Key Assumptions

1. The response should be a continuous .orangered_style[numeric] variable

2. .orangered_style[Observations are independent] (e.g. we have not studied twins, triplets etc)
  
--

3. .orangered_style[Sphericity] (already covered)

4. The response variable should follow a .orangered_style[normal distribution], within each level of our categorical/factor variable (e.g. pre-sauna S-MARE scores should be normally distributed, etc)
  
    * This is equivalent to the **residuals being normally distributed**, just like the one-way ANOVA case

---

# Checking RM ANOVA assumptions

We have previously covered how to check the Sphericity assumption.

We will focus here on checking the normality of residuals assumption (4). The checks should look familiar:

### How to check

* <del>.orangered_style[Histogram] with normal/density curve overlaid </del>
 
--

* .orangered_style[Normal Q-Q Plot] (only option currently available in jamovi)
  
--

* <del>Formal statistical test: .orangered_style[Shapiro-Wilk test] and/or .orangered_style[Kolmogorov-Smirnov test]</del>

---

# RM ANOVA - Sauna Residuals Normality Check

.left-column[
  * The Q-Q plot suggests some non-normality (note deviation from line), although some fluctuations are expected for small sample sizes
]

.right-column[
<img src="data:image/png;base64,#sauna_qqplot.png" width="500px" style="display: block; margin: auto;" />

]

---

# Residuals Note

We could also take a look at the density curves for the original scores.

* Note for example the clear non-normality for the `post3` responses:

---

# Next Steps when Assumptions Fail

Fortunately, if one or more of the repeated measures ANOVA assumptions have failed, we have several options available:

* If .orangered_style[Mauchly's Sphericity Test] shows our sphericity assumption has failed, we can still use the .orangered_style[Greenhouse-Geisser] or .orangered_style[Huynh-Feldt *F*-test] versions of the repeated measures ANOVA
  
--

* Use the .orangered_style[Friedman test]. This test:
  
--

* Does not assume residuals follow normal distribution
    
--

* Is robust to skewed distributions, but not always as powerful as the `$F$`-tests

---

# Friedman Test

The .orangered_style[Friedman Test] is a non-parametric test for one-way repeated measures ANOVA, which we can use when our normality of residuals assumption is violated.

It calculates a rank for each individual (block) across the different groups, and tests for clear differences in ranks across groups.

* We can think of it as being an extension of the .orangered_style[Wilcoxon Signed-Rank Test] (which is for 2 dependent groups)
  
--

We have:

`$$H_0: \text{ Pop. distributions are equal vs } H_1: \text{ Pop. distributions are not all equal}$$`
--

* *This is one of several options (e.g. Dunn's Test is also a good choice), but only Friedman's Test is available in jamovi)*
  
--

For post-hoc pairwise comparisons, we can use tests like the .orangered_style[Durbin-Conover test].

---

# Friedman Test - Sauna Example

*Note that the pairwise comparison `$p$`-values are unadjusted here, so should be viewed with caution.*

---

# Friedman Test Summary - Sauna Example

A .orangered_style[Friedman Test] was conducted to compare the distributions of S-MARE scores (0-100) for a question on psychological calm, given by individuals across 4 sauna bathing statuses.

The differences in distributions were .orangered_style[statistically non-significant] across sauna statuses at the `$\alpha = 0.05$` level of significance, with `$\chi^2 = 6.5$`, `$df = 3$`, `$n=10$`, `$p= 0.09 > 0.05$` (two-tailed).

*Note if the initial test is not statistically significant, we would not conduct the post-hoc tests - the image on the previous page was included for discussion purposes.*

---

# Summary

A .orangered_style[repeated measures ANOVA] can be used to compare means of **two or more dependent groups**.

* We can use the `$\eta^2$` effect size to check for .orangered_style[clinical significance]
  
--

* We need to check the assumptions of our repeated measures ANOVA

* If the Sphericity assumption fails, we can use the .orangered_style[Greenhouse Geisser] or  .orangered_style[Huynh-Feldt] versions of the `$F$` test statistic

* If the normality of residuals assumption fails, we can use the non-parametric .orangered_style[Friedman Test]
  
--

* If our initial results are statistically significant, we need to perform .orangered_style[post-hoc tests] to determine which specific conditions/time points is/are different

---

# ANOVA Extensions

The repeated measures ANOVA we have discussed is technically a one-way repeated measures ANOVA (we considered one categorical variable).

While beyond the scope of this subject, several extensions to ANOVA exist, e.g.:

* .orangered_style[Mixed ANOVA]: Contains within subjects and between subjects variables (e.g. sauna status and gender)
  
--

* .orangered_style[Two-way ANOVA]: Two categorical variables rather than one (e.g. sauna status and time of day - morning vs evening)
  
--

* .orangered_style[Interaction Effects]: Two categorical variables plus the effect of one variable depends on the level of the other variable
  
--

* .orangered_style[ANCOVA]: Include numerical and categorical predictors to help model a numerical dependent variable
  
--

* .orangered_style[Linear Mixed Effects Models]: A more advanced modelling technique you may encounter in future studies

*Note that we will cover regression/linear modelling in a future topic.*

---

# End

That concludes our lecture on repeated measures ANOVA.

### What to do next:

* .seagreen_style[Quick Kahoot revision quiz]: Please go to [kahoot.it](kahoot.it) and type in the code shown

* Make sure to attend this week's DA computer lab

* If you have any questions, check the LMS, email us or ask in the computer labs

### Optional Further Reading

* Parts from Kokoska (2020) Chapters 11, 14
  
---

# References

* Chang, M., Ibaraki, T., Naruse, Y., & Imamura, Y. (2023). A study on neural changes induced by sauna bathing: Neural basis of the “totonou” state. PloS One, 18(11), e0294137–e0294137. [https://doi.org/10.1371/journal.pone.0294137](https://doi.org/10.1371/journal.pone.0294137)

* Cohen, J. (1988). *Statistical Power Analysis for the Behavioral Sciences*. 2nd edition. New York: Academic Press.

* Girden, E. R. (1992). ANOVA: repeated measures. Sage Publications.

* Kokoska, S. (2020). Introductory statistics: a problem-solving approach (Third edition..). W H FREEMAN.

* The jamovi project. (2022). *Jamovi [Computer Software]*. [https://www.jamovi.org](https://www.jamovi.org).

---
class: middle

These notes have been prepared by Rupert Kuveke, Amanda Shaker, and other members of the Department of Mathematical and Physical Sciences. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>