STM1001 Topic 9B Lecture

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 Topic 9B Lecture
## Repeated Measures Analysis
### La Trobe University

---

# Topic 9B: Repeated Measures Analysis

### In this topic, we will discuss Repeated Measures Analysis, and in particular, Repeated Measures ANOVA.

---

# Repeated Measures Analysis

* In [Lecture 7](https://rpubs.com/a_shaker/L7) and the associated 
[Computer Lab](https://rpubs.com/LTU_STM1001/CL8_j), we learnt about one-way ANOVA

* You may recall that one-way ANOVA allows us to test for a difference in means between ***two or more independent groups***

* In that sense, we can think of one-way ANOVA as an extension of the independent samples `$t$`-test, which tests for differences in means between ***two independent groups***

* In a similar way, Repeated Measures ANOVA can be thought of as an extension to paired `$t$`-test

* While the paired `$t$`-test tests for mean differences between ***two dependent groups***, Repeated Measures ANOVA allows us to test for differences in means between ***two or more dependent groups***.

---

# What is a repeated measures data set?

.content-box-blue[
***Repeated measures data sets*** refer to data sets for which one or more of the variables of interest have been measured on more than one occasion for each individual.
]

* So - where we have two measurements per individual, we can use the paired `$t$`-test

* Where we have two or more measurements per individual, we can use ***Repeated Measures ANOVA***

---
# Repeated Measures ANOVA Example

* We will consider an example from the `datarium` R package (Kassambara, 2019)

* In particular, we will be using a data set called `selfesteem`

* This data set records the self-esteem score of ten individuals three times each, i.e., at three different time points

* Shortly, we will be using repeated measures ANOVA to see whether there was a significant change in the average self-esteem scores over time

---
# Repeated Measures ANOVA Example

<div id="htmlwidget-1d1d32e2e3ed1f595295" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-1d1d32e2e3ed1f595295">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10"],[1,2,3,4,5,6,7,8,9,10],[4.005,2.5581,3.2442,3.4195,2.8712,2.0459,3.526,3.1794,3.508,3.0438],[5.1823,6.9129,4.4434,4.7117,3.9084,5.3405,5.5807,4.3702,4.3998,4.4894],[7.1078,6.3084,9.7784,8.347099999999999,6.4573,6.6532,6.8402,7.8186,8.4712,8.581099999999999]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>id<\/th>\n      <th>t1<\/th>\n      <th>t2<\/th>\n      <th>t3<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[1,2,3,4]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---
# Repeated Measures ANOVA Example

As we can see, the data set contains the following variables:

* `id` : the ID of the individual (ranges from 1 to 10)
* `t1` : self-esteem score at time-point 1
* `t2` : self-esteem score at time-point 2
* `t3` : self-esteem score at time-point 3

* So, there is one row for each individual in the data set

* This means data set is currently in "**wide** format"
--
 (as opposed to "**long** format", which has one row per time-point and multiple rows for each individual). 
 
--

* We will begin by visualising the data via boxplots

* In this week's computer lab, you will also produce other descriptive statistics and plots to summarise the data

---
# Repeated Measures ANOVA Example

* There does appear to be an association between self-esteem score and time. We will carry out a hypothesis test to ascertain whether or not this association is significant

---
# Repeated Measures ANOVA Example

The hypotheses for a repeated measures ANOVA can be set up as follows:

$$ H_0 : \mu_1 = \mu_2 = \ldots = \mu_k \text{     versus     } H_1 : \text{not all } \mu_i \text{'s are equal,}$$
where:

* For some number of `$k$` time-points (or conditions), `$\mu_1, \mu_2, \ldots, \mu_k$` denote the true population mean for time-point 1, time-point 2, ..., and time-point `$k$` respectively.

---
# Repeated Measures ANOVA Example

In our example, we have:

$$ H_0 : \mu_1 = \mu_2 = \mu_3 \text{     versus     } H_1 : \text{not all } \mu_i \text{'s are equal,}$$
where:

* `$\mu_1, \mu_2,$` and `$\mu_3$` denote the population mean self-esteem score for time-points 1, 2, and 3 respectively.

We also have that:

1. The **dependent** (response) variable is **score**
1. The **independent** (explanatory) variable is **time**

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Kahoot!

## Go to [kahoot.it](https://kahoot.it/) and use

## the code provided

---
# Repeated Measures ANOVA Output

```r
Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   4.57  0.5078

Error: Within
                Df Sum Sq Mean Sq F value   Pr(>F)    
as.factor(time)  2 102.46   51.23   55.47 2.01e-08 ***
Residuals       18  16.62    0.92                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

---
# Repeated Measures ANOVA Output

```r
Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   4.57  0.5078

Error: Within
                Df Sum Sq Mean Sq F value   `Pr(>F)`    
as.factor(time)  2 102.46   51.23   55.47 `2.01e-08` ***
Residuals       18  16.62    0.92                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

* The ** `$p$`-value** is close to 0, i.e. `$p < .001$`

---
# Repeated Measures ANOVA Output

```r
Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   4.57  0.5078

Error: Within
                Df Sum Sq Mean Sq `F value`   `Pr(>F)`    
as.factor(time)  2 102.46   51.23   `55.47` `2.01e-08` ***
Residuals       18  16.62    0.92                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

* The ** `$p$`-value** is close to 0, i.e. `$p < .001$`
* The test statistic is `$F = 55.47$`

---
# Repeated Measures ANOVA Output

```r
Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   4.57  0.5078

Error: Within
                 `Df` Sum Sq Mean Sq `F value`   `Pr(>F)`    
 `as.factor(time)` `2` 102.46   51.23   `55.47` `2.01e-08` ***
 Residuals       18  16.62    0.92                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

* The ** `$p$`-value** is close to 0, i.e. `$p < .001$`
* The test statistic is `$F = 55.47$`
* The first degrees of freedom, `$d_1$` is 2.  We can also calculate this as `$d_1 = r - 1$`, where `$r$` is the number of time-points (or conditions)

---
# Repeated Measures ANOVA Output

```r
Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   4.57  0.5078

Error: Within
                 `Df` Sum Sq Mean Sq `F value`   `Pr(>F)`    
 `as.factor(time)` `2` 102.46   51.23   `55.47` `2.01e-08` ***
 `Residuals`       `18`  16.62    0.92                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

* The ** `$p$`-value** is close to 0, i.e. `$p < .001$`
* The test statistic is `$F = 55.47$`
* The first degrees of freedom, `$d_1$`, is 2.  We can also calculate this as `$d_1 = r - 1$`, where `$r$` is the number of time-points (or conditions)
* The second degrees of freedom, `$d_2$`, is 18. We can also calculate this as `$d_2 = (n - 1)(r - 1)$`, where `$n$` is equal to the number of individuals

---
# Repeated Measures ANOVA Output

```r
Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9   4.57  0.5078

We can conclude with the following summary:

*There **was** a significant difference in mean self-esteem score `$[F(2, 18) = 55.47, p < .001]$` across time.*

---
# Repeated Measures ANOVA Example

* Since we have found evidence of a significant difference across time, we can also carry out post-hoc tests to see which pairs of time-points have significant differences

* We will cover post-hoc tests following Repeated Measures ANOVA in this topic's computer lab

---

# References

Kassambara, A. (2019). _datarium: Data Bank for Statistical Analysis
and Visualization_. R package version 0.1.0. URL:
[https://CRAN.R-project.org/package=datarium](https://CRAN.R-project.org/package=datarium).

---

background-image: url(data:image/png;base64,#computerlab.jpg)
background-position: bottom
background-size: 75%
class: center

# See you in the computer labs!

---
class: middle

<font color = "grey">
These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>
</font>