Introduction
The selfesteem data from datarium package contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved. The goal of this study is to determine if there is statistical difference between any pair of populations (time points) and also to find these pairs in further studies. Null hypothesis: There is no statistically significant difference between mean self-esteem scores in each time point.
RM Anova in R
Key R functions:
anova_test()[rstatix package], a wrapper aroundcar::Anova()for making easy the computation of repeated measures ANOVA. Key arguments for performing repeated measures ANOVA:data: data framedv: (numeric) the dependent (or outcome) variable name.wid: variable name specifying the case/sample identifier.within: within-subjects factor or grouping variable
get_anova_table()[rstatix package]. Extracts the ANOVA table from the output ofanova_test(). It returns ANOVA table that is automatically corrected for eventual deviation from the sphericity assumption. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05). Read more in Chapter (mauchly-s-test-of-sphericity-in-r).
1-way RM Anova
Head of the dataset:| id | t1 | t2 | t3 |
|---|---|---|---|
| 1 | 4.005027 | 5.182286 | 7.107831 |
| 2 | 2.558124 | 6.912915 | 6.308434 |
| 3 | 3.244241 | 4.443434 | 9.778410 |
The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points. So let’s convert this data frame into long format:
selfesteem <- selfesteem %>%
gather(key = "time", value = "score", t1, t2, t3) %>%
convert_as_factor(id, time)Descriptive statistics
Summary statistics for each time point during the diet:| time | variable | n | mean | sd |
|---|---|---|---|---|
| t1 | score | 10 | 3.140 | 0.552 |
| t2 | score | 10 | 4.934 | 0.863 |
| t3 | score | 10 | 7.636 | 1.143 |
Distribution of all self-esteem scores is investigated. It is visibly skewed to the right which means that mean value is greater than the median.
From the distribution of scores for each group it can be seen that the mean scores are higher over time stamps.
All together:
Assumptions
The repeated measures ANOVA makes the following assumptions about the data:
No significant outliers in any cell of the design. This can be checked by visualizing the data using box plot methods and by using the function
identify_outliers()[rstatix package].Normality: the outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (
shapiro_test()[rstatix]) or by visual inspection using QQ plot (ggqqplot()[ggpubr package]).Assumption of sphericity: the variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity, which is automatically reported when using the R function
anova_test()[rstatix package].
Note that, if the above assumptions are not met there are a non-parametric alternative (Friedman test) to the one-way repeated measures ANOVA!
Unfortunately, there are no non-parametric alternatives to the two-way and the three-way repeated measures ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way/three-way repeated measures ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.
If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way/three-way repeated measures ANOVA on the original data.
It’s also possible to perform robust ANOVA test using the WRS2 R package.
No outliers
| time | id | score | is.outlier | is.extreme |
|---|---|---|---|---|
| t1 | 6 | 2.045868 | TRUE | FALSE |
| t2 | 2 | 6.912915 | TRUE | FALSE |
There are two outliers, however they are not extreme so that the assumption is met.
Normality
From the plot above, as all the points fall approximately along the reference lines, we can assume normality.
| time | variable | statistic | p |
|---|---|---|---|
| t1 | score | 0.9666901 | 0.8585757 |
| t2 | score | 0.8758846 | 0.1169956 |
| t3 | score | 0.9227150 | 0.3801563 |
According to results of Shapiro-Wilk test, for each time point p-value > 0.05. For that reason it can be concluded that selfesteem scores are normally distributed for each time point.
ANOVA
results <- anova_test(data = selfesteem, dv = score, wid = id, within = time)
tab_aov <- get_anova_table(results)| Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|
| time | 2 | 18 | 55.469 | 2.01e-08 |
|
0.829 |
Post-hoc tests
To test which pair of means is different post-hoc tests will be conducted.| .y. | group1 | group2 | n1 | n2 | statistic | df | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|---|
| score | t1 | t2 | 10 | 10 | -4.967618 | 9 | 7.72e-04 | 2e-03 | ** |
| score | t1 | t3 | 10 | 10 | -13.228148 | 9 | 3.00e-07 | 1e-06 | **** |
| score | t2 | t3 | 10 | 10 | -4.867816 | 9 | 8.86e-04 | 2e-03 | ** |
All the pairwise differences are statistically significant.
Conclusions
Below plot summarises the analysis conducted within three groups.
Fisher’s repeated measures one-way ANOVA results show that, across 10 individuals’ self-esteem score to be measured three times during a diet, there was a statistically significant difference between at least one pair of means.
Effect size
library(effectsize)
interpret_omega_squared(c(0.82), rules = "field2013")## [1] "large"
## (Rules: field2013)
Omega squared (ω2) is a measure of effect size. Since its values are in the range (-1, 1), the value of 0.82 is interpreted as large. There is a
2-way RM Anova
For Two-Way Repeated Measures ANOVA, “Two-way” means that there are two factors in the experiment, for example, different treatments and different conditions. “Repeated-measures” means that the same subject received more than one treatment and/or more than one condition. Similar to two-way ANOVA, two-way repeated measures ANOVA can be employed to test for significant differences between the factor level means within a factor and for interactions between factors.
Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures, and the data violates the ANOVA assumption of independence. Two-Way Repeated Measures ANOVA designs can be two repeated measures factors, or one repeated measures factor and one non-repeated factor. If any repeated factor is present, then the repeated measures ANOVA should be used.
Please apply Two-way RM-ANOVA to analyze if any significant interactions (between time and music, time and image, music and image, or music and time and image)! Use the following data set:
| PID | stress | image | music | |
|---|---|---|---|---|
| 1 | 1 | 90 | Happy | Horror |
| 61 | 1 | 7 | Angry | Disney |
| 121 | 1 | 31 | Happy | Disney |
| 181 | 1 | 68 | Angry | Disney |
| 241 | 1 | 6 | Happy | Disney |
| 301 | 1 | 80 | Angry | Horror |
Extracting Condition Means
Before we can run our ANOVA, we need to find the mean stress value for each participant for each combination of conditions.| PID | music | image | stress | |
|---|---|---|---|---|
| 1 | 1 | Disney | Angry | 26.57143 |
| 61 | 1 | Horror | Angry | 46.60000 |
| 121 | 1 | Disney | Happy | 43.85714 |
| 181 | 1 | Horror | Happy | 90.00000 |
| 2 | 2 | Disney | Angry | 26.16667 |
| 62 | 2 | Horror | Angry | 64.60000 |
| music | image | variable | statistic | p |
|---|---|---|---|---|
| Disney | Angry | stress | 0.9770062 | 0.3153384 |
| Horror | Angry | stress | 0.9845936 | 0.6489055 |
| Disney | Happy | stress | 0.9643932 | 0.0773124 |
| Horror | Happy | stress | 0.9877208 | 0.8158102 |
It is visible that data is normally distributed.
Also Shapiro-Wilk tests show that p-value for each combination is greater than 0.05. Normality is violated. However let’s continue the study to see the results.
ANOVA
##
## Error: PID
## Df Sum Sq Mean Sq F value Pr(>F)
## music 1 21 21.37 0.129 0.72
## Residuals 58 9586 165.27
##
## Error: PID:music
## Df Sum Sq Mean Sq F value Pr(>F)
## music 1 61 61.01 0.341 0.562
## image 1 6 6.02 0.034 0.855
## Residuals 58 10388 179.10
##
## Error: PID:image
## Df Sum Sq Mean Sq F value Pr(>F)
## image 1 582 581.8 2.399 0.127
## music:image 1 406 405.9 1.674 0.201
## Residuals 58 14066 242.5
##
## Error: PID:music:image
## Df Sum Sq Mean Sq F value Pr(>F)
## music:image 1 767 766.8 3.863 0.0542 .
## Residuals 58 11513 198.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see that there is no main effect neither of music nor image because p-value for both of these is way over 0.05. There is also no statistical evidence that there is a significant interaction effect between the two independent variables. p-value = 0.054, so it is very close to the significance level of 0.05.
Data transformation
Combination of image and music is transformed into a factor which represents ‘time point’ measurement. One new row is added to balance number of observations for each combination. Its stress value is interpolated by median in this category.
myData.new <- mutate(myData.mean, time = as.factor(paste0(music, image)))[,c(1,4,5)]
table(myData.new$time) ##
## DisneyAngry DisneyHappy HorrorAngry HorrorHappy
## 60 60 60 59
temp <- myData.new %>% filter(time == 'HorrorHappy')
df <- data.frame(PID = 61, stress = median(temp$stress), time = factor("HorrorHappy"))
myData.new <- rbind(myData.new, df)| PID | stress | time | |
|---|---|---|---|
| 1 | 1 | 26.57143 | DisneyAngry |
| 61 | 1 | 46.60000 | HorrorAngry |
| 121 | 1 | 43.85714 | DisneyHappy |
| 181 | 1 | 90.00000 | HorrorHappy |
Conclusions
Fisher’s repeated measures one-way ANOVA results show that, across 60 participants’ stress score to be measured in four factor configurations, there was no statistically significant difference between any pair of means since p-value > 0.05.