3-way Anova
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.
We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Descriptive statistics
At first, let’s have a look on a headache dataset:
ids <- sample(1:nrow(headache), 8)
samp <- headache[ids, ]
kbl(samp) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))| id | gender | risk | treatment | pain_score |
|---|---|---|---|---|
| 31 | male | high | Z | 74.4 |
| 51 | female | low | Z | 65.4 |
| 14 | male | low | Z | 68.3 |
| 67 | female | high | Z | 75.0 |
| 42 | female | low | X | 78.1 |
| 50 | female | low | Z | 72.1 |
| 43 | female | low | Y | 70.1 |
| 70 | female | high | Z | 80.4 |
Then, we can check the summary of pain scores data:
summary(headache$pain_score)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 63.7 72.9 77.9 77.6 81.5 100.0
Now we can examine how scores are distributed depending on used treatment method, risk of migraine and the gender of participants of the research. Then, let’s have a closer look on the means and quantiles of scores within the risk groups.
To get more complete picture we will compute summary statistics subsequently for:
- each treatment group
| treatment | variable | n | mean | sd |
|---|---|---|---|---|
| X | pain_score | 24 | 80.5 | 8.57 |
| Y | pain_score | 24 | 76.3 | 7.31 |
| Z | pain_score | 24 | 76.2 | 5.88 |
- subgroups created from each possible combination of risk, gender and treatment
gender risk treatment variable n min max median iqr mean sd se ci male high X pain_score 6 86.3 100.0 93.4 6.40 92.7 5.12 2.09 5.37 male low X pain_score 6 70.8 81.2 75.9 5.10 76.1 3.85 1.57 4.04 female high X pain_score 6 68.4 82.7 81.1 2.29 78.9 5.32 2.17 5.58 female low X pain_score 6 68.6 78.1 74.6 5.05 74.2 3.69 1.51 3.87 male high Y pain_score 6 77.5 91.2 81.2 4.95 82.3 5.00 2.04 5.25 male low Y pain_score 6 67.9 80.7 73.4 5.40 73.1 4.76 1.95 5.00 female high Y pain_score 6 73.1 86.6 81.8 3.65 81.2 4.62 1.89 4.85 female low Y pain_score 6 63.7 74.7 68.7 4.56 68.4 4.08 1.67 4.28 male high Z pain_score 6 74.4 85.1 80.4 5.38 79.7 4.05 1.65 4.25 male low Z pain_score 6 68.3 80.4 74.8 7.24 74.5 4.89 2.00 5.13 female high Z pain_score 6 75.0 87.1 80.8 2.59 81.0 3.98 1.63 4.18 female low Z pain_score 6 65.4 73.1 69.6 2.78 69.8 2.72 1.11 2.85 Assumptions
Outliers
| gender | risk | treatment | id | pain_score | is.outlier | is.extreme |
|---|---|---|---|---|---|---|
| female | high | X | 57 | 68.4 | TRUE | TRUE |
| female | high | Y | 62 | 73.1 | TRUE | FALSE |
| female | high | Z | 67 | 75.0 | TRUE | FALSE |
| female | high | Z | 71 | 87.1 | TRUE | FALSE |
Since the pain_score is related to subjective, personal perception of pain, I will not delete all outliers but only the extreme one.
headache2<- headache %>% filter(id != 57)Normality
To check the assumption of normality, I’ll use the Shapiro-Wilk’s test for each combination of treatment, risk and gender.
headache2 %>%
group_by(treatment, gender, risk) %>%
shapiro_test(pain_score) %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))| gender | risk | treatment | variable | statistic | p |
|---|---|---|---|---|---|
| male | high | X | pain_score | 0.958 | 0.808 |
| male | low | X | pain_score | 0.982 | 0.962 |
| female | high | X | pain_score | 0.922 | 0.543 |
| female | low | X | pain_score | 0.933 | 0.600 |
| male | high | Y | pain_score | 0.902 | 0.384 |
| male | low | Y | pain_score | 0.920 | 0.507 |
| female | high | Y | pain_score | 0.939 | 0.654 |
| female | low | Y | pain_score | 0.927 | 0.555 |
| male | high | Z | pain_score | 0.955 | 0.784 |
| male | low | Z | pain_score | 0.924 | 0.535 |
| female | high | Z | pain_score | 0.971 | 0.901 |
| female | low | Z | pain_score | 0.958 | 0.801 |
For each row p > 0.05. That indicates the normal distribution of pain scores.
However, there’s other method available for testing the normality assumption - Quantile-Quantile plot. Let’s check its results as well.
In each group points form an approximately straight line. Again, we can conclude that normality assumption is fulfilled.
Homogeneity of variance
Performing a Levene’s test allows to asses whether variances within groups are homogenous:
headache2 %>%
levene_test(pain_score~treatment*gender*risk) %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))| df1 | df2 | statistic | p |
|---|---|---|---|
| 11 | 59 | 0.525 | 0.879 |
p>0.05 => variances are, indeed, homogenous.
Anova
focal variable is treatment moderator variables are gender and risk
results <- aov(data = headache2, pain_score ~gender*risk*treatment)
anova(results) %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 279.224 | 279.224 | 16.018 | 0.000 |
| risk | 1 | 1895.339 | 1895.339 | 108.729 | 0.000 |
| treatment | 2 | 369.748 | 184.874 | 10.606 | 0.000 |
| gender:risk | 1 | 0.062 | 0.062 | 0.004 | 0.953 |
| gender:treatment | 2 | 78.707 | 39.353 | 2.258 | 0.114 |
| risk:treatment | 2 | 44.590 | 22.295 | 1.279 | 0.286 |
| gender:risk:treatment | 2 | 213.453 | 106.726 | 6.123 | 0.004 |
| Residuals | 59 | 1028.470 | 17.432 | NA | NA |
report(results)## The ANOVA (formula: pain_score ~ gender * risk * treatment) suggests that:
##
## - The main effect of gender is statistically significant and large (F(1, 59) = 16.02, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
## - The main effect of risk is statistically significant and large (F(1, 59) = 108.73, p < .001; Eta2 (partial) = 0.65, 95% CI [0.53, 1.00])
## - The main effect of treatment is statistically significant and large (F(2, 59) = 10.61, p < .001; Eta2 (partial) = 0.26, 95% CI [0.11, 1.00])
## - The interaction between gender and risk is statistically not significant and very small (F(1, 59) = 3.55e-03, p = 0.953; Eta2 (partial) = 6.02e-05, 95% CI [0.00, 1.00])
## - The interaction between gender and treatment is statistically not significant and medium (F(2, 59) = 2.26, p = 0.114; Eta2 (partial) = 0.07, 95% CI [0.00, 1.00])
## - The interaction between risk and treatment is statistically not significant and small (F(2, 59) = 1.28, p = 0.286; Eta2 (partial) = 0.04, 95% CI [0.00, 1.00])
## - The interaction between gender, risk and treatment is statistically significant and large (F(2, 59) = 6.12, p = 0.004; Eta2 (partial) = 0.17, 95% CI [0.04, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
As stated in the report created on the basis of anova results, there was a statistically significant three-way interaction between gender, risk and treatment.
Post-hoc tests
If there is a significant 3-way interaction effect, you can decompose it into:
- Simple two-way interaction: run two-way interaction at each level of third variable,
- Simple simple main effect: run one-way model at each level of second variable,
- Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
Two-way interactions
| gender | Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|---|
| male | risk | 1 | 59 | 55.54 | 0.000 |
|
0.485 |
| male | treatment | 2 | 59 | 11.29 | 0.000 |
|
0.277 |
| male | risk:treatment | 2 | 59 | 5.83 | 0.005 |
|
0.165 |
| female | risk | 1 | 59 | 54.17 | 0.000 |
|
0.479 |
| female | treatment | 2 | 59 | 1.57 | 0.216 | 0.051 | |
| female | risk:treatment | 2 | 59 | 1.57 | 0.216 | 0.051 |
Main effects
| gender | risk | Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|---|---|
| male | high | treatment | 2 | 59 | 16.390 | 0.000 |
|
0.357 |
| male | low | treatment | 2 | 59 | 0.732 | 0.485 | 0.024 | |
| female | high | treatment | 2 | 59 | 0.004 | 0.996 | 0.000 | |
| female | low | treatment | 2 | 59 | 3.140 | 0.051 | 0.096 |
Pairwise comparisons
| gender | risk | term | .y. | group1 | group2 | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|
| male | high | treatment | pain_score | X | Y | 0.000 | *** |
| male | high | treatment | pain_score | X | Z | 0.000 | **** |
| male | high | treatment | pain_score | Y | Z | 0.897 | ns |
| gender | risk | treatment | emmean | se | df | conf.low | conf.high | method |
|---|---|---|---|---|---|---|---|---|
| male | high | X | 92.7 | 1.8 | 60 | 89.1 | 96.3 | Emmeans test |
| male | high | Y | 82.3 | 1.8 | 60 | 78.8 | 85.9 | Emmeans test |
| male | high | Z | 79.7 | 1.8 | 60 | 76.1 | 83.3 | Emmeans test |
When considering male at high risk of migraine, difference between treatments X and Y and also X and Z is statistically significant. The difference between Y and Z is not statistically significant because of p.adj that equals almost 0.897.