3-way Anova

The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.

We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.

We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.

Descriptive statistics

At first, let’s have a look on a headache dataset:

ids <- sample(1:nrow(headache), 8)
samp <- headache[ids, ]
kbl(samp) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
id gender risk treatment pain_score
31 male high Z 74.4
51 female low Z 65.4
14 male low Z 68.3
67 female high Z 75.0
42 female low X 78.1
50 female low Z 72.1
43 female low Y 70.1
70 female high Z 80.4

Then, we can check the summary of pain scores data:

summary(headache$pain_score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    63.7    72.9    77.9    77.6    81.5   100.0

Now we can examine how scores are distributed depending on used treatment method, risk of migraine and the gender of participants of the research. Then, let’s have a closer look on the means and quantiles of scores within the risk groups. To get more complete picture we will compute summary statistics subsequently for:

  1. each treatment group
treatment variable n mean sd
X pain_score 24 80.5 8.57
Y pain_score 24 76.3 7.31
Z pain_score 24 76.2 5.88
  1. subgroups created from each possible combination of risk, gender and treatment
    gender risk treatment variable n min max median iqr mean sd se ci
    male high X pain_score 6 86.3 100.0 93.4 6.40 92.7 5.12 2.09 5.37
    male low X pain_score 6 70.8 81.2 75.9 5.10 76.1 3.85 1.57 4.04
    female high X pain_score 6 68.4 82.7 81.1 2.29 78.9 5.32 2.17 5.58
    female low X pain_score 6 68.6 78.1 74.6 5.05 74.2 3.69 1.51 3.87
    male high Y pain_score 6 77.5 91.2 81.2 4.95 82.3 5.00 2.04 5.25
    male low Y pain_score 6 67.9 80.7 73.4 5.40 73.1 4.76 1.95 5.00
    female high Y pain_score 6 73.1 86.6 81.8 3.65 81.2 4.62 1.89 4.85
    female low Y pain_score 6 63.7 74.7 68.7 4.56 68.4 4.08 1.67 4.28
    male high Z pain_score 6 74.4 85.1 80.4 5.38 79.7 4.05 1.65 4.25
    male low Z pain_score 6 68.3 80.4 74.8 7.24 74.5 4.89 2.00 5.13
    female high Z pain_score 6 75.0 87.1 80.8 2.59 81.0 3.98 1.63 4.18
    female low Z pain_score 6 65.4 73.1 69.6 2.78 69.8 2.72 1.11 2.85

    Assumptions

Outliers

gender risk treatment id pain_score is.outlier is.extreme
female high X 57 68.4 TRUE TRUE
female high Y 62 73.1 TRUE FALSE
female high Z 67 75.0 TRUE FALSE
female high Z 71 87.1 TRUE FALSE

Since the pain_score is related to subjective, personal perception of pain, I will not delete all outliers but only the extreme one.

headache2<- headache %>% filter(id != 57)

Normality

To check the assumption of normality, I’ll use the Shapiro-Wilk’s test for each combination of treatment, risk and gender.

headache2 %>%
  group_by(treatment, gender, risk) %>%
  shapiro_test(pain_score) %>%
  kbl() %>%
   kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
gender risk treatment variable statistic p
male high X pain_score 0.958 0.808
male low X pain_score 0.982 0.962
female high X pain_score 0.922 0.543
female low X pain_score 0.933 0.600
male high Y pain_score 0.902 0.384
male low Y pain_score 0.920 0.507
female high Y pain_score 0.939 0.654
female low Y pain_score 0.927 0.555
male high Z pain_score 0.955 0.784
male low Z pain_score 0.924 0.535
female high Z pain_score 0.971 0.901
female low Z pain_score 0.958 0.801

For each row p > 0.05. That indicates the normal distribution of pain scores.

However, there’s other method available for testing the normality assumption - Quantile-Quantile plot. Let’s check its results as well.

In each group points form an approximately straight line. Again, we can conclude that normality assumption is fulfilled.

Homogeneity of variance

Performing a Levene’s test allows to asses whether variances within groups are homogenous:

headache2 %>% 
  levene_test(pain_score~treatment*gender*risk) %>%
  kbl() %>%
   kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
df1 df2 statistic p
11 59 0.525 0.879

p>0.05 => variances are, indeed, homogenous.

Anova

focal variable is treatment moderator variables are gender and risk

results <- aov(data = headache2, pain_score ~gender*risk*treatment)
anova(results) %>%
  kbl() %>%
   kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Df Sum Sq Mean Sq F value Pr(>F)
gender 1 279.224 279.224 16.018 0.000
risk 1 1895.339 1895.339 108.729 0.000
treatment 2 369.748 184.874 10.606 0.000
gender:risk 1 0.062 0.062 0.004 0.953
gender:treatment 2 78.707 39.353 2.258 0.114
risk:treatment 2 44.590 22.295 1.279 0.286
gender:risk:treatment 2 213.453 106.726 6.123 0.004
Residuals 59 1028.470 17.432 NA NA
report(results)
## The ANOVA (formula: pain_score ~ gender * risk * treatment) suggests that:
## 
##   - The main effect of gender is statistically significant and large (F(1, 59) = 16.02, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
##   - The main effect of risk is statistically significant and large (F(1, 59) = 108.73, p < .001; Eta2 (partial) = 0.65, 95% CI [0.53, 1.00])
##   - The main effect of treatment is statistically significant and large (F(2, 59) = 10.61, p < .001; Eta2 (partial) = 0.26, 95% CI [0.11, 1.00])
##   - The interaction between gender and risk is statistically not significant and very small (F(1, 59) = 3.55e-03, p = 0.953; Eta2 (partial) = 6.02e-05, 95% CI [0.00, 1.00])
##   - The interaction between gender and treatment is statistically not significant and medium (F(2, 59) = 2.26, p = 0.114; Eta2 (partial) = 0.07, 95% CI [0.00, 1.00])
##   - The interaction between risk and treatment is statistically not significant and small (F(2, 59) = 1.28, p = 0.286; Eta2 (partial) = 0.04, 95% CI [0.00, 1.00])
##   - The interaction between gender, risk and treatment is statistically significant and large (F(2, 59) = 6.12, p = 0.004; Eta2 (partial) = 0.17, 95% CI [0.04, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

As stated in the report created on the basis of anova results, there was a statistically significant three-way interaction between gender, risk and treatment.

Post-hoc tests

If there is a significant 3-way interaction effect, you can decompose it into:

  • Simple two-way interaction: run two-way interaction at each level of third variable,
  • Simple simple main effect: run one-way model at each level of second variable,
  • Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.

If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.

Two-way interactions

gender Effect DFn DFd F p p<.05 ges
male risk 1 59 55.54 0.000
0.485
male treatment 2 59 11.29 0.000
0.277
male risk:treatment 2 59 5.83 0.005
0.165
female risk 1 59 54.17 0.000
0.479
female treatment 2 59 1.57 0.216 0.051
female risk:treatment 2 59 1.57 0.216 0.051
The two-way interaction between risk and treatment for males was statistically significant (p = 0.005, p < 0.05). Therefore, risk strongly influences the outcome of the treatment (its effect on pain_score).

Main effects

gender risk Effect DFn DFd F p p<.05 ges
male high treatment 2 59 16.390 0.000
0.357
male low treatment 2 59 0.732 0.485 0.024
female high treatment 2 59 0.004 0.996 0.000
female low treatment 2 59 3.140 0.051 0.096
From the table above we can conclude that for men in a group of high risk chosen type of treatment has a statistically significant effect on pain_score. => there’s a statistically significant simple simple main effect of treatment for those individuals

Pairwise comparisons

gender risk term .y. group1 group2 p.adj p.adj.signif
male high treatment pain_score X Y 0.000 ***
male high treatment pain_score X Z 0.000 ****
male high treatment pain_score Y Z 0.897 ns
gender risk treatment emmean se df conf.low conf.high method
male high X 92.7 1.8 60 89.1 96.3 Emmeans test
male high Y 82.3 1.8 60 78.8 85.9 Emmeans test
male high Z 79.7 1.8 60 76.1 83.3 Emmeans test

When considering male at high risk of migraine, difference between treatments X and Y and also X and Z is statistically significant. The difference between Y and Z is not statistically significant because of p.adj that equals almost 0.897.