3-way Anova

3-Way ANOVA

The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.

We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.

We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.

Descriptive statistics

Before we do the 3 way ANOVA test, we can use first get a graphical insight of the data.

plot.design(pain_score ~ gender + risk + treatment, data = headache)

It looks like every factor will influence a lot for pain_score. The risk is the most one while treatment Y and Z looks almost same. We can see more detailed in the data below. And while other factor are same, treatment Y and Z truly closed.

mean(headache$pain_score)
## [1] 77.6
ms <- headache %>%
  group_by(gender, risk, treatment) %>%
  get_summary_stats(pain_score, type = "mean_sd")

ms %>% 
  kbl() %>%
  kable_material("striped") %>%
  column_spec(6, color = "white",
              background = spec_color(ms$mean[1:12], end = 0.7)) %>%
  column_spec(7, color = "white",
              background = spec_color(ms$sd[1:12], end = 0.7))
gender risk treatment variable n mean sd
male high X pain_score 6 92.7 5.12
male high Y pain_score 6 82.3 5.00
male high Z pain_score 6 79.7 4.05
male low X pain_score 6 76.1 3.85
male low Y pain_score 6 73.1 4.76
male low Z pain_score 6 74.5 4.89
female high X pain_score 6 78.9 5.32
female high Y pain_score 6 81.2 4.62
female high Z pain_score 6 81.0 3.98
female low X pain_score 6 74.2 3.69
female low Y pain_score 6 68.4 4.08
female low Z pain_score 6 69.8 2.72

Boxplot, divide by gender, show that in the same treatment the data looks in the similar trend:

ggplot(headache, aes(x = treatment, y = pain_score, color = risk)) +
  geom_boxplot() +
  facet_grid(gender ~ .) 

There are some outliers in the boxplot. The distributions within each cell look more wonky than theme of 2-way ANOVA, but that’s still not surprising given the small sample size (n=72):

xt <- xtabs( ~ treatment + risk + gender, data = headache)
xt %>%
  kbl() %>%
  kable_material()
treatment risk gender Freq
X high male 6
Y high male 6
Z high male 6
X low male 6
Y low male 6
Z low male 6
X high female 6
Y high female 6
Z high female 6
X low female 6
Y low female 6
Z low female 6

Assumptions

Outliers

headache %>%
  group_by(gender, risk, treatment) %>%
  identify_outliers(pain_score) %>% 
  kbl() %>%
  kable_material(c("striped", "hover")) %>%
  column_spec(6, color = "white",
              background = spec_color(c(2,3),begin = 0.4,end=0.9))%>%
  column_spec(7, color = "white",
              background = spec_color(c(2,3),begin = 0.4,end=0.9))
gender risk treatment id pain_score is.outlier is.extreme
female high X 57 68.4 TRUE TRUE
female high Y 62 73.1 TRUE FALSE
female high Z 67 75.0 TRUE FALSE
female high Z 71 87.1 TRUE FALSE

It exist 1 extreme outlier, we try to delete it firstly. Also it can be seen in the boxplot above.

headache = headache[-57,]
headache %>%
  group_by(gender, risk, treatment) %>%
  identify_outliers(pain_score)%>% 
  kbl() %>%
  kable_material(c("striped", "hover")) %>%
  column_spec(6, color = "white",
              background = spec_color(c(TRUE,FALSE),begin = 0.4,end=0.9))%>%
  column_spec(7, color = "white",
              background = spec_color(c(TRUE,FALSE),begin = 0.4,end=0.9))
gender risk treatment id pain_score is.outlier is.extreme
female high X 56 82.7 TRUE FALSE
female high X 58 78.6 TRUE TRUE
female high Y 62 73.1 TRUE FALSE
female high Z 67 75.0 TRUE FALSE
female high Z 71 87.1 TRUE FALSE

But unfortunately it didn’t change a lot. However they all happened in the female with high risk and treatment X. And it only contains 6 samples. So I choose to include outliers in the analysis and firmly believe that they will not have a substantial impact on the results. Only declude the value of 68.4.

Normality

sp <- headache %>%
  group_by(gender, risk, treatment) %>%
  shapiro_test(pain_score)
sp %>%
  kbl() %>%
  kable_material(c("striped", "hover")) %>%
  column_spec(6, color = spec_color(sp$p[1:12],end=0.8))
gender risk treatment variable statistic p
male high X pain_score 0.958 0.808
male high Y pain_score 0.902 0.384
male high Z pain_score 0.955 0.784
male low X pain_score 0.982 0.962
male low Y pain_score 0.920 0.507
male low Z pain_score 0.924 0.535
female high X pain_score 0.922 0.543
female high Y pain_score 0.939 0.654
female high Z pain_score 0.971 0.901
female low X pain_score 0.933 0.600
female low Y pain_score 0.927 0.555
female low Z pain_score 0.958 0.801

The pain_score were normally distributed, because p > 0.05 for each cell which means cannot reject H0 in Shapiro-Wilk’s test of normality. We also can see it in the picture below, values are approximately along the reference line.

ggqqplot(headache, "pain_score") +
  facet_grid(risk~gender+treatment)

Homogeneity of variance

we can use Levene’s test:

levene_test(pain_score ~ gender * risk * treatment, data = headache)%>%
  kbl() %>%
  kable_material()
df1 df2 statistic p
11 59 0.525 0.879

p-value > 0.05: it means that there is no significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in different groups are equal.

We can also see it in the plot, there is no evident relationships between residuals and fitted values (the mean of each groups). So, we can assume the homogeneity of variances.

model<-lm(pain_score~gender*risk*treatment,data=headache)
plot(model, 1)

Anova

In the case, treatment is the focal factor. It will depend on gender and risk which are moderator variable.

results <- aov(pain_score ~ gender * risk * treatment, data = headache)
anova(results)
## Analysis of Variance Table
## 
## Response: pain_score
##                       Df Sum Sq Mean Sq F value             Pr(>F)    
## gender                 1    279     279   16.02            0.00018 ***
## risk                   1   1895    1895  108.73 0.0000000000000052 ***
## treatment              2    370     185   10.61            0.00012 ***
## gender:risk            1      0       0    0.00            0.95268    
## gender:treatment       2     79      39    2.26            0.11357    
## risk:treatment         2     45      22    1.28            0.28592    
## gender:risk:treatment  2    213     107    6.12            0.00384 ** 
## Residuals             59   1028      17                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rep<-report(results)

The ANOVA (formula: pain_score ~ gender * risk * treatment) suggests that:

  • The main effect of gender is statistically significant and large (F(1, 59) = 16.02, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
  • The main effect of risk is statistically significant and large (F(1, 59) = 108.73, p < .001; Eta2 (partial) = 0.65, 95% CI [0.53, 1.00])
  • The main effect of treatment is statistically significant and large (F(2, 59) = 10.61, p < .001; Eta2 (partial) = 0.26, 95% CI [0.11, 1.00])
  • The interaction between gender and risk is statistically not significant and very small (F(1, 59) = 3.55e-03, p = 0.953; Eta2 (partial) = 6.02e-05, 95% CI [0.00, 1.00])
  • The interaction between gender and treatment is statistically not significant and medium (F(2, 59) = 2.26, p = 0.114; Eta2 (partial) = 0.07, 95% CI [0.00, 1.00])
  • The interaction between risk and treatment is statistically not significant and small (F(2, 59) = 1.28, p = 0.286; Eta2 (partial) = 0.04, 95% CI [0.00, 1.00])
  • The interaction between gender, risk and treatment is statistically significant and large (F(2, 59) = 6.12, p = 0.004; Eta2 (partial) = 0.17, 95% CI [0.04, 1.00])

Effect sizes were labelled following Field’s (2013) recommendations.

By using the ‘anova_test’, the asterisk represents the interaction effect and the main effect of each variable (and all lower-order interactions).

results2 <- anova_test(pain_score ~ gender * risk * treatment, data = headache)
## Coefficient covariances computed by hccm()
results2
## ANOVA Table (type II tests)
## 
##                  Effect DFn DFd       F                   p p<.05       ges
## 1                gender   1  59  13.921 0.00043000000000000     * 0.1910000
## 2                  risk   1  59 109.707 0.00000000000000441     * 0.6500000
## 3             treatment   2  59  10.607 0.00011600000000000     * 0.2640000
## 4           gender:risk   1  59   0.002 0.96799999999999997       0.0000271
## 5      gender:treatment   2  59   2.185 0.12200000000000000       0.0690000
## 6        risk:treatment   2  59   1.279 0.28599999999999998       0.0420000
## 7 gender:risk:treatment   2  59   6.123 0.00400000000000000     * 0.1720000

From the above ANOVA table, it can be seen that there are significant differences between groups (p = 0.016), which are highlighted with “*“.

Post-hoc tests

If there is a significant 3-way interaction effect, you can decompose it into:

  • Simple two-way interaction: run two-way interaction at each level of third variable,
  • Simple simple main effect: run one-way model at each level of second variable,
  • Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.

If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.

Two-way interactions

ANOVA tells us if there are differences among group means, but not what the differences are. To find out which groups are statistically different from one another, we can perform a Tukey’s Honestly Significant Difference (Tukey’s HSD) post-hoc test for pairwise comparisons:

tukey.results<-TukeyHSD(results)

Let’s focus on two-way interaction It is obviously and easily to find that only when it comes to treatment Y and Z, the p>0.05. There is not a significant difference between them. It is the same with what we assume in Descriptive statistics after plot.design.

tukey.results[4]%>%
  kbl() %>%
  kable_minimal()
diff lwr upr p adj
female:high-male:high -3.76 -7.49 -0.026 0.048
male:low-male:high -10.28 -13.96 -6.598 0.000
female:low-male:high -14.15 -17.83 -10.475 0.000
male:low-female:high -6.52 -10.25 -2.785 0.000
female:low-female:high -10.39 -14.13 -6.662 0.000
female:low-male:low -3.88 -7.56 -0.198 0.035
tukey.results[5]%>%
  kbl() %>%
  kable_minimal()
diff lwr upr p adj
female:X-male:X -6.820 -11.95 -1.69 0.003
male:Y-male:X -6.654 -11.67 -1.63 0.003
female:Y-male:X -9.774 -14.79 -4.75 0.000
male:Z-male:X -7.326 -12.35 -2.31 0.001
female:Z-male:X -9.135 -14.16 -4.12 0.000
male:Y-female:X 0.166 -4.97 5.30 1.000
female:Y-female:X -2.954 -8.09 2.18 0.540
male:Z-female:X -0.506 -5.64 4.63 1.000
female:Z-female:X -2.315 -7.45 2.82 0.768
female:Y-male:Y -3.120 -8.14 1.90 0.455
male:Z-male:Y -0.672 -5.69 4.35 0.999
female:Z-male:Y -2.481 -7.50 2.54 0.693
male:Z-female:Y 2.448 -2.57 7.47 0.705
female:Z-female:Y 0.639 -4.38 5.66 0.999
female:Z-male:Z -1.809 -6.83 3.21 0.894
tukey.results[6]%>%
  kbl() %>%
  kable_minimal()
diff lwr upr p adj
low:X-high:X -12.01 -17.146 -6.880 0.000
high:Y-high:X -5.38 -10.514 -0.247 0.035
low:Y-high:X -16.43 -21.565 -11.298 0.000
high:Z-high:X -6.78 -11.914 -1.647 0.003
low:Z-high:X -15.06 -20.197 -9.931 0.000
high:Y-low:X 6.63 1.612 11.653 0.003
low:Y-low:X -4.42 -9.439 0.602 0.115
high:Z-low:X 5.23 0.212 10.253 0.036
low:Z-low:X -3.05 -8.071 1.969 0.480
low:Y-high:Y -11.05 -16.072 -6.031 0.000
high:Z-high:Y -1.40 -6.421 3.620 0.962
low:Z-high:Y -9.68 -14.704 -4.663 0.000
high:Z-low:Y 9.65 4.630 14.671 0.000
low:Z-low:Y 1.37 -3.653 6.388 0.966
low:Z-high:Z -8.28 -13.304 -3.263 0.000

Main effects

As we can see in the ANOVA table, the main effects of risk are most statistically significant.

model <- lm(pain_score ~ gender * risk * treatment, data = headache)
headache %>%
  group_by(gender,treatment) %>%
  anova_test(pain_score ~ risk, error = model)%>%
  kbl() %>%
  kable_material(c("striped", "hover"))
gender treatment Effect DFn DFd F p p<.05 ges
male X risk 1 59 47.92 0.000 * 0.448
male Y risk 1 59 14.57 0.000 * 0.198
male Z risk 1 59 4.70 0.034 * 0.074
female X risk 1 59 7.25 0.009 * 0.110
female Y risk 1 59 28.26 0.000 * 0.324
female Z risk 1 59 21.80 0.000 * 0.270
From the table above we can find each p<0.5, it means, in each gender with each treatment, there is a statistically significant difference between risk low and risk high.

Compare the score of the different risk by gender and treatment levels:

pwc <- headache %>% 
  group_by(gender,treatment) %>%
  emmeans_test(pain_score ~ risk, p.adjust.method = "bonferroni") 
pwc%>%
  kbl() %>%
  kable_material(c("striped", "hover"))
gender treatment term .y. group1 group2 df statistic p p.adj p.adj.signif
female X risk pain_score high low 59 2.69 0.009 0.009 **
female Y risk pain_score high low 59 5.32 0.000 0.000 ****
female Z risk pain_score high low 59 4.67 0.000 0.000 ****
male X risk pain_score high low 59 6.92 0.000 0.000 ****
male Y risk pain_score high low 59 3.82 0.000 0.000 ***
male Z risk pain_score high low 59 2.17 0.034 0.034 *
There was a significant difference of pain score between all groups for each gender with each treatment (p < 0.05).

Pairwise comparisons

From the table of result below, we can easily find that in the pairwise samples, there is significantly difference between risk high and low,treatment X and Y, X and Z. But not significantly in treatment Y and Z. It is also the same with assumption we make after plot.design at the beginning.

pairwise <- headache %>%
  pairwise_t_test(
    pain_score ~ treatment, 
    p.adjust.method = "bonferroni"
    )
pairwise%>%
  kbl() %>%
  kable_material(c("striped", "hover"))
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
pain_score X Y 23 24 0.029 * 0.086 ns
pain_score X Z 23 24 0.028 * 0.084 ns
pain_score Y Z 24 24 0.994 ns 1.000 ns
pairwise1 <- headache %>%
  pairwise_t_test(
    pain_score ~ risk, 
    p.adjust.method = "bonferroni"
    )
pairwise1%>%
  kbl() %>%
  kable_material(c("striped", "hover"))
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
pain_score high low 35 36 0 **** 0 ****

Summary

bxp <- ggplot(headache, aes(x = risk, y = pain_score, color = gender)) +
  geom_boxplot() +
  facet_grid(treatment ~ .) 
pwc <- pwc %>% add_xy_position(x="gender")
bxp + 
  stat_pvalue_manual(pwc) +
  labs(
    subtitle=get_test_label(results2,detailed=TRUE),
    caption=get_pwc_label(pwc)
  )

A three-way ANOVA was conducted to examine the effects of gender, risk and treatment on pain score.

Residual analysis was performed to test for the assumptions of the three-way ANOVA. Outliers were assessed by box plot method, normality was assessed using Shapiro-Wilk’s normality test and homogeneity of variances was assessed by Levene’s test.

There was an extreme outliers, and delete only once. residuals were normally distributed (p > 0.05) and there was homogeneity of variances (p > 0.05).

There was a statistically significant interaction between three factors on pain score, F(2, 59) = 6.12, p = 0.004, eta2[g] = 0.17.

Consequently, an analysis of simple main effects for risk was performed with statistical significance receiving a Bonferroni adjustment. There was a statistically significant difference in mean pain scores between all groups for each gender with each treatment.

All pairwise comparisons were analyzed between the different risk groups organized by gender and treatment. There was a significant difference of pain score between all groups(p < 0.05). And after analyzed between the different treatment groups organized by gender and risk, we find there is no a significant difference between treatment Y and Z(p>0.05).