3-way Anova
3-Way ANOVA
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.
We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Descriptive statistics
Before we do the 3 way ANOVA test, we can use first get a graphical insight of the data.
plot.design(pain_score ~ gender + risk + treatment, data = headache) It looks like every factor will influence a lot for pain_score. The risk is the most one while treatment Y and Z looks almost same. We can see more detailed in the data below. And while other factor are same, treatment Y and Z truly closed.
mean(headache$pain_score)## [1] 77.6
ms <- headache %>%
group_by(gender, risk, treatment) %>%
get_summary_stats(pain_score, type = "mean_sd")
ms %>%
kbl() %>%
kable_material("striped") %>%
column_spec(6, color = "white",
background = spec_color(ms$mean[1:12], end = 0.7)) %>%
column_spec(7, color = "white",
background = spec_color(ms$sd[1:12], end = 0.7))| gender | risk | treatment | variable | n | mean | sd |
|---|---|---|---|---|---|---|
| male | high | X | pain_score | 6 | 92.7 | 5.12 |
| male | high | Y | pain_score | 6 | 82.3 | 5.00 |
| male | high | Z | pain_score | 6 | 79.7 | 4.05 |
| male | low | X | pain_score | 6 | 76.1 | 3.85 |
| male | low | Y | pain_score | 6 | 73.1 | 4.76 |
| male | low | Z | pain_score | 6 | 74.5 | 4.89 |
| female | high | X | pain_score | 6 | 78.9 | 5.32 |
| female | high | Y | pain_score | 6 | 81.2 | 4.62 |
| female | high | Z | pain_score | 6 | 81.0 | 3.98 |
| female | low | X | pain_score | 6 | 74.2 | 3.69 |
| female | low | Y | pain_score | 6 | 68.4 | 4.08 |
| female | low | Z | pain_score | 6 | 69.8 | 2.72 |
Boxplot, divide by gender, show that in the same treatment the data looks in the similar trend:
ggplot(headache, aes(x = treatment, y = pain_score, color = risk)) +
geom_boxplot() +
facet_grid(gender ~ .) There are some outliers in the boxplot. The distributions within each cell look more wonky than theme of 2-way ANOVA, but that’s still not surprising given the small sample size (n=72):
xt <- xtabs( ~ treatment + risk + gender, data = headache)
xt %>%
kbl() %>%
kable_material()| treatment | risk | gender | Freq |
|---|---|---|---|
| X | high | male | 6 |
| Y | high | male | 6 |
| Z | high | male | 6 |
| X | low | male | 6 |
| Y | low | male | 6 |
| Z | low | male | 6 |
| X | high | female | 6 |
| Y | high | female | 6 |
| Z | high | female | 6 |
| X | low | female | 6 |
| Y | low | female | 6 |
| Z | low | female | 6 |
Assumptions
Outliers
headache %>%
group_by(gender, risk, treatment) %>%
identify_outliers(pain_score) %>%
kbl() %>%
kable_material(c("striped", "hover")) %>%
column_spec(6, color = "white",
background = spec_color(c(2,3),begin = 0.4,end=0.9))%>%
column_spec(7, color = "white",
background = spec_color(c(2,3),begin = 0.4,end=0.9))| gender | risk | treatment | id | pain_score | is.outlier | is.extreme |
|---|---|---|---|---|---|---|
| female | high | X | 57 | 68.4 | TRUE | TRUE |
| female | high | Y | 62 | 73.1 | TRUE | FALSE |
| female | high | Z | 67 | 75.0 | TRUE | FALSE |
| female | high | Z | 71 | 87.1 | TRUE | FALSE |
It exist 1 extreme outlier, we try to delete it firstly. Also it can be seen in the boxplot above.
headache = headache[-57,]
headache %>%
group_by(gender, risk, treatment) %>%
identify_outliers(pain_score)%>%
kbl() %>%
kable_material(c("striped", "hover")) %>%
column_spec(6, color = "white",
background = spec_color(c(TRUE,FALSE),begin = 0.4,end=0.9))%>%
column_spec(7, color = "white",
background = spec_color(c(TRUE,FALSE),begin = 0.4,end=0.9))| gender | risk | treatment | id | pain_score | is.outlier | is.extreme |
|---|---|---|---|---|---|---|
| female | high | X | 56 | 82.7 | TRUE | FALSE |
| female | high | X | 58 | 78.6 | TRUE | TRUE |
| female | high | Y | 62 | 73.1 | TRUE | FALSE |
| female | high | Z | 67 | 75.0 | TRUE | FALSE |
| female | high | Z | 71 | 87.1 | TRUE | FALSE |
But unfortunately it didn’t change a lot. However they all happened in the female with high risk and treatment X. And it only contains 6 samples. So I choose to include outliers in the analysis and firmly believe that they will not have a substantial impact on the results. Only declude the value of 68.4.
Normality
sp <- headache %>%
group_by(gender, risk, treatment) %>%
shapiro_test(pain_score)
sp %>%
kbl() %>%
kable_material(c("striped", "hover")) %>%
column_spec(6, color = spec_color(sp$p[1:12],end=0.8))| gender | risk | treatment | variable | statistic | p |
|---|---|---|---|---|---|
| male | high | X | pain_score | 0.958 | 0.808 |
| male | high | Y | pain_score | 0.902 | 0.384 |
| male | high | Z | pain_score | 0.955 | 0.784 |
| male | low | X | pain_score | 0.982 | 0.962 |
| male | low | Y | pain_score | 0.920 | 0.507 |
| male | low | Z | pain_score | 0.924 | 0.535 |
| female | high | X | pain_score | 0.922 | 0.543 |
| female | high | Y | pain_score | 0.939 | 0.654 |
| female | high | Z | pain_score | 0.971 | 0.901 |
| female | low | X | pain_score | 0.933 | 0.600 |
| female | low | Y | pain_score | 0.927 | 0.555 |
| female | low | Z | pain_score | 0.958 | 0.801 |
The pain_score were normally distributed, because p > 0.05 for each cell which means cannot reject H0 in Shapiro-Wilk’s test of normality. We also can see it in the picture below, values are approximately along the reference line.
ggqqplot(headache, "pain_score") +
facet_grid(risk~gender+treatment)Homogeneity of variance
we can use Levene’s test:
levene_test(pain_score ~ gender * risk * treatment, data = headache)%>%
kbl() %>%
kable_material()| df1 | df2 | statistic | p |
|---|---|---|---|
| 11 | 59 | 0.525 | 0.879 |
p-value > 0.05: it means that there is no significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in different groups are equal.
We can also see it in the plot, there is no evident relationships between residuals and fitted values (the mean of each groups). So, we can assume the homogeneity of variances.
model<-lm(pain_score~gender*risk*treatment,data=headache)
plot(model, 1)Anova
In the case, treatment is the focal factor. It will depend on gender and risk which are moderator variable.
results <- aov(pain_score ~ gender * risk * treatment, data = headache)
anova(results)## Analysis of Variance Table
##
## Response: pain_score
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 279 279 16.02 0.00018 ***
## risk 1 1895 1895 108.73 0.0000000000000052 ***
## treatment 2 370 185 10.61 0.00012 ***
## gender:risk 1 0 0 0.00 0.95268
## gender:treatment 2 79 39 2.26 0.11357
## risk:treatment 2 45 22 1.28 0.28592
## gender:risk:treatment 2 213 107 6.12 0.00384 **
## Residuals 59 1028 17
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rep<-report(results)The ANOVA (formula: pain_score ~ gender * risk * treatment) suggests that:
- The main effect of gender is statistically significant and large (F(1, 59) = 16.02, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
- The main effect of risk is statistically significant and large (F(1, 59) = 108.73, p < .001; Eta2 (partial) = 0.65, 95% CI [0.53, 1.00])
- The main effect of treatment is statistically significant and large (F(2, 59) = 10.61, p < .001; Eta2 (partial) = 0.26, 95% CI [0.11, 1.00])
- The interaction between gender and risk is statistically not significant and very small (F(1, 59) = 3.55e-03, p = 0.953; Eta2 (partial) = 6.02e-05, 95% CI [0.00, 1.00])
- The interaction between gender and treatment is statistically not significant and medium (F(2, 59) = 2.26, p = 0.114; Eta2 (partial) = 0.07, 95% CI [0.00, 1.00])
- The interaction between risk and treatment is statistically not significant and small (F(2, 59) = 1.28, p = 0.286; Eta2 (partial) = 0.04, 95% CI [0.00, 1.00])
- The interaction between gender, risk and treatment is statistically significant and large (F(2, 59) = 6.12, p = 0.004; Eta2 (partial) = 0.17, 95% CI [0.04, 1.00])
Effect sizes were labelled following Field’s (2013) recommendations.
By using the ‘anova_test’, the asterisk represents the interaction effect and the main effect of each variable (and all lower-order interactions).
results2 <- anova_test(pain_score ~ gender * risk * treatment, data = headache)## Coefficient covariances computed by hccm()
results2## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 gender 1 59 13.921 0.00043000000000000 * 0.1910000
## 2 risk 1 59 109.707 0.00000000000000441 * 0.6500000
## 3 treatment 2 59 10.607 0.00011600000000000 * 0.2640000
## 4 gender:risk 1 59 0.002 0.96799999999999997 0.0000271
## 5 gender:treatment 2 59 2.185 0.12200000000000000 0.0690000
## 6 risk:treatment 2 59 1.279 0.28599999999999998 0.0420000
## 7 gender:risk:treatment 2 59 6.123 0.00400000000000000 * 0.1720000
From the above ANOVA table, it can be seen that there are significant differences between groups (p = 0.016), which are highlighted with “*“.
Post-hoc tests
If there is a significant 3-way interaction effect, you can decompose it into:
- Simple two-way interaction: run two-way interaction at each level of third variable,
- Simple simple main effect: run one-way model at each level of second variable,
- Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
Two-way interactions
ANOVA tells us if there are differences among group means, but not what the differences are. To find out which groups are statistically different from one another, we can perform a Tukey’s Honestly Significant Difference (Tukey’s HSD) post-hoc test for pairwise comparisons:
tukey.results<-TukeyHSD(results)Let’s focus on two-way interaction It is obviously and easily to find that only when it comes to treatment Y and Z, the p>0.05. There is not a significant difference between them. It is the same with what we assume in Descriptive statistics after plot.design.
tukey.results[4]%>%
kbl() %>%
kable_minimal()
|
tukey.results[5]%>%
kbl() %>%
kable_minimal()
|
tukey.results[6]%>%
kbl() %>%
kable_minimal()
|
Main effects
As we can see in the ANOVA table, the main effects of risk are most statistically significant.
model <- lm(pain_score ~ gender * risk * treatment, data = headache)
headache %>%
group_by(gender,treatment) %>%
anova_test(pain_score ~ risk, error = model)%>%
kbl() %>%
kable_material(c("striped", "hover"))| gender | treatment | Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|---|---|
| male | X | risk | 1 | 59 | 47.92 | 0.000 | * | 0.448 |
| male | Y | risk | 1 | 59 | 14.57 | 0.000 | * | 0.198 |
| male | Z | risk | 1 | 59 | 4.70 | 0.034 | * | 0.074 |
| female | X | risk | 1 | 59 | 7.25 | 0.009 | * | 0.110 |
| female | Y | risk | 1 | 59 | 28.26 | 0.000 | * | 0.324 |
| female | Z | risk | 1 | 59 | 21.80 | 0.000 | * | 0.270 |
Compare the score of the different risk by gender and treatment levels:
pwc <- headache %>%
group_by(gender,treatment) %>%
emmeans_test(pain_score ~ risk, p.adjust.method = "bonferroni")
pwc%>%
kbl() %>%
kable_material(c("striped", "hover"))| gender | treatment | term | .y. | group1 | group2 | df | statistic | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|---|---|
| female | X | risk | pain_score | high | low | 59 | 2.69 | 0.009 | 0.009 | ** |
| female | Y | risk | pain_score | high | low | 59 | 5.32 | 0.000 | 0.000 | **** |
| female | Z | risk | pain_score | high | low | 59 | 4.67 | 0.000 | 0.000 | **** |
| male | X | risk | pain_score | high | low | 59 | 6.92 | 0.000 | 0.000 | **** |
| male | Y | risk | pain_score | high | low | 59 | 3.82 | 0.000 | 0.000 | *** |
| male | Z | risk | pain_score | high | low | 59 | 2.17 | 0.034 | 0.034 | * |
Pairwise comparisons
From the table of result below, we can easily find that in the pairwise samples, there is significantly difference between risk high and low,treatment X and Y, X and Z. But not significantly in treatment Y and Z. It is also the same with assumption we make after plot.design at the beginning.
pairwise <- headache %>%
pairwise_t_test(
pain_score ~ treatment,
p.adjust.method = "bonferroni"
)
pairwise%>%
kbl() %>%
kable_material(c("striped", "hover"))| .y. | group1 | group2 | n1 | n2 | p | p.signif | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|
| pain_score | X | Y | 23 | 24 | 0.029 | * | 0.086 | ns |
| pain_score | X | Z | 23 | 24 | 0.028 | * | 0.084 | ns |
| pain_score | Y | Z | 24 | 24 | 0.994 | ns | 1.000 | ns |
pairwise1 <- headache %>%
pairwise_t_test(
pain_score ~ risk,
p.adjust.method = "bonferroni"
)
pairwise1%>%
kbl() %>%
kable_material(c("striped", "hover"))| .y. | group1 | group2 | n1 | n2 | p | p.signif | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|
| pain_score | high | low | 35 | 36 | 0 | **** | 0 | **** |
Summary
bxp <- ggplot(headache, aes(x = risk, y = pain_score, color = gender)) +
geom_boxplot() +
facet_grid(treatment ~ .)
pwc <- pwc %>% add_xy_position(x="gender")
bxp +
stat_pvalue_manual(pwc) +
labs(
subtitle=get_test_label(results2,detailed=TRUE),
caption=get_pwc_label(pwc)
) A three-way ANOVA was conducted to examine the effects of gender, risk and treatment on pain score.
Residual analysis was performed to test for the assumptions of the three-way ANOVA. Outliers were assessed by box plot method, normality was assessed using Shapiro-Wilk’s normality test and homogeneity of variances was assessed by Levene’s test.
There was an extreme outliers, and delete only once. residuals were normally distributed (p > 0.05) and there was homogeneity of variances (p > 0.05).
There was a statistically significant interaction between three factors on pain score, F(2, 59) = 6.12, p = 0.004, eta2[g] = 0.17.
Consequently, an analysis of simple main effects for risk was performed with statistical significance receiving a Bonferroni adjustment. There was a statistically significant difference in mean pain scores between all groups for each gender with each treatment.
All pairwise comparisons were analyzed between the different risk groups organized by gender and treatment. There was a significant difference of pain score between all groups(p < 0.05). And after analyzed between the different treatment groups organized by gender and risk, we find there is no a significant difference between treatment Y and Z(p>0.05).