3-way Anova
3-Way ANOVA
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.
We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Descriptive statistics
Let’s print headache dataset:
## # A tibble: 72 x 5
## id gender risk treatment pain_score
## <int> <fct> <fct> <fct> <dbl>
## 1 1 male low X 79.3
## 2 2 male low X 76.8
## 3 3 male low X 70.8
## 4 4 male low X 81.2
## 5 5 male low X 75.1
## 6 6 male low X 73.1
## 7 7 male low Y 68.2
## 8 8 male low Y 80.7
## 9 9 male low Y 75.3
## 10 10 male low Y 73.4
## # ... with 62 more rows
Now see a summary of headache dataset with regards to pain_score.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 63.7 72.9 77.9 77.6 81.5 100.0
Then, get stats of each treatment method.
## # A tibble: 3 x 8
## treatment variable n min max median mean sd
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 X pain_score 24 68.4 100 79.0 80.5 8.57
## 2 Y pain_score 24 63.7 91.2 76.4 76.3 7.31
## 3 Z pain_score 24 65.4 87.1 75.4 76.2 5.88
Also, show gender, risk, treatment and corresponding, more precise stats.
## # A tibble: 12 x 12
## gender risk treatment variable n min max median mean sd se
## <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 male high X pain_score 6 86.3 100 93.4 92.7 5.12 2.09
## 2 male high Y pain_score 6 77.5 91.2 81.2 82.3 5.00 2.04
## 3 male high Z pain_score 6 74.4 85.1 80.4 79.7 4.05 1.65
## 4 male low X pain_score 6 70.8 81.2 75.9 76.1 3.86 1.57
## 5 male low Y pain_score 6 67.9 80.7 73.4 73.1 4.76 1.94
## 6 male low Z pain_score 6 68.3 80.4 74.8 74.5 4.89 2.00
## 7 female high X pain_score 6 68.4 82.7 81.1 78.9 5.32 2.17
## 8 female high Y pain_score 6 73.1 86.6 81.8 81.2 4.62 1.89
## 9 female high Z pain_score 6 75.0 87.1 80.8 81.0 3.98 1.63
## 10 female low X pain_score 6 68.6 78.1 74.6 74.2 3.69 1.51
## 11 female low Y pain_score 6 63.7 74.7 68.7 68.4 4.08 1.67
## 12 female low Z pain_score 6 65.4 73.1 69.6 69.8 2.72 1.11
## # ... with 1 more variable: ci <dbl>
Then check means and whiskers within males and females.
Assumptions
Outliers
## # A tibble: 4 x 7
## gender risk treatment id pain_score is.outlier is.extreme
## <fct> <fct> <fct> <int> <dbl> <lgl> <lgl>
## 1 female high X 57 68.4 TRUE TRUE
## 2 female high Y 62 73.1 TRUE FALSE
## 3 female high Z 67 75.0 TRUE FALSE
## 4 female high Z 71 87.1 TRUE FALSE
We have only one extreme outlier for a female (id=57) with a high risk of migraine taking drug X.
Normality
## # A tibble: 12 x 6
## gender risk treatment variable statistic p
## <fct> <fct> <fct> <chr> <dbl> <dbl>
## 1 male high X pain_score 0.958 0.808
## 2 male high Y pain_score 0.902 0.384
## 3 male high Z pain_score 0.955 0.784
## 4 male low X pain_score 0.982 0.962
## 5 male low Y pain_score 0.920 0.507
## 6 male low Z pain_score 0.924 0.535
## 7 female high X pain_score 0.714 0.00869
## 8 female high Y pain_score 0.939 0.654
## 9 female high Z pain_score 0.971 0.901
## 10 female low X pain_score 0.933 0.600
## 11 female low Y pain_score 0.927 0.555
## 12 female low Z pain_score 0.958 0.801
To test the normality I did use Shapiro-Wilk test. The only exception where pain score is not normally distributed is female with high risk treated with drug X.
Homogenity of variance
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 11 60 0.179 0.998
To test homogenity of variance I did use Levene Test. Since p > 0.05 variances are homogenous.
Anova
## Analysis of Variance Table
##
## Response: pain_score
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 313 313 16.20 0.00016 ***
## treatment 2 283 142 7.32 0.00143 **
## risk 1 1794 1794 92.70 0.000000000000088 ***
## gender:treatment 2 129 65 3.34 0.04220 *
## gender:risk 1 3 3 0.14 0.70849
## treatment:risk 2 28 14 0.71 0.49422
## gender:treatment:risk 2 287 143 7.41 0.00133 **
## Residuals 60 1161 19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is very significant three-way relation gender:treatment:risk.
Post-hoc tests
If there is a significant 3-way interaction effect, you can decompose it into:
- Simple two-way interaction: run two-way interaction at each level of third variable,
- Simple simple main effect: run one-way model at each level of second variable,
- Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
Two-way interactions
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## # A tibble: 6 x 8
## gender Effect DFn DFd F p `p<.05` ges
## * <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male risk 1 60 50.0 0.00000000187 "*" 0.455
## 2 male treatment 2 60 10.2 0.000157 "*" 0.253
## 3 male risk:treatment 2 60 5.25 0.008 "*" 0.149
## 4 female risk 1 60 42.8 0.000000015 "*" 0.416
## 5 female treatment 2 60 0.482 0.62 "" 0.016
## 6 female risk:treatment 2 60 2.87 0.065 "" 0.087
From the results we can see that risk is very significnt factor for both genders. Also because p<0.05 for mens within treatment and risk:treatment (risk has high infuence on the effect of treatment) these are significant factors.
Main effects
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## # A tibble: 4 x 9
## gender risk Effect DFn DFd F p `p<.05` ges
## * <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male high treatment 2 60 14.8 0.0000061 "*" 0.33
## 2 male low treatment 2 60 0.66 0.521 "" 0.022
## 3 female high treatment 2 60 0.52 0.597 "" 0.017
## 4 female low treatment 2 60 2.83 0.067 "" 0.086
It seems that for mens in high risk group chosen type of treatment is very influencial when it comes describing pain score.
Pairwise comparisons
## # A tibble: 3 x 9
## .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
## 1 pain_score X Y 24 24 0.0514 ns 0.154 ns
## 2 pain_score X Z 24 24 0.0505 ns 0.152 ns
## 3 pain_score Y Z 24 24 0.994 ns 1 ns
The difference between X:Y and X:Z is statistically significant (p value is very close to 0.05, but compared to p value for Y:Z which is close to 1 we should treat them as significant). On the other hand difference between Y and Z is not significant at all.