Statistical Inference

3-way Anova

3-Way ANOVA

The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.

We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.

We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.

Descriptive statistics

First, let’s observe some general things about our dataset.

summary_headache <- headache %>%
  group_by(gender, risk, treatment) %>%
  summarise(
    mean_pain_score = mean(pain_score, na.rm = TRUE),
    sd_pain_score = sd(pain_score, na.rm = TRUE),
    .groups = 'drop'
  )
print(summary_headache)

## # A tibble: 12 × 5
##    gender risk  treatment mean_pain_score sd_pain_score
##    <fct>  <fct> <fct>               <dbl>         <dbl>
##  1 male   high  X                    92.7          5.12
##  2 male   high  Y                    82.3          5.00
##  3 male   high  Z                    79.7          4.05
##  4 male   low   X                    76.1          3.85
##  5 male   low   Y                    73.1          4.77
##  6 male   low   Z                    74.5          4.89
##  7 female high  X                    78.9          5.32
##  8 female high  Y                    81.2          4.62
##  9 female high  Z                    81.0          3.98
## 10 female low   X                    74.2          3.69
## 11 female low   Y                    68.4          4.08
## 12 female low   Z                    69.8          2.72

ggplot(headache, aes(x = treatment, y = pain_score, fill = risk)) +
  geom_boxplot() +
  facet_wrap(gender ~ risk, scales = "free") +
  scale_fill_brewer(palette = "Pastel1") +
  labs(title = "Pain Score Across Treatments by Gender and Risk of Migraine",
       x = "Treatment",
       y = "Pain Score",
       fill = "Risk of Migraine") +
  theme_minimal()

As we can see, there are a few outliers, we would check that later on.

Assumptions

Outliers

outliers_test <- headache %>%
  group_by(gender, risk, treatment) %>%
  identify_outliers(pain_score)
outliers_test

## # A tibble: 4 × 7
##   gender risk  treatment    id pain_score is.outlier is.extreme
##   <fct>  <fct> <fct>     <int>      <dbl> <lgl>      <lgl>     
## 1 female high  X            57       68.4 TRUE       TRUE      
## 2 female high  Y            62       73.1 TRUE       FALSE     
## 3 female high  Z            67       75.0 TRUE       FALSE     
## 4 female high  Z            71       87.1 TRUE       FALSE

1 out of 4 outliers from this dataset is considered extreme.

Normality

To test normality, ggqqplot and Shapiro-Wilk test can be used. We will use both.

ggqqplot(headache, "pain_score") +
  facet_grid(gender ~ risk + treatment, scales = "free") + # Note the change in facet formula
  theme(legend.position = "none") +
  geom_qq_line() +
  geom_qq(color = "red") +
  labs(
    title = "Normal Q-Q Plots for Pain Scores",
    subtitle = "Grouped by Gender, Risk, and Treatment"
  )

As we can see, some outliers are still there and they affect the normality. Let’s use Shapiro-Wilk to prove this “hypothesis” ;)

normality_test <- headache %>%
  group_by(gender, risk, treatment) %>%
  shapiro_test(pain_score)
normality_test

## # A tibble: 12 × 6
##    gender risk  treatment variable   statistic       p
##    <fct>  <fct> <fct>     <chr>          <dbl>   <dbl>
##  1 male   high  X         pain_score     0.958 0.808  
##  2 male   high  Y         pain_score     0.902 0.384  
##  3 male   high  Z         pain_score     0.955 0.784  
##  4 male   low   X         pain_score     0.982 0.962  
##  5 male   low   Y         pain_score     0.920 0.507  
##  6 male   low   Z         pain_score     0.924 0.535  
##  7 female high  X         pain_score     0.714 0.00869
##  8 female high  Y         pain_score     0.939 0.654  
##  9 female high  Z         pain_score     0.971 0.901  
## 10 female low   X         pain_score     0.933 0.600  
## 11 female low   Y         pain_score     0.927 0.555  
## 12 female low   Z         pain_score     0.958 0.801

Exactly! The female individual with high risk of migraine on drug X has a p-value less than 0.05. Other groups appear to be normally distributed

Homogeneity of variance

Let’s test homogeneity of variance using levene test.

homogeneity_test <- headache %>%
  levene_test(pain_score ~ gender * treatment * risk)
homogeneity_test

## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1    11    60     0.179 0.998

P-value is higher that 0.05, which means that it is not significant. Conclusion: variances are homogenious

Anova

After checking all of the assumptions, we can finally use ANOVA.

anova_test <- aov(data = headache, pain_score ~ gender * treatment * risk)
anova(anova_test)

## Analysis of Variance Table
## 
## Response: pain_score
##                       Df Sum Sq Mean Sq F value            Pr(>F)    
## gender                 1    313     313   16.20           0.00016 ***
## treatment              2    283     142    7.32           0.00143 ** 
## risk                   1   1794    1794   92.70 0.000000000000088 ***
## gender:treatment       2    129      65    3.34           0.04220 *  
## gender:risk            1      3       3    0.14           0.70849    
## treatment:risk         2     28      14    0.71           0.49422    
## gender:treatment:risk  2    287     143    7.41           0.00133 ** 
## Residuals             60   1161      19                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

report(anova_test)

## The ANOVA (formula: pain_score ~ gender * treatment * risk) suggests that:
## 
##   - The main effect of gender is statistically significant and large (F(1, 60) =
## 16.20, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
##   - The main effect of treatment is statistically significant and large (F(2, 60)
## = 7.32, p = 0.001; Eta2 (partial) = 0.20, 95% CI [0.06, 1.00])
##   - The main effect of risk is statistically significant and large (F(1, 60) =
## 92.70, p < .001; Eta2 (partial) = 0.61, 95% CI [0.48, 1.00])
##   - The interaction between gender and treatment is statistically significant and
## medium (F(2, 60) = 3.34, p = 0.042; Eta2 (partial) = 0.10, 95% CI [2.01e-03,
## 1.00])
##   - The interaction between gender and risk is statistically not significant and
## very small (F(1, 60) = 0.14, p = 0.708; Eta2 (partial) = 2.35e-03, 95% CI
## [0.00, 1.00])
##   - The interaction between treatment and risk is statistically not significant
## and small (F(2, 60) = 0.71, p = 0.494; Eta2 (partial) = 0.02, 95% CI [0.00,
## 1.00])
##   - The interaction between gender, treatment and risk is statistically
## significant and large (F(2, 60) = 7.41, p = 0.001; Eta2 (partial) = 0.20, 95%
## CI [0.06, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

Based on the results of the ANOVA test, we can see that there that relations between pain score and gender, so as between pain score and risk are statistically significant. 3-way correlation is also statistically significant and large.

Post-hoc tests

If there is a significant 3-way interaction effect, you can decompose it into:

Simple two-way interaction: run two-way interaction at each level of third variable,
Simple simple main effect: run one-way model at each level of second variable,
Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.

If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.

Two-way interactions

l_model <- lm(pain_score ~ gender * risk * treatment, data = headache)
headache %>% 
  group_by(gender) %>%
  anova_test(pain_score ~ risk * treatment, error = l_model)

## # A tibble: 6 × 8
##   gender Effect           DFn   DFd      F             p `p<.05`   ges
## * <fct>  <chr>          <dbl> <dbl>  <dbl>         <dbl> <chr>   <dbl>
## 1 male   risk               1    60 50.0   0.00000000187 "*"     0.455
## 2 male   treatment          2    60 10.2   0.000157      "*"     0.253
## 3 male   risk:treatment     2    60  5.25  0.008         "*"     0.149
## 4 female risk               1    60 42.8   0.000000015   "*"     0.416
## 5 female treatment          2    60  0.482 0.62          ""      0.016
## 6 female risk:treatment     2    60  2.87  0.065         ""      0.087

It appears that their is a high significance for male patients with high risk of migraine, since the p-value is so low. Basically, the effectiveness of treatment on pain scores depends on risk.

Main effects

effects <- headache %>%
  group_by(gender, risk) %>%
  anova_test(pain_score ~ treatment, error = l_model)
effects

## # A tibble: 4 × 9
##   gender risk  Effect      DFn   DFd     F         p `p<.05`   ges
## * <fct>  <fct> <chr>     <dbl> <dbl> <dbl>     <dbl> <chr>   <dbl>
## 1 male   high  treatment     2    60 14.8  0.0000061 "*"     0.33 
## 2 male   low   treatment     2    60  0.66 0.521     ""      0.022
## 3 female high  treatment     2    60  0.52 0.597     ""      0.017
## 4 female low   treatment     2    60  2.83 0.067     ""      0.086

From the table we may conclude, that type of treatment is significant only for male individuals at a high risk of migraine.

Pairwise comparisons

pairwise_comp <- headache %>%
  group_by(gender, risk) %>%
  emmeans_test(pain_score ~ treatment, p.adjust.method = "bonferroni")

pairwise_comp %>% 
  filter(gender == "male", risk == "high")

## # A tibble: 3 × 11
##   gender risk  term      .y.       group1 group2    df statistic       p   p.adj
##   <fct>  <fct> <chr>     <chr>     <chr>  <chr>  <dbl>     <dbl>   <dbl>   <dbl>
## 1 male   high  treatment pain_sco… X      Y         60      4.09 1.29e-4 3.86e-4
## 2 male   high  treatment pain_sco… X      Z         60      5.14 3.14e-6 9.42e-6
## 3 male   high  treatment pain_sco… Y      Z         60      1.05 2.99e-1 8.97e-1
## # ℹ 1 more variable: p.adj.signif <chr>

Difference between Y-Z treatments are statistically insignificant, while X-Y and X-Z show significant difference.

Conclusion

After checking all of the assumptions and performing statistical tests, we may draw a conclusion: indeed, there is a significant 3-way interaction between gender, risk and treatment.