BS Anova 2
2-way Anova
Two-way ANOVA test hypotheses
There is no difference in the means of factor A
There is no difference in means of factor B
There is no interaction between factors A and B
The alternative hypothesis for cases 1 and 2 is: the means are not equal.
The alternative hypothesis for case 3 is: there is an interaction between A and B.
Data
In this report we will the jobsatisfaction dataset [datarium package], which contains the job satisfaction score organized by gender and education levels.
In this study, a research wants to evaluate if there is a significant two-way interaction between gender and education_level on explaining the job satisfaction score.
An interaction effect occurs when the effect of one independent variable on an outcome variable depends on the level of the other independent variables. If an interaction effect does not exist, main effects could be reported.
Descriptive statistics
Before running a model, you should always plot the data, to check that your assumptions look okay.
Here are a couple plots you might generate while analyzing these data:
Create a box plot of the score by gender levels, colored by education levels:
The distributions within each cell look pretty wonky, but that’s not particularly surprising given the small sample size (n=58):
## gender education_level Freq
## 1 male school 9
## 2 female school 10
## 3 male college 9
## 4 female college 10
## 5 male university 10
## 6 female university 10
Compute the mean and the SD (standard deviation) of the score by groups:
## gender education_level mean_satisfaction sd_satisfaction
## 1 male school 5.43 0.364
## 2 male college 6.22 0.340
## 3 male university 9.29 0.445
## 4 female school 5.74 0.474
## 5 female college 6.46 0.475
## 6 female university 8.41 0.938
Assumptions
Outliers
Identify outliers in each cell design: because Anova makes assumption that there are no significant outliers in any cell of the design
x <- jobsatisfaction %>%
group_by(gender, education_level) %>%
identify_outliers(score)
data.frame(x)## [1] gender education_level id score
## [5] is.outlier is.extreme
## <0 wierszy> (lub 'row.names' o zerowej długości)
In our case there are no outliers.
If outliers were extreme outliers were detected, what can be due to: 1) data entry errors, measurement errors or unusual values.
Yo can include the outliers in the analysis anyway if you do not believe the result will be substantially affected. This can be evaluated by comparing the result of the ANOVA test with and without the outlier.
It’s also possible to keep the outliers in the data and perform robust ANOVA test using the WRS2 package.
Normality
Check normality assumption by analyzing the model residuals. QQ plot and Shapiro-Wilk test of normality are used.
# Compute Shapiro-Wilk test of normality
data.frame(shapiro_test(residuals(model)))## variable statistic p.value
## 1 residuals(model) 0.968 0.127
jobsatisfaction %>%
group_by(education_level, gender) %>%
shapiro_test(score)## # A tibble: 6 x 5
## gender education_level variable statistic p
## <fct> <fct> <chr> <dbl> <dbl>
## 1 male school score 0.980 0.966
## 2 female school score 0.963 0.819
## 3 male college score 0.958 0.779
## 4 female college score 0.963 0.819
## 5 male university score 0.916 0.323
## 6 female university score 0.950 0.674
Shapiro-Wilk test as well as qqplots show us that all groups as well as all results combined, without division on groups, are normally distributed.
Homogeneity of variance
This can be checked using the Levene’s test:
leveneTest(score~education_level*gender, data=jobsatisfaction)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 5 2.2 0.069 .
## 52
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.
Anova
res.aov <- jobsatisfaction %>% anova_test(score ~ gender * education_level)
res.aov## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05
## 1 gender 1 52 0.745 0.3920000000000000150990331
## 2 education_level 2 52 187.892 0.0000000000000000000000016 *
## 3 gender:education_level 2 52 7.338 0.0020000000000000000416334 *
## ges
## 1 0.014
## 2 0.878
## 3 0.220
There was a statistically significant interaction between gender and level of education for job satisfaction score.
In this example, the effect of education_level is our focal variable, that is our primary concern.
It is thought that the effect of education_level will depend on one other factor, gender, which are called a moderator variable.
Post-hoc tests
We only use comparison tests when we reject the null hypothesis in Anova. So in our case I will use it.
Pairwise comparisons
Post-hoct tests A significant two-way interaction indicates that the impact that one factor (e.g., education_level) has on the outcome variable (e.g., job satisfaction score) depends on the level of the other factor (e.g., gender) (and vice versa). So, you can decompose a significant two-way interaction into:
Simple main effect: run one-way model of the first variable at each level of the second variable, Simple pairwise comparisons: if the simple main effect is significant, run multiple pairwise comparisons to determine which groups are different.
For a non-significant two-way interaction, you need to determine whether you have any statistically significant main effects from the ANOVA output. A significant main effect can be followed up by pairwise comparisons between groups.
According to instruction from above we will perform post-hoc test for significant two-way interaction #### Simple main effects
model <- lm(score ~ gender * education_level, data = jobsatisfaction)
jobsatisfaction %>%
group_by(gender) %>%
anova_test(score ~ education_level, error = model)## # A tibble: 2 x 8
## gender Effect DFn DFd F p `p<.05` ges
## * <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male education_level 2 52 132. 3.92e-21 * 0.836
## 2 female education_level 2 52 62.8 1.35e-14 * 0.707
The simple main effect of “education_level” on job satisfaction score was statistically significant for both male and female (p < 0.0001).
In other words, there is a statistically significant difference in mean job satisfaction score between males educated to either school, college or university level, F(2, 52) = 132, p < 0.0001. The same conclusion holds true for females, F(2, 52) = 62.8, p < 0.0001.
Multiple pairwise comparisons
A statistically significant simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. We’ll now perform multiple pairwise comparisons between the different education_level groups by gender.
I will run and interpret all possible pairwise comparisons using a Bonferroni adjustment. This can be easily done using the function emmeans_test() [rstatix package], a wrapper around the emmeans package. Emmeans stands for estimated marginal means (aka least square means or adjusted means).
Compare the score of the different education levels by gender levels:
# pairwise comparisons
library(emmeans)
pwc <- jobsatisfaction %>%
group_by(gender) %>%
emmeans_test(score ~ education_level, p.adjust.method = "bonferroni")
pwc## # A tibble: 6 x 10
## gender term .y. group1 group2 df statistic p p.adj
## * <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 female education_level score school college 52 -2.94 4.95e- 3 1.49e- 2
## 2 female education_level score school univer~ 52 -10.8 6.07e-15 1.82e-14
## 3 female education_level score college univer~ 52 -7.90 1.84e-10 5.52e-10
## 4 male education_level score school college 52 -3.07 3.37e- 3 1.01e- 2
## 5 male education_level score school univer~ 52 -15.3 6.87e-21 2.06e-20
## 6 male education_level score college univer~ 52 -12.1 8.42e-17 2.53e-16
## # ... with 1 more variable: p.adj.signif <chr>
There was a significant difference of job satisfaction score between all groups for both males and females (p < 0.05).
Summary of 2-way Anova results
A two-way ANOVA was conducted to examine the effects of gender and education level on job satisfaction score.
Residual analysis was performed to test for the assumptions of the two-way ANOVA. Outliers were assessed by box plot method, normality was assessed using Shapiro-Wilk’s normality test and homogeneity of variances was assessed by Levene’s test.
There were no extreme outliers, residuals were normally distributed (p > 0.05) and there was homogeneity of variances (p > 0.05).
There was a statistically significant interaction between gender and education level on job satisfaction score, F(2, 52) = 7.33, p = 0.0016, eta2[g] = 0.22.
Consequently, an analysis of simple main effects for education level was performed with statistical significance receiving a Bonferroni adjustment. There was a statistically significant difference in mean “job satisfaction” scores for both males (F(2, 52) = 132, p < 0.0001) and females (F(2, 52) = 62.8, p < 0.0001) educated to either school, college or university level.
All pairwise comparisons were analyzed between the different education_level groups organized by gender. There was a significant difference of Job Satisfaction score between all groups for both males and females (p < 0.05).
Number of stars shows how strong is the difference between pairs of groups.
3-Way ANOVA
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.
We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Descriptive statistics
computing mean and sd of all groups.
## # A tibble: 12 x 7
## gender risk treatment variable n mean sd
## <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 male high X pain_score 6 92.7 5.12
## 2 male high Y pain_score 6 82.3 5.00
## 3 male high Z pain_score 6 79.7 4.05
## 4 male low X pain_score 6 76.1 3.86
## 5 male low Y pain_score 6 73.1 4.76
## 6 male low Z pain_score 6 74.5 4.89
## 7 female high X pain_score 6 78.9 5.32
## 8 female high Y pain_score 6 81.2 4.62
## 9 female high Z pain_score 6 81.0 3.98
## 10 female low X pain_score 6 74.2 3.69
## 11 female low Y pain_score 6 68.4 4.08
## 12 female low Z pain_score 6 69.8 2.72
Visualization
Assumptions
Outliers
## # A tibble: 4 x 7
## gender risk treatment id pain_score is.outlier is.extreme
## <fct> <fct> <fct> <int> <dbl> <lgl> <lgl>
## 1 female high X 57 68.4 TRUE TRUE
## 2 female high Y 62 73.1 TRUE FALSE
## 3 female high Z 67 75.0 TRUE FALSE
## 4 female high Z 71 87.1 TRUE FALSE
Normality
## variable statistic p.value
## 1 residuals(model) 0.982 0.398
From results of Shapiro-Wilk (0 = 0.4) test as well as qqplot we can assume normality.
Checking normality assumprions by groups
## # A tibble: 12 x 6
## gender risk treatment variable statistic p
## <fct> <fct> <fct> <chr> <dbl> <dbl>
## 1 male high X pain_score 0.958 0.808
## 2 male high Y pain_score 0.902 0.384
## 3 male high Z pain_score 0.955 0.784
## 4 male low X pain_score 0.982 0.962
## 5 male low Y pain_score 0.920 0.507
## 6 male low Z pain_score 0.924 0.535
## 7 female high X pain_score 0.714 0.00869
## 8 female high Y pain_score 0.939 0.654
## 9 female high Z pain_score 0.971 0.901
## 10 female low X pain_score 0.933 0.600
## 11 female low Y pain_score 0.927 0.555
## 12 female low Z pain_score 0.958 0.801
From results we see that only females at high risk that were using treatment X are not normally distributed.
Homogeneity of variance
x <- headache %>% levene_test(pain_score ~ gender*risk*treatment)
data.frame(x)## df1 df2 statistic p
## 1 11 60 0.179 0.998
Result of Lavene’s test is not significant therefore we can assume homogeneity of variances.
Anova
res.aov <- headache %>% anova_test(pain_score ~ gender*risk*treatment)
res.aov## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 gender 1 60 16.196 0.000163000000000 * 0.213
## 2 risk 1 60 92.699 0.000000000000088 * 0.607
## 3 treatment 2 60 7.318 0.001000000000000 * 0.196
## 4 gender:risk 1 60 0.141 0.708000000000000 0.002
## 5 gender:treatment 2 60 3.338 0.042000000000000 * 0.100
## 6 risk:treatment 2 60 0.713 0.494000000000000 0.023
## 7 gender:risk:treatment 2 60 7.406 0.001000000000000 * 0.198
P-value at line 7 is equal to 0.001 it shows that there is significant tree-way interaction between gender, risk and treatement.
Post-hoc tests
If there is a significant 3-way interaction effect, you can decompose it into:
- Simple two-way interaction: run two-way interaction at each level of third variable,
- Simple simple main effect: run one-way model at each level of second variable,
- Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
Simple two-way interactions
# Group the data by gender and
# fit simple two-way interaction
model <- lm(pain_score ~ gender*risk*treatment, data = headache)
headache %>%
group_by(gender) %>%
anova_test(pain_score ~ risk*treatment, error = model)## # A tibble: 6 x 8
## gender Effect DFn DFd F p `p<.05` ges
## * <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male risk 1 60 50.0 0.00000000187 "*" 0.455
## 2 male treatment 2 60 10.2 0.000157 "*" 0.253
## 3 male risk:treatment 2 60 5.25 0.008 "*" 0.149
## 4 female risk 1 60 42.8 0.000000015 "*" 0.416
## 5 female treatment 2 60 0.482 0.62 "" 0.016
## 6 female risk:treatment 2 60 2.87 0.065 "" 0.087
There was a statistically significant simple two-way interaction between risk and treatment (risk:treatment) for males, F(2, 60) = 5.25, p = 0.008, but not for females, F(2, 60) = 2.87, p = 0.065.
For males, this result suggests that the effect of treatment on “pain_score” depends on one’s “risk” of migraine. In other words, the risk moderates the effect of the type of treatment on pain_score.
Simple Simple main effects
I was doing it only for males as this was the only simple two-way interaction that was statistically significant.
## # A tibble: 4 x 9
## gender risk Effect DFn DFd F p `p<.05` ges
## * <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male high treatment 2 60 14.8 0.0000061 "*" 0.33
## 2 male low treatment 2 60 0.66 0.521 "" 0.022
## 3 female high treatment 2 60 0.52 0.597 "" 0.017
## 4 female low treatment 2 60 2.83 0.067 "" 0.086
## gender risk Effect DFn DFd F p p<.05 ges
## 1 male high treatment 2 60 14.766 0.0000061 * 0.33
## 2 male low treatment 2 60 0.66 0.521 0.022
This analysis indicates that, the type of treatment taken has a statistically significant effect on pain_score in males who are at high risk.
In other words, the mean pain_score in the treatment X, Y and Z groups was statistically significantly different for males who at high risk, but not for males at low risk.
Simple Simple pairwise comparisons
A statistically significant simple simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different.
Compare the different treatments by gender and risk variables:
Pairwise comparison
# Pairwise comparisons
library(emmeans)
pwc <- headache %>%
group_by(gender, risk) %>%
emmeans_test(pain_score ~ treatment, p.adjust.method = "bonferroni") %>%
select(-df, -statistic, -p) # Remove details
# Show comparison results for male at high risk
pwc %>% filter(gender == "male", risk == "high")## # A tibble: 3 x 8
## gender risk term .y. group1 group2 p.adj p.adj.signif
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 male high treatment pain_score X Y 0.000386 ***
## 2 male high treatment pain_score X Z 0.00000942 ****
## 3 male high treatment pain_score Y Z 0.897 ns
Estimated marginal means
## # A tibble: 3 x 9
## gender risk treatment emmean se df conf.low conf.high method
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 male high X 92.7 1.80 60 89.1 96.3 Emmeans test
## 2 male high Y 82.3 1.80 60 78.7 85.9 Emmeans test
## 3 male high Z 79.7 1.80 60 76.1 83.3 Emmeans test
As we can see from pairwise comparison and estimated marginal means above there are a differences in means between X and Y, as well as X and Z, but not between Y and Z.