title: “BS Anova 2” author: “Filip Ziętara” date: Published on Tuesday 04 January 2022 output: rmdformats::readthedown: highlight: kate toc_float: true toc_depth: 4 editor_options: chunk_output_type: console —
Two-way ANOVA test hypotheses - There is no difference in the means of factor A - There is no difference in means of factor B - There is no interaction between factors A and B The alternative hypothesis for cases 1 and 2 is: the means are not equal. The alternative hypothesis for case 3 is: there is an interaction between A and B. ## Data In this report we will the jobsatisfaction dataset [datarium package], which contains the job satisfaction score organized by gender and education levels. In this study, a research wants to evaluate if there is a significant two-way interaction between gender and education_level on explaining the job satisfaction score. An interaction effect occurs when the effect of one independent variable on an outcome variable depends on the level of the other independent variables. If an interaction effect does not exist, main effects could be reported. ## Descriptive statistics Before running a model, you should always plot the data, to check that your assumptions look okay. Here are a couple plots you might generate while analyzing these data:
ggplot(jobsatisfaction, aes(x=score)) +
geom_histogram(bins=10) +
facet_grid(gender ~ education_level) +
theme_classic()
Boxplot, to highlight the group means:
ggplot(jobsatisfaction, aes(y=score, x=education_level, fill = gender)) +
geom_boxplot() +
theme_classic()
The distributions within each cell look pretty wonky, but that’s not particularly surprising given the small sample size (n=58):
xtabs(~ gender + education_level, data = jobsatisfaction)
## education_level
## gender school college university
## male 9 9 10
## female 10 10 10
Compute the mean and the SD (standard deviation) of the score by groups:
jobsatisfaction %>% group_by(gender, education_level) %>% get_summary_stats(score, type="mean_sd")
## # A tibble: 6 x 6
## gender education_level variable n mean sd
## <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 male school score 9 5.43 0.364
## 2 male college score 9 6.22 0.34
## 3 male university score 10 9.29 0.445
## 4 female school score 10 5.74 0.474
## 5 female college score 10 6.46 0.475
## 6 female university score 10 8.41 0.938
Identify outliers in each cell design:
jobsatisfaction %>%
group_by(gender, education_level) %>%
identify_outliers(score)
## [1] gender education_level id score
## [5] is.outlier is.extreme
## <0 rows> (or 0-length row.names)
Check normality assumption by analyzing the model residuals. QQ plot and Shapiro-Wilk test of normality are used.
jobsatisfaction %>%
group_by(gender,education_level) %>%
shapiro_test(score)
## # A tibble: 6 x 5
## gender education_level variable statistic p
## <fct> <fct> <chr> <dbl> <dbl>
## 1 male school score 0.980 0.966
## 2 male college score 0.958 0.779
## 3 male university score 0.916 0.323
## 4 female school score 0.963 0.819
## 5 female college score 0.963 0.819
## 6 female university score 0.950 0.674
ggqqplot(jobsatisfaction, "score") + facet_grid(gender~education_level)
This can be checked using the Levene’s test:
jobsatisfaction %>%
levene_test(score~education_level*gender)
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 5 52 2.20 0.0686
plot(lm(score~gender*education_level, data=jobsatisfaction))
## Anova In this example, the effect of education_level is our focal variable, that is our primary concern. It is thought that the effect of education_level will depend on one other factor, gender, which are called a moderator variable.
results<-aov(score~gender*education_level, data=jobsatisfaction)
anova(results)
## Analysis of Variance Table
##
## Response: score
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 0.5 0.5 1.79 0.1871
## education_level 2 113.7 56.8 187.89 <0.0000000000000002 ***
## gender:education_level 2 4.4 2.2 7.34 0.0016 **
## Residuals 52 15.7 0.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
report(results)
## The ANOVA (formula: score ~ gender * education_level) suggests that:
##
## - The main effect of gender is statistically not significant and small (F(1, 52) = 1.79, p = 0.187; Eta2 (partial) = 0.03, 95% CI [0.00, 1.00])
## - The main effect of education_level is statistically significant and large (F(2, 52) = 187.89, p < .001; Eta2 (partial) = 0.88, 95% CI [0.83, 1.00])
## - The interaction between gender and education_level is statistically significant and large (F(2, 52) = 7.34, p = 0.002; Eta2 (partial) = 0.22, 95% CI [0.06, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
A significant two-way interaction indicates that the impact that one factor (e.g., education_level) has on the outcome variable (e.g., job satisfaction score) depends on the level of the other factor (e.g., gender) (and vice versa). So, you can decompose a significant two-way interaction into: - Simple main effect: run one-way model of the first variable at each level of the second variable, - Simple pairwise comparisons: if the simple main effect is significant, run multiple pairwise comparisons to determine which groups are different. For a non-significant two-way interaction, you need to determine whether you have any statistically significant main effects from the ANOVA output. A significant main effect can be followed up by pairwise comparisons between groups.
test <- jobsatisfaction %>% tukey_hsd(score~gender*education_level)
test
## # A tibble: 19 x 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 gender male female 0 -0.193 -0.483 0.0968 1.87e- 1
## 2 education~ school college 0 0.757 0.327 1.19 2.64e- 4
## 3 education~ school univers~ 0 3.25 2.83 3.68 0
## 4 education~ college univers~ 0 2.49 2.07 2.92 0
## 5 gender:ed~ male:sch~ female:~ 0 0.314 -0.433 1.06 8.13e- 1
## 6 gender:ed~ male:sch~ male:co~ 0 0.797 0.0296 1.56 3.75e- 2
## 7 gender:ed~ male:sch~ female:~ 0 1.04 0.289 1.78 1.92e- 3
## 8 gender:ed~ male:sch~ male:un~ 0 3.87 3.12 4.61 0
## 9 gender:ed~ male:sch~ female:~ 0 2.98 2.23 3.73 0
## 10 gender:ed~ female:s~ male:co~ 0 0.482 -0.265 1.23 4.09e- 1
## 11 gender:ed~ female:s~ female:~ 0 0.722 -0.00575 1.45 5.3 e- 2
## 12 gender:ed~ female:s~ male:un~ 0 3.55 2.82 4.28 0
## 13 gender:ed~ female:s~ female:~ 0 2.67 1.94 3.39 0
## 14 gender:ed~ male:col~ female:~ 0 0.240 -0.508 0.987 9.32e- 1
## 15 gender:ed~ male:col~ male:un~ 0 3.07 2.32 3.82 0
## 16 gender:ed~ male:col~ female:~ 0 2.18 1.43 2.93 1.86e-10
## 17 gender:ed~ female:c~ male:un~ 0 2.83 2.10 3.56 0
## 18 gender:ed~ female:c~ female:~ 0 1.94 1.22 2.67 2.73e- 9
## 19 gender:ed~ male:uni~ female:~ 0 -0.886 -1.61 -0.158 8.75e- 3
## # ... with 1 more variable: p.adj.signif <chr>
jobsatisfaction %>%
pairwise_t_test(score~education_level)
## # A tibble: 3 x 9
## .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
## 1 score school college 19 19 3.27e- 4 *** 3.27e- 4 ***
## 2 score school university 19 20 3.43e-23 **** 1.03e-22 ****
## 3 score college university 19 20 3.8 e-18 **** 7.6 e-18 ****
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable. We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine. We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
head(headache)
## # A tibble: 6 x 5
## id gender risk treatment pain_score
## <int> <fct> <fct> <fct> <dbl>
## 1 1 male low X 79.3
## 2 2 male low X 76.8
## 3 3 male low X 70.8
## 4 4 male low X 81.2
## 5 5 male low X 75.1
## 6 6 male low X 73.1
summary(headache)
## id gender risk treatment pain_score
## Min. : 1.0 male :36 high:36 X:24 Min. : 63.7
## 1st Qu.:18.8 female:36 low :36 Y:24 1st Qu.: 72.9
## Median :36.5 Z:24 Median : 77.9
## Mean :36.5 Mean : 77.6
## 3rd Qu.:54.2 3rd Qu.: 81.5
## Max. :72.0 Max. :100.0
ggplot(headache, aes(x=pain_score)) +
geom_histogram(bins=10) +
facet_grid(gender ~ risk ~ treatment) +
theme_classic()
headache %>% group_by(gender,risk,treatment) %>% get_summary_stats(pain_score, type="mean_sd")
## # A tibble: 12 x 7
## gender risk treatment variable n mean sd
## <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 male high X pain_score 6 92.7 5.12
## 2 male high Y pain_score 6 82.3 5.00
## 3 male high Z pain_score 6 79.7 4.05
## 4 male low X pain_score 6 76.1 3.86
## 5 male low Y pain_score 6 73.1 4.76
## 6 male low Z pain_score 6 74.5 4.89
## 7 female high X pain_score 6 78.9 5.32
## 8 female high Y pain_score 6 81.2 4.62
## 9 female high Z pain_score 6 81.0 3.98
## 10 female low X pain_score 6 74.2 3.69
## 11 female low Y pain_score 6 68.4 4.08
## 12 female low Z pain_score 6 69.8 2.72
headache %>%
group_by(gender, risk, treatment) %>%
identify_outliers(pain_score)
## # A tibble: 4 x 7
## gender risk treatment id pain_score is.outlier is.extreme
## <fct> <fct> <fct> <int> <dbl> <lgl> <lgl>
## 1 female high X 57 68.4 TRUE TRUE
## 2 female high Y 62 73.1 TRUE FALSE
## 3 female high Z 67 75.0 TRUE FALSE
## 4 female high Z 71 87.1 TRUE FALSE
headache %>%
group_by(gender,risk,treatment) %>%
shapiro_test(pain_score)
## # A tibble: 12 x 6
## gender risk treatment variable statistic p
## <fct> <fct> <fct> <chr> <dbl> <dbl>
## 1 male high X pain_score 0.958 0.808
## 2 male high Y pain_score 0.902 0.384
## 3 male high Z pain_score 0.955 0.784
## 4 male low X pain_score 0.982 0.962
## 5 male low Y pain_score 0.920 0.507
## 6 male low Z pain_score 0.924 0.535
## 7 female high X pain_score 0.714 0.00869
## 8 female high Y pain_score 0.939 0.654
## 9 female high Z pain_score 0.971 0.901
## 10 female low X pain_score 0.933 0.600
## 11 female low Y pain_score 0.927 0.555
## 12 female low Z pain_score 0.958 0.801
ggqqplot(headache, "pain_score") + facet_grid(gender~risk~treatment)
headache %>%
levene_test(pain_score~risk*treatment*gender)
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 11 60 0.179 0.998
plot(lm(pain_score~gender*risk*treatment, data=headache))
results<-aov(pain_score~gender*risk*treatment, data=headache)
anova(results)
## Analysis of Variance Table
##
## Response: pain_score
## Df Sum Sq Mean Sq F value Pr(>F)
## gender 1 313 313 16.20 0.00016 ***
## risk 1 1794 1794 92.70 0.000000000000088 ***
## treatment 2 283 142 7.32 0.00143 **
## gender:risk 1 3 3 0.14 0.70849
## gender:treatment 2 129 65 3.34 0.04220 *
## risk:treatment 2 28 14 0.71 0.49422
## gender:risk:treatment 2 287 143 7.41 0.00133 **
## Residuals 60 1161 19
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
report(results)
## The ANOVA (formula: pain_score ~ gender * risk * treatment) suggests that:
##
## - The main effect of gender is statistically significant and large (F(1, 60) = 16.20, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
## - The main effect of risk is statistically significant and large (F(1, 60) = 92.70, p < .001; Eta2 (partial) = 0.61, 95% CI [0.48, 1.00])
## - The main effect of treatment is statistically significant and large (F(2, 60) = 7.32, p = 0.001; Eta2 (partial) = 0.20, 95% CI [0.06, 1.00])
## - The interaction between gender and risk is statistically not significant and very small (F(1, 60) = 0.14, p = 0.708; Eta2 (partial) = 2.35e-03, 95% CI [0.00, 1.00])
## - The interaction between gender and treatment is statistically significant and medium (F(2, 60) = 3.34, p = 0.042; Eta2 (partial) = 0.10, 95% CI [2.01e-03, 1.00])
## - The interaction between risk and treatment is statistically not significant and small (F(2, 60) = 0.71, p = 0.494; Eta2 (partial) = 0.02, 95% CI [0.00, 1.00])
## - The interaction between gender, risk and treatment is statistically significant and large (F(2, 60) = 7.41, p = 0.001; Eta2 (partial) = 0.20, 95% CI [0.06, 1.00])
##
## Effect sizes were labelled following Field's (2013) recommendations.
If there is a significant 3-way interaction effect, you can decompose it into: - Simple two-way interaction: run two-way interaction at each level of third variable, - Simple simple main effect: run one-way model at each level of second variable, - Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary. If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary. #### Two-way interactions
model <- lm(pain_score ~ gender*risk*treatment, data = headache)
headache %>%
group_by(gender) %>%
anova_test(pain_score ~ risk*treatment, error = model)
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## # A tibble: 6 x 8
## gender Effect DFn DFd F p `p<.05` ges
## * <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male risk 1 60 50.0 0.00000000187 "*" 0.455
## 2 male treatment 2 60 10.2 0.000157 "*" 0.253
## 3 male risk:treatment 2 60 5.25 0.008 "*" 0.149
## 4 female risk 1 60 42.8 0.000000015 "*" 0.416
## 5 female treatment 2 60 0.482 0.62 "" 0.016
## 6 female risk:treatment 2 60 2.87 0.065 "" 0.087
treatment.effect <- headache %>%
group_by(gender, risk) %>%
anova_test(pain_score ~ treatment, error = model)
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
treatment.effect
## # A tibble: 4 x 9
## gender risk Effect DFn DFd F p `p<.05` ges
## * <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male high treatment 2 60 14.8 0.0000061 "*" 0.33
## 2 male low treatment 2 60 0.66 0.521 "" 0.022
## 3 female high treatment 2 60 0.52 0.597 "" 0.017
## 4 female low treatment 2 60 2.83 0.067 "" 0.086
test <- headache %>% tukey_hsd(pain_score~gender*risk*treatment)
test
## # A tibble: 107 x 9
## term group1 group2 null.value estimate conf.low conf.high p.adj
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 gender male female 0 -4.17 -6.25 -2.10 1.63e- 4
## 2 risk high low 0 -9.98 -12.1 -7.91 2 e-11
## 3 treatment X Y 0 -4.20 -7.25 -1.15 4.49e- 3
## 4 treatment X Z 0 -4.22 -7.27 -1.16 4.32e- 3
## 5 treatment Y Z 0 -0.0166 -3.07 3.04 1 e+ 0
## 6 gender:risk male:hi~ female:~ 0 -4.56 -8.44 -0.687 1.47e- 2
## 7 gender:risk male:hi~ male:low 0 -10.4 -14.2 -6.50 1.12e- 8
## 8 gender:risk male:hi~ female:~ 0 -14.2 -18.0 -10.3 2.04e-11
## 9 gender:risk female:~ male:low 0 -5.81 -9.68 -1.94 1.12e- 3
## 10 gender:risk female:~ female:~ 0 -9.59 -13.5 -5.72 8.93e- 8
## # ... with 97 more rows, and 1 more variable: p.adj.signif <chr>
headache %>%
pairwise_t_test(pain_score~gender)
## # A tibble: 1 x 9
## .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
## 1 pain_score male female 36 36 0.0172 * 0.0172 *
headache %>%
pairwise_t_test(pain_score~risk)
## # A tibble: 1 x 9
## .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
## 1 pain_score high low 36 36 1.24e-10 **** 1.24e-10 ****
headache %>%
pairwise_t_test(pain_score~treatment)
## # A tibble: 3 x 9
## .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
## 1 pain_score X Y 24 24 0.0514 ns 0.152 ns
## 2 pain_score X Z 24 24 0.0505 ns 0.152 ns
## 3 pain_score Y Z 24 24 0.994 ns 0.994 ns