noweRmd.knit

title: “BS Anova 2” author: “Filip Ziętara” date: Published on Tuesday 04 January 2022 output: rmdformats::readthedown: highlight: kate toc_float: true toc_depth: 4 editor_options: chunk_output_type: console —

2-way Anova

Two-way ANOVA test hypotheses - There is no difference in the means of factor A - There is no difference in means of factor B - There is no interaction between factors A and B The alternative hypothesis for cases 1 and 2 is: the means are not equal. The alternative hypothesis for case 3 is: there is an interaction between A and B. ## Data In this report we will the jobsatisfaction dataset [datarium package], which contains the job satisfaction score organized by gender and education levels. In this study, a research wants to evaluate if there is a significant two-way interaction between gender and education_level on explaining the job satisfaction score. An interaction effect occurs when the effect of one independent variable on an outcome variable depends on the level of the other independent variables. If an interaction effect does not exist, main effects could be reported. ## Descriptive statistics Before running a model, you should always plot the data, to check that your assumptions look okay. Here are a couple plots you might generate while analyzing these data:

ggplot(jobsatisfaction, aes(x=score)) +
geom_histogram(bins=10) +
facet_grid(gender ~ education_level) +
theme_classic()

Boxplot, to highlight the group means:

ggplot(jobsatisfaction, aes(y=score, x=education_level, fill = gender)) +
geom_boxplot() +
theme_classic()

The distributions within each cell look pretty wonky, but that’s not particularly surprising given the small sample size (n=58):

xtabs(~ gender + education_level, data = jobsatisfaction)

##         education_level
## gender   school college university
##   male        9       9         10
##   female     10      10         10

Compute the mean and the SD (standard deviation) of the score by groups:

jobsatisfaction %>% group_by(gender, education_level) %>% get_summary_stats(score, type="mean_sd")

## # A tibble: 6 x 6
##   gender education_level variable     n  mean    sd
##   <fct>  <fct>           <chr>    <dbl> <dbl> <dbl>
## 1 male   school          score        9  5.43 0.364
## 2 male   college         score        9  6.22 0.34 
## 3 male   university      score       10  9.29 0.445
## 4 female school          score       10  5.74 0.474
## 5 female college         score       10  6.46 0.475
## 6 female university      score       10  8.41 0.938

Assumptions

Outliers

Identify outliers in each cell design:

jobsatisfaction %>%
group_by(gender, education_level) %>%
identify_outliers(score)

## [1] gender          education_level id              score          
## [5] is.outlier      is.extreme     
## <0 rows> (or 0-length row.names)

Normality

Check normality assumption by analyzing the model residuals. QQ plot and Shapiro-Wilk test of normality are used.

jobsatisfaction %>%
group_by(gender,education_level) %>%
shapiro_test(score)

## # A tibble: 6 x 5
##   gender education_level variable statistic     p
##   <fct>  <fct>           <chr>        <dbl> <dbl>
## 1 male   school          score        0.980 0.966
## 2 male   college         score        0.958 0.779
## 3 male   university      score        0.916 0.323
## 4 female school          score        0.963 0.819
## 5 female college         score        0.963 0.819
## 6 female university      score        0.950 0.674

ggqqplot(jobsatisfaction, "score") + facet_grid(gender~education_level)

Homogeneity of variance

This can be checked using the Levene’s test:

jobsatisfaction %>%
levene_test(score~education_level*gender)

## # A tibble: 1 x 4
##     df1   df2 statistic      p
##   <int> <int>     <dbl>  <dbl>
## 1     5    52      2.20 0.0686

plot(lm(score~gender*education_level, data=jobsatisfaction))

## Anova In this example, the effect of education_level is our focal variable, that is our primary concern. It is thought that the effect of education_level will depend on one other factor, gender, which are called a moderator variable.

results<-aov(score~gender*education_level, data=jobsatisfaction)
anova(results)

## Analysis of Variance Table
## 
## Response: score
##                        Df Sum Sq Mean Sq F value              Pr(>F)    
## gender                  1    0.5     0.5    1.79              0.1871    
## education_level         2  113.7    56.8  187.89 <0.0000000000000002 ***
## gender:education_level  2    4.4     2.2    7.34              0.0016 ** 
## Residuals              52   15.7     0.3                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

report(results)

## The ANOVA (formula: score ~ gender * education_level) suggests that:
## 
##   - The main effect of gender is statistically not significant and small (F(1, 52) = 1.79, p = 0.187; Eta2 (partial) = 0.03, 95% CI [0.00, 1.00])
##   - The main effect of education_level is statistically significant and large (F(2, 52) = 187.89, p < .001; Eta2 (partial) = 0.88, 95% CI [0.83, 1.00])
##   - The interaction between gender and education_level is statistically significant and large (F(2, 52) = 7.34, p = 0.002; Eta2 (partial) = 0.22, 95% CI [0.06, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

Post-hoc tests

A significant two-way interaction indicates that the impact that one factor (e.g., education_level) has on the outcome variable (e.g., job satisfaction score) depends on the level of the other factor (e.g., gender) (and vice versa). So, you can decompose a significant two-way interaction into: - Simple main effect: run one-way model of the first variable at each level of the second variable, - Simple pairwise comparisons: if the simple main effect is significant, run multiple pairwise comparisons to determine which groups are different. For a non-significant two-way interaction, you need to determine whether you have any statistically significant main effects from the ANOVA output. A significant main effect can be followed up by pairwise comparisons between groups.

Main effects

Pairwise comparisons

test <- jobsatisfaction %>% tukey_hsd(score~gender*education_level)
test

## # A tibble: 19 x 9
##    term       group1    group2   null.value estimate conf.low conf.high    p.adj
##  * <chr>      <chr>     <chr>         <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
##  1 gender     male      female            0   -0.193 -0.483      0.0968 1.87e- 1
##  2 education~ school    college           0    0.757  0.327      1.19   2.64e- 4
##  3 education~ school    univers~          0    3.25   2.83       3.68   0       
##  4 education~ college   univers~          0    2.49   2.07       2.92   0       
##  5 gender:ed~ male:sch~ female:~          0    0.314 -0.433      1.06   8.13e- 1
##  6 gender:ed~ male:sch~ male:co~          0    0.797  0.0296     1.56   3.75e- 2
##  7 gender:ed~ male:sch~ female:~          0    1.04   0.289      1.78   1.92e- 3
##  8 gender:ed~ male:sch~ male:un~          0    3.87   3.12       4.61   0       
##  9 gender:ed~ male:sch~ female:~          0    2.98   2.23       3.73   0       
## 10 gender:ed~ female:s~ male:co~          0    0.482 -0.265      1.23   4.09e- 1
## 11 gender:ed~ female:s~ female:~          0    0.722 -0.00575    1.45   5.3 e- 2
## 12 gender:ed~ female:s~ male:un~          0    3.55   2.82       4.28   0       
## 13 gender:ed~ female:s~ female:~          0    2.67   1.94       3.39   0       
## 14 gender:ed~ male:col~ female:~          0    0.240 -0.508      0.987  9.32e- 1
## 15 gender:ed~ male:col~ male:un~          0    3.07   2.32       3.82   0       
## 16 gender:ed~ male:col~ female:~          0    2.18   1.43       2.93   1.86e-10
## 17 gender:ed~ female:c~ male:un~          0    2.83   2.10       3.56   0       
## 18 gender:ed~ female:c~ female:~          0    1.94   1.22       2.67   2.73e- 9
## 19 gender:ed~ male:uni~ female:~          0   -0.886 -1.61      -0.158  8.75e- 3
## # ... with 1 more variable: p.adj.signif <chr>

jobsatisfaction %>%
pairwise_t_test(score~education_level)

## # A tibble: 3 x 9
##   .y.   group1  group2        n1    n2        p p.signif    p.adj p.adj.signif
## * <chr> <chr>   <chr>      <int> <int>    <dbl> <chr>       <dbl> <chr>       
## 1 score school  college       19    19 3.27e- 4 ***      3.27e- 4 ***         
## 2 score school  university    19    20 3.43e-23 ****     1.03e-22 ****        
## 3 score college university    19    20 3.8 e-18 ****     7.6 e-18 ****

3-Way ANOVA

The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable. We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine. We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.

head(headache)

## # A tibble: 6 x 5
##      id gender risk  treatment pain_score
##   <int> <fct>  <fct> <fct>          <dbl>
## 1     1 male   low   X               79.3
## 2     2 male   low   X               76.8
## 3     3 male   low   X               70.8
## 4     4 male   low   X               81.2
## 5     5 male   low   X               75.1
## 6     6 male   low   X               73.1

Descriptive statistics

summary(headache)

##        id          gender     risk    treatment   pain_score   
##  Min.   : 1.0   male  :36   high:36   X:24      Min.   : 63.7  
##  1st Qu.:18.8   female:36   low :36   Y:24      1st Qu.: 72.9  
##  Median :36.5                         Z:24      Median : 77.9  
##  Mean   :36.5                                   Mean   : 77.6  
##  3rd Qu.:54.2                                   3rd Qu.: 81.5  
##  Max.   :72.0                                   Max.   :100.0

ggplot(headache, aes(x=pain_score)) +
geom_histogram(bins=10) +
facet_grid(gender ~ risk ~ treatment) +
theme_classic()

headache %>% group_by(gender,risk,treatment) %>% get_summary_stats(pain_score, type="mean_sd")

## # A tibble: 12 x 7
##    gender risk  treatment variable       n  mean    sd
##    <fct>  <fct> <fct>     <chr>      <dbl> <dbl> <dbl>
##  1 male   high  X         pain_score     6  92.7  5.12
##  2 male   high  Y         pain_score     6  82.3  5.00
##  3 male   high  Z         pain_score     6  79.7  4.05
##  4 male   low   X         pain_score     6  76.1  3.86
##  5 male   low   Y         pain_score     6  73.1  4.76
##  6 male   low   Z         pain_score     6  74.5  4.89
##  7 female high  X         pain_score     6  78.9  5.32
##  8 female high  Y         pain_score     6  81.2  4.62
##  9 female high  Z         pain_score     6  81.0  3.98
## 10 female low   X         pain_score     6  74.2  3.69
## 11 female low   Y         pain_score     6  68.4  4.08
## 12 female low   Z         pain_score     6  69.8  2.72

Assumptions

Outliers

headache %>%
group_by(gender, risk, treatment) %>%
identify_outliers(pain_score)

## # A tibble: 4 x 7
##   gender risk  treatment    id pain_score is.outlier is.extreme
##   <fct>  <fct> <fct>     <int>      <dbl> <lgl>      <lgl>     
## 1 female high  X            57       68.4 TRUE       TRUE      
## 2 female high  Y            62       73.1 TRUE       FALSE     
## 3 female high  Z            67       75.0 TRUE       FALSE     
## 4 female high  Z            71       87.1 TRUE       FALSE

Normality

headache %>%
group_by(gender,risk,treatment) %>%
shapiro_test(pain_score)

## # A tibble: 12 x 6
##    gender risk  treatment variable   statistic       p
##    <fct>  <fct> <fct>     <chr>          <dbl>   <dbl>
##  1 male   high  X         pain_score     0.958 0.808  
##  2 male   high  Y         pain_score     0.902 0.384  
##  3 male   high  Z         pain_score     0.955 0.784  
##  4 male   low   X         pain_score     0.982 0.962  
##  5 male   low   Y         pain_score     0.920 0.507  
##  6 male   low   Z         pain_score     0.924 0.535  
##  7 female high  X         pain_score     0.714 0.00869
##  8 female high  Y         pain_score     0.939 0.654  
##  9 female high  Z         pain_score     0.971 0.901  
## 10 female low   X         pain_score     0.933 0.600  
## 11 female low   Y         pain_score     0.927 0.555  
## 12 female low   Z         pain_score     0.958 0.801

ggqqplot(headache, "pain_score") + facet_grid(gender~risk~treatment)

Homogeneity of variance

headache %>%
levene_test(pain_score~risk*treatment*gender)

## # A tibble: 1 x 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1    11    60     0.179 0.998

plot(lm(pain_score~gender*risk*treatment, data=headache))

Anova

results<-aov(pain_score~gender*risk*treatment, data=headache)
anova(results)

## Analysis of Variance Table
## 
## Response: pain_score
##                       Df Sum Sq Mean Sq F value            Pr(>F)    
## gender                 1    313     313   16.20           0.00016 ***
## risk                   1   1794    1794   92.70 0.000000000000088 ***
## treatment              2    283     142    7.32           0.00143 ** 
## gender:risk            1      3       3    0.14           0.70849    
## gender:treatment       2    129      65    3.34           0.04220 *  
## risk:treatment         2     28      14    0.71           0.49422    
## gender:risk:treatment  2    287     143    7.41           0.00133 ** 
## Residuals             60   1161      19                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

report(results)

## The ANOVA (formula: pain_score ~ gender * risk * treatment) suggests that:
## 
##   - The main effect of gender is statistically significant and large (F(1, 60) = 16.20, p < .001; Eta2 (partial) = 0.21, 95% CI [0.08, 1.00])
##   - The main effect of risk is statistically significant and large (F(1, 60) = 92.70, p < .001; Eta2 (partial) = 0.61, 95% CI [0.48, 1.00])
##   - The main effect of treatment is statistically significant and large (F(2, 60) = 7.32, p = 0.001; Eta2 (partial) = 0.20, 95% CI [0.06, 1.00])
##   - The interaction between gender and risk is statistically not significant and very small (F(1, 60) = 0.14, p = 0.708; Eta2 (partial) = 2.35e-03, 95% CI [0.00, 1.00])
##   - The interaction between gender and treatment is statistically significant and medium (F(2, 60) = 3.34, p = 0.042; Eta2 (partial) = 0.10, 95% CI [2.01e-03, 1.00])
##   - The interaction between risk and treatment is statistically not significant and small (F(2, 60) = 0.71, p = 0.494; Eta2 (partial) = 0.02, 95% CI [0.00, 1.00])
##   - The interaction between gender, risk and treatment is statistically significant and large (F(2, 60) = 7.41, p = 0.001; Eta2 (partial) = 0.20, 95% CI [0.06, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

Post-hoc tests

If there is a significant 3-way interaction effect, you can decompose it into: - Simple two-way interaction: run two-way interaction at each level of third variable, - Simple simple main effect: run one-way model at each level of second variable, - Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary. If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary. #### Two-way interactions

model  <- lm(pain_score ~ gender*risk*treatment, data = headache)
headache %>%
  group_by(gender) %>%
  anova_test(pain_score ~ risk*treatment, error = model)

## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()

## # A tibble: 6 x 8
##   gender Effect           DFn   DFd      F             p `p<.05`   ges
## * <fct>  <chr>          <dbl> <dbl>  <dbl>         <dbl> <chr>   <dbl>
## 1 male   risk               1    60 50.0   0.00000000187 "*"     0.455
## 2 male   treatment          2    60 10.2   0.000157      "*"     0.253
## 3 male   risk:treatment     2    60  5.25  0.008         "*"     0.149
## 4 female risk               1    60 42.8   0.000000015   "*"     0.416
## 5 female treatment          2    60  0.482 0.62          ""      0.016
## 6 female risk:treatment     2    60  2.87  0.065         ""      0.087

Main effects

treatment.effect <- headache %>%
  group_by(gender, risk) %>%
  anova_test(pain_score ~ treatment, error = model)

## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()
## Coefficient covariances computed by hccm()

treatment.effect

## # A tibble: 4 x 9
##   gender risk  Effect      DFn   DFd     F         p `p<.05`   ges
## * <fct>  <fct> <chr>     <dbl> <dbl> <dbl>     <dbl> <chr>   <dbl>
## 1 male   high  treatment     2    60 14.8  0.0000061 "*"     0.33 
## 2 male   low   treatment     2    60  0.66 0.521     ""      0.022
## 3 female high  treatment     2    60  0.52 0.597     ""      0.017
## 4 female low   treatment     2    60  2.83 0.067     ""      0.086

Pairwise comparisons

test <- headache %>% tukey_hsd(pain_score~gender*risk*treatment)
test

## # A tibble: 107 x 9
##    term        group1   group2   null.value estimate conf.low conf.high    p.adj
##  * <chr>       <chr>    <chr>         <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
##  1 gender      male     female            0  -4.17      -6.25    -2.10  1.63e- 4
##  2 risk        high     low               0  -9.98     -12.1     -7.91  2   e-11
##  3 treatment   X        Y                 0  -4.20      -7.25    -1.15  4.49e- 3
##  4 treatment   X        Z                 0  -4.22      -7.27    -1.16  4.32e- 3
##  5 treatment   Y        Z                 0  -0.0166    -3.07     3.04  1   e+ 0
##  6 gender:risk male:hi~ female:~          0  -4.56      -8.44    -0.687 1.47e- 2
##  7 gender:risk male:hi~ male:low          0 -10.4      -14.2     -6.50  1.12e- 8
##  8 gender:risk male:hi~ female:~          0 -14.2      -18.0    -10.3   2.04e-11
##  9 gender:risk female:~ male:low          0  -5.81      -9.68    -1.94  1.12e- 3
## 10 gender:risk female:~ female:~          0  -9.59     -13.5     -5.72  8.93e- 8
## # ... with 97 more rows, and 1 more variable: p.adj.signif <chr>

headache %>%
pairwise_t_test(pain_score~gender)

## # A tibble: 1 x 9
##   .y.        group1 group2    n1    n2      p p.signif  p.adj p.adj.signif
## * <chr>      <chr>  <chr>  <int> <int>  <dbl> <chr>     <dbl> <chr>       
## 1 pain_score male   female    36    36 0.0172 *        0.0172 *

headache %>%
pairwise_t_test(pain_score~risk)

## # A tibble: 1 x 9
##   .y.        group1 group2    n1    n2        p p.signif    p.adj p.adj.signif
## * <chr>      <chr>  <chr>  <int> <int>    <dbl> <chr>       <dbl> <chr>       
## 1 pain_score high   low       36    36 1.24e-10 ****     1.24e-10 ****

headache %>%
pairwise_t_test(pain_score~treatment)

## # A tibble: 3 x 9
##   .y.        group1 group2    n1    n2      p p.signif p.adj p.adj.signif
## * <chr>      <chr>  <chr>  <int> <int>  <dbl> <chr>    <dbl> <chr>       
## 1 pain_score X      Y         24    24 0.0514 ns       0.152 ns          
## 2 pain_score X      Z         24    24 0.0505 ns       0.152 ns          
## 3 pain_score Y      Z         24    24 0.994  ns       0.994 ns