ANOVA is a statistical technique, commonly used to compare the mean between two or more groups. ANOVA test is centered on the different sources of variation in a typical variable. This statistical method is an extension of the statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables. ANOVA tests whether there is a difference in means of the groups at each level of the independent variable. Notably, these methods have seen a resurgence as a part of “A/B Testing”.

0.1 Exercise 1 One-way ANOVA

In this exercise you have to consider data set named PlantGrowth from R environment. It contains the weight of plants obtained under a control and two different treatment conditions. Please consider the following steps:

  • Data Preparation (One-way ANOVA)

We’ll use the PlantGrowth dataset from the datasets package. After that, we can check our different kind treat in our group’s variable, with levels function.

##    weight group
## 1    4.17  ctrl
## 2    5.58  ctrl
## 3    5.18  ctrl
## 4    6.11  ctrl
## 5    4.50  ctrl
## 6    4.61  ctrl
## 7    5.17  ctrl
## 8    4.53  ctrl
## 9    5.33  ctrl
## 10   5.14  ctrl
## 11   4.81  trt1
## 12   4.17  trt1
## 13   4.41  trt1
## 14   3.59  trt1
## 15   5.87  trt1
## 16   3.83  trt1
## 17   6.03  trt1
## 18   4.89  trt1
## 19   4.32  trt1
## 20   4.69  trt1
## 21   6.31  trt2
## 22   5.12  trt2
## 23   5.54  trt2
## 24   5.50  trt2
## 25   5.37  trt2
## 26   5.29  trt2
## 27   4.92  trt2
## 28   6.15  trt2
## 29   5.80  trt2
## 30   5.26  trt2
## [1] "ctrl" "trt1" "trt2"

Before build the model, we must consider the statistics summary of our data (PlantGrowth), and don’t forget to add the library that we need.

## # A tibble: 3 x 5
##   group variable     n  mean    sd
##   <fct> <chr>    <dbl> <dbl> <dbl>
## 1 ctrl  weight      10  5.03 0.583
## 2 trt1  weight      10  4.66 0.794
## 3 trt2  weight      10  5.53 0.443

We can apply box plot for our visualization data, then we can see if any of our data are outliers or not from the box plot.

From the box plot we know that our data have outliers, and we can assume that each group have a difference of means. We must check are our outliers is extreme or not. From the table below there were no extreme outliers.

## # A tibble: 2 x 4
##   group weight is.outlier is.extreme
##   <fct>  <dbl> <lgl>      <lgl>     
## 1 trt1    5.87 TRUE       FALSE     
## 2 trt1    6.03 TRUE       FALSE
  • One-way ANOVA Test

The aov() function is used to obtain the relevant sums of squares. Using the summary() function on the output from aov() creates the desired ANOVA table.

From the table below we get p-value < a, then we reject H0, we accept H1 that claims at least one group mean is not equal to the others.

##             Df Sum Sq Mean Sq F value Pr(>F)  
## group        2  3.766  1.8832   4.846 0.0159 *
## Residuals   27 10.492  0.3886                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here below, we’ve created a dataframe with a row for each diet. By predicting on this dataframe, we obtain the sample means of each group (ctrl,trt1, trt2).

##   group weight
## 1  ctrl  5.032
## 2  trt1  4.661
## 3  trt2  5.526
  • Check Assumptions (One-way ANOVA)

The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots

In the plot below, we can see that there’s no clear relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the homogeneity of variances. And we can see that the outliers are o17, o15, and 4o. Not much.

Check normality assumption by groups. Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.

## # A tibble: 3 x 4
##   group variable statistic     p
##   <fct> <chr>        <dbl> <dbl>
## 1 ctrl  weight       0.957 0.747
## 2 trt1  weight       0.930 0.452
## 3 trt2  weight       0.941 0.564

The score that we get were normally distributed (p > 0.05) for each group, as assessed by Saphiro-Wilk’s test of normality.

Then, we can also possible to use the Levene’s test to check the homogeneity of variances. From the results below, we can see that the p-value > 0.05, which is not significant. This means that, there’s not significant difference between variances across groups. Therefore, we can assume that homogeneity of variances in the different treatment groups.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  1.1192 0.3412
##       27
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  weight and group
## F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739

From the output above we can see that the p-value is less than the significance level of 0.05. This means that there is evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.

  • Pairwise-comparison (One-way ANOVA)
## # A tibble: 3 x 9
##   term  group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr>  <chr>       <dbl>    <dbl>    <dbl>     <dbl> <dbl> <chr>       
## 1 group ctrl   trt1            0   -0.371   -1.06      0.320 0.391 ns          
## 2 group ctrl   trt2            0    0.494   -0.197     1.19  0.198 ns          
## 3 group trt1   trt2            0    0.865    0.174     1.56  0.012 *

From the output above, we can see that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).

  • Tukey HSD (One-way ANOVA)

Tukey’s Honest Significance difference can be applied directly to an object which was created using aov(). It will adjust the p-values of the pairwise comparisons of the means to control the FWER, in this case, for 0.05. Notice it also gives confidence intervals for the difference of the means.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ group, data = PlantGrowth)
## 
## $group
##             diff        lwr       upr     p adj
## trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
## trt2-ctrl  0.494 -0.1972161 1.1852161 0.1979960
## trt2-trt1  0.865  0.1737839 1.5562161 0.0120064

Based the results, we can see that there’s difference between trt2 and trt1 is significant (adjusted p-value = 0.012). Also, nicely, we can easily produce a plot of these confidence intervals.

0.2 Exercise 2 Two-way ANOVA

Please apply Two-way ANOVA procedures to the rats data from the faraway package. There are two factors here: poison and treat. You can use the levels() function to extract the levels of a factor variable. Give your opinions accordingly!

  • Data Preparation (Two-way ANOVA)

We’ll use the rats dataset from the faraway package.

##    time poison treat
## 1  0.31      I     A
## 2  0.82      I     B
## 3  0.43      I     C
## 4  0.45      I     D
## 5  0.45      I     A
## 6  1.10      I     B
## 7  0.45      I     C
## 8  0.71      I     D
## 9  0.46      I     A
## 10 0.88      I     B
## 11 0.63      I     C
## 12 0.66      I     D
## 13 0.43      I     A
## 14 0.72      I     B
## 15 0.76      I     C
## 16 0.62      I     D
## 17 0.36     II     A
## 18 0.92     II     B
## 19 0.44     II     C
## 20 0.56     II     D
## 21 0.29     II     A
## 22 0.61     II     B
## 23 0.35     II     C
## 24 1.02     II     D
## 25 0.40     II     A
## 26 0.49     II     B
## 27 0.31     II     C
## 28 0.71     II     D
## 29 0.23     II     A
## 30 1.24     II     B
## 31 0.40     II     C
## 32 0.38     II     D
## 33 0.22    III     A
## 34 0.30    III     B
## 35 0.23    III     C
## 36 0.30    III     D
## 37 0.21    III     A
## 38 0.37    III     B
## 39 0.25    III     C
## 40 0.36    III     D
## 41 0.18    III     A
## 42 0.38    III     B
## 43 0.24    III     C
## 44 0.31    III     D
## 45 0.23    III     A
## 46 0.29    III     B
## 47 0.22    III     C
## 48 0.33    III     D

Before build the model, we must consider the kind of each variable in our data and the statistics summary of our data (rats), and don’t forget to add the library that we need.

## # A tibble: 12 x 3
##     time poison treat
##    <dbl> <fct>  <fct>
##  1  0.46 I      A    
##  2  0.88 I      B    
##  3  0.63 I      C    
##  4  0.71 I      D    
##  5  0.4  II     A    
##  6  0.61 II     B    
##  7  0.35 II     C    
##  8  1.02 II     D    
##  9  0.18 III    A    
## 10  0.3  III    B    
## 11  0.22 III    C    
## 12  0.36 III    D
## # A tibble: 12 x 6
##    poison treat variable     n  mean    sd
##    <fct>  <fct> <chr>    <dbl> <dbl> <dbl>
##  1 I      A     time         4 0.412 0.069
##  2 I      B     time         4 0.88  0.161
##  3 I      C     time         4 0.568 0.157
##  4 I      D     time         4 0.61  0.113
##  5 II     A     time         4 0.32  0.075
##  6 II     B     time         4 0.815 0.336
##  7 II     C     time         4 0.375 0.057
##  8 II     D     time         4 0.667 0.271
##  9 III    A     time         4 0.21  0.022
## 10 III    B     time         4 0.335 0.047
## 11 III    C     time         4 0.235 0.013
## 12 III    D     time         4 0.325 0.026

We can apply box plot for our visualization data, then we can see if any of our data are outliers or not from the box plot.

## # A tibble: 1 x 5
##   poison treat  time is.outlier is.extreme
##   <fct>  <fct> <dbl> <lgl>      <lgl>     
## 1 I      A      0.31 TRUE       FALSE
  • Two-way ANOVA Test

Let’s fit each of the possible models, then investigate their estimates for each of the group means.

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## treat         3 0.9212  0.3071  13.806 3.78e-06 ***
## poison        2 1.0330  0.5165  23.222 3.33e-07 ***
## treat:poison  6 0.2501  0.0417   1.874    0.112    
## Residuals    36 0.8007  0.0222                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## treat        3 0.9212  0.3071   12.27 6.7e-06 ***
## poison       2 1.0330  0.5165   20.64 5.7e-07 ***
## Residuals   42 1.0509  0.0250                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## treat        3 0.9212 0.30707   6.484 0.000992 ***
## Residuals   44 2.0839 0.04736                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## poison       2  1.033  0.5165   11.79 7.66e-05 ***
## Residuals   45  1.972  0.0438                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##             Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   47  3.005 0.06394
## ANOVA Table (type II tests)
## 
##         Effect DFn DFd      F        p p<.05   ges
## 1        treat   3  36 13.806 3.78e-06     * 0.535
## 2       poison   2  36 23.222 3.33e-07     * 0.563
## 3 treat:poison   6  36  1.874 1.12e-01       0.238

There was a statistically significant interaction between treat and poison for time, F(6, 36) = 1.874, p = 1.12e-01

  • Check Assumptions (Two-way ANOVA)

The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.

## # A tibble: 12 x 5
##    poison treat variable statistic      p
##    <fct>  <fct> <chr>        <dbl>  <dbl>
##  1 I      A     time         0.782 0.0741
##  2 II     A     time         0.971 0.848 
##  3 III    A     time         0.927 0.577 
##  4 I      B     time         0.947 0.700 
##  5 II     B     time         0.948 0.701 
##  6 III    B     time         0.831 0.171 
##  7 I      C     time         0.895 0.405 
##  8 II     C     time         0.983 0.921 
##  9 III    C     time         0.993 0.972 
## 10 I      D     time         0.899 0.427 
## 11 II     D     time         0.981 0.907 
## 12 III    D     time         0.946 0.689

Points 24,26,30 are detected as outliers, which can severely affect normality and homogeneity of variance. It can be useful to remove outliers to meet the test assumptions. We can do Levene’s test, which is less sensitive to departures from normal distribution. The score were normally distributed (p > 0.05) for each cell, as assessed by Shapiro-Wilk’s test of normality.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value    Pr(>F)    
## group 11  4.1323 0.0005833 ***
##       36                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## # A tibble: 1 x 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1    11    36      4.13 0.000583
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  time and treat * poison
## F = 14.848, num df = 11.000, denom df = 13.801, p-value = 8.107e-06

From the output above we can see that the p-value is less than the significance level of 0.05. This means that there is evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.

  • Pairwise-comparison (Two-way ANOVA)
## # A tibble: 12 x 10
##    treat term   .y.   group1 group2    df statistic       p   p.adj p.adj.signif
##  * <chr> <chr>  <chr> <chr>  <chr>  <dbl>     <dbl>   <dbl>   <dbl> <chr>       
##  1 A     poison time  I      II        36     0.877 3.86e-1 1.00e+0 ns          
##  2 A     poison time  I      III       36     1.92  6.28e-2 1.88e-1 ns          
##  3 A     poison time  II     III       36     1.04  3.04e-1 9.12e-1 ns          
##  4 B     poison time  I      II        36     0.616 5.42e-1 1.00e+0 ns          
##  5 B     poison time  I      III       36     5.17  8.98e-6 2.70e-5 ****        
##  6 B     poison time  II     III       36     4.55  5.86e-5 1.76e-4 ***         
##  7 C     poison time  I      II        36     1.83  7.62e-2 2.29e-1 ns          
##  8 C     poison time  I      III       36     3.15  3.25e-3 9.76e-3 **          
##  9 C     poison time  II     III       36     1.33  1.93e-1 5.78e-1 ns          
## 10 D     poison time  I      II        36    -0.545 5.89e-1 1.00e+0 ns          
## 11 D     poison time  I      III       36     2.70  1.04e-2 3.13e-2 *           
## 12 D     poison time  II     III       36     3.25  2.52e-3 7.56e-3 **

All pairwise comparisons were analyzed between the different poison groups organized by treat. There was a significant difference of time between all groups for treat (p < 0.05) except treat A,I,II and B,I,II and D,I,II.

  • Tukey HSD (Two-way ANOVA)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = time ~ treat * poison, data = rats)
## 
## $treat
##            diff         lwr         upr     p adj
## B-A  0.36250000  0.19852116  0.52647884 0.0000047
## C-A  0.07833333 -0.08564550  0.24231217 0.5772283
## D-A  0.22000000  0.05602116  0.38397884 0.0048556
## C-B -0.28416667 -0.44814550 -0.12018783 0.0002333
## D-B -0.14250000 -0.30647884  0.02147884 0.1077087
## D-C  0.14166667 -0.02231217  0.30564550 0.1107678
## 
## $poison
##             diff        lwr         upr     p adj
## II-I   -0.073125 -0.2020091  0.05575913 0.3583151
## III-I  -0.341250 -0.4701341 -0.21236587 0.0000005
## III-II -0.268125 -0.3970091 -0.13924087 0.0000339
## 
## $`treat:poison`
##                diff         lwr          upr     p adj
## B:I-A:I      0.4675  0.09942113  0.835578867 0.0041387
## C:I-A:I      0.1550 -0.21307887  0.523078867 0.9393871
## D:I-A:I      0.1975 -0.17057887  0.565578867 0.7673377
## A:II-A:I    -0.0925 -0.46057887  0.275578867 0.9989739
## B:II-A:I     0.4025  0.03442113  0.770578867 0.0220694
## C:II-A:I    -0.0375 -0.40557887  0.330578867 0.9999999
## D:II-A:I     0.2550 -0.11307887  0.623078867 0.4203275
## A:III-A:I   -0.2025 -0.57057887  0.165578867 0.7395476
## B:III-A:I   -0.0775 -0.44557887  0.290578867 0.9998038
## C:III-A:I   -0.1775 -0.54557887  0.190578867 0.8641326
## D:III-A:I   -0.0875 -0.45557887  0.280578867 0.9993833
## C:I-B:I     -0.3125 -0.68057887  0.055578867 0.1618705
## D:I-B:I     -0.2700 -0.63807887  0.098078867 0.3379251
## A:II-B:I    -0.5600 -0.92807887 -0.191921133 0.0003212
## B:II-B:I    -0.0650 -0.43307887  0.303078867 0.9999650
## C:II-B:I    -0.5050 -0.87307887 -0.136921133 0.0014940
## D:II-B:I    -0.2125 -0.58057887  0.155578867 0.6808722
## A:III-B:I   -0.6700 -1.03807887 -0.301921133 0.0000139
## B:III-B:I   -0.5450 -0.91307887 -0.176921133 0.0004903
## C:III-B:I   -0.6450 -1.01307887 -0.276921133 0.0000285
## D:III-B:I   -0.5550 -0.92307887 -0.186921133 0.0003699
## D:I-C:I      0.0425 -0.32557887  0.410578867 0.9999996
## A:II-C:I    -0.2475 -0.61557887  0.120578867 0.4645761
## B:II-C:I     0.2475 -0.12057887  0.615578867 0.4645761
## C:II-C:I    -0.1925 -0.56057887  0.175578867 0.7938454
## D:II-C:I     0.1000 -0.26807887  0.468078867 0.9979417
## A:III-C:I   -0.3575 -0.72557887  0.010578867 0.0634939
## B:III-C:I   -0.2325 -0.60057887  0.135578867 0.5569107
## C:III-C:I   -0.3325 -0.70057887  0.035578867 0.1086625
## D:III-C:I   -0.2425 -0.61057887  0.125578867 0.4949176
## A:II-D:I    -0.2900 -0.65807887  0.078078867 0.2439183
## B:II-D:I     0.2050 -0.16307887  0.573078867 0.7252287
## C:II-D:I    -0.2350 -0.60307887  0.133078867 0.5413031
## D:II-D:I     0.0575 -0.31057887  0.425578867 0.9999899
## A:III-D:I   -0.4000 -0.76807887 -0.031921133 0.0234641
## B:III-D:I   -0.2750 -0.64307887  0.093078867 0.3126107
## C:III-D:I   -0.3750 -0.74307887 -0.006921133 0.0426208
## D:III-D:I   -0.2850 -0.65307887  0.083078867 0.2655759
## B:II-A:II    0.4950  0.12692113  0.863078867 0.0019661
## C:II-A:II    0.0550 -0.31307887  0.423078867 0.9999936
## D:II-A:II    0.3475 -0.02057887  0.715578867 0.0790981
## A:III-A:II  -0.1100 -0.47807887  0.258078867 0.9953340
## B:III-A:II   0.0150 -0.35307887  0.383078867 1.0000000
## C:III-A:II  -0.0850 -0.45307887  0.283078867 0.9995291
## D:III-A:II   0.0050 -0.36307887  0.373078867 1.0000000
## C:II-B:II   -0.4400 -0.80807887 -0.071921133 0.0085466
## D:II-B:II   -0.1475 -0.51557887  0.220578867 0.9563176
## A:III-B:II  -0.6050 -0.97307887 -0.236921133 0.0000893
## B:III-B:II  -0.4800 -0.84807887 -0.111921133 0.0029569
## C:III-B:II  -0.5800 -0.94807887 -0.211921133 0.0001822
## D:III-B:II  -0.4900 -0.85807887 -0.121921133 0.0022537
## D:II-C:II    0.2925 -0.07557887  0.660578867 0.2335610
## A:III-C:II  -0.1650 -0.53307887  0.203078867 0.9105785
## B:III-C:II  -0.0400 -0.40807887  0.328078867 0.9999998
## C:III-C:II  -0.1400 -0.50807887  0.228078867 0.9695827
## D:III-C:II  -0.0500 -0.41807887  0.318078867 0.9999976
## A:III-D:II  -0.4575 -0.82557887 -0.089421133 0.0054007
## B:III-D:II  -0.3325 -0.70057887  0.035578867 0.1086625
## C:III-D:II  -0.4325 -0.80057887 -0.064421133 0.0103742
## D:III-D:II  -0.3425 -0.71057887  0.025578867 0.0880763
## B:III-A:III  0.1250 -0.24307887  0.493078867 0.9868993
## C:III-A:III  0.0250 -0.34307887  0.393078867 1.0000000
## D:III-A:III  0.1150 -0.25307887  0.483078867 0.9932581
## C:III-B:III -0.1000 -0.46807887  0.268078867 0.9979417
## D:III-B:III -0.0100 -0.37807887  0.358078867 1.0000000
## D:III-C:III  0.0900 -0.27807887  0.458078867 0.9992007

0.3 Exercise 3 Three-way ANOVA

In this exercise you will use the headache dataset [in datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine. You have to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.

  • Data Preparation (Three-way ANOVA)

We’ll use the headache dataset from the datarium package.

## # A tibble: 72 x 5
##       id gender risk  treatment pain_score
##    <int> <fct>  <fct> <fct>          <dbl>
##  1     1 male   low   X               79.3
##  2     2 male   low   X               76.8
##  3     3 male   low   X               70.8
##  4     4 male   low   X               81.2
##  5     5 male   low   X               75.1
##  6     6 male   low   X               73.1
##  7     7 male   low   Y               68.2
##  8     8 male   low   Y               80.7
##  9     9 male   low   Y               75.3
## 10    10 male   low   Y               73.4
## # ... with 62 more rows

Before build the model, we must consider the kind of each variable in our data and the statistics summary of our data (headache), and don’t forget to add the library that we need.

## # A tibble: 12 x 5
##       id gender risk  treatment pain_score
##    <int> <fct>  <fct> <fct>          <dbl>
##  1    21 male   high  X               92.7
##  2    30 male   high  Y               80.1
##  3    33 male   high  Z               81.3
##  4     2 male   low   X               76.8
##  5     8 male   low   Y               80.7
##  6    18 male   low   Z               74.7
##  7    57 female high  X               68.4
##  8    65 female high  Y               82.1
##  9    70 female high  Z               80.4
## 10    42 female low   X               78.1
## 11    48 female low   Y               64.1
## 12    49 female low   Z               69.0
## # A tibble: 12 x 7
##    gender risk  treatment variable       n  mean    sd
##    <fct>  <fct> <fct>     <chr>      <dbl> <dbl> <dbl>
##  1 male   high  X         pain_score     6  92.7  5.12
##  2 male   high  Y         pain_score     6  82.3  5.00
##  3 male   high  Z         pain_score     6  79.7  4.05
##  4 male   low   X         pain_score     6  76.1  3.86
##  5 male   low   Y         pain_score     6  73.1  4.76
##  6 male   low   Z         pain_score     6  74.5  4.89
##  7 female high  X         pain_score     6  78.9  5.32
##  8 female high  Y         pain_score     6  81.2  4.62
##  9 female high  Z         pain_score     6  81.0  3.98
## 10 female low   X         pain_score     6  74.2  3.69
## 11 female low   Y         pain_score     6  68.4  4.08
## 12 female low   Z         pain_score     6  69.8  2.72

We can apply box plot for our visualization data, then we can see if any of our data are outliers or not from the box plot. We create a box plot of pain_score by treatment, color lines by gender groups and gacet the plot by risk.

  • Check Assumptions (Three-way ANOVA)
## # A tibble: 4 x 7
##   gender risk  treatment    id pain_score is.outlier is.extreme
##   <fct>  <fct> <fct>     <int>      <dbl> <lgl>      <lgl>     
## 1 female high  X            57       68.4 TRUE       TRUE      
## 2 female high  Y            62       73.1 TRUE       FALSE     
## 3 female high  Z            67       75.0 TRUE       FALSE     
## 4 female high  Z            71       87.1 TRUE       FALSE

In the QQ plot below, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.39), so we can assume normality.

## # A tibble: 1 x 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 residuals(model)     0.982   0.398

The pain scores were normally distributed (p > 0.05) except for one group (female at high risk of migraine taking medicine X, p = 0.0086), as assessed by Shapiro-Wilk’s test of normality.

## # A tibble: 12 x 6
##    gender risk  treatment variable   statistic       p
##    <fct>  <fct> <fct>     <chr>          <dbl>   <dbl>
##  1 male   high  X         pain_score     0.958 0.808  
##  2 male   high  Y         pain_score     0.902 0.384  
##  3 male   high  Z         pain_score     0.955 0.784  
##  4 male   low   X         pain_score     0.982 0.962  
##  5 male   low   Y         pain_score     0.920 0.507  
##  6 male   low   Z         pain_score     0.924 0.535  
##  7 female high  X         pain_score     0.714 0.00869
##  8 female high  Y         pain_score     0.939 0.654  
##  9 female high  Z         pain_score     0.971 0.901  
## 10 female low   X         pain_score     0.933 0.600  
## 11 female low   Y         pain_score     0.927 0.555  
## 12 female low   Z         pain_score     0.958 0.801

From the QQ plot below, we know that all the points fall approximately along the reference line, except for one group (female at high risk of migraine taking medicine X), where we already identified an extreme outlier.

The output below, show the Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.

## # A tibble: 1 x 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1    11    60     0.179 0.998

Then we compute the anova table, and there is a statistically significant three-way interaction between gender, risk and treatment, F(2, 60) = 7.41, p = 0.001.

## ANOVA Table (type II tests)
## 
##                  Effect DFn DFd      F        p p<.05   ges
## 1                gender   1  60 16.196 1.63e-04     * 0.213
## 2                  risk   1  60 92.699 8.80e-14     * 0.607
## 3             treatment   2  60  7.318 1.00e-03     * 0.196
## 4           gender:risk   1  60  0.141 7.08e-01       0.002
## 5      gender:treatment   2  60  3.338 4.20e-02     * 0.100
## 6        risk:treatment   2  60  0.713 4.94e-01       0.023
## 7 gender:risk:treatment   2  60  7.406 1.00e-03     * 0.198
  • Pairwise-comparison (Three-way ANOVA)

A statistically significant simple simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. This can be easily done using the function emmeans_test() described in the previous section.

Compare the different treatments by gender and risk variables:

## # A tibble: 3 x 8
##   gender risk  term      .y.        group1 group2      p.adj p.adj.signif
##   <chr>  <chr> <chr>     <chr>      <chr>  <chr>       <dbl> <chr>       
## 1 male   high  treatment pain_score X      Y      0.000386   ***         
## 2 male   high  treatment pain_score X      Z      0.00000942 ****        
## 3 male   high  treatment pain_score Y      Z      0.897      ns
## # A tibble: 3 x 9
##   gender risk  treatment emmean    se    df conf.low conf.high method      
##   <fct>  <fct> <fct>      <dbl> <dbl> <dbl>    <dbl>     <dbl> <chr>       
## 1 male   high  X           92.7  1.80    60     89.1      96.3 Emmeans test
## 2 male   high  Y           82.3  1.80    60     78.7      85.9 Emmeans test
## 3 male   high  Z           79.7  1.80    60     76.1      83.3 Emmeans test

In the the pairwise comparisons table above, we are interested only in the simple simple comparisons for males at a high risk of a migraine headache. In our example, there are three possible combinations of group differences.

For male at high risk, there was a statistically significant mean difference between treatment X and treatment Y of 10.4 (p.adj < 0.001), and between treatment X and treatment Z of 13.1 (p.adj < 0.0001).

However, the difference between treatment Y and treatment Z (2.66) was not statistically significant, p.adj = 0.897.