ANOVA is a statistical technique, commonly used to compare the mean between two or more groups. ANOVA test is centered on the different sources of variation in a typical variable. This statistical method is an extension of the statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables. ANOVA tests whether there is a difference in means of the groups at each level of the independent variable. Notably, these methods have seen a resurgence as a part of “A/B Testing”.
In this exercise you have to consider data set named PlantGrowth from R environment. It contains the weight of plants obtained under a control and two different treatment conditions. Please consider the following steps:
We’ll use the PlantGrowth
dataset from the datasets
package. After that, we can check our different kind treat in our group’s variable, with levels
function.
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
## 7 5.17 ctrl
## 8 4.53 ctrl
## 9 5.33 ctrl
## 10 5.14 ctrl
## 11 4.81 trt1
## 12 4.17 trt1
## 13 4.41 trt1
## 14 3.59 trt1
## 15 5.87 trt1
## 16 3.83 trt1
## 17 6.03 trt1
## 18 4.89 trt1
## 19 4.32 trt1
## 20 4.69 trt1
## 21 6.31 trt2
## 22 5.12 trt2
## 23 5.54 trt2
## 24 5.50 trt2
## 25 5.37 trt2
## 26 5.29 trt2
## 27 4.92 trt2
## 28 6.15 trt2
## 29 5.80 trt2
## 30 5.26 trt2
## [1] "ctrl" "trt1" "trt2"
Before build the model, we must consider the statistics summary of our data (PlantGrowth
), and don’t forget to add the library that we need.
library(tidyr)
library(rstatix)
library(ggplot2)
# get the summary statistics
PlantGrowth %>%
group_by(group) %>%
get_summary_stats(weight, type = "mean_sd")
## # A tibble: 3 x 5
## group variable n mean sd
## <fct> <chr> <dbl> <dbl> <dbl>
## 1 ctrl weight 10 5.03 0.583
## 2 trt1 weight 10 4.66 0.794
## 3 trt2 weight 10 5.53 0.443
We can apply box plot for our visualization data, then we can see if any of our data are outliers or not from the box plot.
# plot (weight ~ group, data = PlantGrowth, col = 2:5)
# boxplot of `weight` by `group`
ggplot(PlantGrowth, aes(x=group, y=weight, fill=group))+
geom_boxplot()+
geom_jitter(shape=15,
color="darkgreen",
position = position_jitter(0.21)) +
theme_bw()
From the box plot we know that our data have outliers, and we can assume that each group have a difference of means. We must check are our outliers is extreme or not. From the table below there were no extreme outliers.
## # A tibble: 2 x 4
## group weight is.outlier is.extreme
## <fct> <dbl> <lgl> <lgl>
## 1 trt1 5.87 TRUE FALSE
## 2 trt1 6.03 TRUE FALSE
The aov()
function is used to obtain the relevant sums of squares. Using the summary()
function on the output from aov()
creates the desired ANOVA table.
From the table below we get p-value < a, then we reject H0, we accept H1 that claims at least one group mean is not equal to the others.
res.aov = anova_test(weight~group, data = PlantGrowth)
weight_aov = aov (weight ~ group, data = PlantGrowth)
summary(weight_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 3.766 1.8832 4.846 0.0159 *
## Residuals 27 10.492 0.3886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here below, we’ve created a dataframe with a row for each diet. By predicting on this dataframe, we obtain the sample means of each group (ctrl,trt1, trt2).
groups = data.frame(group=unique(PlantGrowth$group))
data.frame(groups, weight=predict(weight_aov, groups))
## group weight
## 1 ctrl 5.032
## 2 trt1 4.661
## 3 trt2 5.526
The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots
In the plot below, we can see that there’s no clear relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the homogeneity of variances. And we can see that the outliers are o17, o15, and 4o. Not much.
Check normality assumption by groups. Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.
## # A tibble: 3 x 4
## group variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 ctrl weight 0.957 0.747
## 2 trt1 weight 0.930 0.452
## 3 trt2 weight 0.941 0.564
The score that we get were normally distributed (p > 0.05) for each group, as assessed by Saphiro-Wilk’s test of normality.
Then, we can also possible to use the Levene’s test to check the homogeneity of variances. From the results below, we can see that the p-value > 0.05, which is not significant. This means that, there’s not significant difference between variances across groups. Therefore, we can assume that homogeneity of variances in the different treatment groups.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.1192 0.3412
## 27
##
## One-way analysis of means (not assuming equal variances)
##
## data: weight and group
## F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739
From the output above we can see that the p-value is less than the significance level of 0.05. This means that there is evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.
## # A tibble: 3 x 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 group ctrl trt1 0 -0.371 -1.06 0.320 0.391 ns
## 2 group ctrl trt2 0 0.494 -0.197 1.19 0.198 ns
## 3 group trt1 trt2 0 0.865 0.174 1.56 0.012 *
From the output above, we can see that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).
Tukey’s Honest Significance difference can be applied directly to an object which was created using aov()
. It will adjust the p-values of the pairwise comparisons of the means to control the FWER, in this case, for 0.05. Notice it also gives confidence intervals for the difference of the means.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ group, data = PlantGrowth)
##
## $group
## diff lwr upr p adj
## trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
## trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
## trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
Based the results, we can see that there’s difference between trt2 and trt1 is significant (adjusted p-value = 0.012). Also, nicely, we can easily produce a plot of these confidence intervals.
Please apply Two-way ANOVA procedures to the rats data from the faraway package. There are two factors here: poison and treat. You can use the levels() function to extract the levels of a factor variable. Give your opinions accordingly!
We’ll use the rats
dataset from the faraway
package.
## time poison treat
## 1 0.31 I A
## 2 0.82 I B
## 3 0.43 I C
## 4 0.45 I D
## 5 0.45 I A
## 6 1.10 I B
## 7 0.45 I C
## 8 0.71 I D
## 9 0.46 I A
## 10 0.88 I B
## 11 0.63 I C
## 12 0.66 I D
## 13 0.43 I A
## 14 0.72 I B
## 15 0.76 I C
## 16 0.62 I D
## 17 0.36 II A
## 18 0.92 II B
## 19 0.44 II C
## 20 0.56 II D
## 21 0.29 II A
## 22 0.61 II B
## 23 0.35 II C
## 24 1.02 II D
## 25 0.40 II A
## 26 0.49 II B
## 27 0.31 II C
## 28 0.71 II D
## 29 0.23 II A
## 30 1.24 II B
## 31 0.40 II C
## 32 0.38 II D
## 33 0.22 III A
## 34 0.30 III B
## 35 0.23 III C
## 36 0.30 III D
## 37 0.21 III A
## 38 0.37 III B
## 39 0.25 III C
## 40 0.36 III D
## 41 0.18 III A
## 42 0.38 III B
## 43 0.24 III C
## 44 0.31 III D
## 45 0.23 III A
## 46 0.29 III B
## 47 0.22 III C
## 48 0.33 III D
Before build the model, we must consider the kind of each variable in our data and the statistics summary of our data (rats
), and don’t forget to add the library that we need.
library(rstatix) # use for basic Anova/statistical tests
set.seed(123)
data("rats", package = "faraway")
rats %>%
sample_n_by(poison, treat, size = 1)
## # A tibble: 12 x 3
## time poison treat
## <dbl> <fct> <fct>
## 1 0.46 I A
## 2 0.88 I B
## 3 0.63 I C
## 4 0.71 I D
## 5 0.4 II A
## 6 0.61 II B
## 7 0.35 II C
## 8 1.02 II D
## 9 0.18 III A
## 10 0.3 III B
## 11 0.22 III C
## 12 0.36 III D
# first get the summary statistics
rats %>%
group_by(poison, treat) %>%
get_summary_stats(time, type = "mean_sd")
## # A tibble: 12 x 6
## poison treat variable n mean sd
## <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 I A time 4 0.412 0.069
## 2 I B time 4 0.88 0.161
## 3 I C time 4 0.568 0.157
## 4 I D time 4 0.61 0.113
## 5 II A time 4 0.32 0.075
## 6 II B time 4 0.815 0.336
## 7 II C time 4 0.375 0.057
## 8 II D time 4 0.667 0.271
## 9 III A time 4 0.21 0.022
## 10 III B time 4 0.335 0.047
## 11 III C time 4 0.235 0.013
## 12 III D time 4 0.325 0.026
We can apply box plot for our visualization data, then we can see if any of our data are outliers or not from the box plot.
# visialization
ggplot(rats, aes(x = treat, y = time, fill = poison)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = "steelblue",
position = position_jitter(0.21)) +
theme_minimal()
## # A tibble: 1 x 5
## poison treat time is.outlier is.extreme
## <fct> <fct> <dbl> <lgl> <lgl>
## 1 I A 0.31 TRUE FALSE
Let’s fit each of the possible models, then investigate their estimates for each of the group means.
rats_int = aov(time ~ treat * poison, data = rats) # interaction model
rats_add = aov(time ~ treat + poison, data = rats) # additive model
rats_treat= aov(time ~ treat, data = rats) # single factor model
rats_poison = aov(time ~ poison, data = rats) # single factor model
rats_null = aov(time ~ 1, data = rats) # null model
summary(rats_int)
## Df Sum Sq Mean Sq F value Pr(>F)
## treat 3 0.9212 0.3071 13.806 3.78e-06 ***
## poison 2 1.0330 0.5165 23.222 3.33e-07 ***
## treat:poison 6 0.2501 0.0417 1.874 0.112
## Residuals 36 0.8007 0.0222
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## treat 3 0.9212 0.3071 12.27 6.7e-06 ***
## poison 2 1.0330 0.5165 20.64 5.7e-07 ***
## Residuals 42 1.0509 0.0250
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## treat 3 0.9212 0.30707 6.484 0.000992 ***
## Residuals 44 2.0839 0.04736
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## poison 2 1.033 0.5165 11.79 7.66e-05 ***
## Residuals 45 1.972 0.0438
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 47 3.005 0.06394
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 treat 3 36 13.806 3.78e-06 * 0.535
## 2 poison 2 36 23.222 3.33e-07 * 0.563
## 3 treat:poison 6 36 1.874 1.12e-01 0.238
There was a statistically significant interaction between treat and poison for time, F(6, 36) = 1.874, p = 1.12e-01
The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.
## # A tibble: 12 x 5
## poison treat variable statistic p
## <fct> <fct> <chr> <dbl> <dbl>
## 1 I A time 0.782 0.0741
## 2 II A time 0.971 0.848
## 3 III A time 0.927 0.577
## 4 I B time 0.947 0.700
## 5 II B time 0.948 0.701
## 6 III B time 0.831 0.171
## 7 I C time 0.895 0.405
## 8 II C time 0.983 0.921
## 9 III C time 0.993 0.972
## 10 I D time 0.899 0.427
## 11 II D time 0.981 0.907
## 12 III D time 0.946 0.689
Points 24,26,30 are detected as outliers, which can severely affect normality and homogeneity of variance. It can be useful to remove outliers to meet the test assumptions. We can do Levene’s test, which is less sensitive to departures from normal distribution. The score were normally distributed (p > 0.05) for each cell, as assessed by Shapiro-Wilk’s test of normality.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 11 4.1323 0.0005833 ***
## 36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 11 36 4.13 0.000583
##
## One-way analysis of means (not assuming equal variances)
##
## data: time and treat * poison
## F = 14.848, num df = 11.000, denom df = 13.801, p-value = 8.107e-06
From the output above we can see that the p-value is less than the significance level of 0.05. This means that there is evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different treatment groups.
library(emmeans)
pwc_2 <- rats %>%
group_by(treat) %>%
emmeans_test(time ~ poison, p.adjust.method = "bonferroni")
pwc_2
## # A tibble: 12 x 10
## treat term .y. group1 group2 df statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A poison time I II 36 0.877 3.86e-1 1.00e+0 ns
## 2 A poison time I III 36 1.92 6.28e-2 1.88e-1 ns
## 3 A poison time II III 36 1.04 3.04e-1 9.12e-1 ns
## 4 B poison time I II 36 0.616 5.42e-1 1.00e+0 ns
## 5 B poison time I III 36 5.17 8.98e-6 2.70e-5 ****
## 6 B poison time II III 36 4.55 5.86e-5 1.76e-4 ***
## 7 C poison time I II 36 1.83 7.62e-2 2.29e-1 ns
## 8 C poison time I III 36 3.15 3.25e-3 9.76e-3 **
## 9 C poison time II III 36 1.33 1.93e-1 5.78e-1 ns
## 10 D poison time I II 36 -0.545 5.89e-1 1.00e+0 ns
## 11 D poison time I III 36 2.70 1.04e-2 3.13e-2 *
## 12 D poison time II III 36 3.25 2.52e-3 7.56e-3 **
All pairwise comparisons were analyzed between the different poison groups organized by treat. There was a significant difference of time between all groups for treat (p < 0.05) except treat A,I,II and B,I,II and D,I,II.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = time ~ treat * poison, data = rats)
##
## $treat
## diff lwr upr p adj
## B-A 0.36250000 0.19852116 0.52647884 0.0000047
## C-A 0.07833333 -0.08564550 0.24231217 0.5772283
## D-A 0.22000000 0.05602116 0.38397884 0.0048556
## C-B -0.28416667 -0.44814550 -0.12018783 0.0002333
## D-B -0.14250000 -0.30647884 0.02147884 0.1077087
## D-C 0.14166667 -0.02231217 0.30564550 0.1107678
##
## $poison
## diff lwr upr p adj
## II-I -0.073125 -0.2020091 0.05575913 0.3583151
## III-I -0.341250 -0.4701341 -0.21236587 0.0000005
## III-II -0.268125 -0.3970091 -0.13924087 0.0000339
##
## $`treat:poison`
## diff lwr upr p adj
## B:I-A:I 0.4675 0.09942113 0.835578867 0.0041387
## C:I-A:I 0.1550 -0.21307887 0.523078867 0.9393871
## D:I-A:I 0.1975 -0.17057887 0.565578867 0.7673377
## A:II-A:I -0.0925 -0.46057887 0.275578867 0.9989739
## B:II-A:I 0.4025 0.03442113 0.770578867 0.0220694
## C:II-A:I -0.0375 -0.40557887 0.330578867 0.9999999
## D:II-A:I 0.2550 -0.11307887 0.623078867 0.4203275
## A:III-A:I -0.2025 -0.57057887 0.165578867 0.7395476
## B:III-A:I -0.0775 -0.44557887 0.290578867 0.9998038
## C:III-A:I -0.1775 -0.54557887 0.190578867 0.8641326
## D:III-A:I -0.0875 -0.45557887 0.280578867 0.9993833
## C:I-B:I -0.3125 -0.68057887 0.055578867 0.1618705
## D:I-B:I -0.2700 -0.63807887 0.098078867 0.3379251
## A:II-B:I -0.5600 -0.92807887 -0.191921133 0.0003212
## B:II-B:I -0.0650 -0.43307887 0.303078867 0.9999650
## C:II-B:I -0.5050 -0.87307887 -0.136921133 0.0014940
## D:II-B:I -0.2125 -0.58057887 0.155578867 0.6808722
## A:III-B:I -0.6700 -1.03807887 -0.301921133 0.0000139
## B:III-B:I -0.5450 -0.91307887 -0.176921133 0.0004903
## C:III-B:I -0.6450 -1.01307887 -0.276921133 0.0000285
## D:III-B:I -0.5550 -0.92307887 -0.186921133 0.0003699
## D:I-C:I 0.0425 -0.32557887 0.410578867 0.9999996
## A:II-C:I -0.2475 -0.61557887 0.120578867 0.4645761
## B:II-C:I 0.2475 -0.12057887 0.615578867 0.4645761
## C:II-C:I -0.1925 -0.56057887 0.175578867 0.7938454
## D:II-C:I 0.1000 -0.26807887 0.468078867 0.9979417
## A:III-C:I -0.3575 -0.72557887 0.010578867 0.0634939
## B:III-C:I -0.2325 -0.60057887 0.135578867 0.5569107
## C:III-C:I -0.3325 -0.70057887 0.035578867 0.1086625
## D:III-C:I -0.2425 -0.61057887 0.125578867 0.4949176
## A:II-D:I -0.2900 -0.65807887 0.078078867 0.2439183
## B:II-D:I 0.2050 -0.16307887 0.573078867 0.7252287
## C:II-D:I -0.2350 -0.60307887 0.133078867 0.5413031
## D:II-D:I 0.0575 -0.31057887 0.425578867 0.9999899
## A:III-D:I -0.4000 -0.76807887 -0.031921133 0.0234641
## B:III-D:I -0.2750 -0.64307887 0.093078867 0.3126107
## C:III-D:I -0.3750 -0.74307887 -0.006921133 0.0426208
## D:III-D:I -0.2850 -0.65307887 0.083078867 0.2655759
## B:II-A:II 0.4950 0.12692113 0.863078867 0.0019661
## C:II-A:II 0.0550 -0.31307887 0.423078867 0.9999936
## D:II-A:II 0.3475 -0.02057887 0.715578867 0.0790981
## A:III-A:II -0.1100 -0.47807887 0.258078867 0.9953340
## B:III-A:II 0.0150 -0.35307887 0.383078867 1.0000000
## C:III-A:II -0.0850 -0.45307887 0.283078867 0.9995291
## D:III-A:II 0.0050 -0.36307887 0.373078867 1.0000000
## C:II-B:II -0.4400 -0.80807887 -0.071921133 0.0085466
## D:II-B:II -0.1475 -0.51557887 0.220578867 0.9563176
## A:III-B:II -0.6050 -0.97307887 -0.236921133 0.0000893
## B:III-B:II -0.4800 -0.84807887 -0.111921133 0.0029569
## C:III-B:II -0.5800 -0.94807887 -0.211921133 0.0001822
## D:III-B:II -0.4900 -0.85807887 -0.121921133 0.0022537
## D:II-C:II 0.2925 -0.07557887 0.660578867 0.2335610
## A:III-C:II -0.1650 -0.53307887 0.203078867 0.9105785
## B:III-C:II -0.0400 -0.40807887 0.328078867 0.9999998
## C:III-C:II -0.1400 -0.50807887 0.228078867 0.9695827
## D:III-C:II -0.0500 -0.41807887 0.318078867 0.9999976
## A:III-D:II -0.4575 -0.82557887 -0.089421133 0.0054007
## B:III-D:II -0.3325 -0.70057887 0.035578867 0.1086625
## C:III-D:II -0.4325 -0.80057887 -0.064421133 0.0103742
## D:III-D:II -0.3425 -0.71057887 0.025578867 0.0880763
## B:III-A:III 0.1250 -0.24307887 0.493078867 0.9868993
## C:III-A:III 0.0250 -0.34307887 0.393078867 1.0000000
## D:III-A:III 0.1150 -0.25307887 0.483078867 0.9932581
## C:III-B:III -0.1000 -0.46807887 0.268078867 0.9979417
## D:III-B:III -0.0100 -0.37807887 0.358078867 1.0000000
## D:III-C:III 0.0900 -0.27807887 0.458078867 0.9992007
In this exercise you will use the headache dataset [in datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine. You have to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
We’ll use the headache
dataset from the datarium
package.
## # A tibble: 72 x 5
## id gender risk treatment pain_score
## <int> <fct> <fct> <fct> <dbl>
## 1 1 male low X 79.3
## 2 2 male low X 76.8
## 3 3 male low X 70.8
## 4 4 male low X 81.2
## 5 5 male low X 75.1
## 6 6 male low X 73.1
## 7 7 male low Y 68.2
## 8 8 male low Y 80.7
## 9 9 male low Y 75.3
## 10 10 male low Y 73.4
## # ... with 62 more rows
Before build the model, we must consider the kind of each variable in our data and the statistics summary of our data (headache
), and don’t forget to add the library that we need.
library(rstatix)
set.seed(123)
data("headache", package = "datarium")
headache %>%
sample_n_by(gender, risk, treatment, size = 1)
## # A tibble: 12 x 5
## id gender risk treatment pain_score
## <int> <fct> <fct> <fct> <dbl>
## 1 21 male high X 92.7
## 2 30 male high Y 80.1
## 3 33 male high Z 81.3
## 4 2 male low X 76.8
## 5 8 male low Y 80.7
## 6 18 male low Z 74.7
## 7 57 female high X 68.4
## 8 65 female high Y 82.1
## 9 70 female high Z 80.4
## 10 42 female low X 78.1
## 11 48 female low Y 64.1
## 12 49 female low Z 69.0
library(tidyr)
library(rstatix)
library(ggplot2)
# first get the summary statistics
headache %>%
group_by(gender, risk, treatment) %>%
get_summary_stats(pain_score, type = "mean_sd")
## # A tibble: 12 x 7
## gender risk treatment variable n mean sd
## <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 male high X pain_score 6 92.7 5.12
## 2 male high Y pain_score 6 82.3 5.00
## 3 male high Z pain_score 6 79.7 4.05
## 4 male low X pain_score 6 76.1 3.86
## 5 male low Y pain_score 6 73.1 4.76
## 6 male low Z pain_score 6 74.5 4.89
## 7 female high X pain_score 6 78.9 5.32
## 8 female high Y pain_score 6 81.2 4.62
## 9 female high Z pain_score 6 81.0 3.98
## 10 female low X pain_score 6 74.2 3.69
## 11 female low Y pain_score 6 68.4 4.08
## 12 female low Z pain_score 6 69.8 2.72
We can apply box plot for our visualization data, then we can see if any of our data are outliers or not from the box plot. We create a box plot of pain_score
by treatment
, color lines by gender
groups and gacet the plot by risk
.
# visialization
library(ggpubr)
head_plot <- ggboxplot(
headache, x = "treatment", y = "pain_score",
color = "gender", palette = "jco", facet.by = "risk"
)
head_plot
## # A tibble: 4 x 7
## gender risk treatment id pain_score is.outlier is.extreme
## <fct> <fct> <fct> <int> <dbl> <lgl> <lgl>
## 1 female high X 57 68.4 TRUE TRUE
## 2 female high Y 62 73.1 TRUE FALSE
## 3 female high Z 67 75.0 TRUE FALSE
## 4 female high Z 71 87.1 TRUE FALSE
In the QQ plot below, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.39), so we can assume normality.
# normality assumption
model <- lm(pain_score ~ gender*risk*treatment, data = headache)
# Create a QQ plot of residuals
ggqqplot(residuals(model))
## # A tibble: 1 x 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.982 0.398
The pain scores were normally distributed (p > 0.05) except for one group (female at high risk of migraine taking medicine X, p = 0.0086), as assessed by Shapiro-Wilk’s test of normality.
# normality assumption by groups
headache %>%
group_by(gender, risk, treatment) %>%
shapiro_test(pain_score)
## # A tibble: 12 x 6
## gender risk treatment variable statistic p
## <fct> <fct> <fct> <chr> <dbl> <dbl>
## 1 male high X pain_score 0.958 0.808
## 2 male high Y pain_score 0.902 0.384
## 3 male high Z pain_score 0.955 0.784
## 4 male low X pain_score 0.982 0.962
## 5 male low Y pain_score 0.920 0.507
## 6 male low Z pain_score 0.924 0.535
## 7 female high X pain_score 0.714 0.00869
## 8 female high Y pain_score 0.939 0.654
## 9 female high Z pain_score 0.971 0.901
## 10 female low X pain_score 0.933 0.600
## 11 female low Y pain_score 0.927 0.555
## 12 female low Z pain_score 0.958 0.801
From the QQ plot below, we know that all the points fall approximately along the reference line, except for one group (female at high risk of migraine taking medicine X), where we already identified an extreme outlier.
# create qq plot for each
ggqqplot(headache, "pain_score", ggtheme = theme_bw()) +
facet_grid(gender + risk ~ treatment, labeller = "label_both")
The output below, show the Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 11 60 0.179 0.998
Then we compute the anova table, and there is a statistically significant three-way interaction between gender, risk and treatment, F(2, 60) = 7.41, p = 0.001.
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 gender 1 60 16.196 1.63e-04 * 0.213
## 2 risk 1 60 92.699 8.80e-14 * 0.607
## 3 treatment 2 60 7.318 1.00e-03 * 0.196
## 4 gender:risk 1 60 0.141 7.08e-01 0.002
## 5 gender:treatment 2 60 3.338 4.20e-02 * 0.100
## 6 risk:treatment 2 60 0.713 4.94e-01 0.023
## 7 gender:risk:treatment 2 60 7.406 1.00e-03 * 0.198
A statistically significant simple simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. This can be easily done using the function emmeans_test()
described in the previous section.
Compare the different treatments by gender and risk variables:
library(emmeans)
pwc_3 <- headache %>%
group_by(gender, risk) %>%
emmeans_test(pain_score ~ treatment, p.adjust.method = "bonferroni") %>%
select(-df,-statistic,-p) # remove details
# show comparison results for male at high risk
pwc_3 %>%
filter(gender == "male", risk == "high")
## # A tibble: 3 x 8
## gender risk term .y. group1 group2 p.adj p.adj.signif
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 male high treatment pain_score X Y 0.000386 ***
## 2 male high treatment pain_score X Z 0.00000942 ****
## 3 male high treatment pain_score Y Z 0.897 ns
# Estimated marginal means (i.e. adjusted means)
# with 95% confidence interval
get_emmeans(pwc_3) %>% filter(gender == "male", risk == "high")
## # A tibble: 3 x 9
## gender risk treatment emmean se df conf.low conf.high method
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 male high X 92.7 1.80 60 89.1 96.3 Emmeans test
## 2 male high Y 82.3 1.80 60 78.7 85.9 Emmeans test
## 3 male high Z 79.7 1.80 60 76.1 83.3 Emmeans test
In the the pairwise comparisons table above, we are interested only in the simple simple comparisons for males at a high risk of a migraine headache. In our example, there are three possible combinations of group differences.
For male at high risk, there was a statistically significant mean difference between treatment X and treatment Y of 10.4 (p.adj < 0.001), and between treatment X and treatment Z of 13.1 (p.adj < 0.0001).
However, the difference between treatment Y and treatment Z (2.66) was not statistically significant, p.adj = 0.897.