In this exercise you have to consider data set named PlantGrowth from R environment. It contains the weight of plants obtained under a control and two different treatment conditions. Please consider the following steps:
Here we use the built-in R dataset named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.
library(tidyverse) # for data manipulation and visualization
library(tidyr) # to create tidy data
library(rstatix) # use for basic Anova/statistical tests
library(ggplot2) # to visualizing data
library(ggpubr) # for creating easily publication ready plots
library(emmeans) # to Estimated Marginal Means
library(faraway) # to get rats data## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
Show the levels of the grouping variable:
## [1] "ctrl" "trt1" "trt2"
Where, ctrl = Control, trt1 = Treatment 1 ,trt2 = Treatment 2
Compute some summary statistics (count, mean and sd) of the variable weight organized by groups:
## # A tibble: 3 x 5
## group variable n mean sd
## <fct> <chr> <dbl> <dbl> <dbl>
## 1 ctrl weight 10 5.03 0.583
## 2 trt1 weight 10 4.66 0.794
## 3 trt2 weight 10 5.53 0.443
Create a box plot of weight by group:
ggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
geom_boxplot() +
geom_jitter(shape = 15,
color = "steelblue",
position = position_jitter(0.21)) +
theme_minimal()Outliers can be easily identified using box plot methods, implemented in the R function identify_outliers() [rstatix package].
## # A tibble: 2 x 4
## group weight is.outlier is.extreme
## <fct> <dbl> <lgl> <lgl>
## 1 trt1 5.87 TRUE FALSE
## 2 trt1 6.03 TRUE FALSE
There were no extreme outliers.
The aov() function is used to obtain the relevant sums of squares. Using the summary() function on the output from aov() creates the desired ANOVA table. (Without the unneeded row for total).
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 group 2 27 4.846 0.016 * 0.264
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 3.766 1.8832 4.846 0.0159 *
## Residuals 27 10.492 0.3886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
in the table above, the column ges corresponds to the generalized eta squared (effect size). It measures the proportion of the variability in the outcome variable (here plant weight) that can be explained in terms of the predictor (here, treatment group). An effect size of 0.264 (26.4%) means that 26.4% of the change in the weight can be accounted for the treatment conditions. From the above ANOVA table, it can be seen that there are significant differences between groups (p=0.016), which are highlighted with “*“, \(F(2,27)=4.85\), \(p=0.16\), \(eta2[g]=0.26\).
where, F indicates that we are comparing to an F-distribution (F-test); (2, 27) indicates the degrees of freedom in the numerator (DFn) and the denominator (DFd), respectively; 4.85 indicates the obtained F-statistic value p specifies the p-value *ges is the generalized effect size (amount of variability due to the factor)
The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.
The normality assumption can be checked by using one of the following two approaches:
Analyzing the ANOVA model residuals to check the normality for all groups together. This approach is easier and it’s very handy when you have many groups or if there are few data points per group. Check normality for each group separately. This approach might be used when you have only a few groups and many data points per group.
This section will show how to proceed for both option 1 and 2.
Check normality assumption by analyzing the model residuals. QQ plot and Shapiro-Wilk test of normality are used. QQ plot draws the correlation between a given data and the normal distribution.
In the plot above, there is no evident to show relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the homogenity of variances.
Check normality assumption by groups. Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.
## # A tibble: 3 x 4
## group variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 ctrl weight 0.957 0.747
## 2 trt1 weight 0.930 0.452
## 3 trt2 weight 0.941 0.564
The score were normally distributed (p>0.05
) for each group, as assessed by Shapiro-Wilk’s test of normality. Note : if sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
QQ plot draws the correlation between a given data and the normal distribution. Create QQ plots for each group level:
Points 4, 15, 17 are detected as outliers since it is far from the linearity, which can severely affect normality and homogeneity of variance. It can be useful to remove outliers to meet the test assumptions. I recommend Levene’s test, which is less sensitive to departures from normal distribution. From the Shapiro-Wilk test on the ANOVA residuals (W=0.97, p=0.8) which finds no indication that normality is violated.
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 2 27 1.12 0.341
##
## One-way analysis of means (not assuming equal variances)
##
## data: weight and group
## F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739
From the output above, we can see that the p-value is >0.05, which is not significant. This means that, there is not significant difference between variances across groups. Therefore, we can assume the homogenity of variances in the different treatment groups.
A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. Key R function: tukey_hsd() [rstatix].
## # A tibble: 3 x 9
## term group1 group2 null.value estimate conf.low conf.high p.adj p.adj.signif
## * <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 group ctrl trt1 0 -0.371 -1.06 0.320 0.391 ns
## 2 group ctrl trt2 0 0.494 -0.197 1.19 0.198 ns
## 3 group trt1 trt2 0 0.865 0.174 1.56 0.012 *
The output contains the following columns:
estimate: estimate of the difference between means of the two groups conf.low, conf.high: the lower and the upper end point of the confidence interval at 95% (default) p.adj: p-value after adjustment for the multiple comparisons. It can be seen from the output, that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).
Tukey’s Honest Significance difference can be applied directly to an object which was created using aov(). It will adjust the p-values of the pairwise comparisons of the means to control the FWER, in this case, for 0.05. Notice it also gives confidence intervals for the difference of the means.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ group, data = PlantGrowth)
##
## $group
## diff lwr upr p adj
## trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
## trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
## trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
Based on these results, we see the control is causing treatment to change either control towards treatment 1 or control towards treatment 2. But not between treatments.
We could report the results of one-way ANOVA as follow:
A one-way ANOVA was performed to evaluate if the plant growth was different for the 3 different treatment groups: ctr, trt1 and trt2 all have n=10.
Data is presented as mean +/- standard deviation. Plant growth was statistically significantly different between different treatment groups, F(2, 27) = 4.85, p = 0.016, generalized eta squared = 0.26.
Plant growth decreased in trt1 group (4.66 +/- 0.79) compared to ctr group (5.03 +/- 0.58). It increased in trt2 group (5.53 +/- 0.44) compared to trt1 and ctr group.
Tukey post-hoc analyses revealed that the increase from trt1 to trt2 (0.87, 95% CI (0.17 to 1.56)) was statistically significant (p = 0.012), but no other group differences were statistically significant.
Please apply Two-way ANOVA procedures to the rats data from the faraway package. There are two factors here: poison and treat. You can use the levels() function to extract the levels of a factor variable. Give your opinions accordingly!
## time poison treat
## Min. :0.1800 I :16 A:12
## 1st Qu.:0.3000 II :16 B:12
## Median :0.4000 III:16 C:12
## Mean :0.4794 D:12
## 3rd Qu.:0.6225
## Max. :1.2400
## [1] "I" "II" "III"
## [1] "A" "B" "C" "D"
Here, 48 rats were randomly assigned both one of three poisons and one of four possible treatments. The experimenters then measures their survival time in tens of hours. A total of 12 groups, each with 4 replicates.
Before running any tests, we should first look at the data. We will create interaction plots, which will help us visualize the effect of one factor, as we move through the levels of another factor.
par(mfrow = c(1, 2))
with(rats, interaction.plot(poison, treat, time, lwd = 2, col = 1:4))
with(rats, interaction.plot(treat, poison, time, lwd = 2, col = 1:3)) If there is not interaction, thus an additive model, we would expect to see parallel lines. That means when the factor level is changed, there can be an effect on the response. However, the difference between the levels of the other factor should still be the same.
The obvious indication of interaction would be lines that cross while heading in different directions. We can’t really see it but the lines aren’t strictly parallel, and there is some overlap on the right panel. However, is this interaction effect significant?
Let’s fit each of the possible models, then investigate their estimates for each of the group means.
rats_int = aov(time ~ poison * treat, data = rats) # interaction model
rats_add = aov(time ~ poison + treat, data = rats) # additive model
rats_pois = aov(time ~ poison , data = rats) # single factor model
rats_treat = aov(time ~ treat, data = rats) # single factor model
rats_null = aov(time ~ 1, data = rats) # null modelTo get the estimates, we’ll create a table which we will predict on.
## poison treat
## 1 I A
## 2 II A
## 3 III A
## 4 I B
## 5 II B
## 6 III B
## 7 I C
## 8 II C
## 9 III C
## 10 I D
## 11 II D
## 12 III D
## [,1] [,2] [,3]
## [1,] "I-A" "II-A" "III-A"
## [2,] "I-B" "II-B" "III-B"
## [3,] "I-C" "II-C" "III-C"
## [4,] "I-D" "II-D" "III-D"
Since repeating ourselves a number of times, so write a function to perform the prediction. Some housekeeping is done to keep the estimates in order and provide row and column names. Above, it is shown where each of the estimates will be placed in the resulting matrix.
get_est_means = function(model, table) {
mat = matrix(predict(model, table), nrow = 4, ncol = 3, byrow = TRUE)
colnames(mat) = c("I", "II", "III")
rownames(mat) = c("A", "B", "C", "D")
mat
}First, we obtain the estimates from the interaction model. Note that each cell has a different value.
| I | II | III | |
|---|---|---|---|
| A | 0.4125 | 0.3200 | 0.210 |
| B | 0.8800 | 0.8150 | 0.335 |
| C | 0.5675 | 0.3750 | 0.235 |
| D | 0.6100 | 0.6675 | 0.325 |
Next, we obtain the estimates from the additive model. Again, each cell has a different value. We also see that these estimates are somewhat close to those from the interaction model.
| I | II | III | |
|---|---|---|---|
| A | 0.4522917 | 0.3791667 | 0.1110417 |
| B | 0.8147917 | 0.7416667 | 0.4735417 |
| C | 0.5306250 | 0.4575000 | 0.1893750 |
| D | 0.6722917 | 0.5991667 | 0.3310417 |
To understand the difference, let’s consider the effect of the treatments.
additive_means = get_est_means(model = rats_add, table = rats_table)
additive_means["A",] - additive_means["B",]## I II III
## -0.3625 -0.3625 -0.3625
interaction_means = get_est_means(model = rats_int, table = rats_table)
interaction_means["A",] - interaction_means["B",]## I II III
## -0.4675 -0.4950 -0.1250
This is the key difference between the interaction and additive models. The difference between the effect of treatments A and B is the same for each poison in the additive model. They are different in the interaction model.
The remaining three models are much simpler, having either only row or only column effects. Or no effects in the case of the null model.
| I | II | III | |
|---|---|---|---|
| A | 0.6175 | 0.544375 | 0.27625 |
| B | 0.6175 | 0.544375 | 0.27625 |
| C | 0.6175 | 0.544375 | 0.27625 |
| D | 0.6175 | 0.544375 | 0.27625 |
| I | II | III | |
|---|---|---|---|
| A | 0.3141667 | 0.3141667 | 0.3141667 |
| B | 0.6766667 | 0.6766667 | 0.6766667 |
| C | 0.3925000 | 0.3925000 | 0.3925000 |
| D | 0.5341667 | 0.5341667 | 0.5341667 |
| I | II | III | |
|---|---|---|---|
| A | 0.479375 | 0.479375 | 0.479375 |
| B | 0.479375 | 0.479375 | 0.479375 |
| C | 0.479375 | 0.479375 | 0.479375 |
| D | 0.479375 | 0.479375 | 0.479375 |
## Df Sum Sq Mean Sq F value Pr(>F)
## poison 2 1.0330 0.5165 23.222 3.33e-07 ***
## treat 3 0.9212 0.3071 13.806 3.78e-06 ***
## poison:treat 6 0.2501 0.0417 1.874 0.112
## Residuals 36 0.8007 0.0222
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using a significance level of \(α=0.05\) , we see that the interaction is not significant. Within the additive model, both factors are significant, so we select the additive model. Within the additive model, we could do further testing about the main effects.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = time ~ poison + treat, data = rats)
##
## $poison
## diff lwr upr p adj
## II-I -0.073125 -0.2089936 0.0627436 0.3989657
## III-I -0.341250 -0.4771186 -0.2053814 0.0000008
## III-II -0.268125 -0.4039936 -0.1322564 0.0000606
##
## $treat
## diff lwr upr p adj
## B-A 0.36250000 0.18976135 0.53523865 0.0000083
## C-A 0.07833333 -0.09440532 0.25107198 0.6221729
## D-A 0.22000000 0.04726135 0.39273865 0.0076661
## C-B -0.28416667 -0.45690532 -0.11142802 0.0004090
## D-B -0.14250000 -0.31523865 0.03023865 0.1380432
## D-C 0.14166667 -0.03107198 0.31440532 0.1416151
For an example with interaction, we investigate the warpbreaks dataset, a default dataset in R.
par(mfrow = c(1, 2))
with(warpbreaks, interaction.plot(wool, tension, breaks, lwd = 2, col = 2:4))
with(warpbreaks, interaction.plot(tension, wool, breaks, lwd = 2, col = 2:3)) Either plot makes it rather clear that the wool and tensions factors interact.
## Df Sum Sq Mean Sq F value Pr(>F)
## wool 1 451 450.7 3.765 0.058213 .
## tension 2 2034 1017.1 8.498 0.000693 ***
## wool:tension 2 1003 501.4 4.189 0.021044 *
## Residuals 48 5745 119.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using an \(α\) of \(0.05\) the ANOVA test finds that the interaction is significant, so we use the interaction model here.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine. We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Load the data and inspect one random row by group combinations:
In this example, the effect of the treatment types is our focal variable, that is our primary concern. It is thought that the effect of treatments will depend on two other factors, “gender” and “risk” level of migraine, which are called moderator variables.
Compute the mean and the standard deviation (SD) of pain_score by groups:
## # A tibble: 12 x 7
## gender risk treatment variable n mean sd
## <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 male high X pain_score 6 92.7 5.12
## 2 male high Y pain_score 6 82.3 5.00
## 3 male high Z pain_score 6 79.7 4.05
## 4 male low X pain_score 6 76.1 3.86
## 5 male low Y pain_score 6 73.1 4.76
## 6 male low Z pain_score 6 74.5 4.89
## 7 female high X pain_score 6 78.9 5.32
## 8 female high Y pain_score 6 81.2 4.62
## 9 female high Z pain_score 6 81.0 3.98
## 10 female low X pain_score 6 74.2 3.69
## 11 female low Y pain_score 6 68.4 4.08
## 12 female low Z pain_score 6 69.8 2.72
Create a box plot of pain_score by treatment, color lines by risk groups and facet the plot by gender:
bxp <- ggboxplot(
headache, x = "treatment", y = "pain_score",
color = "risk", palette = "jco", facet.by = "gender"
)
bxpIdentify outliers by groups:
## # A tibble: 4 x 7
## gender risk treatment id pain_score is.outlier is.extreme
## <fct> <fct> <fct> <int> <dbl> <lgl> <lgl>
## 1 female high X 57 68.4 TRUE TRUE
## 2 female high Y 62 73.1 TRUE FALSE
## 3 female high Z 67 75.0 TRUE FALSE
## 4 female high Z 71 87.1 TRUE FALSE
It can be seen that, the data contain one extreme outlier (id = 57, female at high risk of migraine taking drug X)
Check normality assumption by analyzing the model residuals. QQ plot and Shapiro-Wilk test of normality are used.
model <- lm(pain_score ~ gender*risk*treatment, data = headache)
# Create a QQ plot of residuals
ggqqplot(residuals(model))## # A tibble: 1 x 3
## variable statistic p.value
## <chr> <dbl> <dbl>
## 1 residuals(model) 0.982 0.398
In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.4), so we can assume normality. Check normality assumption by groups. Computing Shapiro-Wilk test for each combinations of factor levels.
## # A tibble: 12 x 6
## gender risk treatment variable statistic p
## <fct> <fct> <fct> <chr> <dbl> <dbl>
## 1 male high X pain_score 0.958 0.808
## 2 male high Y pain_score 0.902 0.384
## 3 male high Z pain_score 0.955 0.784
## 4 male low X pain_score 0.982 0.962
## 5 male low Y pain_score 0.920 0.507
## 6 male low Z pain_score 0.924 0.535
## 7 female high X pain_score 0.714 0.00869
## 8 female high Y pain_score 0.939 0.654
## 9 female high Z pain_score 0.971 0.901
## 10 female low X pain_score 0.933 0.600
## 11 female low Y pain_score 0.927 0.555
## 12 female low Z pain_score 0.958 0.801
The pain scores were normally distributed (p > 0.05) except for one group (female at high risk of migraine taking drug X, p = 0.0086), as assessed by Shapiro-Wilk’s test of normality.
Create QQ plot for each cell of design:
ggqqplot(headache, "pain_score", ggtheme = theme_bw()) +
facet_grid(gender + risk ~ treatment, labeller = "label_both")approximately along the reference line, except for one group (female at high risk of migraine taking drug X), where we already identified an extreme outlier.
This can be checked using the Levene’s test:
## # A tibble: 1 x 4
## df1 df2 statistic p
## <int> <int> <dbl> <dbl>
## 1 11 60 0.179 0.998
The Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 gender 1 60 16.196 1.63e-04 * 0.213
## 2 risk 1 60 92.699 8.80e-14 * 0.607
## 3 treatment 2 60 7.318 1.00e-03 * 0.196
## 4 gender:risk 1 60 0.141 7.08e-01 0.002
## 5 gender:treatment 2 60 3.338 4.20e-02 * 0.100
## 6 risk:treatment 2 60 0.713 4.94e-01 0.023
## 7 gender:risk:treatment 2 60 7.406 1.00e-03 * 0.198
There was a statistically significant three-way interaction between gender, risk and treatment, \(F(2, 60) = 7.41\), \(p = 0.001\).
If there is a significant three-way interaction effect, you can decompose it into:
Simple two-way interaction: run two-way interaction at each level of third variable, Simple simple main effect: run one-way model at each level of second variable, and *Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
In this section we’ll describe the procedure for a significant three-way interaction.
we are free to decide which two variables will form the simple two-way interactions and which variable will act as the third (moderator) variable. In our example, we want to evaluate the effect of risk*treatment interaction on pain_score at each level of gender.
In the R code below, i’ll group the data by gender and fit the treatment*risk two-way interaction. The argument error is used to specify the three-way ANOVA model from which the pooled error sum of squares and degrees of freedom are to be calculated.
# Group the data by gender and
# fit simple two-way interaction
model <- lm(pain_score ~ gender*risk*treatment, data = headache)
headache %>%
group_by(gender) %>%
anova_test(pain_score ~ risk*treatment, error = model)## # A tibble: 6 x 8
## gender Effect DFn DFd F p `p<.05` ges
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male risk 1 60 50.0 0.00000000187 "*" 0.455
## 2 male treatment 2 60 10.2 0.000157 "*" 0.253
## 3 male risk:treatment 2 60 5.25 0.008 "*" 0.149
## 4 female risk 1 60 42.8 0.0000000150 "*" 0.416
## 5 female treatment 2 60 0.482 0.62 "" 0.016
## 6 female risk:treatment 2 60 2.87 0.065 "" 0.087
There was a statistically significant simple two-way interaction between risk and treatment (risk:treatment) for males, F(2, 60) = 5.25, p = 0.008, but not for females, F(2, 60) = 2.87, p = 0.065.
For males, this result suggests that the effect of treatment on “pain_score” depends on one’s “risk” of migraine. In other words, the risk moderates the effect of the type of treatment on pain_score.
A statistically significant simple two-way interaction can be followed up with simple simple main effects. In our example, you could therefore investigate the effect of treatment on pain_score at every level of risk or investigate the effect of risk at every level of treatment. Group the data by gender and risk and analyze the simple simple main effects of treatment on pain_score:
# Group the data by gender and risk, and fit anova
treatment.effect <- headache %>%
group_by(gender, risk) %>%
anova_test(pain_score ~ treatment, error = model)## # A tibble: 2 x 9
## gender risk Effect DFn DFd F p `p<.05` ges
## <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 male high treatment 2 60 14.8 0.0000061 "*" 0.33
## 2 male low treatment 2 60 0.66 0.521 "" 0.022
There was a statistically significant simple simple main effect of treatment for males at high risk of migraine, \(F(2, 60) = 14.8\), \(p < 0.0001)\), but not for males at low risk of migraine, \(F(2, 60) = 0.66\), \(p = 0.521\).
This analysis indicates that, the type of treatment taken has a statistically significant effect on pain_score in males who are at high risk.
In other words, the mean pain_score in the treatment X, Y and Z groups was statistically significantly different for males who at high risk, but not for males at low risk.
A statistically significant simple simple main effect can be followed up by multiple pairwise comparisons to determine which group means are different. This can be easily done using the function emmeans_test() [rstatix package] described in the previous section.
Compare the different treatments by gender and risk variables:
# Pairwise comparisons
pwc <- headache %>%
group_by(gender, risk) %>%
emmeans_test(pain_score ~ treatment, p.adjust.method = "bonferroni") %>%
select(-df, -statistic, -p) # Remove details
# Show comparison results for male at high risk
pwc %>% filter(gender == "male", risk == "high")## # A tibble: 3 x 8
## gender risk term .y. group1 group2 p.adj p.adj.signif
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 male high treatment pain_score X Y 0.000386 ***
## 2 male high treatment pain_score X Z 0.00000942 ****
## 3 male high treatment pain_score Y Z 0.897 ns
# Estimated marginal means (i.e. adjusted means)
# with 95% confidence interval
get_emmeans(pwc) %>% filter(gender == "male", risk == "high")## # A tibble: 3 x 9
## gender risk treatment emmean se df conf.low conf.high method
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 male high X 92.7 1.80 60 89.1 96.3 Emmeans test
## 2 male high Y 82.3 1.80 60 78.7 85.9 Emmeans test
## 3 male high Z 79.7 1.80 60 76.1 83.3 Emmeans test
In the pairwise comparisons table above, we are interested only in the simple simple comparisons for males at a high risk of a migraine headache. In our example, there are three possible combinations of group differences.
For male at high risk, there was a statistically significant mean difference between treatment X and treatment Y of \(10.4\) \((p.adj < 0.001)\), and between treatment X and treatment Z of \(13.1\) \((p.adj < 0.0001)\).
However, the difference between treatment Y and treatment Z \((2.66)\) was not statistically significant, \(p.adj = 0.897\).
A three-way ANOVA was conducted to determine the effects of gender, risk and treatment on migraine headache episode pain_score.
Residual analysis was performed to test for the assumptions of the three-way ANOVA. Normality was assessed using Shapiro-Wilk’s normality test and homogeneity of variances was assessed by Levene’s test.
Residuals were normally distributed (p > 0.05) and there was homogeneity of variances (p > 0.05).
There was a statistically significant three-way interaction between gender, risk and treatment, F(2, 60) = 7.41, p = 0.001.
Statistical significance was accepted at the p < 0.025 level for simple two-way interactions and simple simple main effects. There was a statistically significant simple two-way interaction between risk and treatment for males, F(2, 60) = 5.2, p = 0.008, but not for females, F(2, 60) = 2.8, p = 0.065.
There was a statistically significant simple simple main effect of treatment for males at high risk of migraine, F(2, 60) = 14.8, p < 0.0001), but not for males at low risk of migraine, F(2, 60) = 0.66, p = 0.521.
All simple simple pairwise comparisons, between the different treatment groups, were run for males at high risk of migraine with a Bonferroni adjustment applied.
There was a statistically significant mean difference between treatment X and treatment Y. However, the difference between treatment Y and treatment Z, was not statistically significant.
# Visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "treatment")
pwc.filtered <- pwc %>% filter(gender == "male", risk == "high")
bxp +
stat_pvalue_manual(
pwc.filtered, color = "risk", linetype = "risk", hide.ns = TRUE,
tip.length = 0, step.increase = 0.1, step.group.by = "gender"
) +
labs(
subtitle = get_test_label(res.aov, detailed = TRUE),
caption = get_pwc_label(pwc)
)