3-Way ANOVA
The three-way ANOVA is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.
We’ll use the headache dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.
We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.
Descriptive statistics
Plots
First, we want to review the data to view possible patterns. We have three independent variables and one response = pain score value.
x <- sample(1:nrow(headache), 10)
x <- sort(x)
kbl(headache[x, ], caption = "Random 10 rows from headache dataset", col.names = c("ID", "Gender", "Risk", "Treatment", "Pain score")) %>%
kable_minimal() %>%
column_spec(5, background = spec_color(headache$pain_score[1:10], end = 0.8))
| ID | Gender | Risk | Treatment | Pain score |
|---|---|---|---|---|
| 6 | male | low | X | 73.1 |
| 10 | male | low | Y | 73.4 |
| 19 | male | high | X | 94.1 |
| 21 | male | high | X | 92.7 |
| 27 | male | high | Y | 84.5 |
| 30 | male | high | Y | 80.1 |
| 32 | male | high | Z | 82.2 |
| 36 | male | high | Z | 79.4 |
| 59 | female | high | X | 81.3 |
| 66 | female | high | Y | 86.6 |
- id => individual identifier of the participant, quantitative variable.
- gender => qualitative nominal variable ‘male’ or ‘female’
- risk => qualitative nominal variable ‘low’ or ‘high’ (describes how likely a person is to have migraine)
- treatment => qualitative nominal variable ‘X’, ‘Y’, ‘Z’ describes three drugs given to the participants
- pain_score => quantitative variable describing the pain score given by the participant.
Now we want to see which of three independent variables affect pain score mostly.
plot.design(pain_score ~ gender * risk * treatment, data = headache, col = "brown", ylab = "Mean of pain score", main = "Impact of factors on pain score")
Using plot.desing function we can observe that the biggest influence on pain_score value has risk level = whether the person is likely or less likely to have migraine. However, the impact of the treatment is much smaller - it is actually similar to the gender factor. We will look into that later with ANOVA analysis.
Another way to look into these three factors is by looking at the connection of them - for example whether gender and risk level together impact the pain score given by the participants.
interaction.plot(headache$gender, headache$risk, headache$pain_score, col = c("blue", "purple"),
lty = 1, lwd = 2, ylim = c(70, 85), trace.label = "Risk",
xlab = "Gender", ylab = "Mean of pain score")
interaction.plot(headache$gender, headache$treatment, headache$pain_score, col = c("blue", "pink", "purple"),
lty = 1, lwd = 2, ylim = c(70, 85), trace.label = "Treatment",
xlab = "Gender", ylab = "Mean of pain score")
interaction.plot(headache$risk, headache$treatment, headache$pain_score, col = c("blue", "pink", "purple"),
lty = 1, lwd = 2, ylim = c(70, 85), trace.label = "Treatment",
xlab = "Risk level", ylab = "Mean of pain score")
Here in the three plots we can observe that:
1. Gender and risk level of having migraine. are definitely correlated. The mean pain score for men was higher that for women in both cases, when participants had high and also low risk of having migraine.
2. Gender and treatment performed showed significant difference in case of only one treatment - drug X when value of mean pain score is much higher for males than foe females. When using drug Z and Y, the difference between males and females pains scores is not as big.
3. Risk level of having migraine and treatment performed also showed some collocation. All three drugs showed smaller pain score for people not as likely to have migraine.
By connecting all data on one boxplot we can observe all at one place. Hence, the distinction between low and high risk level definitely affects pain score. The difference between gender is not as visible - however, the use of drug X with high risk shows a significant difference between genders (as we saw previously on interaction plots).
ggplot(aes(x = pain_score, y = risk, color = gender), data = headache) + geom_boxplot() + facet_grid(~treatment) + xlab("Pain score") + ylab("Risk level") + ggtitle("Pain score distribution grouped by gender, risk and treatment")
Here we will add new variable that sums up the independent variables in following pattern:
male + low risk + treatment X -> mlX
Then we are able to create a boxplot of all distributions separately.
h1 <- headache %>% mutate('Total' = ifelse(gender == 'male' & risk == 'low' & treatment == 'X', 'mlX',
ifelse(gender == 'male' & risk == 'low' & treatment == 'Y', 'mlY',
ifelse(gender == 'male' & risk == 'low' & treatment == 'Z', 'mlZ',
ifelse(gender == 'male' & risk == 'high' & treatment == 'X', 'mhX',
ifelse(gender == 'male' & risk == 'high' & treatment == 'Y', 'mhY',
ifelse(gender == 'male' & risk == 'high' & treatment == 'Z', 'mhZ',
ifelse(gender == 'female' & risk == 'low' & treatment == 'X', 'flX',
ifelse(gender == 'female' & risk == 'low' & treatment == 'Y', 'flY',
ifelse(gender == 'female' & risk == 'low' & treatment == 'Z', 'flZ',
ifelse(gender == 'female' & risk == 'high' & treatment == 'X', 'fhX',
ifelse(gender == 'female' & risk == 'high' & treatment == 'Y', 'fhY',
ifelse(gender == 'female' & risk == 'high' & treatment == 'Z', 'fhZ', 0)))))))))))))
ggplot(h1, aes(x = pain_score, y = Total, fill = Total)) + geom_boxplot() + scale_fill_brewer(name = "", palette="Set3") + xlab("Pain score") + ylab("") + ggtitle("Pain score distribution grouped by gender, risk and treatment") + theme(legend.position = 'none')
Analysis
x <- summary(headache[c('gender', 'risk', 'treatment', 'pain_score')])
kbl(x)
| gender | risk | treatment | pain_score | |
|---|---|---|---|---|
| male :36 | high:36 | X:24 | Min. : 63.7 | |
| female:36 | low :36 | Y:24 | 1st Qu.: 72.9 | |
| NA | NA | Z:24 | Median : 77.9 | |
| NA | NA | NA | Mean : 77.6 | |
| NA | NA | NA | 3rd Qu.: 81.5 | |
| NA | NA | NA | Max. :100.0 |
By looking at the summary of the data we can view how many observations fall into each category. The data are equally distributed, meaning there are the same numbers of observations grouped by gender, risk and treatment in each 2 (or 3 in treatment) subgroups.
The mean, median, percentiles, minimum and maximum values of the pain score are also given.
Now too look at the descriptive statistics individually grouped into categories:
kbl(headache %>%
group_by(gender, risk, treatment) %>%
get_summary_stats(pain_score, type = "full"))
| gender | risk | treatment | variable | n | min | max | median | q1 | q3 | iqr | mad | mean | sd | se | ci |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| male | high | X | pain_score | 6 | 86.3 | 100.0 | 93.4 | 88.9 | 95.3 | 6.40 | 5.98 | 92.7 | 5.12 | 2.09 | 5.37 |
| male | high | Y | pain_score | 6 | 77.5 | 91.2 | 81.2 | 79.0 | 83.9 | 4.95 | 4.36 | 82.3 | 5.00 | 2.04 | 5.25 |
| male | high | Z | pain_score | 6 | 74.4 | 85.1 | 80.4 | 76.6 | 82.0 | 5.38 | 4.83 | 79.7 | 4.05 | 1.65 | 4.25 |
| male | low | X | pain_score | 6 | 70.8 | 81.2 | 75.9 | 73.6 | 78.7 | 5.10 | 4.60 | 76.1 | 3.85 | 1.57 | 4.04 |
| male | low | Y | pain_score | 6 | 67.9 | 80.7 | 73.4 | 69.5 | 74.9 | 5.40 | 5.30 | 73.1 | 4.76 | 1.95 | 5.00 |
| male | low | Z | pain_score | 6 | 68.3 | 80.4 | 74.8 | 70.7 | 77.9 | 7.24 | 7.06 | 74.5 | 4.89 | 2.00 | 5.13 |
| female | high | X | pain_score | 6 | 68.4 | 82.7 | 81.1 | 79.1 | 81.4 | 2.29 | 1.49 | 78.9 | 5.32 | 2.17 | 5.58 |
| female | high | Y | pain_score | 6 | 73.1 | 86.6 | 81.8 | 80.0 | 83.7 | 3.65 | 3.48 | 81.2 | 4.62 | 1.89 | 4.85 |
| female | high | Z | pain_score | 6 | 75.0 | 87.1 | 80.8 | 79.8 | 82.4 | 2.59 | 2.35 | 81.0 | 3.98 | 1.63 | 4.18 |
| female | low | X | pain_score | 6 | 68.6 | 78.1 | 74.6 | 72.0 | 77.1 | 5.05 | 4.82 | 74.2 | 3.69 | 1.51 | 3.87 |
| female | low | Y | pain_score | 6 | 63.7 | 74.7 | 68.7 | 65.2 | 69.8 | 4.56 | 4.40 | 68.4 | 4.08 | 1.67 | 4.28 |
| female | low | Z | pain_score | 6 | 65.4 | 73.1 | 69.6 | 68.9 | 71.6 | 2.78 | 2.46 | 69.8 | 2.72 | 1.11 | 2.85 |
Assumptions
To be able to calculate 3 way ANOVA with interactions we must state some assumptions at the beginning. ANOVA test can be performed if:
- samples are independent
- samples come from populations that are normally distributed - normality check
- homogeneity of variances in groups - check of homoscedasticity
Outliers
We will try by checking if there are any significant outliers it the data.
outliers_data <- headache %>%
group_by(gender, risk, treatment) %>%
identify_outliers(pain_score)
kbl(outliers_data)
| gender | risk | treatment | id | pain_score | is.outlier | is.extreme |
|---|---|---|---|---|---|---|
| female | high | X | 57 | 68.4 | TRUE | TRUE |
| female | high | Y | 62 | 73.1 | TRUE | FALSE |
| female | high | Z | 67 | 75.0 | TRUE | FALSE |
| female | high | Z | 71 | 87.1 | TRUE | FALSE |
We can observe that there are 4 values that strongly influence the data. As we could have seen on the graph above this points are easily visible.
However, before we delete those values we will check normality assumption. If it is fulfilled, there is no point in removing observations. We must take into consideration that we are dealing with really small dataset and removing observations (as done with headache1) can lead us to completely getting rid of all data in a subgroup.
Now trying a deletion of these outliers:
x <- outliers_data[, 4]
x <- unlist(x)
headache1 <- headache %>% filter(id != x)
ggplot(aes(x = pain_score, y = risk, color = gender), data = headache1) + geom_boxplot() + facet_grid(~treatment)
As we see, this still leaves us with another outliers.
Normality
First, lets look at the normality assumption. We will use Shapiro test. The null hypothesis in the test is H0 : Sample data come from normal population.
x <- mshapiro_test(headache$pain_score)
kbl(x)
| statistic | p.value |
|---|---|
| 0.97 | 0.085 |
x <- headache %>%
group_by(gender) %>%
shapiro_test(pain_score)
kbl(x)
| gender | variable | statistic | p |
|---|---|---|---|
| male | pain_score | 0.949 | 0.095 |
| female | pain_score | 0.956 | 0.156 |
x <- headache %>%
group_by(risk) %>%
shapiro_test(pain_score)
kbl(x)
| risk | variable | statistic | p |
|---|---|---|---|
| high | pain_score | 0.954 | 0.143 |
| low | pain_score | 0.969 | 0.407 |
x <- headache %>%
group_by(treatment) %>%
shapiro_test(pain_score)
kbl(x)
| treatment | variable | statistic | p |
|---|---|---|---|
| X | pain_score | 0.939 | 0.154 |
| Y | pain_score | 0.974 | 0.767 |
| Z | pain_score | 0.965 | 0.554 |
We can see that p value is higher than our significance level. Hence, we do not reject the null hypothesis and consider our data normally distributed. Below we create a plot to view residuals and how they follow normal distribution.
ggqqplot(residuals(lm(pain_score ~ gender*risk*treatment, data = headache))) + theme_minimal() + ggtitle("Residuals for checking normality assumption")
As our outliers do not disturb with normal distribution of the dataset, we will continue using headache without deleted outliers.
Homogeneity of variance
To check the homogeneity of variances we can use two tests: Bartlett’s Test and Levene’s Test. Levene’s test is more robust and not as sensitive to outliers. Hence, it is better to use in this case.
x <- leveneTest(pain_score ~ risk * treatment * gender, data = headache)
kbl(x)
| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 11 | 0.179 | 0.998 |
| 60 | NA | NA |
From the Levene’s Test we can see that p value (almost 1) is much higher than our alpha level 0.05. Hence, the assumption of homogeneity of variance is fulfilled.
Anova
Now we can get into calculating ANOVA. As our assumptions are fulfilled, we can create a model we will use later.
First, we will generate our null hypothesis that we will try to reject during the process.
H0A : Factor A = gender has no effect on the pain score reported by participants
H0B : Factor B = risk of having migraine has no effect on the pain score reported by participants
H0C : Factor C = treatment performed has no effect on the pain score reported by participants
H0AxB : There is no interaction between gender and migraine risk of participants that has an affect on the reported pain score
H0AxC : There is no interaction between gender and treatment performed on participants that has an affect on the reported pain score
H0BxC : There is no interaction between migraine risk and treatment performed on participants that has an affect on the reported pain score
H0AxBxC : There is no interaction between gender, migraine risk and treatment performed on participants that has an affect on the reported pain score
model <- aov(data = headache, pain_score ~ gender * risk * treatment)
Now, we can substitute the model into formula for ANOVA.
As we know, the calculation of 3 way ANOVA takes following components:
- Sum of Squares for Treatment A (gender), Treatment B (risk), Treatment C (treatment), Interaction AxB, Interaction AxC, Interaction BxC, Interaction AxBxC, Error (within)
- Mean Sum of Squares for all above
- Degrees of Freedom
- F values for all Treatments MS/MSE
All this parts can be computed manually. However, thanks to R function ANOVA we can also easily see the p value and quickly assess whether the null hypothesis can or cannot be rejected.
x <- anova(model)
kbl(x)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 313.36 | 313.36 | 16.196 | 0.000 |
| risk | 1 | 1793.56 | 1793.56 | 92.699 | 0.000 |
| treatment | 2 | 283.17 | 141.58 | 7.318 | 0.001 |
| gender:risk | 1 | 2.73 | 2.73 | 0.141 | 0.708 |
| gender:treatment | 2 | 129.18 | 64.59 | 3.338 | 0.042 |
| risk:treatment | 2 | 27.59 | 13.80 | 0.713 | 0.494 |
| gender:risk:treatment | 2 | 286.60 | 143.30 | 7.406 | 0.001 |
| Residuals | 60 | 1160.89 | 19.35 | NA | NA |
Now we can assess what we managed to achieve:
- H0A (gender has no effect) was rejected. This means that men and women respond differently to migraine pains and the pain score can be partly explained by gender.
- H0B (risk has no effect) was rejected. This means that people who have higher risk level of having migraine respond differently than those who have those pains less often.
- H0C (treatment has no effect) was rejected. This means that treatments that were given to participants affected their assessment of migraine pains.
- H0AxB (gender:risk has no effect) was not rejected. This means that gender and migraine risk interact together and act on pain score from headache.
- H0AxC (gender:treatment has no effect) was rejected. This means that gender and given treatment interact together and act on pain score from headache.
- H0BxC (risk:treatment has no effect) was not rejected. This means that there is no effect between risk level and treatment given that influence pain score.
- H0AxBxC (risk:treatment:gender has no effect) was rejected. This means that all three factors together influence pain score given by the participants.
Since we managed to reject most (5 out of 7) H0 we now have to perform post-hoc tests to assess which factors / interactions where the most important in this case.
Post-hoc tests
If there is a significant 3-way interaction effect, you can decompose it into:
- Simple two-way interaction: run two-way interaction at each level of third variable,
- Simple simple main effect: run one-way model at each level of second variable,
- Simple simple pairwise comparisons: run pairwise or other post-hoc comparisons if necessary.
If you do not have a statistically significant three-way interaction, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.
We will determine:
three main effects: gender, risk level, treatment = overall determination which of three factors is most important
three two way interactions: gender:risk, gender:treatment, risk:treatment = determination whether (for first example) there is difference in males and females and their risk level taking all three treatment (together)
one three way interaction: gender:risk:treatment = for all factors
If three-way interaction is significant - and in our case it is, we know that two-way interaction gender:risk was significant for different treatments. We can check it in next step.
Two-way interactions
headacheX <- headache %>% filter(treatment == 'X')
headacheZ <- headache %>% filter(treatment == 'Z')
headacheY <- headache %>% filter(treatment == 'Y')
modelX <- aov(data = headacheX, pain_score ~ gender * risk)
kbl(anova(modelX))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 373 | 373.0 | 18.0 | 0.000 |
| risk | 1 | 687 | 686.7 | 33.1 | 0.000 |
| gender:risk | 1 | 215 | 215.2 | 10.4 | 0.004 |
| Residuals | 20 | 415 | 20.7 | NA | NA |
modelZ <- aov(data = headacheZ, pain_score ~ gender * risk)
kbl(anova(modelZ))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 16.6 | 16.6 | 1.04 | 0.320 |
| risk | 1 | 407.4 | 407.4 | 25.64 | 0.000 |
| gender:risk | 1 | 54.6 | 54.6 | 3.43 | 0.079 |
| Residuals | 20 | 317.7 | 15.9 | NA | NA |
modelY <- aov(data = headacheY, pain_score ~ gender * risk)
kbl(anova(modelY))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 53.0 | 53.0 | 2.472 | 0.132 |
| risk | 1 | 727.1 | 727.1 | 33.929 | 0.000 |
| gender:risk | 1 | 19.6 | 19.6 | 0.913 | 0.351 |
| Residuals | 20 | 428.6 | 21.4 | NA | NA |
We can now observe that significant two-way interaction was observed only for modelX being the use of drug X.
Another way to separate the three-way interaction is by dividing it into gender.
headachemen <- headache %>% filter(gender == 'male')
headachewomen <- headache %>% filter(gender == 'female')
modelmen <- aov(data = headachemen, pain_score ~ treatment * risk)
kbl(anova(modelmen))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| treatment | 2 | 394 | 196.9 | 9.15 | 0.001 |
| risk | 1 | 968 | 968.1 | 45.02 | 0.000 |
| treatment:risk | 2 | 203 | 101.6 | 4.72 | 0.016 |
| Residuals | 30 | 645 | 21.5 | NA | NA |
modelwomen <- aov(data = headachewomen, pain_score ~ treatment * risk)
kbl(anova(modelwomen))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| treatment | 2 | 18.6 | 9.32 | 0.542 | 0.587 |
| risk | 1 | 828.2 | 828.16 | 48.168 | 0.000 |
| treatment:risk | 2 | 111.0 | 55.48 | 3.227 | 0.054 |
| Residuals | 30 | 515.8 | 17.19 | NA | NA |
Here the only significant two-way interaction is with male participants.
Last division is by risk level.
headachelow <- headache %>% filter(risk == 'low')
headachehigh <- headache %>% filter(risk == 'high')
modellow <- aov(data = headachelow, pain_score ~ gender * treatment)
kbl(anova(modellow))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 129 | 128.79 | 7.794 | 0.009 |
| treatment | 2 | 119 | 59.48 | 3.600 | 0.040 |
| gender:treatment | 2 | 16 | 8.02 | 0.485 | 0.620 |
| Residuals | 30 | 496 | 16.52 | NA | NA |
modelhigh <- aov(data = headachehigh, pain_score ~ gender * treatment)
kbl(anova(modelhigh))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 187 | 187.3 | 8.45 | 0.007 |
| treatment | 2 | 192 | 95.9 | 4.33 | 0.022 |
| gender:treatment | 2 | 400 | 199.9 | 9.01 | 0.001 |
| Residuals | 30 | 665 | 22.2 | NA | NA |
Hence, the only significant two-way interaction is when patients have high risk of migraine.
Main effects
Now, to find simple main effect we will perform those significant two-way interactions from the previous step checking the second variable.
1. Significant two-way interaction gender:risk (treatment X)
headacheXlow <- headacheX %>% filter(risk == 'low')
headacheXhigh <- headacheX %>% filter(risk == 'high')
modelXlow <- aov(data = headacheXlow, pain_score ~ gender)
kbl(anova(modelXlow))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 10.8 | 10.8 | 0.757 | 0.405 |
| Residuals | 10 | 142.4 | 14.2 | NA | NA |
modelXhigh <- aov(data = headacheXhigh, pain_score ~ gender)
kbl(anova(modelXhigh))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 577 | 577.4 | 21.2 | 0.001 |
| Residuals | 10 | 272 | 27.2 | NA | NA |
Significant main effect: gender for treatment X + high risk level.
headacheXmen <- headacheX %>% filter(gender == 'male')
headacheXfem <- headacheX %>% filter(gender == 'female')
modelXmen <- aov(data = headacheXmen, pain_score ~ risk)
kbl(anova(modelXmen))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| risk | 1 | 835 | 835.4 | 40.7 | 0 |
| Residuals | 10 | 205 | 20.5 | NA | NA |
modelXfem <- aov(data = headacheXfem, pain_score ~ risk)
kbl(anova(modelXfem))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| risk | 1 | 66.5 | 66.5 | 3.18 | 0.105 |
| Residuals | 10 | 209.4 | 20.9 | NA | NA |
Significant main effect: risk for treatment X + gender male.
2. Significant two-way interaction risk:treatment (male)
headachemenX <- headachemen %>% filter(treatment == 'X')
headachemenY <- headachemen %>% filter(treatment == 'Y')
headachemenZ <- headachemen %>% filter(treatment == 'Z')
modelmenX <- aov(data = headachemenX, pain_score ~ risk)
kbl(anova(modelmenX))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| risk | 1 | 835 | 835.4 | 40.7 | 0 |
| Residuals | 10 | 205 | 20.5 | NA | NA |
modelmenY <- aov(data = headachemenY, pain_score ~ risk)
kbl(anova(modelmenY))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| risk | 1 | 254 | 254.1 | 10.6 | 0.009 |
| Residuals | 10 | 239 | 23.9 | NA | NA |
modelmenZ <- aov(data = headachemenZ, pain_score ~ risk)
kbl(anova(modelmenZ))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| risk | 1 | 81.9 | 81.9 | 4.07 | 0.071 |
| Residuals | 10 | 201.4 | 20.1 | NA | NA |
Significant main effect: risk for treatment X + gender male.
Significant main effect: risk for treatment Y + gender male.
headachemenlow <- headachemen %>% filter(risk == 'low')
headachemenhigh <- headachemen %>% filter(risk == 'high')
modelmenlow <- aov(data = headachemenlow, pain_score ~ treatment)
kbl(anova(modelmenlow))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| treatment | 2 | 25.5 | 12.8 | 0.623 | 0.55 |
| Residuals | 15 | 307.3 | 20.5 | NA | NA |
modelmenhigh <- aov(data = headachemenhigh, pain_score ~ treatment)
kbl(anova(modelmenhigh))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| treatment | 2 | 571 | 285.7 | 12.7 | 0.001 |
| Residuals | 15 | 338 | 22.5 | NA | NA |
Significant main effect: treatment for high risk + gender male.
3. Significant two-way interaction gender:treatment (high risk)
headachehighmen <- headachehigh %>% filter(gender == 'male')
headachehighfem <- headachehigh %>% filter(gender == 'female')
modelhighmen <- aov(data = headachehighmen, pain_score ~ treatment)
kbl(anova(modelhighmen))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| treatment | 2 | 571 | 285.7 | 12.7 | 0.001 |
| Residuals | 15 | 338 | 22.5 | NA | NA |
modelhighfem <- aov(data = headachehighfem, pain_score ~ treatment)
kbl(anova(modelhighfem))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| treatment | 2 | 20.1 | 10.1 | 0.461 | 0.639 |
| Residuals | 15 | 327.4 | 21.8 | NA | NA |
Significant main effect: treatment for high risk + gender male.
headachehighX <- headachehigh %>% filter(treatment == 'X')
headachehighY <- headachehigh %>% filter(treatment == 'Y')
headachehighZ <- headachehigh %>% filter(treatment == 'Z')
modelhighX <- aov(data = headachehighX, pain_score ~ gender)
kbl(anova(modelhighX))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 577 | 577.4 | 21.2 | 0.001 |
| Residuals | 10 | 272 | 27.2 | NA | NA |
modelhighY <- aov(data = headachehighY, pain_score ~ gender)
kbl(anova(modelhighY))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 4.08 | 4.08 | 0.176 | 0.684 |
| Residuals | 10 | 231.74 | 23.17 | NA | NA |
modelhighZ <- aov(data = headachehighZ, pain_score ~ gender)
kbl(anova(modelhighZ))
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| gender | 1 | 5.5 | 5.5 | 0.341 | 0.572 |
| Residuals | 10 | 161.2 | 16.1 | NA | NA |
Significant main effect: gender for high risk + treatment X.
SUMMARY: From all these ANOVA tests we can conclude that our main effects are:
- gender for high risk + treatment X
- treatment for high risk + gender male
- risk for treatment X + gender male
- risk for treatment Y + gender male
Pairwise comparisons
Lastly, we can perform TukeyHSD post-hoc test to view the summary in a different form.
pairwise_t_test(data = h1, pain_score ~ Total, p.adjust.method = "bonferroni")
## # A tibble: 66 x 9
## .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
## 1 pain_score fhX fhY 6 6 0.367 ns 1 e+0 ns
## 2 pain_score fhX fhZ 6 6 0.396 ns 1 e+0 ns
## 3 pain_score fhY fhZ 6 6 0.956 ns 1 e+0 ns
## 4 pain_score fhX flX 6 6 0.0686 ns 1 e+0 ns
## 5 pain_score fhY flX 6 6 0.00757 ** 5 e-1 ns
## 6 pain_score fhZ flX 6 6 0.00879 ** 5.8 e-1 ns
## 7 pain_score fhX flY 6 6 0.000112 *** 7.38e-3 **
## 8 pain_score fhY flY 6 6 0.00000448 **** 2.95e-4 ***
## 9 pain_score fhZ flY 6 6 0.00000548 **** 3.62e-4 ***
## 10 pain_score flX flY 6 6 0.0261 * 1 e+0 ns
## # ... with 56 more rows
Looking at the pairwise comparison we can observe that there are some significant correlations marked with * stars. Rest of them, marked with ‘ns’ means not significant values.
Summary
From our calculations we managed to reject 5 out of 7 null hypothesis. Hence, we now know that gender, risk level, treatment performed, as well as connection of gender + treatment, risk + treatment actually has an impact on the pain score given by the participants. From later checks, we managed to exclude main effects being high risk + treatment X, high risk + gender male, treatment X and Y + gender male.