MIXED ANOVA WITH R

Analysis of variance: ANOVA

Although the name of the technique refers to variances, the main goal of ANOVA is to investigate differences in means.

two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable. Other synonyms are: two factorial design, factorial anova or two-way between-subjects ANOVA.

The repeated-measures ANOVA is used for analyzing data where same subjects are measured more than once. This test is also referred to as a within-subjects ANOVA or ANOVA with repeated measures.

A Mixed Analysis of Variance (Mixed ANOVA), also known as a Split-Plot ANOVA, is a statistical technique used to analyze the effects of two or more independent variables on a dependent variable. It combines aspects of both the One-Way ANOVA and the Two-Way ANOVA, allowing for the examination of fixed effects (between-subjects factors) and repeated measures (within-subjects factors) in a single analysis.

DISCUSSION

Two-way mixed ANOVA
Data preparation
Summary statistics
Visualization
Check assumptions
Computation
Post-hoc tests
Conclusion

Learn how to:

Compute and interpret the ANOVA in R for comparing independent groups.
What is ANOVA Repeted Measures and Mixed ANOVA.
Check ANOVA test assumptions.
Perform post-hoc tests, multiple pairwise comparisons between groups to identify which groups are different.
Visualize the data using box plots, add ANOVA and pairwise comparisons p-values to the plot.

UNDERSTANDING THE EXPERIMENT

ANOVA COMPARISON.

In a Mixed ANOVA, the independent variables can be of two types:

Between-Subjects Factor: This is similar to the independent variable in a traditional One-Way or Two-Way ANOVA. It categorizes the observations into different groups or conditions, and the interest lies in understanding the differences in means across these groups.

Within-Subjects Factor: Also known as a repeated measures factor, this variable represents factors for which measurements are taken on the same subjects under different conditions. The within-subjects factor allows you to investigate how subjects’ responses change across these conditions.

FORMULAS TO UTILIZE

Total Sum of Squares (SST)

\(SST= \sum_{i=1}^{m}\sum_{j=1}^{k}(X_{ij} - \bar{X})^2\)

Between-Groups Sum of Squares (SSB)

\(SSB= \sum_{a=1}^{b} n_a (\bar{X_a} - \bar{X})^2\)

Within-Groups Sum of Squares (SSW)

\(SSW= \sum_{a=1}^{b}\sum_{i=1}^{n_a}\sum_{j=1}^{k}(X_{aij} - \bar{X_a})^2\)

Sum of Squares for Interaction (SSI)

\(SSI=SST−SSB−SSW\)

where

\(m\) is the number of participants (individuals)
\(k\) is the number of observations per Condition (Groups in Repeated Measure)
\(b\) is the number of factors in Factor Group
\(n_a\) is the number of participants in Factor Group (Between Group)
\(X_{aij}\) is the j-th observation in the i-th group of Group a
\(\bar{X}\) is the overall mean of all observations
\(\bar{X_a}\) is the mean of Group a

Formulas for Mean Squares and F-Statistics in Mixed ANOVA

Factor_or_Effect	Sum_of_Squares	Degrees_of_Freedom	Mean_Square	F_Statistic
Between-Groups (Group)	SSB	\(df_B\)	\(MSB = \frac{SSB}{df_B}\)	\(F_{\text{Group}} = \frac{MSB}{MSW}\)
Within-Groups (Condition)	SSW	\(df_W\)	\(MSW = \frac{SSW}{df_W}\)	\(F_{\text{Condition}} = \frac{MSW}{MSB}\)
Interaction (Group * Condition)	SSI	\(df_I\)	\(MSI = \frac{SSI}{df_I}\)	\(F_{\text{Interaction}} = \frac{MSI}{MSW}\)

\(df_B\): Degrees of Freedom for the Group factor (number of groups minus 1).

\(df_W\) : Degrees of Freedom for the Condition factor (total number of observations minus the number of groups).

\(df_I\): Degrees of Freedom for the Interaction effect (product of degrees of freedom for Group and Condition factors).

LIBRARIES TO USE

Rstatix: Provides a simple and intuitive pipe-friendly framework, coherent with the ‘tidyverse’ design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses.

‘ggpubr’: provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.

##reading libraries to use

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

library(ggpubr)

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.2.3

The Application

Fatigue can manifest in various ways and impact an individual’s ability to perform tasks effectively.

Fatigue

Fatigue is a condition characterized by a decline in physical and/or cognitive capabilities due to sustained activity, often accompanied by feelings of tiredness, reduced energy levels, and increased effort required to perform tasks.

Groups

Groups in the dataset refer to distinct categories or experimental conditions under which measurements are taken. Each group represents a specific context or scenario that might influence the response variable (velocity).

Velocity

Velocity is a measure of how quickly an object changes its position with respect to time. In your dataset, velocity represents the rate of movement or change in position for individuals under different conditions.

This information can have implications in various fields, such as sports science, occupational health, and human performance optimization, where understanding the impact of fatigue on velocity is critical for making informed decisions and improvements.

# Demo data

hip <- data.frame(
  stringsAsFactors = FALSE,
                id = c(1L,2L,3L,4L,5L,6L,7L,8L,
                       9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,
                       20L,21L),
             Group = c("LOW","LOW","LOW","LOW",
                       "LOW","LOW","LOW","LOW","LOW","LOW","HIGH","HIGH",
                       "HIGH","HIGH","HIGH","HIGH","HIGH","HIGH","HIGH",
                       "HIGH","HIGH"),
       Non_Fatigue = c(0.54,0.35,0.69,0.6,0.5,
                       0.56,0.72,0.3,0.56,0.63,0.4,0.46,0.35,0.7,0.54,
                       0.46,0.35,0.39,0.62,0.52,0.45),
           Fatigue = c(0.6,0.38,0.82,0.5,0.51,
                       0.68,0.73,0.38,0.7,0.54,0.62,0.37,0.32,0.85,0.73,
                       0.49,0.56,0.29,0.79,0.54,0.48)
)

head(hip)

##   id Group Non_Fatigue Fatigue
## 1  1   LOW        0.54    0.60
## 2  2   LOW        0.35    0.38
## 3  3   LOW        0.69    0.82
## 4  4   LOW        0.60    0.50
## 5  5   LOW        0.50    0.51
## 6  6   LOW        0.56    0.68

Treatment or factor to compare is Group of each individual.
The repeated condition for each individual is under Fatigue or Non Fatigue.

# Transform data into long format

hip <- hip %>%
  gather(key = "Condition", value = "Velocity", Non_Fatigue, Fatigue) %>%
  convert_as_factor(id, Condition)

head(hip)

##   id Group   Condition Velocity
## 1  1   LOW Non_Fatigue     0.54
## 2  2   LOW Non_Fatigue     0.35
## 3  3   LOW Non_Fatigue     0.69
## 4  4   LOW Non_Fatigue     0.60
## 5  5   LOW Non_Fatigue     0.50
## 6  6   LOW Non_Fatigue     0.56

hip %>%
  group_by(Group, Condition) %>%
  get_summary_stats(Velocity, type = "mean_sd")

## # A tibble: 4 × 6
##   Group Condition   variable     n  mean    sd
##   <chr> <fct>       <fct>    <dbl> <dbl> <dbl>
## 1 HIGH  Fatigue     Velocity    11 0.549 0.186
## 2 HIGH  Non_Fatigue Velocity    11 0.476 0.111
## 3 LOW   Fatigue     Velocity    10 0.584 0.148
## 4 LOW   Non_Fatigue Velocity    10 0.545 0.134

HYPOTHESIS

Between-Groups Hypotheses (Main Effect of Group):

Null Hypothesis (H₀): There is no significant difference in the means of the dependent variable among the different levels of the between-groups factor (Group).
Alternative Hypothesis (H₁): There is a significant difference in the means of the dependent variable among at least two levels of the between-groups factor (Group).

Within-Groups Hypotheses (Main Effect of Condition):

Null Hypothesis (H₀): There is no significant difference in the means of the dependent variable among the different levels of the within-groups factor (Condition).
Alternative Hypothesis (H₁): There is a significant difference in the means of the dependent variable among at least two levels of the within-groups factor (Condition).

Interaction Effect Hypotheses (Between-Groups and Within-Groups Interaction):

Null Hypothesis (H₀): There is no significant interaction effect between the between-groups factor (Group) and the within-groups factor (Condition) on the dependent variable.
Alternative Hypothesis (H₁): There is a significant interaction effect between the between-groups factor (Group) and the within-groups factor (Condition) on the dependent variable.

EXPLORATORY ANALYSIS

bxp <- ggboxplot(
  hip, x = "Group", y = "Velocity",
  color = "Condition", palette = "jco"
  )

bxp

# Create boxplot and highlight paired data points

bxp <- ggpaired(
  hip, x = "Condition", y = "Velocity", id = "id",
  line.color = "gray", linetype = "dashed"
  )

bxp

ASSUMPTIONS

The ANOVA Repeted Measures test makes the following assumptions about the data:

Independence of the observations.
No significant outliers in any cell of the design
Normality.
Homogeneity of variances.
Assumption of Sphericity

CHECK ASSUMPTIONS

hip %>%
  group_by(Group,Condition) %>%
  identify_outliers(Velocity)

## # A tibble: 1 × 6
##   Group Condition   id    Velocity is.outlier is.extreme
##   <chr> <fct>       <fct>    <dbl> <lgl>      <lgl>     
## 1 LOW   Non_Fatigue 8          0.3 TRUE       FALSE

hip %>%
  group_by(Group,Condition) %>%
  shapiro_test(Velocity)

## # A tibble: 4 × 5
##   Group Condition   variable statistic     p
##   <chr> <fct>       <chr>        <dbl> <dbl>
## 1 HIGH  Fatigue     Velocity     0.957 0.729
## 2 HIGH  Non_Fatigue Velocity     0.925 0.359
## 3 LOW   Fatigue     Velocity     0.952 0.690
## 4 LOW   Non_Fatigue Velocity     0.927 0.418

The data Velocity was normally distributed at each time point, as assessed by Shapiro-Wilk’s test (p > 0.05).

## Homogeneity of variance assumption

hip %>%
  group_by(Condition)%>%
  levene_test(Velocity ~ Group)

## # A tibble: 2 × 5
##   Condition     df1   df2 statistic     p
##   <fct>       <int> <int>     <dbl> <dbl>
## 1 Fatigue         1    19     0.331 0.572
## 2 Non_Fatigue     1    19     0.136 0.716

The Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.

## Homogeneity of covariances assumption
## Compute Box’s M-test:

box_m(hip[, "Velocity", drop = FALSE], hip$Group)

## # A tibble: 1 × 4
##   statistic p.value parameter method                                            
##       <dbl>   <dbl>     <dbl> <chr>                                             
## 1     0.202   0.653         1 Box's M-test for Homogeneity of Covariance Matric…

There was homogeneity of covariances, as assessed by Box’s test of equality of covariance matrices (p > 0.001).

ANOVA MODEL

# Compute ANOVA
res.aov <- anova_test (data = hip, dv = Velocity, wid = id, 
                       between = Group, within = Condition)
res.aov

## ANOVA Table (type III tests)
## 
##            Effect DFn DFd     F     p p<.05   ges
## 1           Group   1  19 0.735 0.402       0.033
## 2       Condition   1  19 5.975 0.024     * 0.038
## 3 Group:Condition   1  19 0.545 0.470       0.004

where,

F Indicates that we are comparing to an F-distribution (F-test); (1, 19) indicates the degrees of freedom in the numerator (DFn) and the denominator (DFd), respectively; 5.975 indicates the obtained F-statistic value
p specifies the p-value
ges is the generalized effect size (amount of variability due to the within-subjects factor)

From the output above, it can be seen that, there is No statistically significant two-way interactions between group and condition on Velocity, F(1, 19) = 0.545, p > 0.05.

## Alternatively it is possible to rewrite it as:

res.aov3<- hip%>%
           anova_test( Velocity ~ Group*Condition + Error(id/Condition) )
                       
res.aov3

## ANOVA Table (type III tests)
## 
##            Effect DFn DFd     F     p p<.05   ges
## 1           Group   1  19 0.735 0.402       0.033
## 2       Condition   1  19 5.975 0.024     * 0.038
## 3 Group:Condition   1  19 0.545 0.470       0.004

## if it were a ANOVA two way

res.aov2 <- hip %>% anova_test(Velocity ~ Group * Condition)
res.aov2

## ANOVA Table (type II tests)
## 
##            Effect DFn DFd     F     p p<.05   ges
## 1           Group   1  38 1.286 0.264       0.033
## 2       Condition   1  38 1.544 0.222       0.039
## 3 Group:Condition   1  38 0.136 0.714       0.004

# Visual report
# Show the report for the within-subject variable, here "Condition"
# Corresponding to the row number 2 in the ANOVA table output

bxp +
  labs(subtitle = get_test_label(res.aov, row = 2, detailed = TRUE))

Post-hoc Test

Performing pairwise paired t-tests

# pairwise comparisons
pwc <- hip %>%
  group_by(Group) %>%
  pairwise_t_test(
    Velocity ~ Condition, paired = TRUE,
    p.adjust.method = "bonferroni"
    )

pwc

## # A tibble: 2 × 11
##   Group .y.      group1  group2       n1    n2 stati…¹    df     p p.adj p.adj…²
## * <chr> <chr>    <chr>   <chr>     <int> <int>   <dbl> <dbl> <dbl> <dbl> <chr>  
## 1 HIGH  Velocity Fatigue Non_Fati…    11    11    2.02    10 0.071 0.071 ns     
## 2 LOW   Velocity Fatigue Non_Fati…    10    10    1.45     9 0.18  0.18  ns     
## # … with abbreviated variable names ¹statistic, ²p.adj.signif

However, interaction was not significant then it is possible to proceed for each main effect only.

# pairwise comparisons for condition only
pwc <- hip %>%
  pairwise_t_test(
    Velocity ~ Condition, paired = TRUE,
    p.adjust.method = "bonferroni"
    )

pwc

## # A tibble: 1 × 10
##   .y.      group1  group2         n1    n2 statistic    df     p p.adj p.adj.s…¹
## * <chr>    <chr>   <chr>       <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>    
## 1 Velocity Fatigue Non_Fatigue    21    21      2.51    20 0.021 0.021 *        
## # … with abbreviated variable name ¹p.adj.signif

CONCLUSION

The key features of Mixed ANOVA are:

It handles both between-subjects and within-subjects factors in the same analysis.
It helps assess main effects (effects of individual factors) and interaction effects (effects of the combined factors).
It takes into account the potential correlations between measurements taken from the same subjects under different conditions.
It is useful when you have a combination of categorical and repeated measures factors in a study.

In summary, Mixed ANOVA is a powerful tool for analyzing data with both between-subjects and within-subjects factors, making it particularly suitable for experiments where you want to examine how different conditions affect participants over time or across different groups. It allows researchers to uncover nuanced insights into the combined effects of various factors on the dependent variable, facilitating a more comprehensive understanding of the underlying relationships in the data.