Analysis of variance: ANOVA


Although the name of the technique refers to variances, the main goal of ANOVA is to investigate differences in means.

two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable. Other synonyms are: two factorial design, factorial anova or two-way between-subjects ANOVA.

The repeated-measures ANOVA is used for analyzing data where same subjects are measured more than once. This test is also referred to as a within-subjects ANOVA or ANOVA with repeated measures.

A Mixed Analysis of Variance (Mixed ANOVA), also known as a Split-Plot ANOVA, is a statistical technique used to analyze the effects of two or more independent variables on a dependent variable. It combines aspects of both the One-Way ANOVA and the Two-Way ANOVA, allowing for the examination of fixed effects (between-subjects factors) and repeated measures (within-subjects factors) in a single analysis.

DISCUSSION


Learn how to:


UNDERSTANDING THE EXPERIMENT


ANOVA COMPARISON.
ANOVA COMPARISON.


In a Mixed ANOVA, the independent variables can be of two types:

Between-Subjects Factor: This is similar to the independent variable in a traditional One-Way or Two-Way ANOVA. It categorizes the observations into different groups or conditions, and the interest lies in understanding the differences in means across these groups.

Within-Subjects Factor: Also known as a repeated measures factor, this variable represents factors for which measurements are taken on the same subjects under different conditions. The within-subjects factor allows you to investigate how subjects’ responses change across these conditions.

FORMULAS TO UTILIZE


Total Sum of Squares (SST)

\(SST= \sum_{i=1}^{m}\sum_{j=1}^{k}(X_{ij} - \bar{X})^2\)

Between-Groups Sum of Squares (SSB)

\(SSB= \sum_{a=1}^{b} n_a (\bar{X_a} - \bar{X})^2\)

Within-Groups Sum of Squares (SSW)

\(SSW= \sum_{a=1}^{b}\sum_{i=1}^{n_a}\sum_{j=1}^{k}(X_{aij} - \bar{X_a})^2\)

Sum of Squares for Interaction (SSI)

\(SSI=SST−SSB−SSW\)

where



Formulas for Mean Squares and F-Statistics in Mixed ANOVA

Factor_or_Effect Sum_of_Squares Degrees_of_Freedom Mean_Square F_Statistic
Between-Groups (Group) SSB \(df_B\) \(MSB = \frac{SSB}{df_B}\) \(F_{\text{Group}} = \frac{MSB}{MSW}\)
Within-Groups (Condition) SSW \(df_W\) \(MSW = \frac{SSW}{df_W}\) \(F_{\text{Condition}} = \frac{MSW}{MSB}\)
Interaction (Group * Condition) SSI \(df_I\) \(MSI = \frac{SSI}{df_I}\) \(F_{\text{Interaction}} = \frac{MSI}{MSW}\)


\(df_B\): Degrees of Freedom for the Group factor (number of groups minus 1).

\(df_W\) : Degrees of Freedom for the Condition factor (total number of observations minus the number of groups).

\(df_I\): Degrees of Freedom for the Interaction effect (product of degrees of freedom for Group and Condition factors).

LIBRARIES TO USE


Rstatix: Provides a simple and intuitive pipe-friendly framework, coherent with the ‘tidyverse’ design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses.

ggpubr’: provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.

##reading libraries to use

library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(ggpubr)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.2.3

The Application

Fatigue can manifest in various ways and impact an individual’s ability to perform tasks effectively.

Fatigue

Fatigue is a condition characterized by a decline in physical and/or cognitive capabilities due to sustained activity, often accompanied by feelings of tiredness, reduced energy levels, and increased effort required to perform tasks.

Groups

Groups in the dataset refer to distinct categories or experimental conditions under which measurements are taken. Each group represents a specific context or scenario that might influence the response variable (velocity).

Velocity

Velocity is a measure of how quickly an object changes its position with respect to time. In your dataset, velocity represents the rate of movement or change in position for individuals under different conditions.

This information can have implications in various fields, such as sports science, occupational health, and human performance optimization, where understanding the impact of fatigue on velocity is critical for making informed decisions and improvements.

# Demo data

hip <- data.frame(
  stringsAsFactors = FALSE,
                id = c(1L,2L,3L,4L,5L,6L,7L,8L,
                       9L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,
                       20L,21L),
             Group = c("LOW","LOW","LOW","LOW",
                       "LOW","LOW","LOW","LOW","LOW","LOW","HIGH","HIGH",
                       "HIGH","HIGH","HIGH","HIGH","HIGH","HIGH","HIGH",
                       "HIGH","HIGH"),
       Non_Fatigue = c(0.54,0.35,0.69,0.6,0.5,
                       0.56,0.72,0.3,0.56,0.63,0.4,0.46,0.35,0.7,0.54,
                       0.46,0.35,0.39,0.62,0.52,0.45),
           Fatigue = c(0.6,0.38,0.82,0.5,0.51,
                       0.68,0.73,0.38,0.7,0.54,0.62,0.37,0.32,0.85,0.73,
                       0.49,0.56,0.29,0.79,0.54,0.48)
)

head(hip)
##   id Group Non_Fatigue Fatigue
## 1  1   LOW        0.54    0.60
## 2  2   LOW        0.35    0.38
## 3  3   LOW        0.69    0.82
## 4  4   LOW        0.60    0.50
## 5  5   LOW        0.50    0.51
## 6  6   LOW        0.56    0.68
# Transform data into long format

hip <- hip %>%
  gather(key = "Condition", value = "Velocity", Non_Fatigue, Fatigue) %>%
  convert_as_factor(id, Condition)

head(hip)
##   id Group   Condition Velocity
## 1  1   LOW Non_Fatigue     0.54
## 2  2   LOW Non_Fatigue     0.35
## 3  3   LOW Non_Fatigue     0.69
## 4  4   LOW Non_Fatigue     0.60
## 5  5   LOW Non_Fatigue     0.50
## 6  6   LOW Non_Fatigue     0.56
hip %>%
  group_by(Group, Condition) %>%
  get_summary_stats(Velocity, type = "mean_sd")
## # A tibble: 4 × 6
##   Group Condition   variable     n  mean    sd
##   <chr> <fct>       <fct>    <dbl> <dbl> <dbl>
## 1 HIGH  Fatigue     Velocity    11 0.549 0.186
## 2 HIGH  Non_Fatigue Velocity    11 0.476 0.111
## 3 LOW   Fatigue     Velocity    10 0.584 0.148
## 4 LOW   Non_Fatigue Velocity    10 0.545 0.134

HYPOTHESIS


Between-Groups Hypotheses (Main Effect of Group):

Within-Groups Hypotheses (Main Effect of Condition):

Interaction Effect Hypotheses (Between-Groups and Within-Groups Interaction):

EXPLORATORY ANALYSIS


bxp <- ggboxplot(
  hip, x = "Group", y = "Velocity",
  color = "Condition", palette = "jco"
  )

bxp

# Create boxplot and highlight paired data points

bxp <- ggpaired(
  hip, x = "Condition", y = "Velocity", id = "id",
  line.color = "gray", linetype = "dashed"
  )

bxp


ASSUMPTIONS


The ANOVA Repeted Measures test makes the following assumptions about the data:

  • Independence of the observations.
  • No significant outliers in any cell of the design
  • Normality.
  • Homogeneity of variances.
  • Assumption of Sphericity

CHECK ASSUMPTIONS


hip %>%
  group_by(Group,Condition) %>%
  identify_outliers(Velocity)
## # A tibble: 1 × 6
##   Group Condition   id    Velocity is.outlier is.extreme
##   <chr> <fct>       <fct>    <dbl> <lgl>      <lgl>     
## 1 LOW   Non_Fatigue 8          0.3 TRUE       FALSE
hip %>%
  group_by(Group,Condition) %>%
  shapiro_test(Velocity)
## # A tibble: 4 × 5
##   Group Condition   variable statistic     p
##   <chr> <fct>       <chr>        <dbl> <dbl>
## 1 HIGH  Fatigue     Velocity     0.957 0.729
## 2 HIGH  Non_Fatigue Velocity     0.925 0.359
## 3 LOW   Fatigue     Velocity     0.952 0.690
## 4 LOW   Non_Fatigue Velocity     0.927 0.418

The data Velocity was normally distributed at each time point, as assessed by Shapiro-Wilk’s test (p > 0.05).

## Homogeneity of variance assumption

hip %>%
  group_by(Condition)%>%
  levene_test(Velocity ~ Group)
## # A tibble: 2 × 5
##   Condition     df1   df2 statistic     p
##   <fct>       <int> <int>     <dbl> <dbl>
## 1 Fatigue         1    19     0.331 0.572
## 2 Non_Fatigue     1    19     0.136 0.716

The Levene’s test is not significant (p > 0.05). Therefore, we can assume the homogeneity of variances in the different groups.

## Homogeneity of covariances assumption
## Compute Box’s M-test:

box_m(hip[, "Velocity", drop = FALSE], hip$Group)
## # A tibble: 1 × 4
##   statistic p.value parameter method                                            
##       <dbl>   <dbl>     <dbl> <chr>                                             
## 1     0.202   0.653         1 Box's M-test for Homogeneity of Covariance Matric…

There was homogeneity of covariances, as assessed by Box’s test of equality of covariance matrices (p > 0.001).

ANOVA MODEL


# Compute ANOVA
res.aov <- anova_test (data = hip, dv = Velocity, wid = id, 
                       between = Group, within = Condition)
res.aov
## ANOVA Table (type III tests)
## 
##            Effect DFn DFd     F     p p<.05   ges
## 1           Group   1  19 0.735 0.402       0.033
## 2       Condition   1  19 5.975 0.024     * 0.038
## 3 Group:Condition   1  19 0.545 0.470       0.004

where,

From the output above, it can be seen that, there is No statistically significant two-way interactions between group and condition on Velocity, F(1, 19) = 0.545, p > 0.05.

## Alternatively it is possible to rewrite it as:

res.aov3<- hip%>%
           anova_test( Velocity ~ Group*Condition + Error(id/Condition) )
                       
res.aov3
## ANOVA Table (type III tests)
## 
##            Effect DFn DFd     F     p p<.05   ges
## 1           Group   1  19 0.735 0.402       0.033
## 2       Condition   1  19 5.975 0.024     * 0.038
## 3 Group:Condition   1  19 0.545 0.470       0.004
## if it were a ANOVA two way

res.aov2 <- hip %>% anova_test(Velocity ~ Group * Condition)
res.aov2
## ANOVA Table (type II tests)
## 
##            Effect DFn DFd     F     p p<.05   ges
## 1           Group   1  38 1.286 0.264       0.033
## 2       Condition   1  38 1.544 0.222       0.039
## 3 Group:Condition   1  38 0.136 0.714       0.004
# Visual report
# Show the report for the within-subject variable, here "Condition"
# Corresponding to the row number 2 in the ANOVA table output

bxp +
  labs(subtitle = get_test_label(res.aov, row = 2, detailed = TRUE))

Post-hoc Test


Performing pairwise paired t-tests

# pairwise comparisons
pwc <- hip %>%
  group_by(Group) %>%
  pairwise_t_test(
    Velocity ~ Condition, paired = TRUE,
    p.adjust.method = "bonferroni"
    )

pwc
## # A tibble: 2 × 11
##   Group .y.      group1  group2       n1    n2 stati…¹    df     p p.adj p.adj…²
## * <chr> <chr>    <chr>   <chr>     <int> <int>   <dbl> <dbl> <dbl> <dbl> <chr>  
## 1 HIGH  Velocity Fatigue Non_Fati…    11    11    2.02    10 0.071 0.071 ns     
## 2 LOW   Velocity Fatigue Non_Fati…    10    10    1.45     9 0.18  0.18  ns     
## # … with abbreviated variable names ¹​statistic, ²​p.adj.signif

However, interaction was not significant then it is possible to proceed for each main effect only.

# pairwise comparisons for condition only
pwc <- hip %>%
  pairwise_t_test(
    Velocity ~ Condition, paired = TRUE,
    p.adjust.method = "bonferroni"
    )

pwc
## # A tibble: 1 × 10
##   .y.      group1  group2         n1    n2 statistic    df     p p.adj p.adj.s…¹
## * <chr>    <chr>   <chr>       <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>    
## 1 Velocity Fatigue Non_Fatigue    21    21      2.51    20 0.021 0.021 *        
## # … with abbreviated variable name ¹​p.adj.signif

CONCLUSION


The key features of Mixed ANOVA are:

In summary, Mixed ANOVA is a powerful tool for analyzing data with both between-subjects and within-subjects factors, making it particularly suitable for experiments where you want to examine how different conditions affect participants over time or across different groups. It allows researchers to uncover nuanced insights into the combined effects of various factors on the dependent variable, facilitating a more comprehensive understanding of the underlying relationships in the data.