Analysis of Variance (ANOVA) is a powerful statistical tool used to determine whether categorical factors significantly influence a continuous numerical outcome. It helps identify whether differences in the outcome are statistically significant across various factor levels or treatments. For example, ANOVA can be applied to assess whether student gender or computer ownership impacts educational achievement.
A fundamental assumption of ANOVA is the independence of observations. In our student example, each student is surveyed once, ensuring that their responses are independent of one another. However, standard ANOVA does not account for changes over time. When the same individuals are measured repeatedly, their responses become correlated-meaning a student’s response at a later time depends on their previous responses. This violates the ANOVA assumption of independence.
To analyze time-dependent data, Repeated Measures ANOVA is used. This approach accounts for correlations within subjects over time, allowing for a more accurate analysis of longitudinal data. This tutorial demonstrates how to perform Repeated Measures ANOVA using the R programming language. A basic understanding of statistics, regression modeling and R is recommended. To get started, watch the introductory video.
Researched and engineered at Decision Analytics
Hub
by Mauricio Claudio | DecisionAnalyticsHub@proton.me
The data for this tutorial is a simulated weight loss study involving 100 participants with one half or fifty participants dieting, and one half or fifty participants fasting over a period of eight weeks. The Weight of each Participant was measured four times, at points Week 0, 2, 4 and 8, during the study. The primary research question is whether the two Regimen treatments - Dieting or Fasting - result in different weight loss, and if so, which treatment is more effective.
The data are summarized and plotted on the right.
No | Variable | Stats / Values | Freqs (% of Valid) | Valid | Missing | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Week [factor] |
|
|
400 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | Regimen [factor] |
|
|
400 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | Participant [factor] |
|
|
400 (100.0%) | 0 (0.0%) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | Weight [numeric] |
|
400 distinct values | 400 (100.0%) | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.1.0)
2025-02-26
The assumptions of Repeated Measures ANOVA are:
The boxplots and density plots in the Data page show no substantial number of outliers in the data, satisfying this assumption.
To assess the normality of the response variable Weight, we show QQ-plots for each of the eight factor combinations. No deviation from normality is noted in any of the plots.
Sphericity refers to the variance of the differences in time in the response variable. To test for it visually, we take the difference of the response variable across time periods. Since there are four time periods, Week 0, 2, 4 and 8, we have to calculate this difference for 4 choose 2 or six periods. We draw a boxplot for each period corresponding to differences in the response variable for the period, and compare the height or IQR of the boxplots to see whether variance is constant across periods. In this case, we see that variance is not constant, as a couple of boxplots have short, squat IQR while others have tall, extended IQR, pointing to unequal variance in time or non-sphericity.
We can also test for sphericity using Mauchly’s test. To do
so, we use the anova_test()
function from the rstatix library. A p-value below
0.05 means rejection of the null hypothesis that variance is constant
across differences of the response variabel in time. Here, Mauchly’s
test confirms the earlier finding from the boxplots that the data
is not spheric, and therefore that this assumption of Repeated Measures
ANOVA is not satisfied.
anova_test(data = WeightLossData,
dv = Weight, # Response variable
wid = Participant, # Random effect
within = Week)$`Mauchly's Test for Sphericity`
Effect W p p<.05
1 Week 0.046 8.56e-63 *
So, is absence of sphericity a show-stopper in Repeated Measures ANOVA? No. Non-sphericity can be corrected in one of two ways. The first is via corrections in the degrees of freedom in the ANOVA calculation, resulting in adjustments to F-values, after fitting the model. These post hoc corrections are beyond the scope of this tutorial, and have been increasingly superseded by a second way to account for non-sphericity. This second way is to correct for non-sphericity by specifying the model covariate structure prior to fitting the model. This is the method described in this tutorial.
Before we continue, we must understand that, regardless of the method employed, non-sphericity needs to be corrected lest the resulting parameter estimates increase Type I errors (‘false positives’), leading us to reject the null hypothesis when we ought to admit it.
To fit a Repeated Measures ANOVA model in R, we use the
gls()
function from the nmle library. A first,
essential step in a fitting the model is to determine which time
correlation structure or covariate structure best fits the data. To do so, we
fit the model with a few covariate structures and chose the best one
based on model fit measured by AIC, BIC and LogLikelihood. A list of the
correlation structure classes available with the gls()
function is found here.
In specifying the model, note two things. One, the ‘subject’ or
experimental unit for which repeated measures are taken is specified in
the corr parameter. Two, the experimental units, the
participants, are modelled as a Random Effect, denoted by the numeral
one followed by bar, in the 1 | Participant
specification.
Why? One reason is that we are dealing with a sample of participants
rather than the population. Another reason is that we are not interested
in the individual change in weight of each participant, but rather we
are interested in accounting for the variability of the
participants as a collective or group. To do so in the model, they are
specified as a Random Effect.
### Covariate structure with no correlation
mod.nostruct = gls(Weight ~ Regimen + Week + Regimen*Week,
corr=corSymm(, form= ~ 1 | Participant),
weights = varIdent(form = ~ 1 | Week))
### Covariate structure with 'compound symmetry' or constant, non-zero correlation
mod.compsym = gls(Weight ~ Regimen + Week + Regimen*Week,
corr=corCompSymm(, form= ~ 1 | Participant))
### Covariate structure with time-series AR(1) or correlation of order 1
mod.ar1 = gls(Weight ~ Regimen + Week + Regimen*Week,
corr=corAR1(, form= ~ 1 | Participant))
anova(mod.nostruct, mod.compsym, mod.ar1) ### Compare the three models
corrSymm()
specifies a covariate structure with no
structure or no correlation along measures. This is identical to
specifying an OLS model that does not take time repeated measures into
account. corrCompSymm
specifies a covariate structure with
compound symmetry or constant, non-zero correlation along
measures.
corAR1
specifies a covariate structure with correlation
of order one (1) along measures, meaning that correlations decay to the
power of time difference. The greater the time difference along
measures, the smaller the correlation. For example, if the correlation
between measures at times 1 and 2 is, \(0.80^{2-1=1}\), then the correlation
between measures at times 1 and 3 is \(0.80^{3-1=2} = 0.64\), and the correlation
between measures at times 1 and 4 is \(0.80^{4-1=3} = 0.51\). This decay in
correlation along measures is usually a more realistic assumption that
either no correlation or constant correlation. The anova()
function compares the three models.
In the model comparison, the AR(1) covariance structure model returns both lower AIC and BIC scores than the two competitors, suggesting that it offers the best fit among the models tested. The differences among the models are not stark, but confirm the earlier finding from the OLS model ACF/PACF plots that the residuals have an AR(1) structure. Hence, we select the AR(1) covariate structure as the best fit for the data.
Model df AIC BIC logLik Test L.Ratio p-value
mod.nostruct 1 18 2220.687 2292.170 -1092.344
mod.compsym 2 10 2283.248 2322.960 -1131.624 1 vs 2 78.56038 <.0001
mod.ar1 3 10 2209.081 2248.794 -1094.541
The ANOVA table for the AR(1) model shows that all terms, including the interaction term, are significant at 95% confidence, pointing to the statistically significant impact of the factors of time and regimen, as well as the interaction of the two, on weight.
Model diagnostics are satisfactory, satisfying the assumptions of constant variance and normality of the residuals. We are good to go and interpret the results.
numDF | F-value | p-value | |
---|---|---|---|
(Intercept) | 1 | 6030.930 | 0.000 |
Regimen | 1 | 6.026 | 0.015 |
Week | 3 | 1614.084 | 0.000 |
Regimen:Week | 3 | 157.393 | 0.000 |
The significance of the Regimen:Week interaction term means that changes in weight among participants depends on both the regimen type and duration. Therefore, we will not interpret the single, main effect terms separately.
The results show that mean weights between dieting and fasting are not statistically different at weeks 0, 2 and 4. At week 8, there is a statistically significant difference in mean weights among participants between the two regimens. Statistical differences can be identified by non-overlapping estimate confidence intervals. The Means tab displays these results in graphical and tabular form.
Two factor levels for Regimen and four factor levels for Week means that there are 8 choose 2 or 28 pair-wise comparisons or contrasts for the Regimen:Week interaction term. Ten of 28 constrasts are not significant at 95% confidence. The 18 significant constrasts are shown in graphical and tabular form in the Constrasts tab. Among other results, the data shows that there is a statistically significant difference of 18.2 pounds between an eight-week diet and an eight-week fast.
Regimen | Week | Mean | lower.CL | upper.CL |
---|---|---|---|---|
Dieting | 0 | 181 | 175 | 187 |
Fasting | 0 | 178 | 172 | 184 |
Dieting | 2 | 177 | 171 | 183 |
Fasting | 2 | 170 | 164 | 177 |
Dieting | 4 | 173 | 167 | 179 |
Fasting | 4 | 162 | 156 | 168 |
Dieting | 8 | 165 | 158 | 171 |
Fasting | 8 | 146 | 140 | 152 |
contrast | Mean | lower.CL | upper.CL | p.value |
---|---|---|---|---|
Dieting Week0 - Fasting Week8 | 34.708 | 20.463 | 48.954 | 0.000 |
Fasting Week0 - Fasting Week8 | 31.660 | 29.984 | 33.336 | 0.000 |
Dieting Week2 - Fasting Week8 | 30.756 | 16.510 | 45.001 | 0.000 |
Dieting Week4 - Fasting Week8 | 26.460 | 12.214 | 40.705 | 0.000 |
Fasting Week2 - Fasting Week8 | 24.024 | 22.653 | 25.394 | 0.000 |
Dieting Week0 - Fasting Week4 | 18.938 | 4.692 | 33.183 | 0.001 |
Dieting Week8 - Fasting Week8 | 18.155 | 3.909 | 32.401 | 0.002 |
Dieting Week0 - Dieting Week8 | 16.553 | 14.877 | 18.230 | 0.000 |
Fasting Week0 - Fasting Week4 | 15.889 | 14.519 | 17.260 | 0.000 |
Fasting Week4 - Fasting Week8 | 15.771 | 14.801 | 16.741 | 0.000 |
Dieting Week2 - Fasting Week4 | 14.985 | 0.739 | 29.230 | 0.029 |
Dieting Week2 - Dieting Week8 | 12.601 | 11.230 | 13.971 | 0.000 |
Dieting Week4 - Dieting Week8 | 8.305 | 7.335 | 9.275 | 0.000 |
Fasting Week2 - Fasting Week4 | 8.253 | 7.283 | 9.223 | 0.000 |
Dieting Week0 - Dieting Week4 | 8.249 | 6.878 | 9.619 | 0.000 |
Fasting Week0 - Fasting Week2 | 7.637 | 6.667 | 8.607 | 0.000 |
Dieting Week2 - Dieting Week4 | 4.296 | 3.326 | 5.266 | 0.000 |
Dieting Week0 - Dieting Week2 | 3.953 | 2.983 | 4.923 | 0.000 |