Intro

Column

Intro

Analysis of Variance (ANOVA) is a powerful statistical tool used to determine whether categorical factors significantly influence a continuous numerical outcome. It helps identify whether differences in the outcome are statistically significant across various factor levels or treatments. For example, ANOVA can be applied to assess whether student gender or computer ownership impacts educational achievement.

A fundamental assumption of ANOVA is the independence of observations. In our student example, each student is surveyed once, ensuring that their responses are independent of one another. However, standard ANOVA does not account for changes over time. When the same individuals are measured repeatedly, their responses become correlated-meaning a student’s response at a later time depends on their previous responses. This violates the ANOVA assumption of independence.

To analyze time-dependent data, Repeated Measures ANOVA is used. This approach accounts for correlations within subjects over time, allowing for a more accurate analysis of longitudinal data. This tutorial demonstrates how to perform Repeated Measures ANOVA using the R programming language. A basic understanding of statistics, regression modeling and R is recommended. To get started, watch the introductory video.

Researched and engineered at Decision Analytics Hub
by Mauricio Claudio | DecisionAnalyticsHub@proton.me

Column

Data

Column

Data

The data for this tutorial is a simulated weight loss study involving 100 participants with one half or fifty participants dieting, and one half or fifty participants fasting over a period of eight weeks. The Weight of each Participant was measured four times, at points Week 0, 2, 4 and 8, during the study. The primary research question is whether the two Regimen treatments - Dieting or Fasting - result in different weight loss, and if so, which treatment is more effective.

The data are summarized and plotted on the right.

Column

Data summary

Data Frame Summary

WeightLossData

Dimensions: 400 x 4
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Valid

Missing

Week [factor]

1. 0

2. 2

3. 4

4. 8

100	(	25.0%	)
100	(	25.0%	)
100	(	25.0%	)
100	(	25.0%	)

400 (100.0%)

0 (0.0%)

Regimen [factor]

1. Dieting

2. Fasting

200	(	50.0%	)
200	(	50.0%	)

400 (100.0%)

0 (0.0%)

Participant [factor]

1. 1
2. 2
3. 3
4. 4
5. 5
6. 6
7. 7
8. 8
9. 9
10. 10
[ 90 others ]

4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
4	(	1.0%	)
360	(	90.0%	)

400 (100.0%)

0 (0.0%)

Weight [numeric]

Mean (sd) : 169.1 (23.9)

min ≤ med ≤ max:

104.4 ≤ 170.3 ≤ 232.2

IQR (CV) : 31.2 (0.1)

400 distinct values

400 (100.0%)

0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.1.0)
2025-02-26

Weight per Week by Regimen I

Weight per Week by Regimen II

Model

Column

1. Assumptions

Assumptions

The assumptions of Repeated Measures ANOVA are:

Absence of outliers
Normality of the response variable
Constant variance in time or sphericity

Absence of outliers

The boxplots and density plots in the Data page show no substantial number of outliers in the data, satisfying this assumption.

Normality of the response variable

To assess the normality of the response variable Weight, we show QQ-plots for each of the eight factor combinations. No deviation from normality is noted in any of the plots.

Sphericity

Sphericity refers to the variance of the differences in time in the response variable. To test for it visually, we take the difference of the response variable across time periods. Since there are four time periods, Week 0, 2, 4 and 8, we have to calculate this difference for 4 choose 2 or six periods. We draw a boxplot for each period corresponding to differences in the response variable for the period, and compare the height or IQR of the boxplots to see whether variance is constant across periods. In this case, we see that variance is not constant, as a couple of boxplots have short, squat IQR while others have tall, extended IQR, pointing to unequal variance in time or non-sphericity.

We can also test for sphericity using Mauchly’s test. To do so, we use the anova_test() function from the rstatix library. A p-value below 0.05 means rejection of the null hypothesis that variance is constant across differences of the response variabel in time. Here, Mauchly’s test confirms the earlier finding from the boxplots that the data is not spheric, and therefore that this assumption of Repeated Measures ANOVA is not satisfied.

anova_test(data = WeightLossData,
           dv = Weight, # Response variable
           wid = Participant, # Random effect
           within = Week)$`Mauchly's Test for Sphericity`

  Effect     W        p p<.05
1   Week 0.046 8.56e-63     *

So, is absence of sphericity a show-stopper in Repeated Measures ANOVA? No. Non-sphericity can be corrected in one of two ways. The first is via corrections in the degrees of freedom in the ANOVA calculation, resulting in adjustments to F-values, after fitting the model. These post hoc corrections are beyond the scope of this tutorial, and have been increasingly superseded by a second way to account for non-sphericity. This second way is to correct for non-sphericity by specifying the model covariate structure prior to fitting the model. This is the method described in this tutorial.

Before we continue, we must understand that, regardless of the method employed, non-sphericity needs to be corrected lest the resulting parameter estimates increase Type I errors (‘false positives’), leading us to reject the null hypothesis when we ought to admit it.

2. Correlated residuals

Correlated residuals

Recall that for each participant in the study four measures of weight in time were taken, and therefore they are correlated. Let’s check whether this is true. To do so we model the data with an Ordinary Least Squares (OLS) regression as if the observations were all independent and check the residuals for correlation or a time-series structure. The ACF and PACF plots for the residuals show that there are significant lags or correlations. The tall spike at lag1 on the PACF and the tapering sinusoidal on the ACF hint strongly at an AR(1) time-series structure in the residuals, pointing to the unsuitability of an OLS model that does not account for time or repeated measures.

Residual ACF / PACF plot

Before moving on, we need to understand why correlated residuals are problematic. One reason is that they skew our parameter estimates, returning models with less reliable coefficients. A second reason is that patterns in the residuals point to poor model fit. For if the model explained the variation in the data, it would not leave behind significant patterns or discernible variation in the residuals. It would leave instead white noise.

Code

attach(WeightLossData)

### Fit OLS regression model
model.OLS = lm(Weight ~ Regimen + Week + Regimen*Week)

### Extract and plot ACF/PACF of residuals
acf(residuals(model.OLS)) ; pacf(residuals(model.OLS)))

3. Covariate structure

Covariate structure

To fit a Repeated Measures ANOVA model in R, we use the gls() function from the nmle library. A first, essential step in a fitting the model is to determine which time correlation structure or covariate structure best fits the data. To do so, we fit the model with a few covariate structures and chose the best one based on model fit measured by AIC, BIC and LogLikelihood. A list of the correlation structure classes available with the gls() function is found here.

Code

In specifying the model, note two things. One, the ‘subject’ or experimental unit for which repeated measures are taken is specified in the corr parameter. Two, the experimental units, the participants, are modelled as a Random Effect, denoted by the numeral one followed by bar, in the 1 | Participant specification. Why? One reason is that we are dealing with a sample of participants rather than the population. Another reason is that we are not interested in the individual change in weight of each participant, but rather we are interested in accounting for the variability of the participants as a collective or group. To do so in the model, they are specified as a Random Effect.

### Covariate structure with no correlation
mod.nostruct = gls(Weight ~ Regimen + Week + Regimen*Week,
                   corr=corSymm(, form= ~ 1 | Participant),
                   weights = varIdent(form = ~ 1 | Week))

### Covariate structure with 'compound symmetry' or constant, non-zero correlation
mod.compsym = gls(Weight ~ Regimen + Week + Regimen*Week,
                  corr=corCompSymm(, form= ~ 1 | Participant))

### Covariate structure with time-series AR(1) or correlation of order 1
mod.ar1 = gls(Weight ~ Regimen + Week + Regimen*Week,
              corr=corAR1(, form= ~ 1 | Participant))

anova(mod.nostruct, mod.compsym, mod.ar1) ### Compare the three models

corrSymm() specifies a covariate structure with no structure or no correlation along measures. This is identical to specifying an OLS model that does not take time repeated measures into account. corrCompSymm specifies a covariate structure with compound symmetry or constant, non-zero correlation along measures.

corAR1 specifies a covariate structure with correlation of order one (1) along measures, meaning that correlations decay to the power of time difference. The greater the time difference along measures, the smaller the correlation. For example, if the correlation between measures at times 1 and 2 is, \(0.80^{2-1=1}\), then the correlation between measures at times 1 and 3 is \(0.80^{3-1=2} = 0.64\), and the correlation between measures at times 1 and 4 is \(0.80^{4-1=3} = 0.51\). This decay in correlation along measures is usually a more realistic assumption that either no correlation or constant correlation. The anova() function compares the three models.

Model comparison

In the model comparison, the AR(1) covariance structure model returns both lower AIC and BIC scores than the two competitors, suggesting that it offers the best fit among the models tested. The differences among the models are not stark, but confirm the earlier finding from the OLS model ACF/PACF plots that the residuals have an AR(1) structure. Hence, we select the AR(1) covariate structure as the best fit for the data.

             Model df      AIC      BIC    logLik   Test  L.Ratio p-value
mod.nostruct     1 18 2220.687 2292.170 -1092.344                        
mod.compsym      2 10 2283.248 2322.960 -1131.624 1 vs 2 78.56038  <.0001
mod.ar1          3 10 2209.081 2248.794 -1094.541

4. Model diagnostics

Model diagnostics

The ANOVA table for the AR(1) model shows that all terms, including the interaction term, are significant at 95% confidence, pointing to the statistically significant impact of the factors of time and regimen, as well as the interaction of the two, on weight.

Model diagnostics are satisfactory, satisfying the assumptions of constant variance and normality of the residuals. We are good to go and interpret the results.

AR(1) model ANOVA

	numDF	F-value	p-value
(Intercept)	1	6030.930	0.000
Regimen	1	6.026	0.015
Week	3	1614.084	0.000
Regimen:Week	3	157.393	0.000

Model fit diagnostics

Column

Results

Column

Results

The significance of the Regimen:Week interaction term means that changes in weight among participants depends on both the regimen type and duration. Therefore, we will not interpret the single, main effect terms separately.

The results show that mean weights between dieting and fasting are not statistically different at weeks 0, 2 and 4. At week 8, there is a statistically significant difference in mean weights among participants between the two regimens. Statistical differences can be identified by non-overlapping estimate confidence intervals. The Means tab displays these results in graphical and tabular form.

Two factor levels for Regimen and four factor levels for Week means that there are 8 choose 2 or 28 pair-wise comparisons or contrasts for the Regimen:Week interaction term. Ten of 28 constrasts are not significant at 95% confidence. The 18 significant constrasts are shown in graphical and tabular form in the Constrasts tab. Among other results, the data shows that there is a statistically significant difference of 18.2 pounds between an eight-week diet and an eight-week fast.

Column

Means

Regimen	Week	Mean	lower.CL	upper.CL
Dieting	0	181	175	187
Fasting	0	178	172	184
Dieting	2	177	171	183
Fasting	2	170	164	177
Dieting	4	173	167	179
Fasting	4	162	156	168
Dieting	8	165	158	171
Fasting	8	146	140	152

Contrasts

contrast	Mean	lower.CL	upper.CL	p.value
Dieting Week0 - Fasting Week8	34.708	20.463	48.954	0.000
Fasting Week0 - Fasting Week8	31.660	29.984	33.336	0.000
Dieting Week2 - Fasting Week8	30.756	16.510	45.001	0.000
Dieting Week4 - Fasting Week8	26.460	12.214	40.705	0.000
Fasting Week2 - Fasting Week8	24.024	22.653	25.394	0.000
Dieting Week0 - Fasting Week4	18.938	4.692	33.183	0.001
Dieting Week8 - Fasting Week8	18.155	3.909	32.401	0.002
Dieting Week0 - Dieting Week8	16.553	14.877	18.230	0.000
Fasting Week0 - Fasting Week4	15.889	14.519	17.260	0.000
Fasting Week4 - Fasting Week8	15.771	14.801	16.741	0.000
Dieting Week2 - Fasting Week4	14.985	0.739	29.230	0.029
Dieting Week2 - Dieting Week8	12.601	11.230	13.971	0.000
Dieting Week4 - Dieting Week8	8.305	7.335	9.275	0.000
Fasting Week2 - Fasting Week4	8.253	7.283	9.223	0.000
Dieting Week0 - Dieting Week4	8.249	6.878	9.619	0.000
Fasting Week0 - Fasting Week2	7.637	6.667	8.607	0.000
Dieting Week2 - Dieting Week4	4.296	3.326	5.266	0.000
Dieting Week0 - Dieting Week2	3.953	2.983	4.923	0.000