Crossover experiment

CLIENT

Brooke Fujioka

DATE

August 6, 2025

From client:

Our study looks at the effects of creatine, carbohydrates and creatine-carbohydrate mix on endurance performance. We have 10 participants take a placebo, and the four treatments listed above with a 3 week washout [period] between each trial. Each trial was a ride to exhaustion, meaning a fixed workload of 65% of Vo2 max until they couldn’t ride any more.

Data provided:

id	group	time
1	P	6546
1	CR	5264
1	CHO	6370
1	CR-CHO	6240
2	P	4515
2	CR	7320

Request:

Assistance with model specification to test for a treatment effect
Review provided analysis codes for accuracy

Assessment

The treatments have factorial structure, so can be encoded in terms of two factors (instead of a single “treatment” variable). This will facilitate estimating effects of CR and CHO separately as well as their interaction. I’d also suggest recording time in minutes instead of seconds for ease of interpretation.

subj.id	cr	cho	time.min
1	N	N	109.1
1	Y	N	87.73
1	N	Y	106.2
1	Y	Y	104
2	N	N	75.25
2	Y	N	122

Visually there’s no obvious treatment effect – most rides last around 120 minutes regardless of treatment group.

Ignoring crossover

The usual model without considering the crossover design would be:

\[ \text{time}_{ijk} = \mu + \text{cr}_i + \text{cho}_j + (\text{cr}\times\text{cho})_{ij} + \epsilon_{ijk} \quad \begin{cases} \text{treatments } i, j = 1, 2 \\ \text{subjects } k = 1, \dots, 10 \\ \epsilon_{ijk} \stackrel{\text{iid}}{\sim} N(0, \sigma^2) \end{cases} \] It’s often helpful to fit this for assessment purposes.

Code

# fit 
fit.lm <- lm(time.min ~ cr*cho, data = clean)
fit.lm


Call:
lm(formula = time.min ~ cr * cho, data = clean)

Coefficients:
(Intercept)          crY         choY     crY:choY  
    116.028       -9.208        2.905       10.220

The coefficient estimates give comparisons with the control:

(Intercept): mean ride time for the control group
crY: the difference in mean ride time on the Creatine treatment relative to the control
choY: the difference in mean ride time on the Carbohydrate treatment relative to the control
crY:choY: the difference in mean ride time on the combination treatment relative to the control

Coding the treatments in a two-way structure allows one to check factorial contrasts, which are the average effects of each treatment.

cr: estimated effect of CR averaged over levels of CHO
cho: estimated effect of CHO, averaged over levels of CR
cr.cho: estimated difference in the estimated effect of CHO between levels of CR

Code

# factorial contrasts
fac.eff.emmc <- function(levels, ...){
  data_grid(clean, cr, cho) |>
    bind_cols(cr.contr = 0.5*rep(c(-1, 1), each = 2),
              cho.contr = 0.5*rep(c(-1, 1), 2)) |>
    mutate(cr.cho.contr = (2*cr.contr)*(2*cho.contr)) |>
    arrange(cho, cr) |>
    select(ends_with('contr')) |>
    rename_with(~str_remove_all(.x, '.contr'))
}

fit.lm |>
  emmeans(spec = ~ cr*cho) |>
  contrast('fac.eff') |> 
  broom::tidy() |>
  select(contrast, estimate, std.error, df, statistic, p.value) |>
  pander()

contrast	estimate	std.error	df	statistic	p.value
cr	-4.098	13.19	36	-0.3107	0.7578
cho	8.015	13.19	36	0.6077	0.5472
cr.cho	10.22	26.38	36	0.3875	0.7007

The standard errors don’t account for the crossover design so these tests aren’t appropriate, but they give a preliminary indication that there are likely no treatment effects. Creatine is associated with a slight decrease in mean ride time (estimate: -4m) and carbohydrate with a slight decrease (estimate: +8m). The interaction effect suggests that if carbohydrate is added to the creatine treatment, then the effect on mean ride time changes from -11m to -1m.

Adjusting for within-subject correlations

As a result of the crossover design, one would expect that trials on the same subject would be correlated. We can check this by looking at residuals by subject using the assessment model above. It’s fairly clear that there is some within-subject correlation, since the residuals for each subject are fairly consistent across treatments – e.g., subjects 4 and 5 are consistently riding about 50min faster than average regardless of treatment.

The usual approach here is to add subject as a random effect:

\[ \text{time}_{ijk} = \mu + \text{cr}_i + \text{cho}_j + (\text{cr}\times\text{cho})_{ij} + \gamma_k + \epsilon_{ijk} \quad \begin{cases} \text{treatments } i, j = 1, 2 \\ \text{subjects } k = 1, \dots, 10 \\ \gamma_k \stackrel{\text{iid}}{\sim} N(0, \sigma^2_\gamma) \\ \epsilon_{ijk} \stackrel{\text{iid}}{\sim} N(0, \sigma^2) \\ \gamma_k \perp \epsilon_{ijk} \end{cases} \]

This is the model that the GPT-generated code provided by the client fits. In this model the variance component $\sigma^2_\gamma$ captures within-subject correlations.

Code

# client-provided GPT-generated code
fit.lme <- lmer(time.min ~ cr*cho + (1|subj.id), data = clean)
summary(fit.lme)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: time.min ~ cr * cho + (1 | subj.id)
   Data: clean

REML criterion at convergence: 346.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7197 -0.4575 -0.0550  0.6184  1.9747 

Random effects:
 Groups   Name        Variance Std.Dev.
 subj.id  (Intercept) 1411.0   37.56   
 Residual              328.4   18.12   
Number of obs: 40, groups:  subj.id, 10

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)  116.028     13.189  12.104   8.798 1.32e-06 ***
crY           -9.208      8.104  27.000  -1.136    0.266    
choY           2.905      8.104  27.000   0.358    0.723    
crY:choY      10.220     11.461  27.000   0.892    0.380    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
         (Intr) crY    choY  
crY      -0.307              
choY     -0.307  0.500       
crY:choY  0.217 -0.707 -0.707

What you see under Fixed effects in the output provides model estimates that match lm but with different standard errors that have now been adjusted for the within-subject correlation. What you see under Random effects are the variance component estimates $\hat{\sigma}^2_\gamma$ (subj.id) and $\hat{\sigma}^2$ (Residual). The estimated within-subject correlation is:

\[ \frac{\hat{\sigma}^2_\gamma}{\hat{\sigma}^2_\gamma + \hat{\sigma}^2} = 0.81 \]

Notice that without adjusting for within-subject correlations the estimated error standard deviation is $\hat{\sigma} = 41.71$ but that after adjustment using the mixed-effects model the estimate is $\hat{\sigma} = 18.12$. Since these estimates form the denominators for $F$ tests in the ANOVA, including the random effect makes a considerable difference for the statistical inference.

Nonetheless, the proper analysis still does not identify any treatment effects:

Code

# anova using mixed effects model
anova(fit.lme) |>
  pander()

Type III Analysis of Variance Table with Satterthwaite’s method
	Sum Sq	Mean Sq	NumDF	DenDF	F value	Pr(>F)
cr	168	168	1	27	0.5115	0.4806
cho	642.4	642.4	1	27	1.956	0.1733
cr:cho	261.1	261.1	1	27	0.7952	0.3804

The factorial effects are similarly insignificant:

contrast	estimate	SE	df	t.ratio	p.value
cr	-4.098	5.73	27	-0.7152	0.4806
cho	8.015	5.73	27	1.399	0.1733
cr.cho	10.22	11.46	27	0.8917	0.3804

The only thing that comes out significant in this analysis is the effect of subject in the crossover design – data indicate a strong within-subject correlation ($\chi^2_1 = 33.91$, $p < 0.0001$).

Code

# test for effect of subject
ranova(fit.lme)

ANOVA-like table for random-effects: Single term deletions

Model:
time.min ~ cr + cho + (1 | subj.id) + cr:cho
              npar  logLik    AIC   LRT Df Pr(>Chisq)    
<none>           6 -173.03 358.07                        
(1 | subj.id)    5 -189.99 389.98 33.91  1  5.772e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary

The codes provided fit a reasonable model that should indeed be used instead of a standard ANOVA assuming independent observations. The specification is explained in detail above. None of the treatment effects are significant, but there is a strong effect of subject in the crossover design.

--- title: "Crossover experiment" author: "Brooke Fujioka" author-title: "CLIENT" date: "8/6/25" published-title: "DATE" format: html: code-fold: true code-tools: true toc: true execute: warning: false message: false echo: false --- From client: > Our study looks at the effects of creatine, carbohydrates and creatine-carbohydrate mix on endurance performance. We have 10 participants take a placebo, and the four treatments listed above with a 3 week washout [period] between each trial. Each trial was a ride to exhaustion, meaning a fixed workload of 65% of Vo2 max until they couldn’t ride any more. Data provided: ```{r} library(tidyverse) library(lme4) library(lmerTest) library(emmeans) library(modelr) library(pander) # read in data raw <- readxl::read_excel('CreatineCarbohydrate-EnduranceTest Data_2025 - Copy.xlsx', sheet = 'Sheet1') |> select(ID, Group, Time) |> rename_with(tolower) # preview raw |> head() |> pander() ``` Request: 1. Assistance with model specification to test for a treatment effect 2. Review provided analysis codes for accuracy ## Assessment The treatments have factorial structure, so can be encoded in terms of *two* factors (instead of a single "treatment" variable). This will facilitate estimating effects of CR and CHO separately as well as their interaction. I'd also suggest recording time in minutes instead of seconds for ease of interpretation. ```{r} # reformat data clean <- raw |> rename(subj.id = id, time.sec = time, trt = group) |> mutate(time.min = time.sec/60, cr = str_detect(trt, 'CR') |> factor(labels = c('N', 'Y')), cho = str_detect(trt, 'CHO') |> factor(labels = c('N', 'Y'))) |> select(subj.id, cr, cho, time.min) # preview clean |> head() |> pander() ``` Visually there's no obvious treatment effect -- most rides last around 120 minutes regardless of treatment group. ```{r} # no obvious treatment effects clean |> ggplot(aes(x = interaction(cr, cho), y = time.min)) + geom_boxplot() + theme_bw() ``` ### Ignoring crossover The usual model without considering the crossover design would be: $$ \text{time}_{ijk} = \mu + \text{cr}_i + \text{cho}_j + (\text{cr}\times\text{cho})_{ij} + \epsilon_{ijk} \quad \begin{cases} \text{treatments } i, j = 1, 2 \\ \text{subjects } k = 1, \dots, 10 \\ \epsilon_{ijk} \stackrel{\text{iid}}{\sim} N(0, \sigma^2) \end{cases} $$ It's often helpful to fit this for assessment purposes. ```{r, echo = T} # fit fit.lm <- lm(time.min ~ cr*cho, data = clean) fit.lm ``` The coefficient estimates give comparisons with the control: - `(Intercept)`: mean ride time for the control group - `crY`: the difference in mean ride time on the Creatine treatment relative to the control - `choY`: the difference in mean ride time on the Carbohydrate treatment relative to the control - `crY:choY`: the difference in mean ride time on the combination treatment relative to the control Coding the treatments in a two-way structure allows one to check factorial contrasts, which are the *average* effects of each treatment. - `cr`: estimated effect of CR averaged over levels of CHO - `cho`: estimated effect of CHO, averaged over levels of CR - `cr.cho`: estimated difference in the estimated effect of CHO between levels of CR ```{r, echo = T} # factorial contrasts fac.eff.emmc <- function(levels, ...){ data_grid(clean, cr, cho) |> bind_cols(cr.contr = 0.5*rep(c(-1, 1), each = 2), cho.contr = 0.5*rep(c(-1, 1), 2)) |> mutate(cr.cho.contr = (2*cr.contr)*(2*cho.contr)) |> arrange(cho, cr) |> select(ends_with('contr')) |> rename_with(~str_remove_all(.x, '.contr')) } fit.lm |> emmeans(spec = ~ cr*cho) |> contrast('fac.eff') |> broom::tidy() |> select(contrast, estimate, std.error, df, statistic, p.value) |> pander() ``` The standard errors don't account for the crossover design so these tests aren't appropriate, but they give a preliminary indication that there are likely no treatment effects. Creatine is associated with a slight decrease in mean ride time (estimate: -4m) and carbohydrate with a slight decrease (estimate: +8m). The interaction effect suggests that if carbohydrate is added to the creatine treatment, then the effect on mean ride time changes from -11m to -1m. ### Adjusting for within-subject correlations As a result of the crossover design, one would expect that trials on the same subject would be correlated. We can check this by looking at residuals by subject using the assessment model above. It's fairly clear that there is some within-subject correlation, since the residuals for each subject are fairly consistent across treatments -- *e.g.*, subjects 4 and 5 are consistently riding about 50min faster than average regardless of treatment. ```{r} # check for residual correlation clean |> add_residuals(fit.lm) |> add_predictions(fit.lm) |> arrange(cho, cr, subj.id) |> ggplot(aes(x = interaction(cr, cho), y = resid, color = factor(subj.id))) + geom_path(aes(group = factor(subj.id))) + theme_bw() ``` The usual approach here is to add subject as a random effect: $$ \text{time}_{ijk} = \mu + \text{cr}_i + \text{cho}_j + (\text{cr}\times\text{cho})_{ij} + \gamma_k + \epsilon_{ijk} \quad \begin{cases} \text{treatments } i, j = 1, 2 \\ \text{subjects } k = 1, \dots, 10 \\ \gamma_k \stackrel{\text{iid}}{\sim} N(0, \sigma^2_\gamma) \\ \epsilon_{ijk} \stackrel{\text{iid}}{\sim} N(0, \sigma^2) \\ \gamma_k \perp \epsilon_{ijk} \end{cases} $$ This is the model that the GPT-generated code provided by the client fits. In this model the variance component $\sigma^2_\gamma$ captures within-subject correlations. ```{r, echo = T} # client-provided GPT-generated code fit.lme <- lmer(time.min ~ cr*cho + (1|subj.id), data = clean) summary(fit.lme) ``` What you see under `Fixed effects` in the output provides model estimates that match `lm` but with different standard errors that have now been adjusted for the within-subject correlation. What you see under `Random effects` are the variance component estimates $\hat{\sigma}^2_\gamma$ (`subj.id`) and $\hat{\sigma}^2$ (`Residual`). The estimated within-subject correlation is: ```{r} # estimated variance components var.comps <- VarCorr(fit.lme) |> as.data.frame() |> pull(vcov) ``` $$ \frac{\hat{\sigma}^2_\gamma}{\hat{\sigma}^2_\gamma + \hat{\sigma}^2} = `r round(var.comps[1]/sum(var.comps), 2)` $$ Notice that without adjusting for within-subject correlations the estimated error standard deviation is $\hat{\sigma} = `r round(sigma(fit.lm), 2)`$ but that after adjustment using the mixed-effects model the estimate is $\hat{\sigma} = `r round(sigma(fit.lme), 2)`$. Since these estimates form the denominators for $F$ tests in the ANOVA, including the random effect makes a considerable difference for the statistical inference. Nonetheless, the proper analysis still does not identify any treatment effects: ```{r, echo = T} # anova using mixed effects model anova(fit.lme) |> pander() ``` The factorial effects are similarly insignificant: ```{r} # factorial effects using mixed effects model fit.lme |> emmeans(specs = ~cr*cho) |> contrast('fac.eff') |> as_tibble() |> pander() ``` The only thing that comes out significant in this analysis is the effect of subject in the crossover design -- data indicate a strong within-subject correlation ($\chi^2_1 = `r ranova(fit.lme) |> broom::tidy() |> slice(2) |> pull(LRT) |> round(2)`$, $p < 0.0001$). ```{r, echo = T} # test for effect of subject ranova(fit.lme) ```         ## Summary The codes provided fit a reasonable model that should indeed be used instead of a standard ANOVA assuming independent observations. The specification is explained in detail above. None of the treatment effects are significant, but there is a strong effect of subject in the crossover design.