PEFR - A crossover study example to test analysis

library(tidyverse)

── Attaching core tidyverse packages ──── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

library(geepack)
library(CrossCarry)

Question - what is the best way to analyze a crossover study?

Naive approaches use the difference between the two periods for each subject. However, this approach does not account for the correlation between the two periods for each subject. One way to analyze a crossover study is to use a mixed model with a random effect for the subject, and to include an interaction term between period and treatment. This approach accounts for the correlation between the two periods for each subject and provides more accurate estimates of the treatment effect by detecting and partitioning variance to this interaction.

There are several approaches for mixed models, including the generalized estimating equations (GEE) and the linear mixed model (LMM). The GEE approach is more robust to misspecification of the correlation structure, while the LMM approach provides more efficient estimates when the correlation structure is correctly specified. Both approaches can be used to analyze crossover studies, but the GEE approach is more commonly used in practice.

However, the (relatively new) CrossCarry package actually tries to estimate the carryover effect, and (I will show below) provides different effect sizes and p values with the same data. This raises the question - which is correct?

The mixed models (which mostly agree) or the CrossCarry package (which is quite different)?

We will start with the PEFR dataset from Jones and Kenward (chapter 2 in 2nd edition of Design and Analysis of Cross-Over Trials). We will first create the dataset, then check the dataset, then use lmer to analyse the data, followed by geepack, then CrossCarry.

Creating the PEFR dataset from Jones and Kenward

This book describes in chapter 2 a crossover study than randomized 27 subjects to the AB sequence and 29 subjects to the BA sequence and measured PEFR in 112 distinct measurements. A = drug, B = placebo. After eligibility was confirmed, subjects went through a 14 day run-in period. They were then randomized to receive the inhaled drug (A) or placebo (B) for 4 weeks. After 4 weeks, they switched to the other treatment (with no washout) for another 4 weeks. Actually these PEFR values presented in the book are averages of PEFR measurements by subjects over 4 weeks. The subjects were supposed to measure PEFR (3 times in AM, 3 times in PM, highest measurement recorded on record cards) on each of 28 days = 56 recorded measurements, which is why many of the PEFR values end in decimals indicating a fraction with a denominator of 7 (but many do not, suggesting missing data). No attempt was made to wait for drug to take effect (or reach 5 half-lives). Data were summarized for all 28 days, and the investigators made the assumption that maximal effect is reached immediately and sustained for 4 weeks. We will use the data (average period PEFR values for 4 weeks based on this assumption) provided in Chapter 2, edition 2 to create a dataset with Subject, Period, Treatment, and PEFR.

pefr_value <- c(121.905, 116.667, 218.500, 200.500, 235.000, 217.143, 
    250.000, 196.429, 186.190, 185.500, 231.563, 221.842, 
    443.250, 420.500, 198.421, 207.692, 270.500, 213.158, 
    360.476, 384.000, 229.750, 188.250, 159.091, 221.905, 
    255.882, 253.571, 279.048, 267.619, 160.556, 163.000,
    172.105, 182.381, 267.000, 313.000, 230.750, 211.111, 
    271.190, 257.619, 276.250, 222.105, 398.750, 404.000, 
    67.778, 70.278, 195.000, 223.158, 325.000, 306.667, 
    368.077, 362.500, 228.947, 227.895, 236.667, 220.000,
    138.333, 138.571, 225.000, 256.250, 392.857, 381.429, 
    190.000, 233.333, 191.429, 228.000, 226.190, 267.143, 
    201.905, 193.500, 134.286, 128.947, 238.000, 248.500, 
    159.500, 140.000, 232.750, 276.563, 172.308, 170.000, 
    266.000, 305.000, 171.333, 186.333, 194.737, 191.429, 
    200.000, 222.619, 146.667, 183.810, 208.000, 241.667, 
    208.750, 218.810, 271.429, 225.000, 143.810, 188.500,
    104.444, 135.280, 145.238, 152.857, 215.385, 240.476, 
    306.000, 288.333, 160.526, 150.476, 353.810, 369.048, 
    293.889, 308.095, 371.190, 404.762)

subject <- rep(c(7,8,9,13,14,15,17,21,22,28,35,36,37,38,41,44,58,66,71,76,79,80,81,82,86,89,90,3,10,11,16,18,23,24,26,27,29,30,32,33,39,43,47,51,52,55,59,68,70,74,77,78,83,84,85,99), each = 2) |> 
  as_factor() 

sequence <- c(rep(x = c("AB"), 54), rep(x = c("BA"), 58)) |>
  as_factor() |> 
  fct_relevel("AB", "BA")

period <- c(rep(x = c(1, 2), 56)) |>
  as_factor() |> 
  fct_relevel("1", "2")
  
treatment <- c(rep(x = c("drug", "placebo"), 27),  rep(x = c("placebo", "drug"), 29)) |> 
  as_factor() |> 
  fct_relevel("placebo", "drug")


pefr <- data.frame(subject, period, sequence, treatment, pefr_value)

Evaluate and check the data

pefr |> 
  tabyl(sequence)

 sequence  n   percent
       AB 54 0.4821429
       BA 58 0.5178571

pefr |> 
  tabyl(period)

 period  n percent
      1 56     0.5
      2 56     0.5

pefr |> 
  tabyl(treatment)

 treatment  n percent
   placebo 56     0.5
      drug 56     0.5

These match up to the data in the book.

Reproduce data in table 2.3

options(pillar.sigfig = 5)

pefr |> 
  summarise(mean = mean(pefr_value))

      mean
1 232.4521

pefr |> 
  group_by(period, sequence) |> 
  summarise(mean = mean(pefr_value))

`summarise()` has grouped output by 'period'. You can
override using the `.groups` argument.

# A tibble: 4 × 3
# Groups:   period [2]
  period sequence   mean
  <fct>  <fct>     <dbl>
1 1      AB       245.84
2 1      BA       215.99
3 2      AB       239.20
4 2      BA       230.16

pefr |> 
  group_by(sequence) |> 
  summarise(mean = mean(pefr_value))

# A tibble: 2 × 2
  sequence   mean
  <fct>     <dbl>
1 AB       242.52
2 BA       223.08

pefr |> 
  group_by(period) |> 
  summarise(mean = mean(pefr_value))

# A tibble: 2 × 2
  period   mean
  <fct>   <dbl>
1 1      230.38
2 2      234.52

These all match up with the table, with one (pair of) exceptions. Kenward and Jones appear to have switched the means for Period 2 between sequence AB and sequence BA. Sequence AB for Period 2 should be 239.20, and sequence BA for Period 2 should be 230.16. This error is only in this table, and not elsewhere in the book.

Reproduce plots in Fig 2.1

We will now test the data by trying to reproduce the plots in the book.

pefr |> 
  pivot_wider(names_from = period,
              names_prefix = "pefr_",
              values_from = pefr_value,
              id_cols = c(subject, sequence)) |> 
  ggplot(aes(x = pefr_1, y = pefr_2)) + 
  geom_point() +
  geom_smooth(method='lm', formula= y~x) +
  facet_wrap(vars(sequence)) +
  labs(title = "Figure 2.1: Period 2 vs Period 1 plots")

These look identical to the plots in the book.

Reproduce plot in Fig 2.2

pefr |> 
  pivot_wider(names_from = period,
              names_prefix = "pefr_",
              values_from = pefr_value,
              id_cols = c(subject, sequence)) |> 
  ggplot(aes(x = pefr_1, y = pefr_2)) + 
  geom_point() +
  geom_smooth(method='lm', formula= y~x) +
  labs(title = "Figure 2.2: Period 2 vs Period 1 plot overall")

This looks identical to the plot in the book.

Reproduce plot in Fig 2.3

pefr |> 
  ggplot(aes(x = period, y = pefr_value)) + 
  geom_point() +
  geom_line(aes(group = subject)) +
  facet_wrap(vars(sequence)) +
  labs(title = "Figure 2.3: Profiles plot for PEFR data",
       x = "Period")

These plots look identical to the plots in the book.

Reproduce plot in Fig 2.4

pefr |> 
  mutate(label = paste0(as.character(period),"_", treatment)) |> 
  group_by(period, treatment) |> 
  mutate(mean_pefr = mean(pefr_value)) |> 
  ungroup() |>
  ggplot(aes(x = period, 
             y = mean_pefr)) + 
  geom_point() +
  geom_text(aes(label = label),
            hjust = 0,
            nudge_x = 0.05) +
  geom_line(aes(group = subject)) +
  labs(title = "Figure 2.4: Group by Periods plot for PEFR data",
       x = "Period")

This plot shows a few suspicious things.

the drug effect appears much bigger (higher pefr vs. PBO) in period 1 vs period 2
the placebo appears much better in period 2 (after drug) vs period 1. Placebo in Period 2 even looks numerically better than Drug (uh-oh).
the upward line for BA sequence is much steeper (+15) than the downward line (-6) for the AB sequence.
This suggests that there may be a fair amount of carryover (less loss in response) from drug to placebo in the AB sequence (upper line, drug then placebo), while the folks in the BA (placebo then drug, lower line) sequence improve a lot in period 2. This makes sense, as there was no washout.
these sloped lines with a steeper BA slope suggest a sequence * period interaction, also known as a carryover effect.
In theory, with no carryover, and drug being consistently better than placebo, this plot should look like an X (see below for example).

Example of consistent drug effect of +30 on PEFR, no carryover

data <- tibble::tribble(
  ~period, ~sequence, ~mean_pefr, ~label,
  '1', 'AB', 245, "1_drug",
  '1', 'BA', 215, "1_placebo",
  '2', 'AB', 215, "2_placebo",
  '2', 'BA', 245, "2_drug"
) |> 
  mutate(period = factor(period)) |> 
  mutate(sequence = factor(sequence))

data |> 
  ggplot(aes(x = period, 
             y = mean_pefr)) + 
  geom_point() +
  geom_text(aes(label = label),
            hjust = 0,
            nudge_x = 0.05) +
  geom_line(aes(group = sequence)) +
  labs(title = "Example: Group by Periods plot for Example PEFR data",
       x = "Period",
       subtitle = "Assuming no Carryover")

Interpretation:

This shows what you should expect from this kind of plot if

There is a consistent +30 effect on PEFR by drug
There is no effect of placebo
The baseline is 215 and consistent over time.
There is no carryover effect.

Drug should win in both periods if it actually works. By roughly the same amount over placebo (considering statistical noise) in each period. Now go back (up) and look at the actual plot in Figure 2.4. Not very close to this, right?

Hmmm. There seems to be a lot of carryover to sort out here.

Off to model with lmer

Let’s see what the lmer package analysis shows.

Analysis of crossover study with lmer

lmodel1

Start with an attempt at an interaction model with lmer, using an example from stack overflow

https://stats.stackexchange.com/questions/617092/interpret-results-of-a-mixed-effects-model-analyzing-a-2x2-cross-over-study

lmodel1 <- lmer(pefr_value ~ treatment * period + 
                  sequence + (1|subject), 
           data = pefr)

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

summary(lmodel1)

Linear mixed model fit by REML. t-tests use
  Satterthwaite's method [lmerModLmerTest]
Formula: 
pefr_value ~ treatment * period + sequence + (1 | subject)
   Data: pefr

REML criterion at convergence: 1139.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.00240 -0.50464  0.02372  0.40878  1.84254 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 5715.2   75.60   
 Residual              326.3   18.06   
Number of obs: 112, groups:  subject, 56

Fixed effects:
              Estimate Std. Error      df t value
(Intercept)    235.435     14.945  56.789  15.754
treatmentdrug   10.403      3.416  54.000   3.046
period2          3.768      3.416  54.000   1.103
sequenceBA     -19.444     20.504  54.000  -0.948
                          Pr(>|t|)    
(Intercept)   < 0.0000000000000002 ***
treatmentdrug              0.00359 ** 
period2                    0.27487    
sequenceBA                 0.34721    
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtmnt perid2
treatmntdrg -0.110              
period2     -0.110 -0.036       
sequenceBA  -0.711  0.000  0.000
fit warnings:
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

Interpretation:

This shows that treatment (drug) is significantly better than placebo (p=0.0036), but period and sequence are not significant. This gives us the p values shown on page 41 on Jones and Kenward. NOTE: This model had to drop the interaction term to avoid rank deficiency. The provided warning says: “fixed-effect model matrix is rank deficient so dropping 1 column / coefficient”.

I think that this is because the subject number predicts (the assigned) sequence perfectly.

Now with lmodel2

Remove sequence from the model, as it is predicted by subject. Keep subject as a random effect, even in the interaction term.

lmodel2 <- lmer(pefr_value ~ treatment + 
                  period * (1|subject) + 
                  (1|subject), 
           data = pefr)

Warning: Model failed to converge with 1 negative
eigenvalue: -4.5e-06

summary(lmodel2)

Linear mixed model fit by REML. t-tests use
  Satterthwaite's method [lmerModLmerTest]
Formula: 
pefr_value ~ treatment + period * (1 | subject) + (1 | subject)
   Data: pefr

REML criterion at convergence: 1148.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.98704 -0.49028  0.02717  0.40212  1.85789 

Random effects:
 Groups    Name        Variance Std.Dev.
 subject   (Intercept) 5649.15  75.161  
 subject.1 (Intercept)   55.25   7.433  
 Residual               326.26  18.063  
Number of obs: 112, groups:  subject, 56

Fixed effects:
              Estimate Std. Error      df t value
(Intercept)    225.366     10.507  60.885  21.449
treatmentdrug   10.403      3.416  54.000   3.046
period2          3.768      3.416  54.000   1.103
                          Pr(>|t|)    
(Intercept)   < 0.0000000000000002 ***
treatmentdrug              0.00359 ** 
period2                    0.27487    
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtmnt
treatmntdrg -0.157       
period2     -0.157 -0.036

Interpretation:

This gives us the p values shown on page 41 on Jones and Kenward. But this model also does not show the interaction term, as we are treating the subject as a random effect, even for the random effect of interaction of period by subject. I am not sure if it is using the interaction term but not showing it, or if it is not truly included.

Now with lmodel3

Try without an interaction term of period * sequence, keeping subject as a random effect

lmodel3 <- lmer(pefr_value ~ treatment + period + 
             sequence + (1|subject), 
           data = pefr)
summary(lmodel3)

Linear mixed model fit by REML. t-tests use
  Satterthwaite's method [lmerModLmerTest]
Formula: 
pefr_value ~ treatment + period + sequence + (1 | subject)
   Data: pefr

REML criterion at convergence: 1139.8

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.00240 -0.50464  0.02372  0.40878  1.84254 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 5715.2   75.60   
 Residual              326.3   18.06   
Number of obs: 112, groups:  subject, 56

Fixed effects:
              Estimate Std. Error      df t value
(Intercept)    235.435     14.945  56.789  15.754
treatmentdrug   10.403      3.416  54.000   3.046
period2          3.768      3.416  54.000   1.103
sequenceBA     -19.444     20.504  54.000  -0.948
                          Pr(>|t|)    
(Intercept)   < 0.0000000000000002 ***
treatmentdrug              0.00359 ** 
period2                    0.27487    
sequenceBA                 0.34721    
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtmnt perid2
treatmntdrg -0.110              
period2     -0.110 -0.036       
sequenceBA  -0.711  0.000  0.000

Interpretation:

lmodel3 sacrifices the interaction term, but is able to keep subject as random effect. This still gives us the p value for treatment of 0.0036, as in the book. But this does not partition any variance to carryover. Is the treatment p value of 0.0036 correct? Probably not, if we are not measuring the carryover/the interaction between period nd sequence.

Now try geepack

A different approach to modeling.

gmodel1

Now try this with the geepack package without an interaction term with id = subject. Requires sort in order before modeling.

pefr2 <- pefr |> 
  mutate(id = paste0(subject, period)) |> 
           arrange(subject, period)
gmodel1 <- geeglm(pefr_value ~ treatment + period + sequence, 
             id = subject, waves = period,
           data = pefr2, family = "gaussian", corstr = "exchangeable")
summary(gmodel1)


Call:
geeglm(formula = pefr_value ~ treatment + period + sequence, 
    family = "gaussian", data = pefr2, id = subject, waves = period, 
    corstr = "exchangeable")

 Coefficients:
              Estimate Std.err    Wald             Pr(>|W|)
(Intercept)    235.435  15.458 231.967 < 0.0000000000000002
treatmentdrug   10.403   3.374   9.508              0.00205
period2          3.768   3.374   1.247              0.26409
sequenceBA     -19.444  20.215   0.925              0.33614
                 
(Intercept)   ***
treatmentdrug ** 
period2          
sequenceBA       
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     5826    1146
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.946 0.01633
Number of clusters:   56  Maximum cluster size: 2

Interpretation:

This gives us a p value of 0.002 for treatment, without measuring an interaction term. Is this real?

gmodel2

Now try this with the geepack package AND an interaction term

pefr2 <- pefr |> 
  mutate(id = paste0(subject, period)) |> 
           arrange(subject, period)
gmodel2 <- geeglm(pefr_value ~ treatment + period * sequence, 
             id = subject, 
           data = pefr2, family = "gaussian", corstr = "ar1")

  (Intercept) treatmentdrug period2 sequenceBA
1           1             0       0          1
2           1             1       1          1
3           1             1       0          0
4           1             0       1          0
5           1             1       0          0
6           1             0       1          0
  period2:sequenceBA
1                  0
2                  1
3                  0
4                  0
5                  0
6                  0

Error in geeglm(pefr_value ~ treatment + period * sequence, id = subject, : Model matrix is rank deficient; geeglm can not proceed

summary(gmodel2)

Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'gmodel2' not found

Interpretation:

Model is rank deficient, geeglm can not proceed - perhaps because subject predicts sequence. No output.

gmodel3

Try without sequence, using geeglm from geepack package.

pefr2 <- pefr |> 
  mutate(id = paste0(subject, period)) |> 
           arrange(subject, period)
gmodel3 <- geeglm(pefr_value ~ treatment + period * subject, 
             id = id,
           data = pefr2, family = "gaussian", corstr = "ar1")

  (Intercept) treatmentdrug period2 subject7 subject8
1           1             0       0        0        0
2           1             1       1        0        0
3           1             1       0        1        0
  subject9 subject10 subject11 subject13 subject14
1        0         0         0         0         0
2        0         0         0         0         0
3        0         0         0         0         0
  subject15 subject16 subject17 subject18 subject21
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject22 subject23 subject24 subject26 subject27
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject28 subject29 subject30 subject32 subject33
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject35 subject36 subject37 subject38 subject39
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject41 subject43 subject44 subject47 subject51
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject52 subject55 subject58 subject59 subject66
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject68 subject70 subject71 subject74 subject76
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject77 subject78 subject79 subject80 subject81
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject82 subject83 subject84 subject85 subject86
1         0         0         0         0         0
2         0         0         0         0         0
3         0         0         0         0         0
  subject89 subject90 subject99 period2:subject7
1         0         0         0                0
2         0         0         0                0
3         0         0         0                0
  period2:subject8 period2:subject9 period2:subject10
1                0                0                 0
2                0                0                 0
3                0                0                 0
  period2:subject11 period2:subject13 period2:subject14
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject15 period2:subject16 period2:subject17
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject18 period2:subject21 period2:subject22
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject23 period2:subject24 period2:subject26
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject27 period2:subject28 period2:subject29
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject30 period2:subject32 period2:subject33
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject35 period2:subject36 period2:subject37
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject38 period2:subject39 period2:subject41
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject43 period2:subject44 period2:subject47
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject51 period2:subject52 period2:subject55
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject58 period2:subject59 period2:subject66
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject68 period2:subject70 period2:subject71
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject74 period2:subject76 period2:subject77
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject78 period2:subject79 period2:subject80
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject81 period2:subject82 period2:subject83
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject84 period2:subject85 period2:subject86
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
  period2:subject89 period2:subject90 period2:subject99
1                 0                 0                 0
2                 0                 0                 0
3                 0                 0                 0
 [ reached getOption("max.print") -- omitted 3 rows ]

Error in geeglm(pefr_value ~ treatment + period * subject, id = id, data = pefr2, : Model matrix is rank deficient; geeglm can not proceed

summary(gmodel3)

Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'gmodel3' not found

Interpretation:

This does not work. Not entirely sure why, as this has subject without sequence.

Model matrix is rank deficient; geeglm can not proceed

If sequence is assigned to each subject, you can’t have both subject and sequence in the model. But not sure why this one did not work. Maybe because id predicts subject? Note if you replace id with subject in this one, you get the same error.

Start CrossCarry Analysis

Start by calculating carryover

carryover <- createCarry(data = pefr, treatment = "treatment",
  id = "subject", period = "period", carrySimple = TRUE)

[1] "This variable was added to data: Carry_drug"

Build model

Then we can build the crossGEE model with the code below.

This results in a 2-item list named model, which contains (1) a QIC dataframe, and (2) a 28-item list of model components

model <- CrossGEE(response = "pefr_value", 
                  treatment = "treatment", 
                  correlation = "ar1",
                  period = "period", 
                  id="subject", 
                  carry=carryover$carryover,
                  data=carryover$data, 
                  family=gaussian())

Look at QIC

We can view the QIC dataframe with

model$QIC

          geepack..QIC.model1.
QIC                 652482.773
QICu                652482.741
Quasi Lik          -326237.370
CIC                      4.016
params                   4.000
QICC                652483.973

Look at the Model

You can also look at the model with

model$model


Call:
geepack::geeglm(formula = form1, family = family, data = data, 
    id = id, corstr = correlation)

Coefficients:
  (Intercept)       period2 treatmentdrug    Carry_drug 
       215.99        -15.68         29.85         38.89 

Degrees of Freedom: 112 Total (i.e. Null);  108 Residual

Scale Link:                   identity
Estimated Scale Parameters:  [1] 5826

Correlation:  Structure = ar1    Link = identity 
Estimated Correlation Parameters:
alpha 
0.946 

Number of clusters:   56   Maximum cluster size: 2

and see the summary with

summary(model$model)


Call:
geepack::geeglm(formula = form1, family = family, data = data, 
    id = id, corstr = correlation)

 Coefficients:
              Estimate Std.err   Wald            Pr(>|W|)
(Intercept)      216.0    13.3 265.64 <0.0000000000000002
period2          -15.7    20.6   0.58                0.45
treatmentdrug     29.8    20.5   2.12                0.15
Carry_drug        38.9    40.4   0.93                0.34
                 
(Intercept)   ***
period2          
treatmentdrug    
Carry_drug       
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = ar1 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     5826    1146
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.946  0.0163
Number of clusters:   56  Maximum cluster size: 2

Interpretation:

The drug treatment increases PEFR by 29.8 points compared to placebo, but did not reach statistical significance (Ste Err 20.5, p=0.15). None of the periods, or carryover vars are significantly associated with PEFR. The period estimate is a bit puzzling, as it is -15.7 for period 2 vs period 1, but the mean PEFR for P2 was higher than P1, as seen previously in the table 2.3 (234.52 vs 230.38). The carryover for drug variable has a large (38.9) estimate, but also has a large standard error (40.4), so it is not significant (p=0.34).

Per the SAS code in the Kenward and Jones book, sequence (also known as group) and period are NS, while treatment has a p value of 0.003 with the same data, as we saw with lmer, or 0.002 with geepack.

So which is correct?

Is treatment significant or not?

The original study does not mention a washout period, so some carryover is reasonable.