<- CrossCarry::Arterial
art $Subject |> table() art
1 2 3 4 5 6 7 8 9 10 11 12
30 30 30 30 30 30 30 30 30 30 30 30
We will simulate datasets for a planned clinical trial for fatigue in IBD in the absence of a flare, iron deficiency anemia, or hypothyroidism.
We will start with a test example from the CrossCarry package, the Arterial dataset.
Let’s read this in and examine it.
<- CrossCarry::Arterial
art $Subject |> table() art
1 2 3 4 5 6 7 8 9 10 11 12
30 30 30 30 30 30 30 30 30 30 30 30
There are 12 subjects, and 10 time points (from -30 to 240 min) per subject per period, there are 3 periods, and three treatments (A,B,C). The outcome is Pressure = SBP in mm Hg. The 5 variables are subject, period, treatment, pressure, and time. The paper on the CrossCarry package is at https://arxiv.org/pdf/2304.02440.
You can estimate the carryover with this code (from the paper).
This adds variables for Carry_B and Carry_C to the 2-item list carryover
, which includes the (1) original dataset, to which has been added the new (1/0 indicator) variables Carry_B and Carry_C, and (2) the vector of carryover variables.
<- createCarry(data=Arterial, treatment = "Treatment",
carryover id = "Subject", period = "Period", carrySimple = TRUE)
[1] "This variable was added to data: Carry_B"
[2] "This variable was added to data: Carry_C"
Then we can build the crossGEE model with the code below.
This results in a 2-item list named model, which contains (1) a QIC dataframe, and (2) a 28-item list of model components
<- CrossGEE(response = "Pressure", treatment = "Treatment",
model period = "Period", id="Subject", carry=carryover$carryover,
data=carryover$data, correlation = "exchangeable", family=gaussian())
We can view the QIC dataframe with
$QIC model
geepack..QIC.model1.
QIC 49010.57675
QICu 48982.50645
Quasi Lik -24484.25322
CIC 21.03515
params 7.00000
QICC 49058.57675
The output of the QIC is displayed, which is used to compare different models for the same variable. You can make a new model by adding the Time variable for minutes.
<- CrossGEE(response = "Pressure", treatment = "Treatment", period = "Period", id="Subject", carry=carryover$carryover, data=carryover$data, correlation = "exchangeable", covar = "Time", family=gaussian()) model2
We can view the QIC, which has changed in model 2.
$QIC model2
geepack..QIC.model1.
QIC 48937.80269
QICu 48910.01738
Quasi Lik -24447.00869
CIC 21.89266
params 8.00000
QICC 49027.80269
In model 2, It is clearly observed that there is reduction in the QIC, QICu, Quasi Lik and QICC, but the CIC did increase.
You can also look at the model with
$model model2
Call:
geepack::geeglm(formula = form1, family = family, data = data,
id = id, corstr = correlation)
Coefficients:
(Intercept) Period2 Period3 TreatmentB
107.52940285 2.06062091 0.93562091 1.55792877
TreatmentC Carry_B Carry_C Time
-6.26854969 -2.12621368 -3.10564906 0.00620324
Degrees of Freedom: 360 Total (i.e. Null); 352 Residual
Scale Link: identity
Estimated Scale Parameters: [1] 135.8167
Correlation: Structure = exchangeable Link = identity
Estimated Correlation Parameters:
alpha
0.4985341
Number of clusters: 12 Maximum cluster size: 30
and see the summary with
summary(model$model)
Call:
geepack::geeglm(formula = form1, family = family, data = data,
id = id, corstr = correlation)
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 107.920 3.243 1107.606 < 0.0000000000000002
Period2 2.061 1.319 2.442 0.118159
Period3 0.936 1.122 0.695 0.404303
TreatmentB 1.558 1.780 0.766 0.381507
TreatmentC -6.269 1.664 14.196 0.000165
Carry_B -2.127 1.787 1.416 0.234078
Carry_C -3.106 2.178 2.033 0.153874
(Intercept) ***
Period2
Period3
TreatmentB
TreatmentC ***
Carry_B
Carry_C
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = exchangeable
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 136 23.23
Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.4977 0.1115
Number of clusters: 12 Maximum cluster size: 30
Treatment C reduces BP by 6.3 mm Hg compared to treatment A, and is the only significant predictor. None of the Periods, carryover vars, nor Treatment B are significantly associated with pressure.
The trial design includes a crossover design between two treatments, thiamine at 15 mg/kg daily, and modafinil 100 mg bid, with 4 week treatment periods and 4 week washout periods. This will be followed by a 2nd 4 week washout period, then an 8 week trial of wellbutrin.
Fatigue and QoL will be measured at the beginning and end of each period with the appropriate PRO (UCPRO and CDPRO), and with the fatigue instrument, the Inflammatory Bowel Disease-Fatigue Questionnaire. Range is 0-20 from 5 questions to get score 1, with higher scores indicating more fatigue. The median for CD is 9 in remission, 11 in flare. The median for UC is 8 in remission, 11 in flare. Data from Journal of Crohn’s and Colitis, Volume 8, Issue 11, 1 November 2014, Pages 1398–1406, https://doi.org/10.1016/j.crohns.2014.04.013. Improvement is considered a reduction by 3 points.
Questions for Score 1 include
1.1. What is your fatigue level right now
1.2. What was your highest fatigue level in the past 2 weeks
1.3. What was your lowest fatigue level in the past 2 weeks
1.4. What was your average fatigue level in the past 2 weeks
1.5. How much of your waking time have you felt fatigued in the past two weeks
Responses on a 0 (none) to 4 (severe or all of the time) scale - [these are guesses - need to see legit version of IBD-F]
0 None, 1 Slight, 2 Somewhat, 3 Moderate, 4 Severe
0 Never, 1 Rarely, 2 Sometimes, 3 Very Often, 4 All of the Time
We will consider 1st order and 2nd order carryover.
There will be 7 study periods. The first six will all be of 4 weeks duration. The last will be of 8 weeks duration (wellbutrin Rx).
Sequence | Period 1 | Period 2 | Period 3 | Period 4 | Period 5 | Period 6 | Period 7 |
---|---|---|---|---|---|---|---|
1 - AAB | Thi | Washout | Thi | Washout | Mod | Washout | Wellbutrin |
2 - ABA | Thi | Washout | Mod | Washout | Thi | Washout | Wellbutrin |
3 - BAA | Mod | Washout | Thi | Washout | Thi | Washout | Wellbutrin |
Now to begin simulating a dataset of fatigue measurements (8 total).
All measurements will occur at the beginning and end of each period
We will start with 10 patients in each group, and a dataset with no carryover at all.
= 9
baseline = 2 #get mean and SD data for fatigue from Bager paper
sdev = 0 # half of previous effect size
c1 = 0 # one quarter of 2nd previous effect size
c2
# effect_T = -3
# effect_M = -4
# effect_W = -5
<- tibble(
data1 # set up patient_id
# set up f scores as single col,
# set up time point as measurement time 1-8
# total 240 measurements of fatigue (f)
subject_id = rep(1:30, each = 8),
seq = rep(rep(c(1:3), each = 8), 10),
timepoint = c(rep(1:8, times = 30)),
# add random noise
rand1 = rnorm(n = 240, mean = 0, sd = sdev),
rand2 = rnorm(n = 240, mean = 0, sd = sdev),
rand3 = rnorm(n = 240, mean = 0, sd = sdev),
rand4 = rnorm(n = 240, mean = 0, sd = sdev),
rand5 = rnorm(n = 240, mean = 0, sd = sdev),
rand6 = rnorm(n = 240, mean = 0, sd = sdev),
rand7 = rnorm(n = 240, mean = 0, sd = sdev),
rand8 = rnorm(n = 240, mean = 0, sd = sdev),
c1 = 0,
c2 = 0,
baseline = 9,
effect = case_when(timepoint == 1 ~ 0,
== 1 & timepoint == 2 ~ -3,
seq == 2 & timepoint == 2 ~ -3,
seq == 3 & timepoint == 2 ~ -4,
seq == 3 ~ 0,
timepoint == 1 & timepoint == 4 ~ -3,
seq == 2 & timepoint == 4 ~ -4,
seq == 3 & timepoint == 4 ~ -3,
seq == 5 ~ 0,
timepoint == 1 & timepoint == 6 ~ -4,
seq == 2 & timepoint == 6 ~ -3,
seq == 3 & timepoint == 6 ~ -3,
seq == 7 ~ 0,
timepoint == 8 ~ -5),
timepoint fatigue = round(baseline + effect + c1 + c2 +
case_when(timepoint == 1 ~ rand1,
== 2 ~ rand2,
timepoint == 3 ~ rand3,
timepoint == 4 ~ rand4,
timepoint == 5 ~ rand5,
timepoint == 6 ~ rand6,
timepoint == 7 ~ rand7,
timepoint == 8 ~ rand8
timepoint
))
)
<- data1 |>
data2 select(subject_id, timepoint, seq, fatigue) |>
pivot_wider(id_cols = c(subject_id, seq), names_from = timepoint, names_prefix = "tmpt_", values_from = fatigue)
<- data2 |>
data2_long pivot_longer(cols = tmpt_1:tmpt_8,names_prefix = "tmpt_", names_to = "timepoint", values_to = "fatigue") |>
mutate(timepoint = as.numeric(timepoint)) |>
mutate(period = (timepoint + 1) %/% 2 ) |>
mutate(time = rep(c(0,1), 120)) |>
mutate(rx = case_when(seq == 1 & period ==1 ~ "Thi",
== 1 & period ==2 ~ "Thi",
seq == 1 & period ==3 ~ "Mod",
seq == 4 ~ "aWell",
period == 2 & period ==1 ~ "Thi",
seq == 2 & period ==2 ~ "Mod",
seq == 2 & period ==3 ~ "Thi",
seq == 3 & period ==1 ~ "Mod",
seq == 3 & period ==2 ~ "Thi",
seq == 3 & period ==3 ~ "Thi"
seq |>
)) select(-timepoint, -seq) |>
filter(time == 1) |>
select(-time)
# define c1 depending on seq and period
# define c2 depending on seq and period
This creates 3 data sets with a 1st order carryover of 0, and a 2nd order carryover of 0.
data1 is tall, with all of the input vars, including random noise at each time point.
data2 is wide, with only subject_id, sequence, and fatigue score at each time point.
data2_long is the same as data2 but in long format.
Adeed period, time, treatment, to get the data formatted for CrossCarry
Let’s try to code this one.
First we will add the indicator (1/0) variables for carryover.
You can estimate the carryover with this code (from the paper).
This adds variables for Carry_Thi and Carry_Well to the 2-item list carryover
, which includes the (1) original dataset, to which has been added the new (1/0 indicator) variables Carry_Thi and Carry_Well, and (2) the vector of carryover variables.
<- createCarry(data = data2_long, treatment = "rx",
carryover id = "subject_id", period = "period", carrySimple = TRUE)
[1] "This variable was added to data: Carry_Mod"
[2] "This variable was added to data: Carry_Thi"
Then we can build the crossGEE model with the code below.
This results in a 2-item list named model, which contains (1) a QIC dataframe, and (2) a 28-item list of model components. Error - Error in geepack::geeglm(formula = form1, family = family, corstr = correlation, : Model matrix is rank deficient; geeglm can not proceed
Per SO ( link: https://stackoverflow.com/questions/18818964/adding-interaction-to-gee-model-matrix-is-rank-deficient), this can occur when you have a covariate that is constant across the dataset. May need to remove variables that are constant.
Tried removing seq, timepoint to no effect.
Tried removing time after filtering to timepoint == 1
Tried reordering carry vars by changing “Well” to “aWell” since these are alphabetical by default
<- CrossGEE(response = "fatigue", treatment = "rx",
model3 period = "period", id = "subject_id", carry = carryover$carryover,
data = carryover$data, correlation = "exchangeable", family = gaussian())
(Intercept) period2 period3 period4 rxMod rxThi Carry_Mod
1 1 0 0 0 0 1 0
2 1 1 0 0 0 1 0
3 1 0 1 0 1 0 0
4 1 0 0 1 0 0 1
5 1 0 0 0 0 1 0
6 1 1 0 0 1 0 0
Carry_Thi
1 0
2 1
3 1
4 0
5 0
6 1
Error in geepack::geeglm(formula = form1, family = family, corstr = correlation, : Model matrix is rank deficient; geeglm can not proceed
We can view the QIC dataframe with - did not work after model failure
$QIC model3
Error in eval(expr, envir, enclos): object 'model3' not found
The output of the QIC is displayed, which is used to compare different models for the same variable.
You can make a new model4 by adding the Time variable for minutes. Does not work after removing the time variable. Take time out of model - get Same error - Error in geepack::geeglm(formula = form1, family = family, corstr = correlation, : Model matrix is rank deficient; geeglm can not proceed
<- CrossGEE(response = "fatigue", treatment = "rx", period = "period", id="subject_id", carry=carryover$carryover, data=carryover$data, correlation = "exchangeable", family=gaussian()) model4
(Intercept) period2 period3 period4 rxMod rxThi Carry_Mod
1 1 0 0 0 0 1 0
2 1 1 0 0 0 1 0
3 1 0 1 0 1 0 0
4 1 0 0 1 0 0 1
5 1 0 0 0 0 1 0
6 1 1 0 0 1 0 0
Carry_Thi
1 0
2 1
3 1
4 0
5 0
6 1
Error in geepack::geeglm(formula = form1, family = family, corstr = correlation, : Model matrix is rank deficient; geeglm can not proceed
We can view the QIC, which has changed in model 4. - did not work after model failure
$QIC model4
Error in eval(expr, envir, enclos): object 'model4' not found
In model 4, It is clearly observed that there is reduction in the QIC, QICu, Quasi Lik and QICC, but the CIC did increase.
You can also look at the model with - did not work after model failure
$model model4
Error in eval(expr, envir, enclos): object 'model4' not found
and see the summary with - did not work after model failure
summary(model4$model)
Error in eval(expr, envir, enclos): object 'model4' not found