library(tidyverse)
library(reshape2)
library(nlme)
library(car) # apply for linearHypothesis function
library(kableExtra)
# Setup SAS run in R Markdown
require(SASmarkdown)
sasexe <- "C:/Program Files/SASHome/SASFoundation/9.4/sas.exe"
sasopts <- "-nosplash -ls 75"
options(digits = 4)

Motivation

It was two years ago when I assisted my supervisor - Dr. Michael Berbaum, at MRC-IHRP to translate SAS code to R code in the Chapter 5 FLW - Applied Longitudinal Analysis. The code was used for his teaching and the MRC chalk talk of 6 strategies of Adjustment for Baseline Response (Berbaum (2017), Mallinckrod† et al. (2008), and Allison, Williams, and Moral-Benito (2017)).

Below is the copycat the slide from my supervisor’s presentation (Berbaum (2017)) and FLW Chapter 5 (G. M. Fitzmaurice (2004)), the effort was trying to systemized how the strategies work and served for which purposes, especially in the observational study in which the baseline among groups barely about equal and missing data always be present.

Finally, I acknowledge and be grateful for the guidance of Dr. Berbaum, who has played an essential role in my biostat career development.

Purpose

Baseline adjustment tends to remove “oddities” in the shape of the response distribution: The adjustment subtracts out or cancel out the oddities to the extent that they still appear in response distributions at the post-baseline times (Berbaum (2017))
- Before randomization the pool of available cases may be “weird”, reflecting the unique character of the catchment. Randomization will allocate the weirdness equivalently amongst experimental groups
- After randomization the subjects are still free to respond as they will, not as we wish. For example, subjects may choose socially desirable high approval of a service if it shortens time to departure.
The effect of adjustment is to make the residuals more like a normal distribution, possibly avoiding the need for an inexplicable transformation.
Baseline differences between experimental groups are a perennial concern for investigators: What if randomization did not work? Randomization works! but that does not guarantee perfectly equal means of variables at baseline, including the outcome variable. Cluster randomization produces less “stirring” and “mixing”
There are many ways to improve randomization, such as stratified randomization, minimization, etc.
Either adjustment technique - differencing or regression on baseline outcome - tends to reduce baseline differences in outcome between the experimental groups. Not a full corrective of any and all problems, but goes in the right direction. Adjustment for baseline outcome is thus a kind of insurance policy that brings your data into greater conformance with the assumption of the models used for inference, and hence promotes accurate inference.
When longitudinal data arise from an observational study, the two methods of adjusting for baseline described in this section can yield discernibly different and, apparently conflicting, results. This conundrum is also known as Lord’s paradox — the interpretation of the two types of analyses and is resolved by noting that these two alternative methods of adjusting for baseline answer qualitatively different scientific questions when the data arise from an observational study.
- This can be illustrated in the simplest setting where there are two groups or sub-populations (e.g., males and females) measured at two occasions (Lord (1967)). The overall goal of such a study is to compare the changes in response for the two groups.
- The analysis that subtracts baseline response, thereby creating a simple change score, addresses the question of whether the two groups differ in terms of their mean change over time.
- In contrast, adjustment for baseline using analysis of covariance addresses the question of whether an individual belonging to one group is expected to change more (or less) than an individual belonging to the other group, given that they have the same baseline response.
The design of the longitudinal study and the research question of interest should primarily guide the choice of analytic methods for adjusting for baseline response.
- The analysis that subtracts baseline response is appropriate when the primary goal of the study is to compare distinct populations in terms of their average change over time. The analysis addresses the question: Do the populations differ in terms of their average change? and this is appropriate when the data have arisen from either an observational study or a randomized trial.
- The analysis of covariance (ANCOVA) will be appropriate in cases where individuals have been assigned to groups at random (e.g., a randomized trial) or where the population distributions of the baseline responses can reasonably be assumed to be equal (even though the sample means of the baseline responses may differ across groups). In cases where the population distributions of the baseline responses are equal, it is then meaningful to ask the question: Is the expected change the same in all groups, when we compare individuals having the same baseline response? Moreover, the ANCOVA will provide a more powerful test of group differences (G. Fitzmaurice (2001)).

Case Study

To illustrate the main ideas by conducting an analysis of response profiles of the blood lead data of the 100 children and 6 variables from the succimer and placebo groups of the Treatment of Lead-Exposed Children (TLC) Trial. The blood lead levels (outcome) at baseline were obtained prior to receiving placebo or succimer. In that case, due to randomization, we can assume that the treatment group means are equal at baseline.
The FLW textbook already modeled 4 strategies in SAS (G. M. Fitzmaurice 2004), here we try to implement in R. Whereas the R’s results must be equivalent to SAS’s output.

Notice: we executed SAS in R Markdown environment for the convenient comparison.

Questions posted: how to handle the baseline measurement in the assessment of whether patterns of change in the mean response over time are the same in the groups.

Manipulate data in R

Read TLC data into R

# Load data
tlc<-read_table(url("https://content.sph.harvard.edu/fitzmaur/ala2e/tlc-data.txt"), col_names=FALSE)
colnames(tlc)<-c('id', 'group', 'lead0', 'lead1', 'lead4', 'lead6')
#dim(tlc) #[1] 100   6

kable(head(tlc), caption = "TLC Dataset (wide format)")%>%
  kable_styling(bootstrap_options = "striped", full_width = F)

TLC Dataset (wide format)
id	group	lead0	lead1	lead4	lead6
1	P	30.8	26.9	25.8	23.8
2	A	26.5	14.8	19.5	21.0
3	A	25.8	23.0	19.1	23.2
4	P	24.7	24.5	22.0	22.5
5	A	20.4	2.8	3.2	9.4
6	A	20.4	5.4	4.5	11.9

Mean blood lead levels overtime

# Mean lead value over time
tab1 <- tlc %>%
  group_by(group) %>%
  summarise_at(vars(-id), funs(mean(., na.rm=TRUE)))
kable(tab1, caption="Changed Mean Lead Over Time")%>%
  kable_styling(bootstrap_options = "striped", full_width = F)

Changed Mean Lead Over Time
group	lead0	lead1	lead4	lead6
A	26.54	13.52	15.51	20.76
P	26.27	24.66	24.07	23.65

# Plot mean lead over time
tab1 %>% 
  melt(id.vars = "group",
            variable.name = "time", value.name = "lead") %>%
  ggplot(mapping=aes(x=time, y=lead, group=group)) + 
    geom_point() +
    geom_line(mapping=aes(color=group)) +
    scale_x_discrete(labels=c("baseline","week 1","week 4","week 6")) +
    theme_bw() + 
    ggtitle("Mean Lead Over Time Plot")

Transpose from wide to long form

# wide to long
tlc.l <- reshape(tlc,
                  varying = c("lead0", "lead1", "lead4", "lead6"), 
                  v.names = "y", 
                  timevar = "time",
                  times = c("0", "1", "4", "6"),
                  new.row.names = 1:1000,
                  direction = "long")

tlc.l.sort <- tlc.l[order(tlc.l$id),]
kable(tlc.l.sort[1:10,], caption="Dataset looks like when Transposing wide to long")%>%
  kable_styling(bootstrap_options = "striped", full_width = F)

Dataset looks like when Transposing wide to long
id	group	time	y
1	P	0	30.8
1	P	1	26.9
1	P	4	25.8
1	P	6	23.8
2	A	0	26.5
2	A	1	14.8
2	A	4	19.5
2	A	6	21.0
3	A	0	25.8
3	A	1	23.0

Spaghetti plot

# create a theme common for all the graphs
mytheme <- theme_classic() %+replace% 
        theme(axis.title.x = element_blank(), 
        axis.title.y = element_text(face="bold",angle=90))

s.base <- ggplot(data = tlc.l, aes(x = time, y = y, group = id, colour = group)) +
    mytheme
#s.base + geom_point() + geom_line(size=1)
s.base + geom_line() + stat_summary(aes(group=1), geom="point", fun.y=mean, shape=17, size=3) +
  facet_grid(.~group) + stat_smooth(aes(group=1), method="loess") + ggtitle("Spaghetti Plot")

Manipulate data in SAS

Prepare data for strategy 1

/* Data for strategy 1 */
filename tlc_url url "https://content.sph.harvard.edu/fitzmaur/ala2e/tlc-data.txt";

data tlc;
   infile tlc_url;
   input id group $ lead0 lead1 lead4 lead6;
   y=lead0; time=0; output;
   y=lead1; time=1; output;
   y=lead4; time=4; output;
   y=lead6; time=6; output;
   drop lead0 lead1 lead4 lead6;
run;

Prepare data for strategy 2

/* Data for strategy 2 */
data tlc_g2;
     set tlc;

********************************************************;
*   Create dummy variables for group and time          *;
********************************************************;

succimer=(group='A');
array t(4) t0 t1 t4 t6;
     j=0;
     do i = 0,1,4,6;
          j+1;
          t(j)=(time=i);
     end;
     drop i j;
run;

Prepare data for strategy 3 & 4

/* Data for strategy 3 & 4 */
data tlc_g34;
     infile tlc_url;
     input id group $ lead0 lead1 lead4 lead6;
     y=lead1; time=1; baseline=lead0; output;
     y=lead4; time=4; baseline=lead0; output;
     y=lead6; time=6; baseline=lead0; output;
     drop lead0 lead1 lead4 lead6;
run;

data tlc_g34;
     set tlc_g34;

change = y - baseline;
cbaseline = baseline - 26.406;

********************************************************;
*   Create dummy variables for group and time          *;
********************************************************;

succimer=(group='A');

array t(3) t1 t4 t6;
     j=0;
     do i = 1,4,6;
        j+1;
        t(j)=(time=i);
     end;
     drop i j;
run;

Strategy 1

The first strategy corresponds to a standard analysis of response profiles without incorporating any constraints on the group means at baseline.

We can retain the baseline value as part of the outcome vector and make no assumptions about group differences in the mean response at baseline. Furthermore, Standard ANOVA with Group and Time factors crossed, thus contains Group, Time, and Group x Time. Allows a group difference at time 1. Baseline means may differ.

Dependent variable $Y_{i, j}$

\[ Y_i = \beta_1 \textbf{1} + \beta_{G_2} \textbf{X}_{G_2} + \beta_{T_2} \textbf{X}_{T_2} + \beta_{T_3} \textbf{X}_{T_3} + \beta_{T_4} \textbf{X}_{T_4} + \beta_{G_2 T_2} \textbf{X}_{G_2 T_2} + \beta_{G_2 T_3} \textbf{X}_{G_2 T_3} + \beta_{G_2 T_4} \textbf{X}_{G_2 T_4} + \epsilon_i \]

\[ \begin{matrix} g = 1,& j = 1\\ & j = 2 \\ & j = 3 \\ & j = 4 \\ \\ \\ \\ g = 2,& j = 1\\ & j = 2 \\ & j = 3 \\ & j = 4 \\ \end{matrix} \stackrel{\mbox{ Design Matrix of $\textbf{X}$}}{% \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ . & . & . & . & . & . & . & . \\ . & . & . & . & . & . & . & . \\ . & . & . & . & . & . & . & . \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \end{bmatrix} %} \begin{bmatrix} \beta_1 \\ \beta_{G_2} \\ \beta_{T_2} \\ \beta_{T_3} \\ \beta_{T_4} \\ \beta_{G_2 T_2} \\ \beta_{G_2 T_3} \\ \beta_{G_2 T_4} \end{bmatrix} \] (Berbaum 2017)

Modeling in R

tlc.l$group <- relevel(as.factor(tlc.l$group), ref="P")
tlc.l$time <- relevel(as.factor(tlc.l$time), ref="0")

g1 <- gls(y ~ 1 + time + group + time:group,
         correlation = corSymm(form = ~1 | id),
         weights = varIdent(form = ~1 | time),method = "ML",data=tlc.l)
summary(g1)

## Generalized least squares fit by maximum likelihood
##   Model: y ~ 1 + time + group + time:group 
##   Data: tlc.l 
##    AIC  BIC logLik
##   2461 2533  -1213
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2     3    
## 2 0.571            
## 3 0.570 0.775      
## 4 0.577 0.582 0.581
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time 
##  Parameter estimates:
##     0     1     4     6 
## 1.000 1.326 1.370 1.525 
## 
## Coefficients:
##                Value Std.Error t-value p-value
## (Intercept)   26.272    0.7103   36.99  0.0000
## time1         -1.612    0.7919   -2.04  0.0425
## time4         -2.202    0.8149   -2.70  0.0072
## time6         -2.626    0.8885   -2.96  0.0033
## groupA         0.268    1.0045    0.27  0.7898
## time1:groupA -11.406    1.1199  -10.18  0.0000
## time4:groupA  -8.824    1.1525   -7.66  0.0000
## time6:groupA  -3.152    1.2566   -2.51  0.0125
## 
##  Correlation: 
##              (Intr) time1  time4  time6  groupA tm1:gA tm4:gA
## time1        -0.218                                          
## time4        -0.191  0.680                                   
## time6        -0.096  0.386  0.385                            
## groupA       -0.707  0.154  0.135  0.068                     
## time1:groupA  0.154 -0.707 -0.481 -0.273 -0.218              
## time4:groupA  0.135 -0.481 -0.707 -0.272 -0.191  0.680       
## time6:groupA  0.068 -0.273 -0.272 -0.707 -0.096  0.386  0.385
## 
## Standardized residuals:
##     Min      Q1     Med      Q3     Max 
## -2.1977 -0.6920 -0.1531  0.5348  5.6899 
## 
## Residual standard error: 4.972 
## Degrees of freedom: 400 total; 392 residual

coefs <- names(coef(g1))

Test for group x time interaction

linearHypothesis(g1, coefs[grep(":", coefs)], verbose=TRUE)

## 
## Hypothesis matrix:
##              (Intercept) time1 time4 time6 groupA time1:groupA time4:groupA
## time1:groupA           0     0     0     0      0            1            0
## time4:groupA           0     0     0     0      0            0            1
## time6:groupA           0     0     0     0      0            0            0
##              time6:groupA
## time1:groupA            0
## time4:groupA            0
## time6:groupA            1
## 
## Right-hand-side vector:
## [1] 0 0 0
## 
## Estimated linear function (hypothesis.matrix %*% coef - rhs)
## time1:groupA time4:groupA time6:groupA 
##      -11.406       -8.824       -3.152 
## 
## 
## Estimated variance/covariance matrix for linear function
##              time1:groupA time4:groupA time6:groupA
## time1:groupA       1.2542       0.8781       0.5437
## time4:groupA       0.8781       1.3282       0.5578
## time6:groupA       0.5437       0.5578       1.5789

#linearHypothesis(g1, matchCoefs(g1, ":"), verbose=TRUE) # equivalent

#linearHypothesis(g1, "time:group") #not working
# Look at Type 3 Tests
Anova(g1, type=3, test="Chisq") # equivalent

Modeling in SAS

title1 Analysis of Response Profiles of data on Blood Lead Levels;
title2 Treatment of Lead Exposed Children (TLC) Trial;

proc mixed noclprint=10 data=tlc order=data;
   class id group(ref=first) time(ref=first);
   model y = group time group*time / s chisq;
   repeated time / type=un subject=id r rcorr;
run;

##         Analysis of Response Profiles of data on Blood Lead Levels
##               Treatment of Lead Exposed Children (TLC) Trial
## 
##                             The Mixed Procedure
## 
##                             Model Information
## 
##           Data Set                     WORK.TLC                 
##           Dependent Variable           y                        
##           Covariance Structure         Unstructured             
##           Subject Effect               id                       
##           Estimation Method            REML                     
##           Residual Variance Method     None                     
##           Fixed Effects SE Method      Model-Based              
##           Degrees of Freedom Method    Between-Within           
## 
## 
##                           Class Level Information
##  
##              Class    Levels    Values
## 
##              id          100    not printed                   
##              group         2    A P                           
##              time          4    1 4 6 0                       
## 
## 
##                                 Dimensions
## 
##                     Covariance Parameters            10
##                     Columns in X                     15
##                     Columns in Z                      0
##                     Subjects                        100
##                     Max Obs per Subject               4
## 
## 
##                           Number of Observations
## 
##                 Number of Observations Read             400
##                 Number of Observations Used             400
##                 Number of Observations Not Used           0
## 
## 
##                              Iteration History
##  
##         Iteration    Evaluations    -2 Res Log Like       Criterion
## 
##                 0              1      2626.25517748                
##                 1              1      2416.07594087      0.00000000
## 
## 
##                         Convergence criteria met.                    
## 
## 
##                         Estimated R Matrix for id 1
##  
##             Row        Col1        Col2        Col3        Col4
## 
##               1     25.2257     19.1074     19.6995     22.2016
##               2     19.1074     44.3458     35.5351     29.6750
##               3     19.6995     35.5351     47.3778     30.6205
##               4     22.2016     29.6750     30.6205     58.6510
## 
## 
##                   Estimated R Correlation Matrix for id 1
##  
##             Row        Col1        Col2        Col3        Col4
## 
##               1      1.0000      0.5713      0.5698      0.5772
##               2      0.5713      1.0000      0.7753      0.5819
##               3      0.5698      0.7753      1.0000      0.5809
##               4      0.5772      0.5819      0.5809      1.0000
## 
## 
##                       Covariance Parameter Estimates
##  
##                       Cov Parm    Subject    Estimate
## 
##                       UN(1,1)     id          44.3458
##                       UN(2,1)     id          35.5351
##                       UN(2,2)     id          47.3778
##                       UN(3,1)     id          29.6750
##                       UN(3,2)     id          30.6205
##                       UN(3,3)     id          58.6510
##                       UN(4,1)     id          19.1074
##                       UN(4,2)     id          19.6995
##                       UN(4,3)     id          22.2016
##                       UN(4,4)     id          25.2257
## 
## 
##                               Fit Statistics
## 
##                    -2 Res Log Likelihood          2416.1
##                    AIC (Smaller is Better)        2436.1
##                    AICC (Smaller is Better)       2436.7
##                    BIC (Smaller is Better)        2462.1
## 
## 
##                      Null Model Likelihood Ratio Test
##  
##                        DF    Chi-Square      Pr > ChiSq
## 
##                         9        210.18          <.0001
## 
## 
##                         Solution for Fixed Effects
##  
##                                        Standard
## Effect       group   time   Estimate      Error     DF   t Value   Pr > |t|
## 
## Intercept                    26.2720     0.7103     98     36.99     <.0001
## group        A                0.2680     1.0045     98      0.27     0.7902
## group        P                     0          .      .       .        .    
## time                 1       -1.6120     0.7919     98     -2.04     0.0445
## time                 4       -2.2020     0.8149     98     -2.70     0.0081
## time                 6       -2.6260     0.8885     98     -2.96     0.0039
## time                 0             0          .      .       .        .    
## group*time   A       1      -11.4060     1.1199     98    -10.18     <.0001
## group*time   A       4       -8.8240     1.1525     98     -7.66     <.0001
## group*time   A       6       -3.1520     1.2566     98     -2.51     0.0138
## group*time   A       0             0          .      .       .        .    
## group*time   P       1             0          .      .       .        .    
## group*time   P       4             0          .      .       .        .    
## group*time   P       6             0          .      .       .        .    
## group*time   P       0             0          .      .       .        .    
## 
## 
##                        Type 3 Tests of Fixed Effects
##  
##                 Num    Den
##   Effect         DF     DF   Chi-Square   F Value     Pr > ChiSq   Pr > F
## 
##   group           1     98        25.43     25.43         <.0001   <.0001
##   time            3     98       184.48     61.49         <.0001   <.0001
##   group*time      3     98       107.79     35.93         <.0001   <.0001

Strategy 2

The second strategy corresponds to an analysis of response profiles where the group means at baseline are constrained to be equal.

We can retain the baseline as part of the outcome vector and assume the group means are equal at baseline, as might be appropriate in a randomized trial. In other words, ANOVA with no Group factor but include Group x Time interaction; assumes no group difference at the baseline. Baseline means constrained equal. The justification is randomization (hence expected means equal) and no treatment delivery before baseline (nothing yet to make means differ).

In other words, it was excluding the treatment group main effect but maintaining an interaction between group and timefrom the model for the response profiles. This model appears to contradict the conventional wisdom that interactions should not be included in a regression model without their main effects. However, this is an important exception to the rule. Because baseline (week 0) was chosen as the reference level for time, the exclusion of the group main effect forces the two groups to have the same mean response at baseline.

Dependent variable $Y_{i, j}$

\[ Y_i = \beta_1 \textbf{1} + \beta_{T_2} \textbf{X}_{T_2} + \beta_{T_3} \textbf{X}_{T_3} + \beta_{T_4} \textbf{X}_{T_4} + \beta_{G_2 T_2} \textbf{X}_{G_2 T_2} + \beta_{G_2 T_3} \textbf{X}_{G_2 T_3} + \beta_{G_2 T_4} \textbf{X}_{G_2 T_4} + \epsilon_i \]

\[ \begin{matrix} g = 1,& j = 1\\ & j = 2 \\ & j = 3 \\ & j = 4 \\ \\ \\ \\ g = 2,& j = 1 \\ & j = 2 \\ & j = 3 \\ & j = 4 \\ \end{matrix} \stackrel{\mbox{ Design Matrix of $\textbf{X}$}}{% \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & 0 \\ . & . & . & . & . & . & . \\ . & . & . & . & . & . & . \\ . & . & . & . & . & . & . \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 & 1 \end{bmatrix} %} \begin{bmatrix} \beta_1 \\ \beta_{T_2} \\ \beta_{T_3} \\ \beta_{T_4} \\ \beta_{G_2 T_2} \\ \beta_{G_2 T_3} \\ \beta_{G_2 T_4} \end{bmatrix} \] (Berbaum 2017)

Modeling in R

tlc.l$t1 <- as.numeric(tlc.l$time == "1")
tlc.l$t4 <- as.numeric(tlc.l$time == "4")
tlc.l$t6 <- as.numeric(tlc.l$time == "6")

tlc.l$succimer <- as.numeric(tlc.l$group == "A")

g2 <- gls(y ~ 1 + t1 + t4 + t6 + t1:succimer + t4:succimer + t6:succimer,
         correlation = corSymm(form = ~1 | id),
         weights = varIdent(form = ~1 | time), method = "ML",data=tlc.l)

summary(g2)

## Generalized least squares fit by maximum likelihood
##   Model: y ~ 1 + t1 + t4 + t6 + t1:succimer + t4:succimer + t6:succimer 
##   Data: tlc.l 
##    AIC  BIC logLik
##   2459 2527  -1213
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2     3    
## 2 0.571            
## 3 0.570 0.775      
## 4 0.577 0.582 0.581
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time 
##  Parameter estimates:
##     0     1     4     6 
## 1.000 1.326 1.370 1.524 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  26.406    0.5018   52.62  0.0000
## t1           -1.645    0.7815   -2.10  0.0360
## t4           -2.231    0.8064   -2.77  0.0059
## t6           -2.642    0.8854   -2.98  0.0030
## t1:succimer -11.341    1.0917  -10.39  0.0000
## t4:succimer  -8.765    1.1298   -7.76  0.0000
## t6:succimer  -3.120    1.2492   -2.50  0.0129
## 
##  Correlation: 
##             (Intr) t1     t4     t6     t1:scc t4:scc
## t1          -0.156                                   
## t4          -0.136  0.674                            
## t6          -0.068  0.381  0.380                     
## t1:succimer  0.000 -0.698 -0.467 -0.265              
## t4:succimer  0.000 -0.466 -0.701 -0.265  0.667       
## t6:succimer  0.000 -0.263 -0.263 -0.705  0.376  0.375
## 
## Standardized residuals:
##     Min      Q1     Med      Q3     Max 
## -2.1819 -0.7071 -0.1439  0.5416  5.7048 
## 
## Residual standard error: 4.974 
## Degrees of freedom: 400 total; 393 residual

Test for group x time interaction

# Test the interactions
linearHypothesis(g2, c("t1:succimer", "t4:succimer", "t6:succimer"), test = "Chisq")

# Another way to look at the interaction test
g2.me <- gls(y ~ 1 + t1 + t4 + t6,
         correlation = corSymm(form = ~1 | id),
         weights = varIdent(form = ~1 | time), method = "ML",data=tlc.l)
anova(g2, g2.me)#, test = "Chisq") # not working with Chisq

Modeling in SAS

title1 Analysis of Response Profiles assuming Equal Mean Blood Lead Levels at Baseline;
title2 in the Succimer and Placebo Groups;
title3 Treatment of Lead Exposed Children (TLC) Trial;

proc mixed noclprint=10 data=tlc_g2 order=data;
     class id time;
     model y = t1 t4 t6 succimer*t1 succimer*t4 succimer*t6 / s chisq;
     repeated time / type=un subject=id r;
     contrast '3 DF Test of Interaction'
          succimer*t1 1, succimer*t4 1, succimer*t6 1 / chisq;
run;

## Analysis of Response Profiles assuming Equal Mean Blood Lead Levels at Base
##                     in the Succimer and Placebo Groups
##               Treatment of Lead Exposed Children (TLC) Trial
## 
##                             The Mixed Procedure
## 
##                             Model Information
## 
##           Data Set                     WORK.TLC_G2              
##           Dependent Variable           y                        
##           Covariance Structure         Unstructured             
##           Subject Effect               id                       
##           Estimation Method            REML                     
##           Residual Variance Method     None                     
##           Fixed Effects SE Method      Model-Based              
##           Degrees of Freedom Method    Between-Within           
## 
## 
##                           Class Level Information
##  
##              Class    Levels    Values
## 
##              id          100    not printed                   
##              time          4    0 1 4 6                       
## 
## 
##                                 Dimensions
## 
##                     Covariance Parameters            10
##                     Columns in X                      7
##                     Columns in Z                      0
##                     Subjects                        100
##                     Max Obs per Subject               4
## 
## 
##                           Number of Observations
## 
##                 Number of Observations Read             400
##                 Number of Observations Used             400
##                 Number of Observations Not Used           0
## 
## 
##                              Iteration History
##  
##         Iteration    Evaluations    -2 Res Log Like       Criterion
## 
##                 0              1      2628.69582475                
##                 1              2      2417.98960808      0.00000001
## 
## 
##                         Convergence criteria met.                    
## 
## 
##                         Estimated R Matrix for id 1
##  
##             Row        Col1        Col2        Col3        Col4
## 
##               1     24.9857     18.9281     19.5146     21.9933
##               2     18.9281     44.2074     35.3925     29.5142
##               3     19.5146     35.3925     47.2307     30.4547
##               4     21.9933     29.5142     30.4547     58.4642
## 
## 
##                       Covariance Parameter Estimates
##  
##                       Cov Parm    Subject    Estimate
## 
##                       UN(1,1)     id          24.9857
##                       UN(2,1)     id          18.9281
##                       UN(2,2)     id          44.2074
##                       UN(3,1)     id          19.5146
##                       UN(3,2)     id          35.3925
##                       UN(3,3)     id          47.2307
##                       UN(4,1)     id          21.9933
##                       UN(4,2)     id          29.5142
##                       UN(4,3)     id          30.4547
##                       UN(4,4)     id          58.4642
## 
## 
##                               Fit Statistics
## 
##                    -2 Res Log Likelihood          2418.0
##                    AIC (Smaller is Better)        2438.0
##                    AICC (Smaller is Better)       2438.6
##                    BIC (Smaller is Better)        2464.0
## 
## 
##                      Null Model Likelihood Ratio Test
##  
##                        DF    Chi-Square      Pr > ChiSq
## 
##                         9        210.71          <.0001
## 
## 
##                         Solution for Fixed Effects
##  
##                                Standard
##     Effect         Estimate       Error      DF    t Value    Pr > |t|
## 
##     Intercept       26.4060      0.4999      99      52.83      <.0001
##     t1              -1.6445      0.7823      99      -2.10      0.0381
##     t4              -2.2313      0.8073      99      -2.76      0.0068
##     t6              -2.6420      0.8864      99      -2.98      0.0036
##     t1*succimer    -11.3410      1.0930      99     -10.38      <.0001
##     t4*succimer     -8.7653      1.1312      99      -7.75      <.0001
##     t6*succimer     -3.1199      1.2507      99      -2.49      0.0143
## 
## 
##                       Type 3 Tests of Fixed Effects
##  
##                 Num    Den
##  Effect          DF     DF   Chi-Square   F Value     Pr > ChiSq   Pr > F
## 
##  t1               1     99         4.42      4.42         0.0356   0.0381
##  t4               1     99         7.64      7.64         0.0057   0.0068
##  t6               1     99         8.88      8.88         0.0029   0.0036
##  t1*succimer      1     99       107.66    107.66         <.0001   <.0001
##  t4*succimer      1     99        60.04     60.04         <.0001   <.0001
##  t6*succimer      1     99         6.22      6.22         0.0126   0.0143
## 
## 
##                                  Contrasts
##  
##                              Num   Den
##   Label                       DF    DF  Chi-Square  F Value    Pr > ChiSq
## 
##   3 DF Test of Interaction     3    99      111.96    37.32        <.0001
## 
##                                 Contrasts
##  
##                      Label                     Pr > F
## 
##                      3 DF Test of Interaction  <.0001

Comment (i): S1 and S2

The first 2 strategies retain the baseline measurement as part of the outcome vector, but differ in terms of assumptions about the mean response at baseline.
- The $1^{st}$ strategy corresponds to a standard analysis of response profiles without incorporating any constraints on the group means at baseline.
- The $2^{nd}$ strategy corresponds to an analysis of response profiles where the group means at baseline are constrained to be equal.
In a randomized trial, where treatment assignment is random, both strategies yield valid estimates of treatment group comparisons, but the $2^{nd}$ strategy is, in general, more powerful.
In randomized trials, or in observational studies where there is good reason to assume that the groups have the same mean response at baseline (e.g., due to matching on baseline response), the $2^{nd}$ strategy for handling baseline is preferred and should be routinely used.
In contrast, in observational studies where there is no a priori reason to assume the groups have the same mean response at baseline, the $2^{nd}$ strategy is not appropriate and only the $1^{st}$ strategy should be used.
Also the omnibus test of the group $\times$ time interaction from the model strategy 2 yields a Wald statistic of 112.2 with 3 degrees of freedom. In contrast, the analysis of response profiles without any adjustment for baseline (strategy 1) yielded a Wald statistic of 107.8, with 3 df $\Longrightarrow$ the difference between these two statistics reflects the increased power of the second method for handling baseline response.

Strategy 3

We can subtract the baseline response from all of the remaining post-baseline responses, and analyze the differences from baseline

Dependent variable $change_i$ = $Y_{i, j} - \hat{Y}_{i,1}$

\[ change_i = \beta_1 \textbf{1} + \beta_{G_2} \textbf{X}_{G_2} + \beta_{T_3} \textbf{X}_{T_3} + \beta_{T_4} \textbf{X}_{T_4} + \beta_{G_2 T_3} \textbf{X}_{G_2 T_3} + \beta_{G_2 T_4} \textbf{X}_{G_2 T_4} + \epsilon_i \]

\[ \begin{matrix} g = 1,& j = 2 \\ & j = 3 \\ & j = 4 \\ \\ \\ \\ g = 2,& j = 2 \\ & j = 3 \\ & j = 4 \\ \end{matrix} \stackrel{\mbox{ Design Matrix of $\textbf{X}$}}{% \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ . & . & . & . & . & . \\ . & . & . & . & . & . \\ . & . & . & . & . & . \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 \end{bmatrix} %} \begin{bmatrix} \beta_1 \\ \beta_{G_2} \\ \beta_{T_3} \\ \beta_{T_4} \\ \beta_{G_2 T_3} \\ \beta_{G_2 T_4} \end{bmatrix} \] (Berbaum 2017)

Modeling in R

names(tlc) <- c('id', 'group', 'baseline', 'lead1', 'lead4', 'lead6')
# Wide to Long form, keep baseline (lead0) intact
tlc.l.baseline <- reshape(tlc,
             varying = c("lead1", "lead4", "lead6"), 
             v.names = "y", 
             timevar = "time",
             times = c("1", "4", "6"),
             new.row.names = 1:1000,
             direction = "long")
tlc.l.baseline.sort <- tlc.l.baseline[order(tlc.l.baseline$id),]
kable(tlc.l.baseline.sort[1:10,], caption="Initial Dataset looks like in Strategy 3")%>%
  kable_styling(bootstrap_options = "striped", full_width = F)

Initial Dataset looks like in Strategy 3
id	group	baseline	time	y
1	P	30.8	1	26.9
1	P	30.8	4	25.8
1	P	30.8	6	23.8
2	A	26.5	1	14.8
2	A	26.5	4	19.5
2	A	26.5	6	21.0
3	A	25.8	1	23.0
3	A	25.8	4	19.1
3	A	25.8	6	23.2
4	P	24.7	1	24.5

# Create dummy variables for time and group

tlc.l.baseline$t4 <- as.numeric(tlc.l.baseline$time == "4")
tlc.l.baseline$t6 <- as.numeric(tlc.l.baseline$time == "6")

tlc.l.baseline$succimer <- as.numeric(tlc.l.baseline$group == "A")

tlc.l.baseline$change <- tlc.l.baseline$y - tlc.l.baseline$baseline

tlc.l.baseline[1:10,]%>%
  kable(caption="Dataset looks like for modeling the Strategy 3")%>%
  kable_styling(bootstrap_options = c("striped", "hover","condensed"))%>%
  footnote(general =  "Modeling the differences from baseline (raw change scores) which means to subtract the baseline response from the remaining post-baseline responses")

Dataset looks like for modeling the Strategy 3
id	group	baseline	time	y	t4	t6	succimer	change
1	P	30.8	1	26.9	0	0	0	-3.9
2	A	26.5	1	14.8	0	0	1	-11.7
3	A	25.8	1	23.0	0	0	1	-2.8
4	P	24.7	1	24.5	0	0	0	-0.2
5	A	20.4	1	2.8	0	0	1	-17.6
6	A	20.4	1	5.4	0	0	1	-15.0
7	P	28.6	1	20.8	0	0	0	-7.8
8	P	33.7	1	31.6	0	0	0	-2.1
9	P	19.7	1	14.9	0	0	0	-4.8
10	P	31.1	1	31.2	0	0	0	0.1
Note:
Modeling the differences from baseline (raw change scores) which means to subtract the baseline response from the remaining post-baseline responses

# Modeling
g3 <- gls(change ~  1 + succimer + t4 + t6 + succimer:t4 + succimer:t6,
          correlation = corSymm(form = ~1 | id),
          weights = varIdent(form = ~1 | time), method = "ML", data=tlc.l.baseline)

summary(g3)

## Generalized least squares fit by maximum likelihood
##   Model: change ~ 1 + succimer + t4 + t6 + succimer:t4 + succimer:t6 
##   Data: tlc.l.baseline 
##    AIC  BIC logLik
##   1850 1894   -913
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2    
## 2 0.680      
## 3 0.386 0.385
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time 
##  Parameter estimates:
##     1     4     6 
## 1.000 1.029 1.122 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  -1.612    0.7919  -2.036  0.0427
## succimer    -11.406    1.1199 -10.185  0.0000
## t4           -0.590    0.6427  -0.918  0.3594
## t6           -1.014    0.9343  -1.085  0.2787
## succimer:t4   2.582    0.9089   2.841  0.0048
## succimer:t6   8.254    1.3213   6.247  0.0000
## 
##  Correlation: 
##             (Intr) succmr t4     t6     sccm:4
## succimer    -0.707                            
## t4          -0.369  0.261                     
## t6          -0.480  0.340  0.325              
## succimer:t4  0.261 -0.369 -0.707 -0.230       
## succimer:t6  0.340 -0.480 -0.230 -0.707  0.325
## 
## Standardized residuals:
##      Min       Q1      Med       Q3      Max 
## -3.36124 -0.45802 -0.01508  0.49682  5.78461 
## 
## Residual standard error: 5.543 
## Degrees of freedom: 300 total; 294 residual

The Modified Test of interest

The test of interest under the $3^{rd}$ would be modified: a join test that combines the main effect of group and the group $\times$ time interaction. With a $( G - 1)$ degrees of freedom test for group with a $( G - 1) \times ( n - 2)$ degrees of freedom test for group x time interaction $\Longrightarrow$ the test has total $( G - 1) \times ( n - 1)$ df
This joint test of the main effect of group and the group $\times$ time interaction is equivalent to the test of the group $\times$ time interaction (with $(G-1) \times (n-1)$ df) under the $1^{st}$ strategy $\Longrightarrow$ the $1^{st}$ and $3^{rd}$ strategies for handling baseline produce identical tests and estimates of effects $\Longrightarrow$ no efficiency gain in the $3^{rd}$ strategy.

# Test the interactions
linearHypothesis(g3, c("succimer", "succimer:t4", "succimer:t6"), test = "Chisq")

Modeling in SAS

title1 Analysis of Response Profiles of Changes from Baseline;
title2 Treatment of Lead Exposed Children (TLC) Trial;

proc mixed noclprint=10 data = tlc_g34 order=data;
     class id time;
     model change = succimer t4 t6 succimer*t4 succimer*t6 / s chisq;
     repeated time / type=un subject=id r;
     contrast '3 DF Test of Main Effect and Interaction'
          succimer 1, succimer*t4 1, succimer*t6 1 / chisq;
run;

##           Analysis of Response Profiles of Changes from Baseline
##               Treatment of Lead Exposed Children (TLC) Trial
## 
##                             The Mixed Procedure
## 
##                             Model Information
## 
##           Data Set                     WORK.TLC_G34             
##           Dependent Variable           change                   
##           Covariance Structure         Unstructured             
##           Subject Effect               id                       
##           Estimation Method            REML                     
##           Residual Variance Method     None                     
##           Fixed Effects SE Method      Model-Based              
##           Degrees of Freedom Method    Between-Within           
## 
## 
##                           Class Level Information
##  
##              Class    Levels    Values
## 
##              id          100    not printed                   
##              time          3    1 4 6                         
## 
## 
##                                 Dimensions
## 
##                     Covariance Parameters             6
##                     Columns in X                      6
##                     Columns in Z                      0
##                     Subjects                        100
##                     Max Obs per Subject               3
## 
## 
##                           Number of Observations
## 
##                 Number of Observations Read             300
##                 Number of Observations Used             300
##                 Number of Observations Not Used           0
## 
## 
##                              Iteration History
##  
##         Iteration    Evaluations    -2 Res Log Like       Criterion
## 
##                 0              1      1900.36481189                
##                 1              1      1818.91458467      0.00000000
## 
## 
##                         Convergence criteria met.                    
## 
## 
##                         Estimated R Matrix for id 1
##  
##                   Row        Col1        Col2        Col3
## 
##                     1     31.3566     21.9539     13.5917
##                     2     21.9539     33.2046     13.9451
##                     3     13.5917     13.9451     39.4735
## 
## 
##                       Covariance Parameter Estimates
##  
##                       Cov Parm    Subject    Estimate
## 
##                       UN(1,1)     id          31.3566
##                       UN(2,1)     id          21.9539
##                       UN(2,2)     id          33.2046
##                       UN(3,1)     id          13.5917
##                       UN(3,2)     id          13.9451
##                       UN(3,3)     id          39.4735
## 
## 
##                               Fit Statistics
## 
##                    -2 Res Log Likelihood          1818.9
##                    AIC (Smaller is Better)        1830.9
##                    AICC (Smaller is Better)       1831.2
##                    BIC (Smaller is Better)        1846.5
## 
## 
##                      Null Model Likelihood Ratio Test
##  
##                        DF    Chi-Square      Pr > ChiSq
## 
##                         5         81.45          <.0001
## 
## 
##                         Solution for Fixed Effects
##  
##                                Standard
##     Effect         Estimate       Error      DF    t Value    Pr > |t|
## 
##     Intercept       -1.6120      0.7919      98      -2.04      0.0445
##     succimer       -11.4060      1.1199      98     -10.18      <.0001
##     t4              -0.5900      0.6427      98      -0.92      0.3609
##     t6              -1.0140      0.9343      98      -1.09      0.2805
##     succimer*t4      2.5820      0.9089      98       2.84      0.0055
##     succimer*t6      8.2540      1.3213      98       6.25      <.0001
## 
## 
##                       Type 3 Tests of Fixed Effects
##  
##                 Num    Den
##  Effect          DF     DF   Chi-Square   F Value     Pr > ChiSq   Pr > F
## 
##  succimer         1     98       103.72    103.72         <.0001   <.0001
##  t4               1     98         0.84      0.84         0.3586   0.3609
##  t6               1     98         1.18      1.18         0.2778   0.2805
##  succimer*t4      1     98         8.07      8.07         0.0045   0.0055
##  succimer*t6      1     98        39.02     39.02         <.0001   <.0001
## 
## 
##                                  Contrasts
##  
##                                             Num   Den
##  Label                                       DF    DF  Chi-Square  F Value
## 
##  3 DF Test of Main Effect and Interaction     3    98      107.79    35.93
## 
##                                 Contrasts
##  
##      Label                                       Pr > ChiSq    Pr > F
## 
##      3 DF Test of Main Effect and Interaction        <.0001    <.0001

Comment (ii): S3 vs. S1

The strategy 3 (S3) alters the interpretation of the tests for all three effects:
- The test for group $\times$ time interaction $\longrightarrow$ a test for parallel profiles for the changes from baseline in the mean response on occasions 2 through n;
- The test for group effect $\longrightarrow$ a test that the changes from baseline at occasion 2 are the same across groups (assuming that occasion 2 is chosen as the reference level for time).
- 6 parameter estimates in S3 are linear combinations of the estimates in S1:

	Group	Week	Estimate(g1)	Estimate(g3)
Intercept			26.272	-1.612
Group	A(S)		0.268	-11.406
Week		1	-1.612
Week		4	-2.202	-0.59
Week		6	-2.626	-1.014
Group $\times$ Week	A	1	-11.406
Group $\times$ Week	A	4	-8.824	2.582
Group $\times$ Week	A	6	-3.152	8.254

From a practical standpoint, given a recommendation of the first strategy over the third for two main reasons:
- tests of interest are not routinely produced as standard output from statistical software
- when there are subjects with missing baseline response, all their data are excluded from the analysis of change scores; in contrast, the S1 incorporates all available data in the analysis.

Strategy 4

We can use the baseline value as a covariate in the analysis of the post-baseline responses

Dependent variable $Y_{i,i-1}$, additional controlling for centered baseline $centered.baseline_i (Y.c_{i,1})$ = $\hat{Y}_{i,1} - \bar{Y_{i,1}}$

\[ Y_{i,i-1} = \beta_1 \textbf{1} + \alpha \mathbf{Y.c}_{i, j=1} + \beta_{G_2} \textbf{X}_{G_2} + \beta_{T_3} \textbf{X}_{T_3} + \beta_{T_4} \textbf{X}_{T_4} + \beta_{G_2 T_3} \textbf{X}_{G_2 T_3} + \beta_{G_2 T_4} \textbf{X}_{G_2 T_4} + \epsilon_i \]

\[ \begin{matrix} g = 1,& j = 2 \\ & j = 3 \\ & j = 4 \\ \\ \\ \\ g = 2,& j = 2 \\ & j = 3 \\ & j = 4 \\ \end{matrix} \stackrel{\mbox{ Design Matrix of $\textbf{X}$}}{% \begin{bmatrix} 1 & Y.c_{i, j=1} & 0 & 0 & 0 & 0 & 0 \\ 1 & Y.c_{i, j=1} & 0 & 1 & 0 & 0 & 0 \\ 1 & Y.c_{i, j=1} & 0 & 0 & 1 & 0 & 0 \\ . & . & . & . & . & . & . \\ . & . & . & . & . & . & . \\ . & . & . & . & . & . & . \\ 1 & Y.c_{i, j=1} & 0 & 0 & 0 & 0 & 0 \\ 1 & Y.c_{i, j=1} & 0 & 1 & 0 & 1 & 0 \\ 1 & Y.c_{i, j=1} & 0 & 0 & 1 & 0 & 1 \end{bmatrix} %} \begin{bmatrix} \beta_1 \\ \beta_{Y_1} \\ \beta_{G_2} \\ \beta_{T_3} \\ \beta_{T_4} \\ \beta_{G_2 T_3} \\ \beta_{G_2 T_4} \end{bmatrix} \] (Berbaum 2017)

Modeling in R

tlc.l.adj.baseline <- tlc.l.baseline
tlc.l.adj.baseline$cbaseline <- tlc.l.adj.baseline$baseline - 26.406 

# Centering baseline response on its overall mean (26.406) 
# mean(tlc.l.baseline$baseline)
# gives the intercept a meaningful interpretation

## Next step should create dummy variables but they already existed in running g2 model above
tlc.l.adj.baseline[1:10,c(1,2,10,3,4,5,6,7,8,9)]%>%
  kable(caption="Dataset looks like for modeling the Strategy 4 and Strategy 4 alternative")%>%
  kable_styling(bootstrap_options = c("striped", "hover","condensed"))%>%
  footnote(general =  "Modeling the differences from baseline (raw change scores) which means to subtract the baseline response from the remaining post-baseline responses. We were centering baseline response on its overall mean (26.406) and treated it as a covariate.")

Dataset looks like for modeling the Strategy 4 and Strategy 4 alternative
id	group	cbaseline	baseline	time	y	t4	t6	succimer	change
1	P	4.394	30.8	1	26.9	0	0	0	-3.9
2	A	0.094	26.5	1	14.8	0	0	1	-11.7
3	A	-0.606	25.8	1	23.0	0	0	1	-2.8
4	P	-1.706	24.7	1	24.5	0	0	0	-0.2
5	A	-6.006	20.4	1	2.8	0	0	1	-17.6
6	A	-6.006	20.4	1	5.4	0	0	1	-15.0
7	P	2.194	28.6	1	20.8	0	0	0	-7.8
8	P	7.294	33.7	1	31.6	0	0	0	-2.1
9	P	-6.706	19.7	1	14.9	0	0	0	-4.8
10	P	4.694	31.1	1	31.2	0	0	0	0.1
Note:
Modeling the differences from baseline (raw change scores) which means to subtract the baseline response from the remaining post-baseline responses. We were centering baseline response on its overall mean (26.406) and treated it as a covariate.

g4 <- gls(y ~  1 + cbaseline + succimer + t4 + t6 + succimer:t4 + succimer:t6,
          correlation = corSymm(form = ~1 | id),
          weights = varIdent(form = ~1 | time),method = "ML", data=tlc.l.adj.baseline)
summary(g4)

## Generalized least squares fit by maximum likelihood
##   Model: y ~ 1 + cbaseline + succimer + t4 + t6 + succimer:t4 + succimer:t6 
##   Data: tlc.l.adj.baseline 
##    AIC  BIC logLik
##   1848 1896 -910.8
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2    
## 2 0.667      
## 3 0.373 0.373
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time 
##  Parameter estimates:
##     1     4     6 
## 1.000 1.034 1.145 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  24.768    0.7751   31.95  0.0000
## cbaseline     0.805    0.0936    8.60  0.0000
## succimer    -11.354    1.0963  -10.36  0.0000
## t4           -0.590    0.6438   -0.92  0.3602
## t6           -1.014    0.9359   -1.08  0.2795
## succimer:t4   2.582    0.9105    2.84  0.0049
## succimer:t6   8.254    1.3236    6.24  0.0000
## 
##  Correlation: 
##             (Intr) cbasln succmr t4     t6     sccm:4
## cbaseline    0.016                                   
## succimer    -0.707 -0.023                            
## t4          -0.373  0.000  0.264                     
## t6          -0.475  0.000  0.336  0.325              
## succimer:t4  0.264  0.000 -0.373 -0.707 -0.230       
## succimer:t6  0.336  0.000 -0.475 -0.230 -0.707  0.325
## 
## Standardized residuals:
##      Min       Q1      Med       Q3      Max 
## -3.05737 -0.51661 -0.06426  0.41652  6.02635 
## 
## Residual standard error: 5.416 
## Degrees of freedom: 300 total; 293 residual

The Modified Test of interest

The test of interest is a joint test of the main effect of group and the group $\times$ time interaction.
The analysis of adjusted change scores reported in Table 5.9 produced a Wald statistic of 111, with 3 df, for jointly testing the effect of group and the group x time interaction. This is larger than the corresponding statistic under the third strategy, reflecting the greater efficiency of the analysis of adjusted change scores.

# Test for the interactions 
linearHypothesis(g4, c("succimer", "succimer:t4", "succimer:t6"), test = "Chisq")

Modeling in SAS

title1 Analysis of Response Profiles of Response Profile from Occasion 2 from Baseline;
title2 Treatment of Lead Exposed Children (TLC) Trial;


proc mixed noclprint=10 data=tlc_g34 order=data;
     class id time;
     model y = cbaseline succimer t4 t6 succimer*t4 succimer*t6 / s chisq;
     repeated time / type=un subject=id r;
     contrast '3 DF Test of Main Effect and Interaction'
          succimer 1, succimer*t4 1, succimer*t6 1 / chisq;
run;

## Analysis of Response Profiles of Response Profile from Occasion 2 from Base
##               Treatment of Lead Exposed Children (TLC) Trial
## 
##                             The Mixed Procedure
## 
##                             Model Information
## 
##           Data Set                     WORK.TLC_G34             
##           Dependent Variable           y                        
##           Covariance Structure         Unstructured             
##           Subject Effect               id                       
##           Estimation Method            REML                     
##           Residual Variance Method     None                     
##           Fixed Effects SE Method      Model-Based              
##           Degrees of Freedom Method    Between-Within           
## 
## 
##                           Class Level Information
##  
##              Class    Levels    Values
## 
##              id          100    not printed                   
##              time          3    1 4 6                         
## 
## 
##                                 Dimensions
## 
##                     Covariance Parameters             6
##                     Columns in X                      7
##                     Columns in Z                      0
##                     Subjects                        100
##                     Max Obs per Subject               3
## 
## 
##                           Number of Observations
## 
##                 Number of Observations Read             300
##                 Number of Observations Used             300
##                 Number of Observations Not Used           0
## 
## 
##                              Iteration History
##  
##         Iteration    Evaluations    -2 Res Log Like       Criterion
## 
##                 0              1      1895.77152040                
##                 1              2      1817.56562756      0.00000000
## 
## 
##                         Convergence criteria met.                    
## 
## 
##                         Estimated R Matrix for id 1
##  
##                   Row        Col1        Col2        Col3
## 
##                     1     30.1509     20.8640     12.9909
##                     2     20.8640     32.2303     13.4600
##                     3     12.9909     13.4600     39.4775
## 
## 
##                       Covariance Parameter Estimates
##  
##                       Cov Parm    Subject    Estimate
## 
##                       UN(1,1)     id          30.1509
##                       UN(2,1)     id          20.8640
##                       UN(2,2)     id          32.2303
##                       UN(3,1)     id          12.9909
##                       UN(3,2)     id          13.4600
##                       UN(3,3)     id          39.4775
## 
## 
##                               Fit Statistics
## 
##                    -2 Res Log Likelihood          1817.6
##                    AIC (Smaller is Better)        1829.6
##                    AICC (Smaller is Better)       1829.9
##                    BIC (Smaller is Better)        1845.2
## 
## 
##                      Null Model Likelihood Ratio Test
##  
##                        DF    Chi-Square      Pr > ChiSq
## 
##                         5         78.21          <.0001
## 
## 
##                         Solution for Fixed Effects
##  
##                                Standard
##     Effect         Estimate       Error      DF    t Value    Pr > |t|
## 
##     Intercept       24.7678      0.7766      97      31.89      <.0001
##     cbaseline        0.8045     0.09390      97       8.57      <.0001
##     succimer       -11.3536      1.0985      97     -10.34      <.0001
##     t4              -0.5900      0.6427      97      -0.92      0.3609
##     t6              -1.0140      0.9343      97      -1.09      0.2805
##     succimer*t4      2.5820      0.9089      97       2.84      0.0055
##     succimer*t6      8.2540      1.3213      97       6.25      <.0001
## 
## 
##                       Type 3 Tests of Fixed Effects
##  
##                 Num    Den
##  Effect          DF     DF   Chi-Square   F Value     Pr > ChiSq   Pr > F
## 
##  cbaseline        1     97        73.41     73.41         <.0001   <.0001
##  succimer         1     97       106.83    106.83         <.0001   <.0001
##  t4               1     97         0.84      0.84         0.3586   0.3609
##  t6               1     97         1.18      1.18         0.2778   0.2805
##  succimer*t4      1     97         8.07      8.07         0.0045   0.0055
##  succimer*t6      1     97        39.02     39.02         <.0001   <.0001
## 
## 
##                                  Contrasts
##  
##                                             Num   Den
##  Label                                       DF    DF  Chi-Square  F Value
## 
##  3 DF Test of Main Effect and Interaction     3    97      111.13    37.04
## 
##                                 Contrasts
##  
##      Label                                       Pr > ChiSq    Pr > F
## 
##      3 DF Test of Main Effect and Interaction        <.0001    <.0001

Comments (iii): S4 vs. S2

The strategy S4 for handling baseline is appropriate when analyzing data from randomized trials
S4 for handling baseline is appropriate also for observational studies where there is good reason to assume the groups have the same mean response at baseline. However, in observational studies where there is no a priori reason to assume that the groups have the same mean response at baseline, S4 should not be used.
S4 can be implemented by conducting the analysis of response profiles (with $Y_{i1}$ included as a covariate) on either the post-baseline responses $Y_i = (Y_{i2}, Y_{i3}, ... , Y_{in})^\prime$, or the post-baseline change scores $D_i= (Y_{i2} - Y_{i1}, Y_{i3} - Y_{i1}, ... , Y_{in} - Y_{i1})^\prime$ (Strategy 4 alternative below)
$\Longrightarrow$ S4 is an analysis of the “adjusted change scores” (i.e., $D_i$ adjusted for $Y_{i1}$) in contrast to an analysis of the raw or unadjusted change scores (S3).
The interpretation of the tests for all three effects:
- test for group $\times$ time interaction: a test for parallel profiles for the adjusted changes from baseline in the mean response on occasions 2 through n
- test for group effect: a test that the adjusted changes from baseline in the mean response at occasion 2 are the same across groups (assuming that occasion 2 is chosen as the reference level for time).
Wald statistic of 111 produced by S4 is quite similar to that obtained under S2 for adjusting for baseline. But S2 is prefered:
- joint tests of main effects and interactions, and these tests are not routinely produced as standard output from statistical software
- when there are subjects with missing baseline response, all their data are excluded from S4; in contrast, S2 incorporates all available data in the analysis.
- there is an implicit assumption in the adjusted change score analysis that the regression slope relating $Y_{ij}$ to $Y_{i1}$ (for $j = 2, ... , n$) is the same at all n - l post-baseline occasions. This implies a strong assumption about the covariances among $Y_{i1} , ... , Y_{in}$. Specifically, it constrains \[Cov(Y_{i1},Y_{i2}) = Cov(Y_{i1},Y_{i3})= ... = Cov(Y_{i1},Y_{in})\] $\Longrightarrow$ if the assumption of homogeneous regression slopes (relating $Y_{ij}$ to $Y_{i1}$, for j = 2, … , n) is relaxed and n - l separate regression slopes are estimated, then S4 and S2 are completely equivalent and produce identical tests and estimates of effects.

Strategy 4 Alternative

We can use the baseline value as a covariate in the analysis of the post-baseline responses

Dependent variable $change_i \ \ (D_i)$ = $Y_{i, j} - \hat{Y}_{i,1}$, additional controlling for centered baseline $centered.baseline_i \ \ (Y.c_{i,1})$ = $\hat{Y}_{i,1} - \bar{Y_{i,1}}$

\[ change_i = \beta_1 \textbf{1} + \alpha \mathbf{Y.c}_{i, j=1} + \beta_{G_2} \textbf{X}_{G_2} + \beta_{T_3} \textbf{X}_{T_3} + \beta_{T_4} \textbf{X}_{T_4} + \beta_{G_2 T_3} \textbf{X}_{G_2 T_3} + \beta_{G_2 T_4} \textbf{X}_{G_2 T_4} + \epsilon_i \]

Modeling in R

g4.alt <- gls(change ~  1 + cbaseline + succimer + t4 + t6 + succimer:t4 + succimer:t6,
          correlation = corSymm(form = ~1 | id),
          weights = varIdent(form = ~1 | time),method = "ML", data=tlc.l.adj.baseline)
summary(g4.alt)

## Generalized least squares fit by maximum likelihood
##   Model: change ~ 1 + cbaseline + succimer + t4 + t6 + succimer:t4 + succimer:t6 
##   Data: tlc.l.adj.baseline 
##    AIC  BIC logLik
##   1848 1896 -910.8
## 
## Correlation Structure: General
##  Formula: ~1 | id 
##  Parameter estimate(s):
##  Correlation: 
##   1     2    
## 2 0.667      
## 3 0.373 0.373
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | time 
##  Parameter estimates:
##     1     4     6 
## 1.000 1.034 1.145 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  -1.638    0.7751  -2.114  0.0354
## cbaseline    -0.195    0.0936  -2.089  0.0376
## succimer    -11.354    1.0963 -10.356  0.0000
## t4           -0.590    0.6438  -0.916  0.3602
## t6           -1.014    0.9359  -1.083  0.2795
## succimer:t4   2.582    0.9105   2.836  0.0049
## succimer:t6   8.254    1.3236   6.236  0.0000
## 
##  Correlation: 
##             (Intr) cbasln succmr t4     t6     sccm:4
## cbaseline    0.016                                   
## succimer    -0.707 -0.023                            
## t4          -0.373  0.000  0.264                     
## t6          -0.475  0.000  0.336  0.325              
## succimer:t4  0.264  0.000 -0.373 -0.707 -0.230       
## succimer:t6  0.336  0.000 -0.475 -0.230 -0.707  0.325
## 
## Standardized residuals:
##      Min       Q1      Med       Q3      Max 
## -3.05737 -0.51661 -0.06426  0.41652  6.02635 
## 
## Residual standard error: 5.416 
## Degrees of freedom: 300 total; 293 residual

The Modified Test of interest

# Test for the interactions 
linearHypothesis(g4.alt, c("succimer", "succimer:t4", "succimer:t6"), test = "Chisq")

Modeling in SAS

title1 Analysis of Response Profiles of Adjusted Changes from Baseline;
title2 Treatment of Lead Exposed Children (TLC) Trial;


proc mixed noclprint=10 data=tlc_g34 order=data;
     class id time;
     model change = cbaseline succimer t4 t6 succimer*t4 succimer*t6 / s chisq;
     repeated time / type=un subject=id r;
     contrast '3 DF Test of Main Effect and Interaction'
          succimer 1, succimer*t4 1, succimer*t6 1 / chisq;
run;

##       Analysis of Response Profiles of Adjusted Changes from Baseline
##               Treatment of Lead Exposed Children (TLC) Trial
## 
##                             The Mixed Procedure
## 
##                             Model Information
## 
##           Data Set                     WORK.TLC_G34             
##           Dependent Variable           change                   
##           Covariance Structure         Unstructured             
##           Subject Effect               id                       
##           Estimation Method            REML                     
##           Residual Variance Method     None                     
##           Fixed Effects SE Method      Model-Based              
##           Degrees of Freedom Method    Between-Within           
## 
## 
##                           Class Level Information
##  
##              Class    Levels    Values
## 
##              id          100    not printed                   
##              time          3    1 4 6                         
## 
## 
##                                 Dimensions
## 
##                     Covariance Parameters             6
##                     Columns in X                      7
##                     Columns in Z                      0
##                     Subjects                        100
##                     Max Obs per Subject               3
## 
## 
##                           Number of Observations
## 
##                 Number of Observations Read             300
##                 Number of Observations Used             300
##                 Number of Observations Not Used           0
## 
## 
##                              Iteration History
##  
##         Iteration    Evaluations    -2 Res Log Like       Criterion
## 
##                 0              1      1895.77152040                
##                 1              2      1817.56562756      0.00000000
## 
## 
##                         Convergence criteria met.                    
## 
## 
##                         Estimated R Matrix for id 1
##  
##                   Row        Col1        Col2        Col3
## 
##                     1     30.1509     20.8640     12.9909
##                     2     20.8640     32.2303     13.4600
##                     3     12.9909     13.4600     39.4775
## 
## 
##                       Covariance Parameter Estimates
##  
##                       Cov Parm    Subject    Estimate
## 
##                       UN(1,1)     id          30.1509
##                       UN(2,1)     id          20.8640
##                       UN(2,2)     id          32.2303
##                       UN(3,1)     id          12.9909
##                       UN(3,2)     id          13.4600
##                       UN(3,3)     id          39.4775
## 
## 
##                               Fit Statistics
## 
##                    -2 Res Log Likelihood          1817.6
##                    AIC (Smaller is Better)        1829.6
##                    AICC (Smaller is Better)       1829.9
##                    BIC (Smaller is Better)        1845.2
## 
## 
##                      Null Model Likelihood Ratio Test
##  
##                        DF    Chi-Square      Pr > ChiSq
## 
##                         5         78.21          <.0001
## 
## 
##                         Solution for Fixed Effects
##  
##                                Standard
##     Effect         Estimate       Error      DF    t Value    Pr > |t|
## 
##     Intercept       -1.6382      0.7766      97      -2.11      0.0375
##     cbaseline       -0.1955     0.09390      97      -2.08      0.0400
##     succimer       -11.3536      1.0985      97     -10.34      <.0001
##     t4              -0.5900      0.6427      97      -0.92      0.3609
##     t6              -1.0140      0.9343      97      -1.09      0.2805
##     succimer*t4      2.5820      0.9089      97       2.84      0.0055
##     succimer*t6      8.2540      1.3213      97       6.25      <.0001
## 
## 
##                       Type 3 Tests of Fixed Effects
##  
##                 Num    Den
##  Effect          DF     DF   Chi-Square   F Value     Pr > ChiSq   Pr > F
## 
##  cbaseline        1     97         4.33      4.33         0.0374   0.0400
##  succimer         1     97       106.83    106.83         <.0001   <.0001
##  t4               1     97         0.84      0.84         0.3586   0.3609
##  t6               1     97         1.18      1.18         0.2778   0.2805
##  succimer*t4      1     97         8.07      8.07         0.0045   0.0055
##  succimer*t6      1     97        39.02     39.02         <.0001   <.0001
## 
## 
##                                  Contrasts
##  
##                                             Num   Den
##  Label                                       DF    DF  Chi-Square  F Value
## 
##  3 DF Test of Main Effect and Interaction     3    97      111.13    37.04
## 
##                                 Contrasts
##  
##      Label                                       Pr > ChiSq    Pr > F
## 
##      3 DF Test of Main Effect and Interaction        <.0001    <.0001

Comments (iv): S4 vs. S4alt

The intuition for why these two analyses are identical is:
- the two outcomes differ by $Y_{i1}$, and both analyses estimate effects that are adjusted for $Y_{i1}$ by holding the baseline value fixed, they produce the same regression coefficients for all effects of interest
- the two analyses differ only in terms of the estimated slope for $Y_{i1}$. (The estimated slope from the analysis based on $Y_i$; is simply one unit larger than the estimated slope from the analysis based on $D_i$)
The modified tests of interest have the same value for S4 and S4alt.

Reference

Allison, Paul D., Richard Williams, and Enrique Moral-Benito. 2017. “Maximum Likelihood for Cross-Lagged Panel Models with Fixed Effects.” Journal Article. Socius : Sociological Research for a Dynamic World 3: 237802311771057. doi:10.1177/2378023117710578.

Berbaum, Michael. 2017. “Adjustment for Baseline Outcome in Longitudinal Analyses.” Unpublished Work. Institute for Health Resereach; Policy - Methodology Research Core.

Fitzmaurice, Garrett. 2001. “A Conundrum in the Analysis of Change.” Journal Article. Nutrition 17 (4): 360–61. doi:10.1016/S0899-9007(00)00593-1.

Fitzmaurice, Garrett M. 2004. Applied Longitudinal Analysis. Book. Wiley Series in Probability and Statistics. Hoboken, N.J: Wiley-Interscience.

Lord, Frederic M. 1967. “A Paradox in the Interpretation of Group Comparisons.” Journal Article. Psychological Bulletin 68 (5): 304–5. doi:10.1037/h0025105.

Mallinckrod†, Craig H., Peter W. Lane, Dan Schnell, Yahong Peng, and James P. Mancuso. 2008. “Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials.” Journal Article. Drug Information Journal 42 (4): 303–19. doi:10.1177/009286150804200402.

Adjustment for Baseline Response in Longitudinal Analyses

Hai Nguyen, Michael Berbaum

June 1, 2020

Motivation

Purpose

Case Study

Manipulate data in R

Read TLC data into R

Mean blood lead levels overtime

Transpose from wide to long form

Spaghetti plot

Manipulate data in SAS

Prepare data for strategy 1

Prepare data for strategy 2

Prepare data for strategy 3 & 4

Strategy 1

Modeling in R

Test for group x time interaction

Modeling in SAS

Strategy 2

Modeling in R

Test for group x time interaction

Modeling in SAS

Comment (i): S1 and S2

Strategy 3

Modeling in R

The Modified Test of interest

Modeling in SAS

Comment (ii): S3 vs. S1

Strategy 4

Modeling in R

The Modified Test of interest

Modeling in SAS

Comments (iii): S4 vs. S2

Strategy 4 Alternative

Modeling in R

The Modified Test of interest

Modeling in SAS

Comments (iv): S4 vs. S4alt

Reference