Amber G, “BIOS 623 (Longitudinal Data Analysis)”

Class 11 (Nov 11, 2021)

Class 11 – LAB missing data in longitudinal study (2)

Review of MNAR – missing not at random

The missing data mechanism is MNAR, if \[ Pr(R) \quad \text{depends on}\quad Y^{(m)}. \] where R is the observe data indicator, Pr(1-R) is the missing data probability.

In general,

Characteristics of MNAR

Ibrahim JG and Molenberghs G. Missing data methods in longitudinal studies: a review. Test (Madr). May 1, 2009. 18(1): 1-43.

Magnusson 2020 https://rpsychologist.com/lmm-slope-missingness

Pattern-mixture Methods (PMM)

Steps in conduting PMM

Missing data patterens

For longitudinal data, we explore the missing pattern with regarding the outcome (\(Y\)) over time.

Suppose subjects were measured at three timepoints, there are eight (\(2^3\)) possible missing data patterns:

P T1 T2 T3
1 O O O
2 O O M
3 O M O
4 M O O
5 O M M
6 M O M
7 M M O
8 M M M

Representing patterns with dummy-coded variables

Pattern D1 D2 D3 D4 D5 D6
OOO 0 0 0 0 0 0
OOM 1 0 0 0 0 0
OMO 0 1 0 0 0 0
MOO 0 0 1 0 0 0
OMM 0 0 0 1 0 0
MOM 0 0 0 0 1 0
MMO 0 0 0 0 0 1

Examples of grouping

Grouping for “complete data” vs. “incomplete data”

G T1 T2 T3 D
1 O O O 0
2 O O M 1
2 O M O 1
2 M O O 1
2 O M M 1
2 M O M 1
2 M M O 1

Examples of grouping

Grouping for “stayers” vs. “dropouts” (focus on T3, the last follow-up)

G T1 T2 T3 D
1 O O O 0
2 O O M 1
1 O M O 0
1 M O O 0
2 O M M 1
2 M O M 1
1 M M O 0

Stratified analysis and pooling

Standard analysis with longitudinal data (e.g., GLS, LME, FEM, GEE, GLMM) can be applied on the sub-groups (divided by the dummay variable).

For example, with 1 dummary variable D (where 0 for stayers and 1 for dropouts, e.g.) \[ g(E(Y_{ij}^{(D=0)})) = \beta_{0}^{(D=0)} + \beta_{1}^{(D=0)}X_{i}^{(D=0)} + \ldots + \beta_{p}^{(D=0)}X_{p}^{(D=0)} \] \[ g(E(Y_{ij}^{(D=1)})) = \beta_{0}^{(D=1)} + \beta_{1}^{(D=1)}X_{1i}^{(D=1)} + \ldots + \beta_{p}^{(D=1)}X_{pi}^{(D=1)} \]

Overall estimate can be pooled by \[ \hat{\beta} = \frac{n_{0}}{N}\hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{(D=1)} = \hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{Diff} \] \[ V(\hat{\beta}) = V(\hat{\beta}^{(D=0)}) + \frac{n_{0}n_{1}}{N^3} (\hat{\beta}^{Diff})^2 \] where \(n_{0}\), \(n_{1}\) and \(N\) are the size for \(D=0\), \(D=1\) and overall sample. \(\hat{\beta}^{Diff} = \hat{\beta}^{(D=1)} - \hat{\beta}^{(D=0)}\)

Schizophrenia Study

Schizophrenia Study (con’t)

Randimization:

Data Elements:

Read-in Data

library(haven); library(lattice)
ds <- read_sas("schizrep.sas7bdat")
head(ds)
## # A tibble: 6 × 5
##      id imps79  week    tx   sex
##   <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1  1103    5.5     0     1     1
## 2  1103    3       1     1     1
## 3  1103    2.5     3     1     1
## 4  1103    4       6     1     1
## 5  1104    6       0     1     1
## 6  1104    3       1     1     1

Examine observations by week and tx

t(table(ds$week, ds$tx))
##    
##       0   1   2   3   4   5   6
##   0 107 105   5  87   2   2  70
##   1 327 321   9 287   9   7 265

In general …

Working on a subset, 0 (baseline), 1, 3, and 6 week

ds2 <-subset(ds, week == 0 | week == 1 | week ==3 | week == 6)
xyplot(imps79 ~ week|as.character(tx), group=id, type='l', data=ds2 )

We call a smoother function …

panel.smoother <- function(x, y) {
  panel.xyplot(x, y, col = "gray50", pch=".")
  panel.loess(x, y, lwd=2, col.line="red", span=1.3)
}   
ds2$tx2 <- as.character(ds2$tx)

in order to obtain mean profile (across the two arms)…

xyplot(imps79 ~ week|tx2, group=id, type='l', col.line = "gray20", data=ds2, panel=panel.smoother )

Objective to model specification

Recall that the objective is on treatment related changes in overall severity.

Such objective dictates the following model:

\[ g(E(Y_{ij})) = \beta_{0} + TX_{i}\beta_{1} +WEEK_{ij}\beta_{2} + TX_{i}*WEEK_{ij}\beta_{3}, \]

where

Remark: \(\hat{\beta}_{1}\) is expected to be zero (due to randomization).

MCAR based analysis – GEE

library(gee)
#GEE
fit.gee <- gee(imps79 ~ 1 + tx + week + tx:week, 
              data = ds2, id=id, corstr="unstructured")
## (Intercept)          tx        week     tx:week 
##   5.2674254  -0.2007796  -0.1749109  -0.1849286
(gee.res <- summary(fit.gee)$coef[,c(1,4)])
##               Estimate Robust S.E.
## (Intercept)  5.2574541  0.08450501
## tx          -0.1592765  0.09805183
## week        -0.1471853  0.02789367
## tx:week     -0.2047821  0.03149928

MAR based analysis – LME (Linear mixed effects model)

library(lme4)
#LME
fit.lme <- lmer(imps79~1 + tx + week+ tx:week
                       +(1 + week|id),data=ds2)
(lme.res <- summary(fit.lme)$coef[,c(1,2)])
##               Estimate Std. Error
## (Intercept)  5.2424923 0.08748516
## tx          -0.1597905 0.10066873
## week        -0.1436872 0.02848115
## tx:week     -0.2378565 0.03221564

Note that a “random intercept” and “random slope (wrt week)” LME model was fitted.

A MNAR based model using PMM (pattern-mixuture model)

First, generate two subsets of data, one for “stayers” and one for “dropouts”

ids <- unique(ds2$id)
len <- length(ids)
stayer <- NULL
dropout <- NULL
for(i in 1:len){
  tt <- subset(ds2, id%in% ids[i])
  lv <- dim(tt)[1] #lv is the index for last visit
  if(tt$week[lv] != 6){ #so, if last visit is not 
                        #week 6, classify as dropout
    dropout <- rbind(dropout, tt)
  }else{
    stayer <- rbind(stayer, tt)
  }
}

Exam the subsets (and obtain \(N\), \(n_{0}\) and \(n_{1}\))

dim(ds2); (N <- length(unique(ds2$id)))
## [1] 1569    6
## [1] 437
dim(stayer); (n0 <- length(unique(stayer$id)))
## [1] 1315    6
## [1] 335
dim(dropout); (n1 <- length(unique(dropout$id)))
## [1] 254   6
## [1] 102

So, \(N=437\), \(n_{0}=335\), and \(n_{1}=102\).

Conduct standard analysis on “stayers”

fit.lme.stayer <- lmer(imps79~1 + tx + week+ tx:week
                       +(1 + week|id),data=stayer)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00275996 (tol = 0.002, component 1)
est.se.stayer <- summary(fit.lme.stayer)$coef[,c(1,2)]

est.se.stayer
##                Estimate Std. Error
## (Intercept)  5.08755892 0.10749837
## tx           0.03060813 0.12082150
## week        -0.14894690 0.02901841
## tx:week     -0.20969506 0.03261961

Remark: first column is the \[ \hat{\beta}^{(D=0)} \] and the second column is the \[ \sqrt{V(\hat{\beta}^{(D=0)}}) \]

Conduct standard analysis on “dropouts”

fit.lme.dropout <- lmer(imps79~1 + tx + week+ tx:week
                       +(1 + week|id),data=dropout)
est.se.dropout <- summary(fit.lme.dropout)$coef[,c(1,2)]
est.se.dropout
##               Estimate Std. Error
## (Intercept)  5.4980814  0.1472960
## tx          -0.3148219  0.1859374
## week        -0.0748355  0.1099229
## tx:week     -0.7757926  0.1387910

Remark: first column is the \[ \hat{\beta}^{(D=1)} \]

Pooling the results to get overall estimates/SE

Recall the overall estimates and SEs can be pooled by \[ \hat{\beta} = \hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{Diff} \] \[ SE(\hat{\beta})= \sqrt{V(\hat{\beta})} = \sqrt{ V(\hat{\beta}^{(D=0)}) + \frac{n_{0}n_{1}}{N^3} (\hat{\beta}^{Diff})^2 } \] where \(n_{0}\), \(n_{1}\) and \(N\) are the size for \(D=0\), \(D=1\) and overall sample and \(\hat{\beta}^{Diff} = \hat{\beta}^{(D=1)} - \hat{\beta}^{(D=0)}\).

beta.diff <- est.se.dropout[,1] - est.se.stayer[,1]
beta.est <- est.se.stayer[,1] + (n1/N) * beta.diff
beta.se <- sqrt(est.se.stayer[,2]^2 + (n0*n1/N^3)*(beta.diff)^2)
(pmm.res <- cbind(beta.est, beta.se))
##                beta.est    beta.se
## (Intercept)  5.18337881 0.10781884
## tx          -0.05001856 0.12102352
## week        -0.13164859 0.02905713
## tx:week     -0.34182767 0.03457245

Comparing the PMM based result with MCAR-GEE and MAR-LME

all.res <- data.frame(round(
  cbind(gee.res,lme.res, pmm.res),3))
names(all.res) <- c("g.est", "g.se", "m.est",
                    "m.se", "p.est", "p.se")
all.res
##              g.est  g.se  m.est  m.se  p.est  p.se
## (Intercept)  5.257 0.085  5.242 0.087  5.183 0.108
## tx          -0.159 0.098 -0.160 0.101 -0.050 0.121
## week        -0.147 0.028 -0.144 0.028 -0.132 0.029
## tx:week     -0.205 0.031 -0.238 0.032 -0.342 0.035

Finally,

Final Notes on PMM

Little R. Missing data assumptions. Annual Review of Statistics and Its Application. Aug 21, 2021. 8:89-107.

Lab (to be conducted during the week)

Repeat the PMM approch using the R markdown file, upload the final results to BB.