Review of MNAR – missing not at random

The missing data mechanism is MNAR, if \[ Pr(R) \quad \text{depends on}\quad Y^{(m)}. \] where R is the observe data indicator, Pr(1-R) is the missing data probability.

In general,

All methods/models would work under MCAR, although with loss of efficiency (reduced sample size, multiple imputation might help)
Only LME and GLMM work when data is MAR
No statistical models (GLS, LME, FEM, GEE and GLMM) would work when missing is MNAR

Characteristics of MNAR

When \(P(R)\) is not independent of \(Y\), especially \(Y^{(m)}\), we need to model \(\{R, Y^{(o)}, Y^{(m)}\}\) jointly
There are two general classes of models for MNAR (Ibrahim 2009, Magnusson 2020)
- Selection models: \(P(R, Y) = P(Y|R) P(R)\)
- Pattern-mixture models: \(P(R,Y)=P(R|Y)P(Y)\)
Selection models involves the modeling of missing, e.g., logistic regression for \(P(R)\). Special code must be written for the estimation and inferences. This is a hot area in statistical methodology field
Pattern-mixture models can be fitted with standard software and models, but it is problem-specific, aka, no general algorithm, due to special missing data pattern pertain to each specific study

Ibrahim JG and Molenberghs G. Missing data methods in longitudinal studies: a review. Test (Madr). May 1, 2009. 18(1): 1-43.

Magnusson 2020 https://rpsychologist.com/lmm-slope-missingness

Pattern-mixture Methods (PMM)

Steps in conduting PMM

explore missing data patterns and divide the sample into exclusive sub-group
perform stratified analysis (e,g, via dummy-variable coding and interaction terms)
combine the results (both estimates and SEs)

Missing data patterens

For longitudinal data, we explore the missing pattern with regarding the outcome (\(Y\)) over time.

Suppose subjects were measured at three timepoints, there are eight (\(2^3\)) possible missing data patterns:

P	T1	T2	T3
1	O	O	O
2	O	O	M
3	O	M	O
4	M	O	O
5	O	M	M
6	M	O	M
7	M	M	O
8	M	M	M

As the number of repeated measures increase, the pattern increase exponentially!

Representing patterns with dummy-coded variables

Pattern	D1	D2	D3	D4	D5	D6
OOO	0	0	0	0	0	0
OOM	1	0	0	0	0	0
OMO	0	1	0	0	0	0
MOO	0	0	1	0	0	0
OMM	0	0	0	1	0	0
MOM	0	0	0	0	1	0
MMO	0	0	0	0	0	1

these dummy-coded variables represent deviations from pattern OOO
stratm can then be divided using such dummy-coded variables
other coding schemes (i.e., re-grouping) are usually used to make the analysis managable or interpretable

Examples of grouping

Grouping for “complete data” vs. “incomplete data”

G	T1	T2	T3	D
1	O	O	O	0
2	O	O	M	1
2	O	M	O	1
2	M	O	O	1
2	O	M	M	1
2	M	O	M	1
2	M	M	O	1

Examples of grouping

Grouping for “stayers” vs. “dropouts” (focus on T3, the last follow-up)

G	T1	T2	T3	D
1	O	O	O	0
2	O	O	M	1
1	O	M	O	0
1	M	O	O	0
2	O	M	M	1
2	M	O	M	1
1	M	M	O	0

Stratified analysis and pooling

Standard analysis with longitudinal data (e.g., GLS, LME, FEM, GEE, GLMM) can be applied on the sub-groups (divided by the dummay variable).

For example, with 1 dummary variable D (where 0 for stayers and 1 for dropouts, e.g.) \[ g(E(Y_{ij}^{(D=0)})) = \beta_{0}^{(D=0)} + \beta_{1}^{(D=0)}X_{i}^{(D=0)} + \ldots + \beta_{p}^{(D=0)}X_{p}^{(D=0)} \] \[ g(E(Y_{ij}^{(D=1)})) = \beta_{0}^{(D=1)} + \beta_{1}^{(D=1)}X_{1i}^{(D=1)} + \ldots + \beta_{p}^{(D=1)}X_{pi}^{(D=1)} \]

Overall estimate can be pooled by \[ \hat{\beta} = \frac{n_{0}}{N}\hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{(D=1)} = \hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{Diff} \] \[ V(\hat{\beta}) = V(\hat{\beta}^{(D=0)}) + \frac{n_{0}n_{1}}{N^3} (\hat{\beta}^{Diff})^2 \] where \(n_{0}\), \(n_{1}\) and \(N\) are the size for \(D=0\), \(D=1\) and overall sample. \(\hat{\beta}^{Diff} = \hat{\beta}^{(D=1)} - \hat{\beta}^{(D=0)}\)

Schizophrenia Study

NIMH (National Institute of Mental Health) Schizophrenia Collaborative Study
Objective is on treatment related changes in overall severity
Outcome: item 79 of the Inpatient multidimensional psychatric Scale (imps79) (severity of illness; min=1, max = 7)

value coding

1 normal

2 borderline mentally ill

3 mildly ill

4 moderately ill

5 markedly ill

6 severely ill

7 among the most extremely ill

value	coding
1	normal
2	borderline mentally ill
3	mildly ill
4	moderately ill
5	markedly ill
6	severely ill
7	among the most extremely ill

Schizophrenia Study (con’t)

Randimization:

Patients were randomly assigned to receive one of four medications: placebo, chlorpromazine, fluphenazine or thioridazine.
Previous analyses revealed similar effects for the three antipsychotic drug groups, they were combined in the present analysis

Data Elements:

id identifier for individual.
imps79 outcome
week 0 (baseline), 1, 2, 3, 4, 5, and 6
treatment 1: treated; 0: placebo
sex gender where 1: female 0 for male

Read-in Data

library(haven); library(lattice)
ds <- read_sas("schizrep.sas7bdat")
head(ds)

## # A tibble: 6 × 5
##      id imps79  week    tx   sex
##   <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1  1103    5.5     0     1     1
## 2  1103    3       1     1     1
## 3  1103    2.5     3     1     1
## 4  1103    4       6     1     1
## 5  1104    6       0     1     1
## 6  1104    3       1     1     1

Examine observations by week and tx

t(table(ds$week, ds$tx))

##    
##       0   1   2   3   4   5   6
##   0 107 105   5  87   2   2  70
##   1 327 321   9 287   9   7 265

In general …

subjects measured at baseline & weekly for up to 6 wks
main measurement weeks were 0 (baseline), 1, 3, and 6
some subjects also measured at weeks 2, 4, and 5
almost all subjects were measured at baseline (434 of 437) and no subjects were only measured at baseline (i.e., all subjects had at least one post-baseline measurement)
some intermittent missingness, but dropout was a much more common pattern of missingness
102 of 437 subjects did not complete the trial (i.e., were not measured at week 6)

Working on a subset, 0 (baseline), 1, 3, and 6 week

ds2 <-subset(ds, week == 0 | week == 1 | week ==3 | week == 6)
xyplot(imps79 ~ week|as.character(tx), group=id, type='l', data=ds2 )

We call a smoother function …

panel.smoother <- function(x, y) {
  panel.xyplot(x, y, col = "gray50", pch=".")
  panel.loess(x, y, lwd=2, col.line="red", span=1.3)
}   
ds2$tx2 <- as.character(ds2$tx)

in order to obtain mean profile (across the two arms)…

xyplot(imps79 ~ week|tx2, group=id, type='l', col.line = "gray20", data=ds2, panel=panel.smoother )

Objective to model specification

Recall that the objective is on treatment related changes in overall severity.

Such objective dictates the following model:

\[ g(E(Y_{ij})) = \beta_{0} + TX_{i}\beta_{1} +WEEK_{ij}\beta_{2} + TX_{i}*WEEK_{ij}\beta_{3}, \]

where

\(\beta_{0}\) (intercept) measure placebo effect at baseline
\(\beta_{1}\) estimates the baseline treatment effect
\(\beta_{2}\) is the effect of change for placebo
\(\beta_{3}\) estimates the treatment effect changes.

Remark: \(\hat{\beta}_{1}\) is expected to be zero (due to randomization).

MCAR based analysis – GEE

library(gee)
#GEE
fit.gee <- gee(imps79 ~ 1 + tx + week + tx:week, 
              data = ds2, id=id, corstr="unstructured")

## (Intercept)          tx        week     tx:week 
##   5.2674254  -0.2007796  -0.1749109  -0.1849286

(gee.res <- summary(fit.gee)$coef[,c(1,4)])

##               Estimate Robust S.E.
## (Intercept)  5.2574541  0.08450501
## tx          -0.1592765  0.09805183
## week        -0.1471853  0.02789367
## tx:week     -0.2047821  0.03149928

MAR based analysis – LME (Linear mixed effects model)

library(lme4)
#LME
fit.lme <- lmer(imps79~1 + tx + week+ tx:week
                       +(1 + week|id),data=ds2)
(lme.res <- summary(fit.lme)$coef[,c(1,2)])

##               Estimate Std. Error
## (Intercept)  5.2424923 0.08748516
## tx          -0.1597905 0.10066873
## week        -0.1436872 0.02848115
## tx:week     -0.2378565 0.03221564

Note that a “random intercept” and “random slope (wrt week)” LME model was fitted.

A MNAR based model using PMM (pattern-mixuture model)

First, generate two subsets of data, one for “stayers” and one for “dropouts”

ids <- unique(ds2$id)
len <- length(ids)
stayer <- NULL
dropout <- NULL
for(i in 1:len){
  tt <- subset(ds2, id%in% ids[i])
  lv <- dim(tt)[1] #lv is the index for last visit
  if(tt$week[lv] != 6){ #so, if last visit is not 
                        #week 6, classify as dropout
    dropout <- rbind(dropout, tt)
  }else{
    stayer <- rbind(stayer, tt)
  }
}

Exam the subsets (and obtain \(N\), \(n_{0}\) and \(n_{1}\))

dim(ds2); (N <- length(unique(ds2$id)))

## [1] 1569    6

## [1] 437

dim(stayer); (n0 <- length(unique(stayer$id)))

## [1] 1315    6

## [1] 335

dim(dropout); (n1 <- length(unique(dropout$id)))

## [1] 254   6

## [1] 102

So, \(N=437\), \(n_{0}=335\), and \(n_{1}=102\).

Conduct standard analysis on “stayers”

fit.lme.stayer <- lmer(imps79~1 + tx + week+ tx:week
                       +(1 + week|id),data=stayer)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00275996 (tol = 0.002, component 1)

est.se.stayer <- summary(fit.lme.stayer)$coef[,c(1,2)]

est.se.stayer

##                Estimate Std. Error
## (Intercept)  5.08755892 0.10749837
## tx           0.03060813 0.12082150
## week        -0.14894690 0.02901841
## tx:week     -0.20969506 0.03261961

Remark: first column is the \[ \hat{\beta}^{(D=0)} \] and the second column is the \[ \sqrt{V(\hat{\beta}^{(D=0)}}) \]

Conduct standard analysis on “dropouts”

fit.lme.dropout <- lmer(imps79~1 + tx + week+ tx:week
                       +(1 + week|id),data=dropout)
est.se.dropout <- summary(fit.lme.dropout)$coef[,c(1,2)]
est.se.dropout

##               Estimate Std. Error
## (Intercept)  5.4980814  0.1472960
## tx          -0.3148219  0.1859374
## week        -0.0748355  0.1099229
## tx:week     -0.7757926  0.1387910

Remark: first column is the \[ \hat{\beta}^{(D=1)} \]

Pooling the results to get overall estimates/SE

Recall the overall estimates and SEs can be pooled by \[ \hat{\beta} = \hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{Diff} \] \[ SE(\hat{\beta})= \sqrt{V(\hat{\beta})} = \sqrt{ V(\hat{\beta}^{(D=0)}) + \frac{n_{0}n_{1}}{N^3} (\hat{\beta}^{Diff})^2 } \] where \(n_{0}\), \(n_{1}\) and \(N\) are the size for \(D=0\), \(D=1\) and overall sample and \(\hat{\beta}^{Diff} = \hat{\beta}^{(D=1)} - \hat{\beta}^{(D=0)}\).

beta.diff <- est.se.dropout[,1] - est.se.stayer[,1]
beta.est <- est.se.stayer[,1] + (n1/N) * beta.diff
beta.se <- sqrt(est.se.stayer[,2]^2 + (n0*n1/N^3)*(beta.diff)^2)
(pmm.res <- cbind(beta.est, beta.se))

##                beta.est    beta.se
## (Intercept)  5.18337881 0.10781884
## tx          -0.05001856 0.12102352
## week        -0.13164859 0.02905713
## tx:week     -0.34182767 0.03457245

Comparing the PMM based result with MCAR-GEE and MAR-LME

all.res <- data.frame(round(
  cbind(gee.res,lme.res, pmm.res),3))
names(all.res) <- c("g.est", "g.se", "m.est",
                    "m.se", "p.est", "p.se")
all.res

##              g.est  g.se  m.est  m.se  p.est  p.se
## (Intercept)  5.257 0.085  5.242 0.087  5.183 0.108
## tx          -0.159 0.098 -0.160 0.101 -0.050 0.121
## week        -0.147 0.028 -0.144 0.028 -0.132 0.029
## tx:week     -0.205 0.031 -0.238 0.032 -0.342 0.035

Finally,

treatment’s effect at baseline is estimated near zero!
effect change for the treatment is estimated with larger magnitude than GEE and LME!

Final Notes on PMM

Taking care of the MNAR!
Still accessible with standard software (comparing to Selection Model)
Sensitivity analysis is important (both within the modeling and across with other model under different missing data mechanism)
Problem-specific, i.e., no general algorithm/code, users need to define sub-group for desired (or more interpretable or manageable) missing patterns
Handle missing in outcome variable only, extending to covariate missing is non-trivial (Little 1992, 2021)

Little R. Missing data assumptions. Annual Review of Statistics and Its Application. Aug 21, 2021. 8:89-107.

Amber G, “BIOS 623 (Longitudinal Data Analysis)”

Class 11 – LAB missing data in longitudinal study (2)

Review of MNAR – missing not at random

Characteristics of MNAR

Pattern-mixture Methods (PMM)

Missing data patterens

Representing patterns with dummy-coded variables

Examples of grouping

Examples of grouping

Stratified analysis and pooling

Schizophrenia Study

Schizophrenia Study (con’t)

Read-in Data

Examine observations by week and tx

In general …

Working on a subset, 0 (baseline), 1, 3, and 6 week

We call a smoother function …

Objective to model specification

MCAR based analysis – GEE

MAR based analysis – LME (Linear mixed effects model)

A MNAR based model using PMM (pattern-mixuture model)

Exam the subsets (and obtain \(N\), \(n_{0}\) and \(n_{1}\))

Conduct standard analysis on “stayers”

Conduct standard analysis on “dropouts”

Pooling the results to get overall estimates/SE

Comparing the PMM based result with MCAR-GEE and MAR-LME

Final Notes on PMM

Lab (to be conducted during the week)

Pattern	D1	D2	D3	D4	D5	D6
OOO	0	0	0	0	0	0
OOM	1	0	0	0	0	0
OMO	0	1	0	0	0	0
MOO	0	0	1	0	0	0
OMM	0	0	0	1	0	0
MOM	0	0	0	0	1	0
MMO	0	0	0	0	0	1

Pattern	D1	D2	D3	D4	D5	D6
OOO	0	0	0	0	0	0
OOM	1	0	0	0	0	0
OMO	0	1	0	0	0	0
MOO	0	0	1	0	0	0
OMM	0	0	0	1	0	0
MOM	0	0	0	0	1	0
MMO	0	0	0	0	0	1

Pattern	D1	D2	D3	D4	D5	D6
OOO	0	0	0	0	0	0
OOM	1	0	0	0	0	0
OMO	0	1	0	0	0	0
MOO	0	0	1	0	0	0
OMM	0	0	0	1	0	0
MOM	0	0	0	0	1	0
MMO	0	0	0	0	0	1