Class 11 (Nov 11, 2021)
The missing data mechanism is MNAR, if \[ Pr(R) \quad \text{depends on}\quad Y^{(m)}. \] where R is the observe data indicator, Pr(1-R) is the missing data probability.
In general,
All methods/models would work under MCAR, although with loss of efficiency (reduced sample size, multiple imputation might help)
Only LME and GLMM work when data is MAR
No statistical models (GLS, LME, FEM, GEE and GLMM) would work when missing is MNAR
When \(P(R)\) is not independent of \(Y\), especially \(Y^{(m)}\), we need to model \(\{R, Y^{(o)}, Y^{(m)}\}\) jointly
There are two general classes of models for MNAR (Ibrahim 2009, Magnusson 2020)
Selection models involves the modeling of missing, e.g., logistic regression for \(P(R)\). Special code must be written for the estimation and inferences. This is a hot area in statistical methodology field
Pattern-mixture models can be fitted with standard software and models, but it is problem-specific, aka, no general algorithm, due to special missing data pattern pertain to each specific study
Ibrahim JG and Molenberghs G. Missing data methods in longitudinal studies: a review. Test (Madr). May 1, 2009. 18(1): 1-43.
Magnusson 2020 https://rpsychologist.com/lmm-slope-missingness
Steps in conduting PMM
explore missing data patterns and divide the sample into exclusive sub-group
perform stratified analysis (e,g, via dummy-variable coding and interaction terms)
combine the results (both estimates and SEs)
For longitudinal data, we explore the missing pattern with regarding the outcome (\(Y\)) over time.
Suppose subjects were measured at three timepoints, there are eight (\(2^3\)) possible missing data patterns:
| P | T1 | T2 | T3 |
|---|---|---|---|
| 1 | O | O | O |
| 2 | O | O | M |
| 3 | O | M | O |
| 4 | M | O | O |
| 5 | O | M | M |
| 6 | M | O | M |
| 7 | M | M | O |
| 8 | M | M | M |
| Pattern | D1 | D2 | D3 | D4 | D5 | D6 |
|---|---|---|---|---|---|---|
| OOO | 0 | 0 | 0 | 0 | 0 | 0 |
| OOM | 1 | 0 | 0 | 0 | 0 | 0 |
| OMO | 0 | 1 | 0 | 0 | 0 | 0 |
| MOO | 0 | 0 | 1 | 0 | 0 | 0 |
| OMM | 0 | 0 | 0 | 1 | 0 | 0 |
| MOM | 0 | 0 | 0 | 0 | 1 | 0 |
| MMO | 0 | 0 | 0 | 0 | 0 | 1 |
these dummy-coded variables represent deviations from pattern OOO
stratm can then be divided using such dummy-coded variables
other coding schemes (i.e., re-grouping) are usually used to make the analysis managable or interpretable
Grouping for “complete data” vs. “incomplete data”
| G | T1 | T2 | T3 | D |
|---|---|---|---|---|
| 1 | O | O | O | 0 |
| 2 | O | O | M | 1 |
| 2 | O | M | O | 1 |
| 2 | M | O | O | 1 |
| 2 | O | M | M | 1 |
| 2 | M | O | M | 1 |
| 2 | M | M | O | 1 |
Grouping for “stayers” vs. “dropouts” (focus on T3, the last follow-up)
| G | T1 | T2 | T3 | D |
|---|---|---|---|---|
| 1 | O | O | O | 0 |
| 2 | O | O | M | 1 |
| 1 | O | M | O | 0 |
| 1 | M | O | O | 0 |
| 2 | O | M | M | 1 |
| 2 | M | O | M | 1 |
| 1 | M | M | O | 0 |
Standard analysis with longitudinal data (e.g., GLS, LME, FEM, GEE, GLMM) can be applied on the sub-groups (divided by the dummay variable).
For example, with 1 dummary variable D (where 0 for stayers and 1 for dropouts, e.g.) \[ g(E(Y_{ij}^{(D=0)})) = \beta_{0}^{(D=0)} + \beta_{1}^{(D=0)}X_{i}^{(D=0)} + \ldots + \beta_{p}^{(D=0)}X_{p}^{(D=0)} \] \[ g(E(Y_{ij}^{(D=1)})) = \beta_{0}^{(D=1)} + \beta_{1}^{(D=1)}X_{1i}^{(D=1)} + \ldots + \beta_{p}^{(D=1)}X_{pi}^{(D=1)} \]
Overall estimate can be pooled by \[ \hat{\beta} = \frac{n_{0}}{N}\hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{(D=1)} = \hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{Diff} \] \[ V(\hat{\beta}) = V(\hat{\beta}^{(D=0)}) + \frac{n_{0}n_{1}}{N^3} (\hat{\beta}^{Diff})^2 \] where \(n_{0}\), \(n_{1}\) and \(N\) are the size for \(D=0\), \(D=1\) and overall sample. \(\hat{\beta}^{Diff} = \hat{\beta}^{(D=1)} - \hat{\beta}^{(D=0)}\)
NIMH (National Institute of Mental Health) Schizophrenia Collaborative Study
Objective is on treatment related changes in overall severity
Outcome: item 79 of the Inpatient multidimensional psychatric Scale (imps79) (severity of illness; min=1, max = 7)
| value | coding |
|---|---|
| 1 | normal |
| 2 | borderline mentally ill |
| 3 | mildly ill |
| 4 | moderately ill |
| 5 | markedly ill |
| 6 | severely ill |
| 7 | among the most extremely ill |
Randimization:
Patients were randomly assigned to receive one of four medications: placebo, chlorpromazine, fluphenazine or thioridazine.
Previous analyses revealed similar effects for the three antipsychotic drug groups, they were combined in the present analysis
Data Elements:
id identifier for individual.
imps79 outcome
week 0 (baseline), 1, 2, 3, 4, 5, and 6
treatment 1: treated; 0: placebo
sex gender where 1: female 0 for male
## # A tibble: 6 × 5
## id imps79 week tx sex
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1103 5.5 0 1 1
## 2 1103 3 1 1 1
## 3 1103 2.5 3 1 1
## 4 1103 4 6 1 1
## 5 1104 6 0 1 1
## 6 1104 3 1 1 1
##
## 0 1 2 3 4 5 6
## 0 107 105 5 87 2 2 70
## 1 327 321 9 287 9 7 265
subjects measured at baseline & weekly for up to 6 wks
main measurement weeks were 0 (baseline), 1, 3, and 6
some subjects also measured at weeks 2, 4, and 5
almost all subjects were measured at baseline (434 of 437) and no subjects were only measured at baseline (i.e., all subjects had at least one post-baseline measurement)
some intermittent missingness, but dropout was a much more common pattern of missingness
102 of 437 subjects did not complete the trial (i.e., were not measured at week 6)
ds2 <-subset(ds, week == 0 | week == 1 | week ==3 | week == 6)
xyplot(imps79 ~ week|as.character(tx), group=id, type='l', data=ds2 )panel.smoother <- function(x, y) {
panel.xyplot(x, y, col = "gray50", pch=".")
panel.loess(x, y, lwd=2, col.line="red", span=1.3)
}
ds2$tx2 <- as.character(ds2$tx)in order to obtain mean profile (across the two arms)…
Recall that the objective is on treatment related changes in overall severity.
Such objective dictates the following model:
\[ g(E(Y_{ij})) = \beta_{0} + TX_{i}\beta_{1} +WEEK_{ij}\beta_{2} + TX_{i}*WEEK_{ij}\beta_{3}, \]
where
Remark: \(\hat{\beta}_{1}\) is expected to be zero (due to randomization).
library(gee)
#GEE
fit.gee <- gee(imps79 ~ 1 + tx + week + tx:week,
data = ds2, id=id, corstr="unstructured")## (Intercept) tx week tx:week
## 5.2674254 -0.2007796 -0.1749109 -0.1849286
## Estimate Robust S.E.
## (Intercept) 5.2574541 0.08450501
## tx -0.1592765 0.09805183
## week -0.1471853 0.02789367
## tx:week -0.2047821 0.03149928
library(lme4)
#LME
fit.lme <- lmer(imps79~1 + tx + week+ tx:week
+(1 + week|id),data=ds2)
(lme.res <- summary(fit.lme)$coef[,c(1,2)])## Estimate Std. Error
## (Intercept) 5.2424923 0.08748516
## tx -0.1597905 0.10066873
## week -0.1436872 0.02848115
## tx:week -0.2378565 0.03221564
Note that a “random intercept” and “random slope (wrt week)” LME model was fitted.
First, generate two subsets of data, one for “stayers” and one for “dropouts”
ids <- unique(ds2$id)
len <- length(ids)
stayer <- NULL
dropout <- NULL
for(i in 1:len){
tt <- subset(ds2, id%in% ids[i])
lv <- dim(tt)[1] #lv is the index for last visit
if(tt$week[lv] != 6){ #so, if last visit is not
#week 6, classify as dropout
dropout <- rbind(dropout, tt)
}else{
stayer <- rbind(stayer, tt)
}
}## [1] 1569 6
## [1] 437
## [1] 1315 6
## [1] 335
## [1] 254 6
## [1] 102
So, \(N=437\), \(n_{0}=335\), and \(n_{1}=102\).
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00275996 (tol = 0.002, component 1)
## Estimate Std. Error
## (Intercept) 5.08755892 0.10749837
## tx 0.03060813 0.12082150
## week -0.14894690 0.02901841
## tx:week -0.20969506 0.03261961
Remark: first column is the \[ \hat{\beta}^{(D=0)} \] and the second column is the \[ \sqrt{V(\hat{\beta}^{(D=0)}}) \]
fit.lme.dropout <- lmer(imps79~1 + tx + week+ tx:week
+(1 + week|id),data=dropout)
est.se.dropout <- summary(fit.lme.dropout)$coef[,c(1,2)]
est.se.dropout## Estimate Std. Error
## (Intercept) 5.4980814 0.1472960
## tx -0.3148219 0.1859374
## week -0.0748355 0.1099229
## tx:week -0.7757926 0.1387910
Remark: first column is the \[ \hat{\beta}^{(D=1)} \]
Recall the overall estimates and SEs can be pooled by \[ \hat{\beta} = \hat{\beta}^{(D=0)} + \frac{n_{1}}{N}\hat{\beta}^{Diff} \] \[ SE(\hat{\beta})= \sqrt{V(\hat{\beta})} = \sqrt{ V(\hat{\beta}^{(D=0)}) + \frac{n_{0}n_{1}}{N^3} (\hat{\beta}^{Diff})^2 } \] where \(n_{0}\), \(n_{1}\) and \(N\) are the size for \(D=0\), \(D=1\) and overall sample and \(\hat{\beta}^{Diff} = \hat{\beta}^{(D=1)} - \hat{\beta}^{(D=0)}\).
beta.diff <- est.se.dropout[,1] - est.se.stayer[,1]
beta.est <- est.se.stayer[,1] + (n1/N) * beta.diff
beta.se <- sqrt(est.se.stayer[,2]^2 + (n0*n1/N^3)*(beta.diff)^2)
(pmm.res <- cbind(beta.est, beta.se))## beta.est beta.se
## (Intercept) 5.18337881 0.10781884
## tx -0.05001856 0.12102352
## week -0.13164859 0.02905713
## tx:week -0.34182767 0.03457245
all.res <- data.frame(round(
cbind(gee.res,lme.res, pmm.res),3))
names(all.res) <- c("g.est", "g.se", "m.est",
"m.se", "p.est", "p.se")
all.res## g.est g.se m.est m.se p.est p.se
## (Intercept) 5.257 0.085 5.242 0.087 5.183 0.108
## tx -0.159 0.098 -0.160 0.101 -0.050 0.121
## week -0.147 0.028 -0.144 0.028 -0.132 0.029
## tx:week -0.205 0.031 -0.238 0.032 -0.342 0.035
Finally,
treatment’s effect at baseline is estimated near zero!
effect change for the treatment is estimated with larger magnitude than GEE and LME!
Taking care of the MNAR!
Still accessible with standard software (comparing to Selection Model)
Sensitivity analysis is important (both within the modeling and across with other model under different missing data mechanism)
Problem-specific, i.e., no general algorithm/code, users need to define sub-group for desired (or more interpretable or manageable) missing patterns
Handle missing in outcome variable only, extending to covariate missing is non-trivial (Little 1992, 2021)
Little R. Missing data assumptions. Annual Review of Statistics and Its Application. Aug 21, 2021. 8:89-107.
Repeat the PMM approch using the R markdown file, upload the final results to BB.