This is a report focusing on Hierarchal Modeling of notes containing phrases associated with end of life decisions and their implementation by the clinical care team. In particular, we will focus on the National Quality Forum Measure #1626:
Percentage of vulnerable adults admitted to ICU who survive at least 48 hours who have their care preferences documented within 48 hours OR documentation as to why this was not done.
Generalized Linear Mixed Effects Models (or Hierarchal Models) will be used for this analysis. We will use this method, as the Generalized Linear Mixed-Effects Model allows us to predict a binary outcome, which in our case is the implementation of the NQF Measure. This analysis will draw heavily from:
The general form of the model, in matrix notation, is given:
\[\mathbf{y} = \boldsymbol{X\beta} + \boldsymbol{Zu} + \boldsymbol{\varepsilon}\]
Where:
Stated formally:
\[\overbrace{\mathbf{y}}^{\mbox{N x 1}} \quad = \quad \overbrace{\underbrace{\mathbf{X}}_{\mbox{N x p}} \quad \underbrace{\boldsymbol{\beta}}_{\mbox{p x 1}}}^{\mbox{N x 1}} \quad + \quad \overbrace{\underbrace{\mathbf{Z}}_{\mbox{N x q}} \quad \underbrace{\boldsymbol{u}}_{\mbox{q x 1}}}^{\mbox{N x 1}} \quad + \quad \overbrace{\boldsymbol{\varepsilon}}^{\mbox{N x 1}}\]
These data were pulled from MIMIC-III, and represent all patients aged > 75 at the time of admission who had Physicians' Notes logged within that timeframe.
library("ggplot2") ## Graphing
library("GGally") ## Extension to ggplot
library("reshape2") ## for melt()
library("lme4") ## for Hierachal Models
library("sjPlot") ## Lovely Presentation of Model Output
These data were generated as part of NQF #1626 Care Providers Summary.
dat <- read.csv("~/nqf_caregivers/data/NQF_Att_Res_03Jun18.csv",
header = T, stringsAsFactors = F)
colnames(dat)
## [1] "SUBJECT_ID" "HADM_ID" "CGID"
## [4] "ROW_ID" "CHARTDATE" "CHARTTIME"
## [7] "STORETIME" "CATEGORY" "NOTE_DESCRIPTION"
## [10] "ISERROR" "TEXT" "FAM.machine"
## [13] "CIM.machine" "LIM.machine" "CAR.machine"
## [16] "COD.machine" "CIM_post.machine" "LABEL"
## [19] "CG_DESCRIPTION" "ADMITTIME" "DISCHTIME"
## [22] "DEATHTIME" "ADMISSION_TYPE" "ADMISSION_LOCATION"
## [25] "DISCHARGE_LOCATION" "INSURANCE" "LANGUAGE"
## [28] "RELIGION" "MARITAL_STATUS" "ETHNICITY"
## [31] "EDREGTIME" "EDOUTTIME" "DIAGNOSIS"
## [34] "HOSPITAL_EXPIRE_FLAG" "HAS_CHARTEVENTS_DATA" "GENDER"
## [37] "DOB" "DOD" "DOD_HOSP"
## [40] "DOD_SSN" "EXPIRE_FLAG" "ICUSTAY_ID"
## [43] "DBSOURCE" "FIRST_CAREUNIT" "LAST_CAREUNIT"
## [46] "FIRST_WARDID" "LAST_WARDID" "INTIME"
## [49] "OUTTIME" "LOS" "WORD_COUNT"
## [52] "NCHAR" "TIME_SINCE_ADMIT"
TEXT as the column is unnecessary but very large.## Remove TEXT
dat$TEXT <- NULL
## Factor
dat <- within(dat, {
FIRST_CAREUNIT <- factor(FIRST_CAREUNIT)
CGID <- factor(CGID)
CG_DESCRIPTION <- factor(CG_DESCRIPTION)
SUBJECT_ID <- factor(SUBJECT_ID)
})
Check number of patients, care providers, and hospital admissions.
cat("We have", length(unique(dat$SUBJECT_ID)), "unique patients.\n")
## We have 1075 unique patients.
cat("We have", length(unique(dat$CGID)), "unique care providers.\n")
## We have 464 unique care providers.
cat("We have", length(unique(dat$HADM_ID)), "unique hospital admissions.\n")
## We have 1268 unique hospital admissions.
Thus care providers (\(q = 464\)) indexed by the j subscript each see \(n_{j}\) patients. Our grouping variable is the doctor, and the total number of patients is given by the sum of the patients seen by each doctor: \(N = \sum_{j}^q n_j\). Our patients (\(N = 1075\)) are considered in terms of the binary outcome of having their care preferences documented or not given the following fixed effect predictors:
TIME_SINCE_ADMIT (discrete)FIRST_CAREUNIT (categorical)CG_DESCRIPTION (categorical)We will also assume a random effect for care providers, CGID.
Formally:
\[\overbrace{\mathbf{y}}^{\mbox{1075 x 1}} \quad = \quad \overbrace{\underbrace{\mathbf{X}}_{\mbox{1075 x 4}} \quad \underbrace{\boldsymbol{\beta}}_{\mbox{4 x 1}}}^{\mbox{1075 x 1}} \quad + \quad \overbrace{\underbrace{\mathbf{Z}}_{\mbox{1075 x 464}} \quad \underbrace{\boldsymbol{u}}_{\mbox{464 x 1}}}^{\mbox{1075 x 1}} \quad + \quad \overbrace{\boldsymbol{\varepsilon}}^{\mbox{1075 x 1}}\]
ggpairs(dat[ , c("TIME_SINCE_ADMIT", "FIRST_CAREUNIT", "CG_DESCRIPTION")])
Below we use the glmer command to estimate a mixed effects logistic regression model with FIRST_CAREUNIT, CG_DESCRIPTION as hospital-level categorical preditors, TIME_SINCE_ADMIT a patient level continuous predictor, and a random intercept by CGID, or provider ID.
## Initial model to inform prior probabilities
m <- glmer(CIM.machine ~
TIME_SINCE_ADMIT + ## Fixed Effect (No Pooling)
FIRST_CAREUNIT + ## Fixed Effect (No Pooling)
CG_DESCRIPTION + ## Fixed Effect (No Pooling)
(1 | CGID), ## Random Effect (Partial Pooling)
data = dat,
family = binomial,
control = glmerControl(optimizer = "bobyqa"), ## Good optimizer to avoid non-convergence
nAGQ = 10) ## Default value 1, higher values increase estimate accuracy
## View
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
## Family: binomial ( logit )
## Formula:
## CIM.machine ~ TIME_SINCE_ADMIT + FIRST_CAREUNIT + CG_DESCRIPTION +
## (1 | CGID)
## Data: dat
## Control: glmerControl(optimizer = "bobyqa")
##
## AIC BIC logLik deviance df.resid
## 14215.0 14273.5 -7099.5 14199.0 11096
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.2801 -0.7880 -0.5058 0.9837 3.5087
##
## Random effects:
## Groups Name Variance Std.Dev.
## CGID (Intercept) 0.8456 0.9196
## Number of obs: 11104, groups: CGID, 464
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.79524 0.14836 -5.360 8.31e-08
## TIME_SINCE_ADMIT -0.07629 0.03181 -2.398 0.0165
## FIRST_CAREUNITCSRU -0.34878 0.16977 -2.054 0.0399
## FIRST_CAREUNITMICU 0.33219 0.07358 4.515 6.34e-06
## FIRST_CAREUNITSICU 0.42726 0.10691 3.997 6.43e-05
## FIRST_CAREUNITTSICU 0.58015 0.12860 4.511 6.44e-06
## CG_DESCRIPTIONResident/Fellow/PA/NP 0.08846 0.14334 0.617 0.5372
##
## (Intercept) ***
## TIME_SINCE_ADMIT *
## FIRST_CAREUNITCSRU *
## FIRST_CAREUNITMICU ***
## FIRST_CAREUNITSICU ***
## FIRST_CAREUNITTSICU ***
## CG_DESCRIPTIONResident/Fellow/PA/NP
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) TIME_S FIRST_CAREUNITC FIRST_CAREUNITM
## TIME_SINCE_ -0.261
## FIRST_CAREUNITC -0.174 -0.014
## FIRST_CAREUNITM -0.362 -0.008 0.296
## FIRST_CAREUNITS -0.297 0.005 0.273 0.564
## FIRST_CAREUNITT -0.271 0.006 0.251 0.469
## CG_DESCRIPT -0.795 0.012 0.003 -0.036
## FIRST_CAREUNITS FIRST_CAREUNITT
## TIME_SINCE_
## FIRST_CAREUNITC
## FIRST_CAREUNITM
## FIRST_CAREUNITS
## FIRST_CAREUNITT 0.529
## CG_DESCRIPT -0.026 -0.014
sjt.glmer(m)
| Â | Â | CIM.machine | ||
| Â | Â | Odds Ratio | CI | p |
| Fixed Parts | ||||
| (Intercept) |  | 0.45 | 0.34 – 0.60 | <.001 |
| TIME_SINCE_ADMIT |  | 0.93 | 0.87 – 0.99 | .016 |
| FIRST_CAREUNIT (CSRU) |  | 0.71 | 0.51 – 0.98 | .040 |
| FIRST_CAREUNIT (MICU) |  | 1.39 | 1.21 – 1.61 | <.001 |
| FIRST_CAREUNIT (SICU) |  | 1.53 | 1.24 – 1.89 | <.001 |
| FIRST_CAREUNIT (TSICU) |  | 1.79 | 1.39 – 2.30 | <.001 |
| CG_DESCRIPTION (Resident/Fellow/PA/NP) |  | 1.09 | 0.82 – 1.45 | .537 |
| Random Parts | ||||
| τ00, CGID |  | 0.846 | ||
| NCGID | Â | 464 | ||
| ICCCGID | Â | 0.204 | ||
| Observations | Â | 11104 | ||
| Deviance | Â | 13296.641 | ||
Note: this model suggests that the care provider description (Attending or Resident/etc.) does not have a significant effect on care preference documentation
## Subset a portion of data
tmpdat <- dat[, c("TIME_SINCE_ADMIT",
"FIRST_CAREUNIT",
"CG_DESCRIPTION",
"CGID")]
## Generate a sequence using TIME_SINCE_ADMIT for plotting
j_values <- with(dat,
seq(from = min(TIME_SINCE_ADMIT),
to = max(TIME_SINCE_ADMIT), length.out = 100))
## Calculate predicted probabilities and store in a list
pp <- lapply(j_values, function(j) {
## Store sequence values
tmpdat$TIME_SINCE_ADMIT <- j
## Predict from first model
predict(m, newdata = tmpdat, type = "response")
})
## Get the means with lower and upper quartiles
plotdat <- t(sapply(pp, function(x) {
c(M = mean(x), quantile(x, c(0.25, 0.75)))
}))
## Add TIME_SINCE_ADMIT values and convert to data frame
plotdat <- as.data.frame(cbind(plotdat, j_values))
## Apply better names
colnames(plotdat) <- c("Predicted_Probability", "Lower", "Upper", "TIME_SINCE_ADMIT")
head(plotdat)
## Predicted_Probability Lower Upper TIME_SINCE_ADMIT
## 1 0.4210280 0.3122511 0.5225425 0.00000000
## 2 0.4206883 0.3119203 0.5221580 0.02020202
## 3 0.4203487 0.3115896 0.5217734 0.04040404
## 4 0.4200091 0.3112591 0.5213889 0.06060606
## 5 0.4196696 0.3109288 0.5210043 0.08080808
## 6 0.4193301 0.3105987 0.5206196 0.10101010
# plot average marginal predicted probabilities
ggplot(plotdat, aes(x = TIME_SINCE_ADMIT, y = Predicted_Probability)) + geom_ribbon(aes(ymin = Lower, ymax = Upper), alpha = 0.25) + geom_line(size = 2) + ylim(c(0, 1)) +
ggtitle("Care Measure Implementation as a Function of\nTime Since Admission (All ICUs)")
## Calculate predicted probabilities and store in a list
biprobs <- lapply(levels(dat$FIRST_CAREUNIT), function(stage) {
tmpdat$FIRST_CAREUNIT[] <- stage
lapply(j_values, function(j) {
tmpdat$TIME_SINCE_ADMIT <- j
predict(m, newdata = tmpdat, type = "response")
})
})
## Get means and quartiles for all jvalues for each level of FIRST_CAREUNIT
plotdat2 <- lapply(biprobs, function(X) {
temp <- t(sapply(X, function(x) {
c(M=mean(x), quantile(x, c(.25, .75)))
}))
temp <- as.data.frame(cbind(temp, j_values))
colnames(temp) <- c("Predicted_Probability", "Lower", "Upper", "TIME_SINCE_ADMIT")
return(temp)
})
## Collapse to one data frame
plotdat2 <- do.call(rbind, plotdat2)
## Add Care Unit
plotdat2$FIRST_CAREUNIT <- factor(rep(levels(dat$FIRST_CAREUNIT), each = length(j_values)))
## Show Head
head(plotdat2)
## Predicted_Probability Lower Upper TIME_SINCE_ADMIT
## 1 0.3577822 0.2497234 0.4501221 0.00000000
## 2 0.3574594 0.2494347 0.4497406 0.02020202
## 3 0.3571367 0.2491463 0.4493593 0.04040404
## 4 0.3568141 0.2488581 0.4489780 0.06060606
## 5 0.3564917 0.2485701 0.4485967 0.08080808
## 6 0.3561693 0.2482824 0.4482155 0.10101010
## FIRST_CAREUNIT
## 1 CCU
## 2 CCU
## 3 CCU
## 4 CCU
## 5 CCU
## 6 CCU
colnames(plotdat2)[1] <- "Probability_of_CIM_Caremeasure_Implementation"
## Graph
ggplot(plotdat2, aes(x = TIME_SINCE_ADMIT, y = Probability_of_CIM_Caremeasure_Implementation)) +
geom_ribbon(aes(ymin = Lower, ymax = Upper, fill = FIRST_CAREUNIT), alpha = 0.15) +
geom_line(aes(colour = FIRST_CAREUNIT), size = 2) +
ylim(c(0, 1)) + facet_wrap( ~ FIRST_CAREUNIT) + ggtitle("Probability of Caremeasure Implementation by ICU\n as Function of Time Since Admission") + xlab("Time Since Admission") + ylab("Probability of Care Measure Implementation")
## Generate Residual Data
resid_dat <- data.frame(tmpdat,
resid = residuals(m, type = "pearson"),
fitted = fitted(m))
ggplot(resid_dat, aes(x = FIRST_CAREUNIT, y = resid)) + geom_boxplot() + coord_flip()
ggplot(resid_dat, aes(x = CG_DESCRIPTION, y = resid)) + geom_boxplot() + coord_flip()
##lattice::dotplot(ranef(m, which = "CGID", condVar = TRUE), scales = list(y = list(alternating = 0)))
Residuals appear similar between ICUs except that, for CSRU they are very closely clustered, yielding a number of outliers.
Note: for more diagnostics, many permutations of the model would be necessary to generate distributions of errors