Background

This is a report focusing on Hierarchal Modeling of notes containing phrases associated with end of life decisions and their implementation by the clinical care team. In particular, we will focus on the National Quality Forum Measure #1626:

Percentage of vulnerable adults admitted to ICU who survive at least 48 hours who have their care preferences documented within 48 hours OR documentation as to why this was not done.

Generalized Linear Mixed Effects Models (or Hierarchal Models) will be used for this analysis. We will use this method, as the Generalized Linear Mixed-Effects Model allows us to predict a binary outcome, which in our case is the implementation of the NQF Measure. This analysis will draw heavily from:

  1. Introduction to Generalized Linear Mixed Models
  2. Mixed Effects Logistic Regression
  3. Agresti, A. (2013). Categorical Data Analysis, 3rd ed. John Wiley & Sons.
  4. Gelman & Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models, 1st ed. Cambridge University Press.

The general form of the model, in matrix notation, is given:

\[\mathbf{y} = \boldsymbol{X\beta} + \boldsymbol{Zu} + \boldsymbol{\varepsilon}\]

Where:

Stated formally:

\[\overbrace{\mathbf{y}}^{\mbox{N x 1}} \quad = \quad \overbrace{\underbrace{\mathbf{X}}_{\mbox{N x p}} \quad \underbrace{\boldsymbol{\beta}}_{\mbox{p x 1}}}^{\mbox{N x 1}} \quad + \quad \overbrace{\underbrace{\mathbf{Z}}_{\mbox{N x q}} \quad \underbrace{\boldsymbol{u}}_{\mbox{q x 1}}}^{\mbox{N x 1}} \quad + \quad \overbrace{\boldsymbol{\varepsilon}}^{\mbox{N x 1}}\]

Data Sources

These data were pulled from MIMIC-III, and represent all patients aged > 75 at the time of admission who had Physicians' Notes logged within that timeframe.

Preprocessing

  1. Consider only patients over 18 years of age
  2. For patients over the age of 89, impute the median age of patients over 89 from MIMIC

Libraries

library("ggplot2") ## Graphing
library("GGally") ## Extension to ggplot
library("reshape2") ## for melt()
library("lme4") ## for Hierachal Models
library("sjPlot") ## Lovely Presentation of Model Output

Data

These data were generated as part of NQF #1626 Care Providers Summary.

dat <- read.csv("~/nqf_caregivers/data/NQF_Att_Res_03Jun18.csv", 
                header = T, stringsAsFactors = F)
colnames(dat)
##  [1] "SUBJECT_ID"           "HADM_ID"              "CGID"                
##  [4] "ROW_ID"               "CHARTDATE"            "CHARTTIME"           
##  [7] "STORETIME"            "CATEGORY"             "NOTE_DESCRIPTION"    
## [10] "ISERROR"              "TEXT"                 "FAM.machine"         
## [13] "CIM.machine"          "LIM.machine"          "CAR.machine"         
## [16] "COD.machine"          "CIM_post.machine"     "LABEL"               
## [19] "CG_DESCRIPTION"       "ADMITTIME"            "DISCHTIME"           
## [22] "DEATHTIME"            "ADMISSION_TYPE"       "ADMISSION_LOCATION"  
## [25] "DISCHARGE_LOCATION"   "INSURANCE"            "LANGUAGE"            
## [28] "RELIGION"             "MARITAL_STATUS"       "ETHNICITY"           
## [31] "EDREGTIME"            "EDOUTTIME"            "DIAGNOSIS"           
## [34] "HOSPITAL_EXPIRE_FLAG" "HAS_CHARTEVENTS_DATA" "GENDER"              
## [37] "DOB"                  "DOD"                  "DOD_HOSP"            
## [40] "DOD_SSN"              "EXPIRE_FLAG"          "ICUSTAY_ID"          
## [43] "DBSOURCE"             "FIRST_CAREUNIT"       "LAST_CAREUNIT"       
## [46] "FIRST_WARDID"         "LAST_WARDID"          "INTIME"              
## [49] "OUTTIME"              "LOS"                  "WORD_COUNT"          
## [52] "NCHAR"                "TIME_SINCE_ADMIT"

Preprocessing

  1. Remove TEXT as the column is unnecessary but very large.
  2. Factor variables for later modeling
## Remove TEXT
dat$TEXT <- NULL

## Factor
dat <- within(dat, {
    FIRST_CAREUNIT <- factor(FIRST_CAREUNIT)
    CGID <- factor(CGID)
    CG_DESCRIPTION <- factor(CG_DESCRIPTION)
    SUBJECT_ID <- factor(SUBJECT_ID)
})

Initial Investigation

Check number of patients, care providers, and hospital admissions.

cat("We have", length(unique(dat$SUBJECT_ID)), "unique patients.\n")
## We have 1075 unique patients.
cat("We have", length(unique(dat$CGID)), "unique care providers.\n")
## We have 464 unique care providers.
cat("We have", length(unique(dat$HADM_ID)), "unique hospital admissions.\n")
## We have 1268 unique hospital admissions.

Thus care providers (\(q = 464\)) indexed by the j subscript each see \(n_{j}\) patients. Our grouping variable is the doctor, and the total number of patients is given by the sum of the patients seen by each doctor: \(N = \sum_{j}^q n_j\). Our patients (\(N = 1075\)) are considered in terms of the binary outcome of having their care preferences documented or not given the following fixed effect predictors:

  1. Time Since Admission, TIME_SINCE_ADMIT (discrete)
  2. First Intensive Careunit, FIRST_CAREUNIT (categorical)
  3. Care Provider Description, CG_DESCRIPTION (categorical)

We will also assume a random effect for care providers, CGID.

Formally:

\[\overbrace{\mathbf{y}}^{\mbox{1075 x 1}} \quad = \quad \overbrace{\underbrace{\mathbf{X}}_{\mbox{1075 x 4}} \quad \underbrace{\boldsymbol{\beta}}_{\mbox{4 x 1}}}^{\mbox{1075 x 1}} \quad + \quad \overbrace{\underbrace{\mathbf{Z}}_{\mbox{1075 x 464}} \quad \underbrace{\boldsymbol{u}}_{\mbox{464 x 1}}}^{\mbox{1075 x 1}} \quad + \quad \overbrace{\boldsymbol{\varepsilon}}^{\mbox{1075 x 1}}\]

Initial Plotting

ggpairs(dat[ , c("TIME_SINCE_ADMIT", "FIRST_CAREUNIT", "CG_DESCRIPTION")])

Generalized Linear Mixed Effects Model

Below we use the glmer command to estimate a mixed effects logistic regression model with FIRST_CAREUNIT, CG_DESCRIPTION as hospital-level categorical preditors, TIME_SINCE_ADMIT a patient level continuous predictor, and a random intercept by CGID, or provider ID.

## Initial model to inform prior probabilities
m <- glmer(CIM.machine ~ 
               TIME_SINCE_ADMIT + ## Fixed Effect (No Pooling)
               FIRST_CAREUNIT + ## Fixed Effect (No Pooling)
               CG_DESCRIPTION + ## Fixed Effect (No Pooling)
               (1 | CGID), ## Random Effect (Partial Pooling)
           data = dat, 
           family = binomial, 
           control = glmerControl(optimizer = "bobyqa"), ## Good optimizer to avoid non-convergence
           nAGQ = 10) ## Default value 1, higher values increase estimate accuracy

## View
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## CIM.machine ~ TIME_SINCE_ADMIT + FIRST_CAREUNIT + CG_DESCRIPTION +  
##     (1 | CGID)
##    Data: dat
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##  14215.0  14273.5  -7099.5  14199.0    11096 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2801 -0.7880 -0.5058  0.9837  3.5087 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  CGID   (Intercept) 0.8456   0.9196  
## Number of obs: 11104, groups:  CGID, 464
## 
## Fixed effects:
##                                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)                         -0.79524    0.14836  -5.360 8.31e-08
## TIME_SINCE_ADMIT                    -0.07629    0.03181  -2.398   0.0165
## FIRST_CAREUNITCSRU                  -0.34878    0.16977  -2.054   0.0399
## FIRST_CAREUNITMICU                   0.33219    0.07358   4.515 6.34e-06
## FIRST_CAREUNITSICU                   0.42726    0.10691   3.997 6.43e-05
## FIRST_CAREUNITTSICU                  0.58015    0.12860   4.511 6.44e-06
## CG_DESCRIPTIONResident/Fellow/PA/NP  0.08846    0.14334   0.617   0.5372
##                                        
## (Intercept)                         ***
## TIME_SINCE_ADMIT                    *  
## FIRST_CAREUNITCSRU                  *  
## FIRST_CAREUNITMICU                  ***
## FIRST_CAREUNITSICU                  ***
## FIRST_CAREUNITTSICU                 ***
## CG_DESCRIPTIONResident/Fellow/PA/NP    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##                 (Intr) TIME_S FIRST_CAREUNITC FIRST_CAREUNITM
## TIME_SINCE_     -0.261                                       
## FIRST_CAREUNITC -0.174 -0.014                                
## FIRST_CAREUNITM -0.362 -0.008  0.296                         
## FIRST_CAREUNITS -0.297  0.005  0.273           0.564         
## FIRST_CAREUNITT -0.271  0.006  0.251           0.469         
## CG_DESCRIPT     -0.795  0.012  0.003          -0.036         
##                 FIRST_CAREUNITS FIRST_CAREUNITT
## TIME_SINCE_                                    
## FIRST_CAREUNITC                                
## FIRST_CAREUNITM                                
## FIRST_CAREUNITS                                
## FIRST_CAREUNITT  0.529                         
## CG_DESCRIPT     -0.026          -0.014

Summary with Odds Ratios

sjt.glmer(m)
    CIM.machine
    Odds Ratio CI p
Fixed Parts
(Intercept)   0.45 0.34 – 0.60 <.001
TIME_SINCE_ADMIT   0.93 0.87 – 0.99 .016
FIRST_CAREUNIT (CSRU)   0.71 0.51 – 0.98 .040
FIRST_CAREUNIT (MICU)   1.39 1.21 – 1.61 <.001
FIRST_CAREUNIT (SICU)   1.53 1.24 – 1.89 <.001
FIRST_CAREUNIT (TSICU)   1.79 1.39 – 2.30 <.001
CG_DESCRIPTION (Resident/Fellow/PA/NP)   1.09 0.82 – 1.45 .537
Random Parts
τ00, CGID   0.846
NCGID   464
ICCCGID   0.204
Observations   11104
Deviance   13296.641

Note: this model suggests that the care provider description (Attending or Resident/etc.) does not have a significant effect on care preference documentation

Graphing

## Subset a portion of data
tmpdat <- dat[, c("TIME_SINCE_ADMIT", 
                  "FIRST_CAREUNIT", 
                  "CG_DESCRIPTION", 
                  "CGID")]

## Generate a sequence using TIME_SINCE_ADMIT for plotting
j_values <- with(dat, 
                seq(from = min(TIME_SINCE_ADMIT), 
                    to = max(TIME_SINCE_ADMIT), length.out = 100))

## Calculate predicted probabilities and store in a list
pp <- lapply(j_values, function(j) {
     ## Store sequence values
     tmpdat$TIME_SINCE_ADMIT <- j
     ## Predict from first model
     predict(m, newdata = tmpdat, type = "response")
})

## Get the means with lower and upper quartiles
plotdat <- t(sapply(pp, function(x) {
    c(M = mean(x), quantile(x, c(0.25, 0.75)))
}))

## Add TIME_SINCE_ADMIT values and convert to data frame
plotdat <- as.data.frame(cbind(plotdat, j_values))

## Apply better names
colnames(plotdat) <- c("Predicted_Probability", "Lower", "Upper", "TIME_SINCE_ADMIT")
head(plotdat)
##   Predicted_Probability     Lower     Upper TIME_SINCE_ADMIT
## 1             0.4210280 0.3122511 0.5225425       0.00000000
## 2             0.4206883 0.3119203 0.5221580       0.02020202
## 3             0.4203487 0.3115896 0.5217734       0.04040404
## 4             0.4200091 0.3112591 0.5213889       0.06060606
## 5             0.4196696 0.3109288 0.5210043       0.08080808
## 6             0.4193301 0.3105987 0.5206196       0.10101010
# plot average marginal predicted probabilities
ggplot(plotdat, aes(x = TIME_SINCE_ADMIT, y = Predicted_Probability)) + geom_ribbon(aes(ymin = Lower, ymax = Upper), alpha = 0.25) + geom_line(size = 2) + ylim(c(0, 1)) +
    ggtitle("Care Measure Implementation as a Function of\nTime Since Admission (All ICUs)")

## Calculate predicted probabilities and store in a list
biprobs <- lapply(levels(dat$FIRST_CAREUNIT), function(stage) {
    tmpdat$FIRST_CAREUNIT[] <- stage
    lapply(j_values, function(j) {
        tmpdat$TIME_SINCE_ADMIT <- j
        predict(m, newdata = tmpdat, type = "response")
    })
})

## Get means and quartiles for all jvalues for each level of FIRST_CAREUNIT
plotdat2 <- lapply(biprobs, function(X) {
    temp <- t(sapply(X, function(x) {
        c(M=mean(x), quantile(x, c(.25, .75)))
    }))
    temp <- as.data.frame(cbind(temp, j_values))
    colnames(temp) <- c("Predicted_Probability", "Lower", "Upper", "TIME_SINCE_ADMIT")
    return(temp)
})

## Collapse to one data frame
plotdat2 <- do.call(rbind, plotdat2)

## Add Care Unit
plotdat2$FIRST_CAREUNIT <- factor(rep(levels(dat$FIRST_CAREUNIT), each = length(j_values)))

## Show Head
head(plotdat2)
##   Predicted_Probability     Lower     Upper TIME_SINCE_ADMIT
## 1             0.3577822 0.2497234 0.4501221       0.00000000
## 2             0.3574594 0.2494347 0.4497406       0.02020202
## 3             0.3571367 0.2491463 0.4493593       0.04040404
## 4             0.3568141 0.2488581 0.4489780       0.06060606
## 5             0.3564917 0.2485701 0.4485967       0.08080808
## 6             0.3561693 0.2482824 0.4482155       0.10101010
##   FIRST_CAREUNIT
## 1            CCU
## 2            CCU
## 3            CCU
## 4            CCU
## 5            CCU
## 6            CCU
colnames(plotdat2)[1] <- "Probability_of_CIM_Caremeasure_Implementation"

## Graph
ggplot(plotdat2, aes(x = TIME_SINCE_ADMIT, y = Probability_of_CIM_Caremeasure_Implementation)) +
    geom_ribbon(aes(ymin = Lower, ymax = Upper, fill = FIRST_CAREUNIT), alpha = 0.15) +
    geom_line(aes(colour = FIRST_CAREUNIT), size = 2) +
    ylim(c(0, 1)) + facet_wrap( ~  FIRST_CAREUNIT) + ggtitle("Probability of Caremeasure Implementation by ICU\n as Function of Time Since Admission") + xlab("Time Since Admission") + ylab("Probability of Care Measure Implementation")

Diagnostics (Residuals)

## Generate Residual Data
resid_dat <- data.frame(tmpdat,
                        resid = residuals(m, type = "pearson"),
                        fitted = fitted(m))

ggplot(resid_dat, aes(x = FIRST_CAREUNIT, y = resid)) + geom_boxplot() + coord_flip()

ggplot(resid_dat, aes(x = CG_DESCRIPTION, y = resid)) + geom_boxplot() + coord_flip()

##lattice::dotplot(ranef(m, which = "CGID", condVar = TRUE), scales = list(y = list(alternating = 0)))

Residuals appear similar between ICUs except that, for CSRU they are very closely clustered, yielding a number of outliers.

Note: for more diagnostics, many permutations of the model would be necessary to generate distributions of errors