Background

This is a report focusing on patient notes containing phrases associated with end of life decisions and their implementation by the clinical care team. In particular, we will focus on the National Quality Forum Measure #1626:

Percentage of vulnerable adults admitted to ICU who survive at least 48 hours who have their care preferences documented within 48 hours OR documentation as to why this was not done.

Per the NQF document, this is particularly important as:

Many patients would prefer to die rather than live permanently comatose, mechanically ventilated, or tube fed (Pearlman 1993; Wenger 1998), yet physicians and surrogate decision makers often do not know patients´ preferences concerning life-sustaining treatment (Wenger 1998; Guidelines 1987; AMA 1994, Wenger 2000; Kish 2000). Patients entering ICUs are likely to receive invasive care, making the elicitation and documentation of preferences necessary to guide these potentially burdensome treatments. (Lorenz 2007) Care in United States hospitals tends to be aggressive. Even patients with lung and colorectal cancer enrolled in hospice receive aggressive care when brought to the hospital. (Cintron 2003) In a study of Medicare claims that evaluated patients who died within one year of a diagnosis of lung, breast, colorectal or other gastrointestinal cancer, patients receiving chemotherapy within two weeks of death increased from 13.8% in 1993 to 18.5% in 1996, and patients had more hospitalizations, ER visits, and ICU stays during the latter time period. (Earle 2004) Another retrospective study of 335 breast cancer patients who died in the 1990s found that within approximately two months prior to death, 64% continued to receive endocrine therapy and 20% continued to receive chemotherapy. (Asola 2006)

Data Sources

These data were pulled from MIMIC-III, and represent all patients aged > 75 at the time of admission who had Physicians' Notes logged within that timeframe.

Preprocessing

  1. Consider only patients over 18 years of age
  2. For patients over the age of 89, impute the median age of patients over 89 from MIMIC

Libraries

library("corrr") # Correlation Matrices
library("rcompanion") # Pairwise nominal test

Utility Functions

attending_check

attending_check to ensure that all patients’ notes documented during any hospital admission have at least a single attending physician as part of the care team, attending_check will go through each hospital admission and keep only those admissions with attendings who have logged a patient note that was captured by MIMIC-III

attending_check <- function(dat){
    ## Temporary data frame
    tmp_frame <- data.frame()
    ## Results frame
    res <- data.frame()
    ## For each hospital admission
    for (name in unique(dat$HADM_ID)){
        ## Subset admission
        tmp_frame <- dat[dat$HADM_ID == name, ]
        ## If any care providers are "Attending"
        if (any("Attending" %in% tmp_frame$CG_DESCRIPTION)){
            ## add hospital admission to results
            res <- rbind(res, tmp_frame)
        }
    }
    ## Return control to outer level
    return(res)
}

allDup

allDup will return all duplicated values, without dropping any. i.e. The directionality of duplicated() is removed by performing the operation on the vector from both directions

allDup <- function(val){
    duplicated(val) | duplicated(val, fromLast = TRUE)
}

plotDat

plotDat() is a convenient plotting function.

plotDat <- function(dat, column, x_col, bs, mn, xl, yl){
  tmp <- as.matrix(table(dat[[column]], dat[[x_col]]))
  prop <- prop.table(tmp, margin = 2)#2 for column-wise proportions
  par(mar = c(5.0, 4.0, 4.0, 15), xpd = TRUE)
  barplot(prop, col = cm.colors(length(rownames(prop))), beside = bs, width = 2, main = mn, xlab = xl, ylab = yl)
  legend("topright", inset = c(-0.90,0), fill = cm.colors(length(rownames(prop))), legend=rownames(prop))
}

detach_packages

detach_packages will keep only base R packages, and will remove all other supplementary packages to avoid functional conflicts.

detach_packages <- function(){
    
    basic.packages <- c("package:stats",
                        "package:graphics",
                        "package:grDevices",
                        "package:utils",
                        "package:datasets",
                        "package:methods",
                        "package:base")
    
    package.list <- search()[ifelse(unlist(gregexpr("package:", search())) == 1, TRUE , FALSE)]
    
    package.list <- setdiff(package.list, basic.packages)
    
    if (length(package.list) > 0 )  for (package in package.list) detach(package, character.only=TRUE)
    
}

Load Data & Initial Cleaning

Load NQF care measure cohort used for a previous manuscript (ROW_ID refers to NOTEEVENTS), load CAREGIVERS, ADMISSIONS, PATIENTS, and ICUSTAYS for additional data.

Note: FAM, CIM, LIM, CAR and COD refer to human annotations, and will be dropped, as we are intersted in the .machine annotations from NeuroNER.

## Load Labeled Note Data for NQF Caremeasure Cohort (From NOTEEVENTS table)
dat <- read.csv("~/nqf_caregivers/data/note_labels_over75.csv", header = T, stringsAsFactors = F)

## Remove X and note_name (artifact indexing columns) from dat
dat$X <- NULL
dat$note_name <- NULL

## Remove manual annotations
dat$FAM <- NULL
dat$CIM <- NULL
dat$CIM_post <- NULL
dat$LIM <- NULL
dat$CAR <- NULL
dat$COD <- NULL

## Load CAREGIVERS Table for join on CGID
cg <- read.csv("~/nqf_caregivers/data/mimic/CAREGIVERS.csv", 
               header = T, stringsAsFactors = F)

## Load ADMISSIONS Table to join on HADM_ID
adm <- read.csv("~/nqf_caregivers/data/mimic/ADMISSIONS.csv", 
                header = T, stringsAsFactors = F)

## Load PATIENTS Table to join on SUBJECT_ID
pat <- read.csv("~/nqf_caregivers/data/mimic/PATIENTS.csv", 
                header = T, stringsAsFactors = F)

## Load ICUSTAYS Table to join on SUBJECT_ID, HADM_ID
stays <- read.csv("~/nqf_caregivers/data/mimic/ICUSTAYS.csv", 
                  header = T, stringsAsFactors = F)

Clean and Merge

In MIMIC-III, ROW_ID is an index used for each table, and DESCRIPTION is also a common variable for each table.

  1. Remove ROW_ID from all tables accept dat (from NOTEEVENTS)
  2. dat to CAREGIVERS on CGID
  3. dat to ADMISSIONS on HADM_ID
  4. dat to PATIENTS on SUBJECT_ID
  5. dat to ICUSTAYS on SUBJECT_ID and HADM_ID
  6. Remove duplicates resulting from the Cartesian Product created during the join.
  7. Restrict analysis to ATTENDING and Resident/Fellow/PA/NP
  8. Ensure all hospital admissions have an attending physician logged in MIMIC
## Change column name of "NOTEEVENTS.DESCRIPTION" to explicitly mention that it describes the note
colnames(dat)[which(colnames(dat) == "DESCRIPTION")] <- "NOTE_DESCRIPTION"

## Change column name of "CAREGIVERS. DESCRIPTION" to explicitly mention that it describes the careprovider
colnames(cg)[which(colnames(cg) == "DESCRIPTION")] <- "CG_DESCRIPTION"

## (1)
cg$ROW_ID <- NULL
adm$ROW_ID <- NULL
pat$ROW_ID <- NULL
stays$ROW_ID <- NULL

dim(dat)
## [1] 11575    17
## (2)
dat <- merge(dat, cg, by = "CGID")
dim(dat)
## [1] 11575    19
## (3)
dat <- merge(dat, adm, by = "HADM_ID")
dim(dat)
## [1] 11575    36
## This has duplicated SUBJECT_ID
identical(dat$SUBJECT_ID.x, dat$SUBJECT_ID.y)
## [1] TRUE
## Remove one
dat$SUBJECT_ID.y <- NULL

## Rename the other
colnames(dat)[which(colnames(dat) == "SUBJECT_ID.x")] <- "SUBJECT_ID"
dim(dat)
## [1] 11575    35
## (4)
dat <- merge(dat, pat, by = "SUBJECT_ID")
dim(dat)
## [1] 11575    41
## (5)
dat <- merge(dat, stays, by = c("SUBJECT_ID", "HADM_ID"))
dim(dat)
## [1] 13369    50
## (6)
dat <- dat[!duplicated(dat), ]
dim(dat)
## [1] 11575    50
## (7)
dat <- dat[(dat$CG_DESCRIPTION == "Attending" | 
                dat$CG_DESCRIPTION == "Resident/Fellow/PA/NP"), ]
dim(dat)
## [1] 11461    50
## (8)
dat <- attending_check(dat)
dim(dat)
## [1] 11104    50
## Cleaning Environment
rm(adm, pat, stays)#, cg)
gc()
##            used (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells  1643612 87.8    3886542 207.6  2663435 142.3
## Vcells 11348603 86.6   29517438 225.3 29517423 225.3

Note: the merge() method we used is an inner join, which is a matrix manipulation that generates the Cartesian Product of two data matrices. The data frame will expand when joined to ICUSTAYS because a single hospital admission can be associated with a number of ICUSTAYS if the patient is transferred to the floor and back.

Count Words and Characters

## Word count
dat$WORD_COUNT <- stringr::str_count(dat$TEXT, "\\W+")

## Count characters
dat$NCHAR <- nchar(dat$TEXT)

plot(dat$WORD_COUNT, dat$NCHAR, 
     main = "Character Count as a Function of Word Count",
     xlab = "Word Count",
     ylab = "Character Count")

cor(dat$WORD_COUNT, dat$NCHAR)
## [1] 0.9915004

Unsurprisingly, word count and character count are correlated.

Time Conversion

We will want to convert CHARTDATE, representing the date the which the note was charted, to numeric from YYYY-MM-DD format. We will also want ICUSTAYS.INTIME in a similar format. The integer that results will be days since 1970-01-01, per ?as.Date.

## From ADMISSIONS
dat$ADMITTIME <- as.numeric(as.Date(dat$ADMITTIME, "%Y-%m-%d %H:%M:%S"))

## From NOTEEVENTS
dat$CHARTDATE <- as.numeric(as.Date(dat$CHARTDATE, "%Y-%m-%d"))

## from ICUSTAYS
dat$INTIME <- as.numeric(as.Date(dat$INTIME, "%Y-%m-%d %H:%M:%S"))

## TIME_SINCE_ADMISSION from `NOTEEVENTS.CHARTDATE` and `ADMISSIONS.ADMITTIME`
## Calculate time since admission (of logging note)
dat$TIME_SINCE_ADMIT <- dat$CHARTDATE - dat$ADMITTIME

SUBJECT_ID and CGID Conversion

SUBJECT_ID and CGID are integer values that may have overlap. To deal with this, we will add a character prefix to their integer values.

dat$SUBJECT_ID <- paste0("SUBJECT_", dat$SUBJECT_ID)
dat$CGID <- paste0("CG_", dat$CGID)

##Change CAREGIVERS as well
cg$CGID <- paste0("CG_", cg$CGID)

Write

Save data for later analyses.

## Write
#write.csv(dat, file = "~/nqf_caregivers/data/NQF_Att_Res_03Jun18.csv", row.names = F)