End of Life Goals of Care Physician Note Exploration

Background

This is a report focusing on the quantity and type of patient notes containing phrases associated with end of life decisions and their implementation by the clinical care team.

Data Sources

Notes consist of all patient notes from MIMIC-III where the patient died within 30 days of the clinical encounter. These notes result from the PostgreSQL query found in "dataset_icu_deaths_2017_10_04_notes_queries.txt", and were preprocessed according to preprocessing.R. The query and preprocessing were conducted by Daniele Ramazotti.

Preprocessing

Data preprocessing was performed by Daniele Ramazotti according to the following method:

Consider only patients over 18 years of age
For patients over the age of 89, impute the median age of patients over 89 from MIMIC
Replace missing values with NA
Exclude any patients who did not expire within 30 days (inclusive) of their ICU admission.

Contextual information has been given by sources provided by Charlotta Lindvall, and include: Keywords_and_definitions.docx as well as the following phrases:

Domains and keywords

Code status: “DNR,do not resuscitate,no resuscitation, DNI,DNR/DNI,CPR,ventilator,breathing machine,breathing tube,full code,full resuscitation,life-sustaining treatments,chest compressions,intubation,shocks,feeding tube”
Goals of care: “goals of care,GOC,life-sustaining treatment,comfort measures,comfort care,family meeting,hospice,quality of life,end of life, understanding of illness,understanding of prognosis,priorities,quality of life,extending life,comfort-focused care, supportive care”
Illness severity: “advanced cancer,progressing cancer,poor function,poor functional status,worsening cancer,widely metastatic,functional decline,frail,ill-appearing,poor prognosis,no treatment,end of life,progressive cancer”
Advance care planning: “health care proxy,HCP,living will,MOLST,advance directives,advance care planning,ACP,durable power of attorney”

Further:

Code Status
Limitations on code status:
- Phrases: “dnr,dnrdni,dni,do not resuscitate,do-not-resuscitate,do not intubate,do-not-intubate,chest compressions,no defibrillation,no endotracheal intubation,no mechanical intubation,shocks,cmo,comfort measures”
Full code status discussed/confirmed:
- Phrases: “Full code confirmed,full code d/w,full code discussed,full code verified,would like to be full code,wishes to be full code,would like to remain full code,wishes to remain full code,wish to be full code,remaining full code”
- Phrases: “full code per,full code as per”
- Not Phrases “Full code per np admit note,full code per admission,full code per orders,full code per review of chart,full code per LMR,full code per chart,full code per recent,full code per last admission,full code per order set,full code per records,full code per pepl,full code per cas,full code per team,Full code as per np admit note,full code as per admission,full code as per orders,full code as per review of chart,full code as per LMR,full code as per chart,full code as per recent,full code as per last admission,full code as per order set,full code as per records,full code as per pepl,full code as per cas,full code as per team”
Goals of care
- Phrases “goals of care,goc,goals for care,goals of treatment,goals for treatment,treatment goals,family meeting,family discussion,family discussions”
Palliative care
- phrases “pallcare,palliative care,pall care,pallcare,palliative medicine”
Hospice –phrases “hospice”

Method

We have four categories within our note annotation GUI:

Care Preferences
Family Meetings
Code Status Limitations
Palliative Care Involvement

Direct Matching (Regex)

To generate a subset of notes containing inclusive phrases we will use regex according to the following trategy:

Convert all notes and phrases to lowercase
Concatenate all phrases and split on ',' for n unique phrases
Form a union of all input phrases
Use grepl() to generate a logical vector to capture all TRUE evaluations
Bind notes together for all instances of TRUE
Remove duplicates
Remove all texts that include the Not dictionary

Fuzzy Matching (Levenshtein Distance)

To generate a larger subset of notes likely to contain inclusive phrases, we will use Levenshtein Distance, or a count of the number of insertions, deletions, or substitutions required to transform one string into another.

Our strategy will follow:

Convert all notes and phrases to lowercase
Concatenate all phrases and split on ',' as well as ' ', for n unique phrases
Form a union of all input phrases
Remove special characters and numbers from notes, replace '\n' with ' '
Use all notes to create a vocabulary of unique words contained therein
Use adist() to find Levenshtein Distance of all input phrases relative to each word in the note vocabulary
For each input phrase, subset all words in the note vocabulary with the least distance from the input phrase (note: there could be multiple for any single phrase)
Use grepl() to generate a logical vector to capture all TRUE evaluations
Bind notes together for all instances of TRUE
Keep only results with 3 or more characters
Remove duplicates

Load Data

We will include "dataset_icu_deaths_notes_processed.txt", as well as the ADMISSIONS and PATIENTS tables from MIMIC-III, to generate a DAYS_UNTIL_DEATH [relative to admission date].

#Load notes
notes <- read.csv("dataset_icu_deaths_notes_processed.txt", 
                              header = T, 
                              stringsAsFactors = F, 
                              sep = '\t')

#Load ADMISSIONS table from MIMIC for admittime/distime
adm <- read.csv("ADMISSIONS.csv",
    header = T, stringsAsFactors = F)

#Load PATIENTS table from MIMIC for date of death
pat <- read.csv("PATIENTS.csv",
                 header = T, stringsAsFactors = F)

#Convert dates for easier manipulation
adm$ADMITTIME <- as.numeric(as.Date(adm$ADMITTIME, "%Y-%m-%d %H:%M:%S"))
adm$DISCHTIME <- as.numeric(as.Date(adm$DISCHTIME, "%Y-%m-%d %H:%M:%S"))
pat$DOD <- as.numeric(as.Date(pat$DOD, "%Y-%m-%d %H:%M:%S"))

#Drop ROW_ID variables from each table
adm$ROW_ID <- NULL
pat$ROW_ID <- NULL

#Merge adm and pat tables on SUBJECT_ID
dat <- merge(adm, pat, by = "SUBJECT_ID")

#Clean environment of admissions and patient tables
rm(pat)
rm(adm)

#merge notes to other data on hadm_id for time data
notes <- merge(notes, dat, by = c("SUBJECT_ID","HADM_ID"))
rm(dat)

#Generate DAYS_UNTIL_DEATH [from admission date] variable
notes$DAYS_UNTIL_DEATH <-  notes$DOD - notes$ADMITTIME


colnames(notes)

##  [1] "SUBJECT_ID"           "HADM_ID"              "ICUSTAY_ID"          
##  [4] "CATEGORY"             "DESCRIPTION"          "TEXT"                
##  [7] "ADMITTIME"            "DISCHTIME"            "DEATHTIME"           
## [10] "ADMISSION_TYPE"       "ADMISSION_LOCATION"   "DISCHARGE_LOCATION"  
## [13] "INSURANCE"            "LANGUAGE"             "RELIGION"            
## [16] "MARITAL_STATUS"       "ETHNICITY"            "EDREGTIME"           
## [19] "EDOUTTIME"            "DIAGNOSIS"            "HOSPITAL_EXPIRE_FLAG"
## [22] "HAS_CHARTEVENTS_DATA" "GENDER"               "DOB"                 
## [25] "DOD"                  "DOD_HOSP"             "DOD_SSN"             
## [28] "EXPIRE_FLAG"          "DAYS_UNTIL_DEATH"

#nrow(notes)
#length(unique(notes$SUBJECT_ID))

par(mai=c(1,2,1,1))
barplot(table(factor(notes$CATEGORY)),
        horiz = T, 
        names.arg = attr(table(factor(notes$CATEGORY)), "names"),
        main = "Note Count by Type (Entire Cohort, All Time Points)",
        las=1)

Focus only on Physician’s Notes

notes <- notes[(notes$CATEGORY == "Physician"),]
cat("We have", nrow(notes), "observations after keeping only physician's notes")

## We have 7214 observations after keeping only physician's notes

length(unique(notes$SUBJECT_ID))

## [1] 368

cat("We have",length(unique(notes$SUBJECT_ID)),"unique patients in the cohort")

## We have 368 unique patients in the cohort

#notes <- notes[which((notes$DOD - notes$ADMITTIME) <= 30),]
#nrow(notes)
#Because of duplicated notes, add column to count characters
notes$CHARS <- nchar(notes$TEXT)

#Order by subject_ID and note size
notes <- notes[with(notes, order(SUBJECT_ID, -CHARS)), ]

#Remove duplicates via ICUSTAY_ID, pull first note for each patient (maximize size of)
notes <- notes[!duplicated(notes$ICUSTAY_ID),]
cat("We have", nrow(notes), "notes after removing duplicates")

## We have 368 notes after removing duplicates

Direct Regex

We will narrow the notes down to 30 days of patient expiration.

hist(notes$DAYS_UNTIL_DEATH, 
     breaks = 50, 
     main = "Note Event Frequency by Days Until Death From Admission", 
     xlab = "Days Until Death", 
     ylab = "Note Event Frequency")

summary(notes$DAYS_UNTIL_DEATH)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    7.00   10.21   16.00   83.00

Create phrase dictionary & Convert text to lowercase

Phrases will be searched for in text according to the dictionary of terms generated earlier.

#Store text for replacement later
txtHolder <- notes$TEXT

#convert all text to lowercase
notes$TEXT <- tolower(notes$TEXT)

#Remove escapes and carriage returns
notes$TEXT <- gsub("\\\\n", '\n', notes$TEXT)

#Replace carriage returns with spaces to split on
notes$TEXT <- gsub('\n', ' ', notes$TEXT)

#Use phrases from above, convert to lower
phrases <- tolower(c("DNR,do not resuscitate,no resuscitation,DNI,DNR/DNI,CPR,ventilator,breathing machine,breathing tube,full code,full resuscitation,life-sustaining treatments,chest compressions,intubation,shocks,feeding tube,",
             "goals of care,GOC,life-sustaining treatment,comfort measures,comfort care,family meeting,hospice,quality of life,end of life,understanding of illness,understanding of prognosis,priorities,quality of life,extending life,comfort-focused care,supportive care,",
             "advanced cancer,progressing cancer,poor function,poor functional status,worsening cancer,widely metastatic,functional decline,frail,ill-appearing,poor prognosis,no treatment,end of life,progressive cancer,",
             "health care proxy,HCP,living will,MOLST,advance directives,advance care planning,ACP,durable power of attorney,",
             "dnr,dnrdni,dni,do not resuscitate,do-not-resuscitate,do not intubate,do-not-intubate,chest compressions,no defibrillation,no endotracheal intubation,no mechanical intubation,shocks,cmo,comfort measures,",
             "Full code confirmed,full code d/w,full code discussed,full code verified,would like to be full code,wishes to be full code,would like to remain full code,wishes to remain full code,wish to be full code,remaining full code,",
             "full code per,full code as per,",
             "goals of care,goc,goals for care,goals of treatment,goals for treatment,treatment goals,family meeting,family discussion,family discussions,",
             "pallcare,palliative care,pall care,pallcare,palliative medicine,",
             "hospice"))

#Paste phrases together
incPhrases <- paste(phrases, sep = ',', collapse = '')

#Split strings on ',', use unique() for a union
incPhrases <- unique(strsplit(incPhrases, ',')[[1]])

#Display inclusive phrases
print(incPhrases)

##  [1] "dnr"                            "do not resuscitate"            
##  [3] "no resuscitation"               "dni"                           
##  [5] "dnr/dni"                        "cpr"                           
##  [7] "ventilator"                     "breathing machine"             
##  [9] "breathing tube"                 "full code"                     
## [11] "full resuscitation"             "life-sustaining treatments"    
## [13] "chest compressions"             "intubation"                    
## [15] "shocks"                         "feeding tube"                  
## [17] "goals of care"                  "goc"                           
## [19] "life-sustaining treatment"      "comfort measures"              
## [21] "comfort care"                   "family meeting"                
## [23] "hospice"                        "quality of life"               
## [25] "end of life"                    "understanding of illness"      
## [27] "understanding of prognosis"     "priorities"                    
## [29] "extending life"                 "comfort-focused care"          
## [31] "supportive care"                "advanced cancer"               
## [33] "progressing cancer"             "poor function"                 
## [35] "poor functional status"         "worsening cancer"              
## [37] "widely metastatic"              "functional decline"            
## [39] "frail"                          "ill-appearing"                 
## [41] "poor prognosis"                 "no treatment"                  
## [43] "progressive cancer"             "health care proxy"             
## [45] "hcp"                            "living will"                   
## [47] "molst"                          "advance directives"            
## [49] "advance care planning"          "acp"                           
## [51] "durable power of attorney"      "dnrdni"                        
## [53] "do-not-resuscitate"             "do not intubate"               
## [55] "do-not-intubate"                "no defibrillation"             
## [57] "no endotracheal intubation"     "no mechanical intubation"      
## [59] "cmo"                            "full code confirmed"           
## [61] "full code d/w"                  "full code discussed"           
## [63] "full code verified"             "would like to be full code"    
## [65] "wishes to be full code"         "would like to remain full code"
## [67] "wishes to remain full code"     "wish to be full code"          
## [69] "remaining full code"            "full code per"                 
## [71] "full code as per"               "goals for care"                
## [73] "goals of treatment"             "goals for treatment"           
## [75] "treatment goals"                "family discussion"             
## [77] "family discussions"             "pallcare"                      
## [79] "palliative care"                "pall care"                     
## [81] "palliative medicine"

#Use exclusionary phrases from above
excPhrases <- tolower("Full code per np admit note,full code per admission,full code per orders,full code per review of chart,full code per LMR,full code per chart,full code per recent,full code per last admission,full code per order set,full code per records,full code per pepl,full code per cas,full code per team,Full code as per np admit note,full code as per admission,full code as per orders,full code as per review of chart,full code as per LMR,full code as per chart,full code as per recent,full code as per last admission,full code as per order set,full code as per records,full code as per pepl,full code as per cas,full code as per team")

#Split strings on ',', use unique() for a union
excPhrases <- unique(strsplit(excPhrases, ',')[[1]])

#Display exclusive phrases
print(excPhrases)

##  [1] "full code per np admit note"      "full code per admission"         
##  [3] "full code per orders"             "full code per review of chart"   
##  [5] "full code per lmr"                "full code per chart"             
##  [7] "full code per recent"             "full code per last admission"    
##  [9] "full code per order set"          "full code per records"           
## [11] "full code per pepl"               "full code per cas"               
## [13] "full code per team"               "full code as per np admit note"  
## [15] "full code as per admission"       "full code as per orders"         
## [17] "full code as per review of chart" "full code as per lmr"            
## [19] "full code as per chart"           "full code as per recent"         
## [21] "full code as per last admission"  "full code as per order set"      
## [23] "full code as per records"         "full code as per pepl"           
## [25] "full code as per cas"             "full code as per team"

Run Regex

strictRegex() will accept all phrases kwds, and all note texts, texts, it will utilize grepl() to find direct matches in the text, and will return a list of booleans.

strictRegex <- function(kwds, texts){
  #Create a list to store results
  tmpList <- list()
  
  #Loop through all keywords
  for (i in 1:length(kwds)){
    #Store results as a logical vector in its respective list entry position
    tmpList[[i]] <- grepl(kwds[i], texts)
  }
  
  #Return list and control to environment
  return(tmpList)
}


system.time(hold <- strictRegex(incPhrases, notes$TEXT))

##    user  system elapsed 
##   31.78    0.00   31.84

system.time(excHold <- strictRegex(excPhrases, notes$TEXT))

##    user  system elapsed 
##   15.32    0.00   15.31

Generate a table showing number of occurences of each label

#Convert from list entries to dataframe columns
hold <- as.data.frame(hold)

#Each column correspondes to each phrase in the phrases vector
colnames(hold) <- incPhrases

#Multiply logicals by 1 for binary numeric
hold <- hold*1

#Sum each column (phrase) to show the number of occurences of the phrase
posTable <- apply(hold[,1:length(colnames(hold))],2, FUN = sum)

#Print matches and count, omit phrases where no matches were found
posTable[posTable > 0]

##                        dnr         do not resuscitate 
##                        155                         50 
##           no resuscitation                        dni 
##                          2                        129 
##                    dnr/dni                        cpr 
##                         77                         44 
##                 ventilator          breathing machine 
##                        226                          1 
##             breathing tube                  full code 
##                          4                        201 
##         full resuscitation         chest compressions 
##                          1                         16 
##                 intubation                     shocks 
##                        115                         20 
##               feeding tube              goals of care 
##                         15                         54 
##                        goc           comfort measures 
##                         10                         30 
##               comfort care             family meeting 
##                         22                         88 
##                    hospice            quality of life 
##                         23                          9 
##                end of life             extending life 
##                          9                          2 
##       comfort-focused care            supportive care 
##                          5                         28 
##              poor function     poor functional status 
##                          4                          3 
##          widely metastatic         functional decline 
##                         10                          1 
##                      frail              ill-appearing 
##                         17                          6 
##             poor prognosis               no treatment 
##                         38                          5 
##         progressive cancer          health care proxy 
##                          1                         21 
##                        hcp                living will 
##                         92                          6 
##         advance directives            do not intubate 
##                          1                          2 
##                        cmo        full code confirmed 
##                         54                          3 
##        full code discussed     wishes to be full code 
##                          1                          1 
## wishes to remain full code       wish to be full code 
##                          1                          1 
##        remaining full code              full code per 
##                          1                          3 
##             goals for care            treatment goals 
##                          2                          4 
##          family discussion         family discussions 
##                         21                          5 
##            palliative care                  pall care 
##                         28                          1

excHold <- as.data.frame(excHold)
colnames(excHold) <- excPhrases
excHold <- excHold*1
excTable <- apply(excHold[,1:length(colnames(excHold))],2, FUN = sum)
excTable[excTable > 0]

## named numeric(0)

No exclusive phrases are found.

Note Selection

Strategy: Include any patient note that contained any inclusive phrase.

#Create a vector
inc <- vector()
for (i in 1:nrow(hold)){
  #Populate vector with logical value if note contains any concepts associated with inclusion
  inc[length(inc)+1] <- any(hold[i,] == 1)
}

Note Subsetting

#Replace text without tolower() and clean its tmp variable
notes$TEXT <- txtHolder

#Clean txtHolder from environment
rm(txtHolder)

#Subset all positive notes
results <- notes[inc,]
nrow(results)

## [1] 360

#Subset negatives
negatives <- notes[!inc,]
nrow(negatives)

## [1] 8

results$COHORT <- rep(1, each = nrow(results))
negatives$COHORT <- rep(0, each = nrow(negatives))
strictResults <- rbind(results, negatives)

write.csv(strictResults, file = "strict_regex_results05Nov17.csv", row.names = F)

Fuzzy Matching (Levenshtein Distance)

Reload Phrases

phrases <- tolower(c("DNR,do not resuscitate,no resuscitation,DNI,DNR/DNI,CPR,ventilator,breathing machine,breathing tube,full code,full resuscitation,life-sustaining treatments,chest compressions,intubation,shocks,feeding tube,",
             "goals of care,GOC,life-sustaining treatment,comfort measures,comfort care,family meeting,hospice,quality of life,end of life,understanding of illness,understanding of prognosis,priorities,quality of life,extending life,comfort-focused care,supportive care,",
             "advanced cancer,progressing cancer,poor function,poor functional status,worsening cancer,widely metastatic,functional decline,frail,ill-appearing,poor prognosis,no treatment,end of life,progressive cancer,",
             "health care proxy,HCP,living will,MOLST,advance directives,advance care planning,ACP,durable power of attorney,",
             "dnr,dnrdni,dni,do not resuscitate,do-not-resuscitate,do not intubate,do-not-intubate,chest compressions,no defibrillation,no endotracheal intubation,no mechanical intubation,shocks,cmo,comfort measures,",
             "Full code confirmed,full code d/w,full code discussed,full code verified,would like to be full code,wishes to be full code,would like to remain full code,wishes to remain full code,wish to be full code,remaining full code,",
             "full code per,full code as per,",
             "goals of care,goc,goals for care,goals of treatment,goals for treatment,treatment goals,family meeting,family discussion,family discussions,",
             "pallcare,palliative care,pall care,pallcare,palliative medicine,",
             "hospice"))


#Create a union of all phrases split between spaces
phrases <- unique(unlist(strsplit(unlist(strsplit(phrases, ',')), ' ')))
print(phrases)

##  [1] "dnr"                "do"                 "not"               
##  [4] "resuscitate"        "no"                 "resuscitation"     
##  [7] "dni"                "dnr/dni"            "cpr"               
## [10] "ventilator"         "breathing"          "machine"           
## [13] "tube"               "full"               "code"              
## [16] "life-sustaining"    "treatments"         "chest"             
## [19] "compressions"       "intubation"         "shocks"            
## [22] "feeding"            "goals"              "of"                
## [25] "care"               "goc"                "treatment"         
## [28] "comfort"            "measures"           "family"            
## [31] "meeting"            "hospice"            "quality"           
## [34] "life"               "end"                "understanding"     
## [37] "illness"            "prognosis"          "priorities"        
## [40] "extending"          "comfort-focused"    "supportive"        
## [43] "advanced"           "cancer"             "progressing"       
## [46] "poor"               "function"           "functional"        
## [49] "status"             "worsening"          "widely"            
## [52] "metastatic"         "decline"            "frail"             
## [55] "ill-appearing"      "progressive"        "health"            
## [58] "proxy"              "hcp"                "living"            
## [61] "will"               "molst"              "advance"           
## [64] "directives"         "planning"           "acp"               
## [67] "durable"            "power"              "attorney"          
## [70] "dnrdni"             "do-not-resuscitate" "intubate"          
## [73] "do-not-intubate"    "defibrillation"     "endotracheal"      
## [76] "mechanical"         "cmo"                "confirmed"         
## [79] "d/w"                "discussed"          "verified"          
## [82] "would"              "like"               "to"                
## [85] "be"                 "wishes"             "remain"            
## [88] "wish"               "remaining"          "per"               
## [91] "as"                 "for"                "discussion"        
## [94] "discussions"        "pallcare"           "palliative"        
## [97] "pall"               "medicine"

Preprocessing

#Store text for replacement later
txtHolder <- notes$TEXT

#convert all text to lowercase
notes$TEXT <- tolower(notes$TEXT)

#Remove escapes and carriage returns
notes$TEXT <- gsub("\\\\n", '\n', notes$TEXT)

#Replace carriage returns with spaces to split on
notes$TEXT <- gsub('\n', ' ', notes$TEXT)

#Split notes on spaces, create union, then create a dictionary of all unique words
dict <- unique(gsub("[^[:alpha:]]", '',unique(unlist(strsplit(notes$TEXT, ' ')))))

#Remove empty strings
dict <- dict[dict != ""]

cat("Our phrase library has",length(phrases),"unique words in it.")

## Our phrase library has 98 unique words in it.

cat("Our vocabulary has", length(dict), "unique words in it.")

## Our vocabulary has 36706 unique words in it.

Levenshtein Distance Calculations

lDist() will accept dict, or a dictionary of the entire vocabulary, phrase, or a dictionary of all phrases, and a res, a data frame populated with a single result. lDist() will compute the Levenshtein Distance for each dict/phrase pair and insert the results into the data frame.

#lDist() will compute the Levenshtein Distances for each
#phrase/word pair and insert them into a dataframe column-wise
lDist <- function(dict, phrase, res){
  tmp <- vector()
  for (i in 1:length(dict)){
    #adist applies the levenshtein distance algorithm, select the only first element from resulting matrix
    tmp[i] <- adist(dict[i], phrase)[1]
  }
  res <- cbind(res, tmp)
  colnames(res)[length(colnames(res))] <- phrase
  return(res)
}

#Create an initial data frame using the dictionary and first phrase
v <- vector()
for (i in 1:length(dict)){
  v[length(v)+1] <- adist(phrases[1], dict[i])[1]
}

res <- data.frame(dict,v)
colnames(res) <- c("Dict", phrases[1])


#For each phrase/word pair, calculate Levenshtein Distance
for (phrase in phrases[2:length(phrases)]){
  res <- lDist(dict, phrase, res)
}

#Replace all identical matches with NA
res[res == 0] <- NA

head(res)

##        Dict dnr do not resuscitate no resuscitation dni dnr/dni cpr
## 1     chief   5  5   5          10  5            11   4       7   4
## 2 complaint   8  8   7           9  8            11   8       8   7
## 3   altered   6  7   6          10  7            12   7       7   6
## 4    mental   5  6   5           8  5            10   5       7   6
## 5    status   6  6   5           8  6             9   6       7   6
## 6     fever   4  5   5          10  5            12   5       7   4
##   ventilator breathing machine tube full code life-sustaining treatments
## 1          9         7       4    4    5    3              14          9
## 2          8         7       6    9    8    7              12          8
## 3          9         8       6    5    7    6              14          8
## 4          6         7       6    6    5    6              12          7
## 5          7         7       7    5    6    6              12          7
## 6          8         8       7    4    4    4              13          8
##   chest compressions intubation shocks feeding goals of care goc treatment
## 1     3           10          9      5       6     5  4    3   5         8
## 2     7            6          8      9       7     7  8    7   8         7
## 3     6           10          9      7       7     7  7    5   7         8
## 4     5           10          8      6       6     5  6    6   6         7
## 5     6           10          7      4       7     4  6    5   6         7
## 6     5           11         10      6       5     5  5    4   5         7
##   comfort measures family meeting hospice quality life end understanding
## 1       6        7      5       6       6       6    4   5            12
## 2       5        9      7       6       6       7    7   8            10
## 3       7        6      7       7       7       7    5   5            10
## 4       7        6      6       4       7       7    6   4            10
## 5       7        5      6       6       7       6    6   6            10
## 6       6        6      5       6       7       7    4   4            11
##   illness prognosis priorities extending comfort-focused supportive
## 1       6         9          8         8              13          9
## 2       8         8          9         7              12          8
## 3       6         9          8         7              12          9
## 4       6         8          9         7              13          9
## 5       6         8          8         8              12          8
## 6       6         9          9         8              13          9
##   advanced cancer progressing poor function functional status worsening
## 1        7      4          10    5        6          8      6         8
## 2        9      8           8    8        8          9      8         6
## 3        5      5           9    6        8          9      6         8
## 4        7      5          10    6        6          6      5         7
## 5        7      6          10    6        7          9     NA         8
## 6        6      4          10    4        7          9      6         8
##   widely metastatic decline frail ill-appearing progressive health proxy
## 1      5         10       5     5            12          10      6     5
## 2      8          9       6     7             9           9      8     8
## 3      6          8       7     7            10           9      6     7
## 4      6          6       6     5            12          10      5     6
## 5      6          6       7     5            12          10      5     6
## 6      5          9       6     4            11           9      5     5
##   hcp living will molst advance directives planning acp durable power
## 1   4      5    4     5       7          7        7   5       7     4
## 2   8      7    8     6       8          9        6   8       8     8
## 3   7      6    7     7       6          8        7   6       7     5
## 4   6      6    5     5       7          8        7   6       6     6
## 5   6      6    6     6       6          8        7   5       6     6
## 6   5      5    5     5       6          7        8   5       7     3
##   attorney dnrdni do-not-resuscitate intubate do-not-intubate
## 1        7      6                 17        8              14
## 2        9      8                 15        8              11
## 3        4      7                 15        7              14
## 4        8      6                 15        6              12
## 5        7      6                 15        6              12
## 6        7      6                 17        8              15
##   defibrillation endotracheal mechanical cmo confirmed d/w discussed
## 1             13           10          7   4         6   5         7
## 2             11           11          9   7         7   9         9
## 3             13           10         10   7         6   7         7
## 4             12            8          5   6         8   6         9
## 5             12           10          9   6         9   6         7
## 6             12           11          9   5         7   5         8
##   verified would like to be wishes remain wish remaining per as for
## 1        6     5    4  5  4      5      6    4         8   4  5   5
## 2        8     7    7  8  9      9      5    8         6   8  8   8
## 3        6     6    5  6  6      6      7    7         9   5  6   6
## 4        7     6    6  5  5      6      5    6         7   5  5   6
## 5        8     6    6  5  6      5      6    6         8   6  4   6
## 6        6     5    4  5  4      5      5    5         8   3  5   3
##   discussion discussions pallcare palliative pall medicine
## 1          8           9        7          9    5        7
## 2          9           9        8          8    7        7
## 3         10          11        5          7    6        8
## 4         10          11        7          9    5        6
## 5          8           9        8          8    5        8
## 6         10          11        7          9    5        7

Fuzzy Matches

Now that we have the results of our Levenshtein Distance search, we will create a new dictionary of keywords that will include the phrase matches with minimal distances from our patient note vocabulary.

#Create a data frame for holding results
tmpFrame <- data.frame()

#Populate it with a single row
tmpFrame <- res[which(res[[colnames(res)[2]]] == min(res[[colnames(res)[2]]], na.rm = TRUE)),]

#Populate the data frame with all other Levenshtein Distance minima
for (phrase in colnames(res)[3:ncol(res)]){
  tmpFrame <- rbind(tmpFrame,res[which(res[[phrase]] == min(res[[phrase]], na.rm = TRUE)),])
}

#Convert results from factor to character
tmpFrame$Dict <- as.character(tmpFrame$Dict)

#Remove duplicates
keywords <- unique(tmpFrame$Dict)

print(rev(keywords[order(nchar(keywords))]))

##   [1] "dnrdonotreintubate" "postresuscitative"  "eventsintubated"   
##   [4] "comfortfocused"     "undertstanding"     "lifesustaining"    
##   [7] "resusciatation"     "resuscitations"     "rescuscitation"    
##  [10] "defibrillator"      "afibrillation"      "resuscutation"     
##  [13] "resuccitation"      "resussitation"      "nasotracheal"      
##  [16] "fibrillation"       "proigressive"       "illappearing"      
##  [19] "progressings"       "understading"       "resucitation"      
##  [22] "resusciation"       "resusitation"       "resuscitated"      
##  [25] "wpalliative"        "discussionw"        "discussions"       
##  [28] "mechanincal"        "wmechanical"        "orotracheal"       
##  [31] "downtitrate"        "resuscitate"        "progressove"       
##  [34] "mestastatic"        "metastastic"        "metatstatic"       
##  [37] "wmetastatic"        "intuabation"        "intubations"       
##  [40] "compresions"        "compression"        "ventillator"       
##  [43] "ventilatory"        "rususcitate"        "resussitate"       
##  [46] "discussion"         "confirmned"         "planningdc"        
##  [49] "kgplanning"         "progresive"         "metestatic"        
##  [52] "metastatis"         "worsesning"         "woresening"        
##  [55] "worsensing"         "wworsening"         "worseining"        
##  [58] "worstening"         "junctional"         "supportuve"        
##  [61] "edtrending"         "wtreatment"         "treatement"        
##  [64] "treatjment"         "treatments"         "incubation"        
##  [67] "brearthing"         "breathinig"         "ventilaton"        
##  [70] "resusitate"         "medicines"          "pallative"         
##  [73] "paliative"          "palliaive"          "discusion"         
##  [76] "regaining"          "retaining"          "terrified"         
##  [79] "discusses"          "comfirmed"          "conformed"         
##  [82] "mechnical"          "endotrach"          "intubated"         
##  [85] "directive"          "metatatic"          "metasatic"         
##  [88] "metastaic"          "functions"          "suportive"         
##  [91] "cxpending"          "intending"          "exceeding"         
##  [94] "expanding"          "attending"          "nitrities"         
##  [97] "measurses"          "measueres"          "intubatin"         
## [100] "treatment"          "pericare"           "selfcare"          
## [103] "ballpark"           "palpbale"           "remining"          
## [106] "dicussed"           "discused"           "attained"          
## [109] "plaquing"           "plannied"           "spanning"          
## [112] "scanning"           "advanced"           "declinie"          
## [115] "declines"           "declined"           "worsenng"          
## [118] "worsenin"           "wosening"           "worseing"          
## [121] "junction"           "advamced"           "stending"          
## [124] "purities"           "procitis"           "pruritis"          
## [127] "priority"           "prognois"           "progosis"          
## [130] "illnessi"           "illiness"           "hospiceo"          
## [133] "meetings"           "measured"           "treatmen"          
## [136] "treament"           "feedings"           "machines"          
## [139] "beathing"           "breating"           "pancake"           
## [142] "daycare"            "palpate"            "calcarb"           
## [145] "remaine"            "remaind"            "remains"           
## [148] "whishes"            "intubat"            "actonel"           
## [151] "atropne"            "autoreg"            "durably"           
## [154] "landing"            "wanning"            "pinning"           
## [157] "plating"            "playing"            "placing"           
## [160] "liveing"            "shealth"            "healthy"           
## [163] "widedly"            "statusi"            "statuso"           
## [166] "statius"            "funtion"            "fuction"           
## [169] "cancers"            "advance"            "tending"           
## [172] "qualify"            "meetins"            "seeting"           
## [175] "sfamily"            "wfamily"            "familys"           
## [178] "measure"            "comforl"            "confort"           
## [181] "beeding"            "seeding"            "needing"           
## [184] "feeling"            "paular"             "palate"            
## [187] "pallor"             "palmar"             "retain"            
## [190] "regain"             "dishes"             "wished"            
## [193] "washes"             "wiould"             "wouldn"            
## [196] "dnrdnr"             "dnidni"             "plower"            
## [199] "powers"             "powder"             "diving"            
## [202] "loving"             "lining"             "giving"            
## [205] "proxys"             "declin"             "statis"            
## [208] "states"             "cancel"             "dancer"            
## [211] "ilness"             "meting"             "famlly"            
## [214] "familt"             "feding"             "shocke"            
## [217] "xchest"             "cheast"             "cheest"            
## [220] "dnrdni"             "palce"              "swish"             
## [223] "wihes"              "liked"              "likes"             
## [226] "likey"              "woulf"              "wpuld"             
## [229] "world"              "wound"              "could"             
## [232] "lower"              "moist"              "wills"             
## [235] "willl"              "trail"              "flail"             
## [238] "satus"              "stats"              "staus"             
## [241] "fmily"              "faily"              "famiy"             
## [244] "famil"              "amily"              "famly"             
## [247] "wcare"              "cares"              "dcare"             
## [250] "carea"              "cared"              "socks"             
## [253] "shock"              "chesp"              "crest"             
## [256] "coded"              "codes"              "fulll"             
## [259] "sfull"              "fully"              "tubee"             
## [262] "tubed"              "ttube"              "jtube"             
## [265] "gtube"              "tubes"              "pala"              
## [268] "ball"               "paln"               "tall"              
## [271] "gall"               "call"               "palp"              
## [274] "pale"               "mfor"               "fior"              
## [277] "fore"               "fort"               "form"              
## [280] "four"               "perm"               "sper"              
## [283] "pver"               "perf"               "perc"              
## [286] "peri"               "perl"               "wich"              
## [289] "dish"               "wise"               "wash"              
## [292] "fish"               "with"               "bike"              
## [295] "life"               "wold"               "cmho"              
## [298] "ower"               "most"               "wiol"              
## [301] "wild"               "wiil"               "pill"              
## [304] "till"               "wall"               "well"              
## [307] "hcps"               "hcpo"               "hcap"              
## [310] "prox"               "rail"               "fail"              
## [313] "pour"               "pool"               "poon"              
## [316] "porr"               "door"               "popr"              
## [319] "pend"               "bend"               "wend"              
## [322] "iend"               "tend"               "ends"              
## [325] "endo"               "send"               "lift"              
## [328] "live"               "wife"               "like"              
## [331] "line"               "carb"               "fare"              
## [334] "cahe"               "cape"               "bare"              
## [337] "cage"               "card"               "cure"              
## [340] "cart"               "cane"               "nare"              
## [343] "case"               "rare"               "came"              
## [346] "goal"               "bode"               "cope"              
## [349] "core"               "node"               "mode"              
## [352] "come"               "foll"               "fuli"              
## [355] "dull"               "fill"               "pull"              
## [358] "fell"               "fall"               "tubs"              
## [361] "wcpr"               "nots"               "note"              
## [364] "dnrd"               "dnri"               "pll"               
## [367] "pal"                "all"                "fcr"               
## [370] "fur"                "fob"                "vor"               
## [373] "ror"                "fqr"                "fos"               
## [376] "fbr"                "mor"                "fow"               
## [379] "foe"                "fom"                "far"               
## [382] "fxr"                "aks"                "asu"               
## [385] "das"                "qas"                "cas"               
## [388] "ahs"                "pas"                "asf"               
## [391] "asx"                "asc"                "aos"               
## [394] "abs"                "nas"                "als"               
## [397] "ras"                "asm"                "asd"               
## [400] "ass"                "vas"                "asl"               
## [403] "ans"                "bas"                "ams"               
## [406] "gas"                "ask"                "las"               
## [409] "asa"                "ast"                "has"               
## [412] "was"                "pep"                "pef"               
## [415] "ner"                "pel"                "pej"               
## [418] "par"                "wer"                "pjr"               
## [421] "pmr"                "ped"                "pen"               
## [424] "peg"                "pcr"                "pes"               
## [427] "pvr"                "pfr"                "pea"               
## [430] "pet"                "her"                "wih"               
## [433] "ish"                "sbe"                "bef"               
## [436] "abe"                "bue"                "ble"               
## [439] "bee"                "bed"                "tow"               
## [442] "tio"                "ton"                "teo"               
## [445] "tro"                "tpo"                "toe"               
## [448] "tob"                "top"                "tol"               
## [451] "too"                "two"                "tco"               
## [454] "tox"                "cmc"                "cml"               
## [457] "cme"                "cmp"                "cmb"               
## [460] "cmf"                "cmg"                "cmy"               
## [463] "cmh"                "cvo"                "cms"               
## [466] "cco"                "cmv"                "acx"               
## [469] "aep"                "axp"                "act"               
## [472] "acd"                "aca"                "ach"               
## [475] "atp"                "acc"                "aip"               
## [478] "avp"                "ace"                "acv"               
## [481] "alp"                "acl"                "abp"               
## [484] "afp"                "amp"                "acs"               
## [487] "asp"                "hcp"                "wil"               
## [490] "ill"                "hcb"                "ncp"               
## [493] "ccp"                "hyp"                "hop"               
## [496] "tcp"                "ocp"                "hlp"               
## [499] "hbp"                "hcl"                "hco"               
## [502] "hep"                "hcc"                "hcv"               
## [505] "icp"                "hap"                "hip"               
## [508] "hct"                "pcp"                "por"               
## [511] "und"                "evd"                "ind"               
## [514] "gnd"                "snd"                "ent"               
## [517] "eed"                "enc"                "wnd"               
## [520] "egd"                "pnd"                "and"               
## [523] "ife"                "lie"                "woc"               
## [526] "gcc"                "gtc"                "gdc"               
## [529] "gsc"                "god"                "glc"               
## [532] "soc"                "goo"                "poc"               
## [535] "loc"                "gpc"                "cae"               
## [538] "cre"                "are"                "odf"               
## [541] "oft"                "ofa"                "ofr"               
## [544] "sof"                "wof"                "eof"               
## [547] "mof"                "ofm"                "oof"               
## [550] "iof"                "tof"                "yof"               
## [553] "off"                "ube"                "tub"               
## [556] "tue"                "cmr"                "cpb"               
## [559] "cpm"                "cpa"                "cpd"               
## [562] "cpx"                "cpf"                "fpr"               
## [565] "ccr"                "cpp"                "ctr"               
## [568] "clr"                "cpv"                "chr"               
## [571] "rpr"                "cpk"                "cpt"               
## [574] "gpr"                "cor"                "car"               
## [577] "cxr"                "ini"                "tni"               
## [580] "uni"                "dwi"                "ddi"               
## [583] "dri"                "dli"                "dti"               
## [586] "dmi"                "dnr"                "nto"               
## [589] "nco"                "bno"                "ino"               
## [592] "wno"                "nwo"                "neo"               
## [595] "npo"                "not"                "vot"               
## [598] "nom"                "nit"                "nok"               
## [601] "noz"                "npt"                "cot"               
## [604] "noa"                "nox"                "fot"               
## [607] "nod"                "nos"                "hot"               
## [610] "noc"                "pot"                "tot"               
## [613] "ngt"                "nor"                "net"               
## [616] "got"                "lot"                "non"               
## [619] "now"                "dfo"                "doa"               
## [622] "dto"                "dow"                "dob"               
## [625] "dos"                "dop"                "dog"               
## [628] "doe"                "dng"                "drr"               
## [631] "dnh"                "dur"                "dna"               
## [634] "dtr"                "dir"                "gnr"               
## [637] "dns"                "dni"                "inr"               
## [640] "fr"                 "ah"                 "ay"                
## [643] "au"                 "rs"                 "aj"                
## [646] "ks"                 "ws"                 "ad"                
## [649] "ak"                 "qs"                 "es"                
## [652] "ax"                 "av"                 "hs"                
## [655] "ai"                 "gs"                 "ag"                
## [658] "ls"                 "xs"                 "cs"                
## [661] "aa"                 "aw"                 "ar"                
## [664] "ms"                 "fs"                 "us"                
## [667] "ab"                 "ss"                 "ps"                
## [670] "am"                 "vs"                 "at"                
## [673] "an"                 "is"                 "er"                
## [676] "bn"                 "bj"                 "bd"                
## [679] "bu"                 "bk"                 "ba"                
## [682] "ae"                 "br"                 "ke"                
## [685] "bw"                 "bt"                 "ce"                
## [688] "se"                 "bg"                 "ue"                
## [691] "ee"                 "fe"                 "ie"                
## [694] "bm"                 "ge"                 "re"                
## [697] "bc"                 "bx"                 "bb"                
## [700] "bl"                 "bs"                 "pe"                
## [703] "le"                 "he"                 "we"                
## [706] "ve"                 "bp"                 "me"                
## [709] "by"                 "ti"                 "tj"                
## [712] "tu"                 "tk"                 "tz"                
## [715] "te"                 "ta"                 "tt"                
## [718] "tl"                 "tn"                 "td"                
## [721] "tg"                 "tw"                 "ts"                
## [724] "tp"                 "th"                 "tx"                
## [727] "tr"                 "tb"                 "tv"                
## [730] "tm"                 "tc"                 "cm"                
## [733] "ac"                 "ap"                 "hc"                
## [736] "hp"                 "en"                 "ed"                
## [739] "gc"                 "oc"                 "oe"                
## [742] "gf"                 "oj"                 "oi"                
## [745] "wf"                 "mf"                 "os"                
## [748] "oy"                 "kf"                 "xf"                
## [751] "oq"                 "qf"                 "yf"                
## [754] "sf"                 "ow"                 "ff"                
## [757] "ou"                 "oh"                 "cf"                
## [760] "ov"                 "hf"                 "om"                
## [763] "oa"                 "od"                 "ol"                
## [766] "oz"                 "vf"                 "ob"                
## [769] "ox"                 "rf"                 "ok"                
## [772] "pf"                 "og"                 "lf"                
## [775] "ef"                 "op"                 "af"                
## [778] "uf"                 "tf"                 "if"                
## [781] "or"                 "on"                 "cp"                
## [784] "pr"                 "cr"                 "nm"                
## [787] "nk"                 "nn"                 "nz"                
## [790] "nw"                 "ne"                 "ny"                
## [793] "nb"                 "nx"                 "nf"                
## [796] "nv"                 "np"                 "nu"                
## [799] "na"                 "nh"                 "nj"                
## [802] "nd"                 "do"                 "nl"                
## [805] "ng"                 "ns"                 "nc"                
## [808] "ni"                 "ot"                 "nt"                
## [811] "du"                 "de"                 "dh"                
## [814] "dv"                 "dg"                 "db"                
## [817] "lo"                 "dq"                 "vo"                
## [820] "da"                 "dj"                 "bo"                
## [823] "df"                 "ds"                 "oo"                
## [826] "dl"                 "dk"                 "dz"                
## [829] "dw"                 "di"                 "ao"                
## [832] "fo"                 "dd"                 "io"                
## [835] "dt"                 "go"                 "uo"                
## [838] "mo"                 "wo"                 "dx"                
## [841] "dc"                 "co"                 "ro"                
## [844] "yo"                 "dm"                 "ho"                
## [847] "eo"                 "dp"                 "po"                
## [850] "so"                 "no"                 "to"                
## [853] "dn"                 "nr"                 "dr"                
## [856] "s"                  "a"                  "e"                 
## [859] "b"                  "t"                  "f"                 
## [862] "n"                  "d"                  "o"

Regex with Fuzzy Matching

Note that a dictionary of keywords has been discovered, we can complete another strict regex for the fuzzy matches discovered through the Levenshtein Distance algorithm.

fuzzyHold <- strictRegex(keywords, notes$TEXT)

#Convert from list entries to dataframe columns
fuzzyHold <- as.data.frame(fuzzyHold)

#Each column correspondes to each phrase in the phrases vector
colnames(fuzzyHold) <- keywords

#Multiply logicals by 1 for binary numeric
fuzzyHold <- fuzzyHold*1

#Sum each column (phrase) to show the number of occurences of the phrase
posFuzzyTable <- apply(fuzzyHold[,1:length(colnames(fuzzyHold))],2, FUN = sum)

#Print matches and count, omit phrases where no matches were found
posFuzzyTable[posFuzzyTable > 0]

##            inr            dni             dr            gnr            dir 
##            289            129            367             29             74 
##            dtr            dna             nr             dn            dur 
##             17             13            346            226            196 
##            dnh            drr            dng             to             no 
##              2              1              4            368            368 
##             so              o             po             dp             eo 
##            367            368            368            165            328 
##              d             ho             dm             yo             ro 
##            368            368            337            223            368 
##             co             dc             dx             wo             mo 
##            368             54             68            333            368 
##             uo             go             dt             io             dd 
##            199            318             41            368            262 
##             fo             ao            doe             di             dw 
##            367            239            145            368             43 
##             dz            dog             dk             dl            dop 
##             18             22             17            338            106 
##             oo             ds             df            dos             bo 
##            367            365             32            364            359 
##            dob             dj             da             vo             dq 
##             40             65            366            320              5 
##             lo             db            dow             dg            dto 
##            368             34            178             56              2 
##             dv            doa             dh             de             du 
##            353             12            182            367            367 
##            dfo            now           note            non             nt 
##              2            360            246            317            368 
##            lot            got            net            nor            ngt 
##             52             47             64            293            111 
##            tot             ot            pot            noc            hot 
##            365            367            271             49             34 
##            nos            nod            fot            nox            noa 
##            333             82              4             92              5 
##            cot            npt            noz            nok            nit 
##             40              5              4             11            367 
##           nots            nom            vot   resuscitated     resusitate 
##              1            100             62             18              6 
##    resussitate    rususcitate            not             ni             nc 
##              2              1            345            368            367 
##             ns             ng             nl            npo             do 
##            368            368            273            230            367 
##             nd            neo              n             nj             nh 
##            368            263            368            185             97 
##             na             nu             np             nv            nwo 
##            368            367            259            150              1 
##             nf             nx             nb             ny            wno 
##            367            107             16            260              1 
##             ne             nw            ino             nz             nn 
##            368             45            249            118            235 
##             nk             nm            bno            nco            nto 
##            238             31             90            245            184 
##   resusitation rescuscitation  resussitation   resusciation   resucitation 
##              6              3              5              2              8 
##  resuccitation resuscitations resusciatation  resuscutation            dnr 
##              2              1              1              1            155 
##            dmi            dti            dli            dri            ddi 
##            329             27             64            145            129 
##            dwi            uni            tni            ini            cxr 
##              5            358              7            344            245 
##             cr            car             pr             cp            cor 
##            364            367            368            215            262 
##            gpr            cpt            cpk            rpr            chr 
##              6             12             16             28            177 
##            cpv            clr            ctr            cpp            ccr 
##              3              1            158              3              2 
##            fpr            cpx            cpd            cpa            cpm 
##              2              1              1             82              3 
##            cpb            cmr    ventilatory    ventillator     ventilaton 
##              1              1             25              4              1 
##       breating       beathing     breathinig     brearthing       machines 
##              2              1              1              1              5 
##          tubes            tue          gtube          jtube          ttube 
##            106             23              6              4              2 
##          tubed            tub           tubs            ube          tubee 
##              7            300              1            267              1 
##           fall           fell          fully           pull           fill 
##            102             58             49             34             43 
##           dull           fuli           foll          sfull          fulll 
##             37              2            275             24              3 
##           come           mode           node           core          codes 
##             72            269             34             26             12 
##          coded           cope           bode      treatment          crest 
##             10             45              2             94             11 
##         cheest         cheast          chesp    compression    compresions 
##              1              1              1             48              1 
##    intubations      intubatin     incubation    intuabation          shock 
##              1              3              2              1            113 
##          socks         shocke        feeling       feedings        needing 
##              2             17             36             12             13 
##        seeding        beeding         feding           goal             on 
##              6              1              1            198            368 
##             or             if             tf             uf              f 
##            368            345            296            164            368 
##             af            off             op             ef             lf 
##            314            183            367            368            252 
##             og            yof             pf             ok            tof 
##            367             17             48            217            134 
##             rf             ox             ob             vf             oz 
##            208            315            348            279             38 
##             ol             od             oa            iof            oof 
##            368            367            329              5              4 
##             om             hf             ov            ofm            mof 
##            368            101            367              2              7 
##             cf            eof             oh             ou             ff 
##             15              9            221            368            341 
##             ow             sf             yf             qf             oq 
##            368            277             18              4             21 
##             xf             kf             oy             os             mf 
##             10             15             29            368            120 
##             wf            sof            ofr             oi            ofa 
##             20            311             38            333              8 
##             oj             gf            oft             oe            odf 
##             18             25            311            193              4 
##             oc           came            are           rare           case 
##            364             85            366             38             43 
##           nare           cane           cart           cure            cre 
##             16             13             10             32            336 
##          cared           card           cage          cares           bare 
##              7            352              5              3             16 
##           cape           cahe           fare           carb            cae 
##             12              1              2            109              4 
##            gpc            loc            poc            goo            soc 
##             30            248             35             85            229 
##            glc            god            gsc            gdc            gtc 
##             14              4              2              1              2 
##             gc     treatments     treatjment     treatement       treament 
##             33             16              1              5              1 
##       treatmen        confort        comforl        measure      measueres 
##             94              3              1             57              1 
##       measured      measurses          famly         familt          amily 
##             17              1              4              2            317 
##          famil          famiy          faily         famlly          fmily 
##            317              2              2              1              1 
##       meetings        seeting         meting        meetins        qualify 
##             14              2              2              1              4 
##           line           like           wife           live            lie 
##            366            274             76            353             64 
##           lift            ife            and             ed           send 
##             25            119            368            368             79 
##            pnd            egd           endo           ends             en 
##             31             46            270             35            368 
##            enc            eed            ent            snd            gnd 
##            272            328            368              7              1 
##            ind            evd           tend           iend           wend 
##            215             10            306             46              1 
##           bend            und           pend undertstanding   understading 
##             10            358            225              1              1 
##       illiness         ilness       progosis       prognois       priority 
##              1              1              1              1              7 
##       pruritis       procitis       purities      nitrities      attending 
##              7              1              1              1             69 
##      expanding      exceeding        tending       stending      intending 
##              7              2             81              2              1 
##     supportuve      suportive        advance       advamced        cancers 
##              1              1             40              1             10 
##         dancer         cancel   progressings           popr           door 
##              1             10              1              2              9 
##            por           porr           poon           pool           pour 
##            346              1              7             16              2 
##       junction      functions        fuction        funtion     junctional 
##             27              4              3              2             11 
##         states          staus        statius          stats          satus 
##             49              4              1              2              1 
##         statis     worstening     worseining       worseing     worsensing 
##              6              1              1              3              1 
##       wosening     woresening       worsenin     worsesning       worsenng 
##              2              1            115              1              1 
##        widedly      metastaic     metastatis      metasatic     metestatic 
##              1              2              3              1              1 
##      metatatic    metatstatic    metastastic    mestastatic       declined 
##              2              1              1              1             31 
##       declines         declin       declinie           fail           rail 
##              7             62              1            224             21 
##          flail          trail   proigressive    progressove     progresive 
##              5              7              1              1              1 
##        healthy        shealth           prox         proxys            pcp 
##             18              1             93              1             62 
##            hct            hip            hap            icp            hcv 
##            324             53             67             19             25 
##            hcc             hp            hep            hco            hcl 
##             12            316            261             42             47 
##            hbp           hcap           hcpo            hlp            ocp 
##              2             13              1              2              4 
##            tcp            hop             hc            hyp            ccp 
##              3             80            341            335              3 
##           hcps            ncp            hcb         giving         lining 
##              1              2              1             29             25 
##         loving        liveing         diving           well            ill 
##              3              1              1            238            358 
##           wall           till           pill          willl          wills 
##            114             91             28              2              1 
##           wiil           wild            wil           wiol           most 
##              1              3            276              2            144 
##          moist       advanced      directive        placing        playing 
##             15             33              5             17             17 
##       scanning        plating        pinning       spanning        wanning 
##              9              2              3              2              1 
##       plannied        landing       plaquing            hcp            asp 
##              1              1              1             92            186 
##             ap            acs            amp            afp             ac 
##            367             34             96             14            366 
##            abp            acl            alp            acv            ace 
##             16             55            116              7            336 
##            avp            aip            acc            atp            ach 
##              9              2            272              6            326 
##            aca            acd            act            aep            acx 
##             43             18            346              2              3 
##        durably          lower         powder         powers           ower 
##              1            150             16              1            153 
##         plower       attained        autoreg        atropne        actonel 
##              1              3             10              1              1 
##    resuscitate      intubated        intubat    downtitrate   fibrillation 
##             61            214            242              2            112 
##  defibrillator      endotrach   nasotracheal    orotracheal    mechanincal 
##             10            203              6              2              1 
##      mechnical             cm            cmv            cco            cms 
##              1            270            164            200              4 
##            cvo            cmh            cmy            cmg            cmf 
##              8            222              1              1              2 
##            cmb            cmp            cme            cml            cmc 
##              1              5              9              1              2 
##      conformed     confirmned      comfirmed       discused      discusses 
##              1              1              1              1              3 
##       dicussed      terrified          could          wound         wouldn 
##              4              1            124            124              6 
##         wiould          world          wpuld          woulf           life 
##              1              1              1              1             36 
##          likey          likes          liked           bike              t 
##              5              3              1              1            368 
##             tc             tm            tox            tco            two 
##            330            352             95            135             70 
##            too             tv             tb            tol             tr 
##            364             40             67            302            368 
##             tx             th             tp             ts             tw 
##             69            368            167            367            132 
##            top             tg            tob             td             tn 
##            246             37            191             54            285 
##             tl            toe             tt             ta             te 
##            318             49            361            367            368 
##            tpo            tro             tz            teo             tk 
##             13            365             29             58             15 
##            ton             tu             tj            tio            tow 
##            228            368              4            368             29 
##             ti             by             me             bp             ve 
##            368            276            368            363            368 
##             we             he             le             pe            bed 
##            353            368            368            367            126 
##             bs             bl             bb             bx             bc 
##            367            367            110            106            328 
##             re             ge            bee              b             bm 
##            368            368            184            368             51 
##              e             ie             fe             ee             ue 
##            368            368            366            367            348 
##             bg            ble             se             ce             bt 
##            334            356            368            367            220 
##             bw            bue            abe             ke             br 
##             11              9            107            360            362 
##             ae             ba             bk             bu             bd 
##             85            368             11            355            355 
##             bj             bn            bef            sbe         washes 
##             25            122             62              8              3 
##         wished        whishes          wihes         dishes        remains 
##              9              1              1              1             70 
##         regain        remaind        remaine         retain           with 
##             22             38             55             15            364 
##           fish           wash           wise            ish           dish 
##             11             24             86            179              8 
##          swish            wih           wich      retaining       remining 
##              6              7              4              5              1 
##      regaining            her            pet             er            pea 
##              2            367             58            368            338 
##            pfr           perl            pvr            pes            pcr 
##              4             86              3             21             13 
##           peri            peg           perc           perf            pen 
##            278             25            128            168            340 
##           pver            ped            pmr            wer            par 
##              1            200              6            234            315 
##           sper            pej            pel            ner           perm 
##             19              1             75            311             23 
##            pef            pep             is             an              a 
##             18             77            367            368            368 
##            was            has             at             vs             am 
##            307            260            368            190            368 
##             ps              s            ast             ss            asa 
##            307            368            366            367            215 
##             ab             us             fs            las            ask 
##            367            368             92            366             51 
##            gas             ms             ar             aw             aa 
##            215            292            367            223             60 
##             cs             xs             ls             ag             gs 
##            364             36            349            368            275 
##             ai             hs             av             ax            ams 
##            368            163            330            365             59 
##            bas             es            ans            asl            vas 
##            244            368            339             12            315 
##             qs            ass            asd            asm            ras 
##             13            367             44             52            194 
##             ak             ad            als            nas            abs 
##            308            368            282            152            367 
##            aos             ws            asc             ks             aj 
##              4            364            301            220             27 
##             rs             au             ay             ah            asx 
##            366            358            343             68              1 
##            asf            pas            ahs            cas            qas 
##              3            281              1            108              3 
##            das            asu            aks            fxr            far 
##             11             69             17              2            110 
##           four           form            fom             fr            foe 
##             26            164              8            343              1 
##            fow            mor            fbr           fort            fos 
##              2            331              6            130             25 
##            ror           fore            vor            fob           fior 
##             20             82             31              2              4 
##           mfor            fur            fcr    discussions      discusion 
##            112            220              1             17              1 
##    discussionw     discussion         palmar        calcarb        palpate 
##              1             77              6              1             17 
##         pallor         palate       palpbale       ballpark       selfcare 
##             30              8              1              1              1 
##          palce        daycare         paular      palliaive        pancake 
##              6              2              1              1              1 
##       pericare      paliative      pallative           pale            all 
##              1              3              3             35            368 
##           palp           call           gall            pal           tall 
##            102            258             66            152             36 
##           paln           ball            pll           pala      medicines 
##              1             30              2             11              8

#Create a vector
incFuzzy <- vector()
for (i in 1:nrow(fuzzyHold)){
  #Populate vector with logical value if note contains any concepts associated with inclusion
  incFuzzy[length(incFuzzy)+1] <- any(fuzzyHold[i,] == 1)
}

Note Subsetting

#Replace text without tolower() and clean its tmp variable
notes$TEXT <- txtHolder

#Clean txtHolder from environment
rm(txtHolder)

#Subset all positive notes
fuzzyResults <- notes[incFuzzy,]
nrow(fuzzyResults)

## [1] 368

#Subset negatives
fuzzyNegatives <- notes[!incFuzzy,]
nrow(fuzzyNegatives)

## [1] 0

fuzzyResults$COHORT <- rep(1, each = nrow(fuzzyResults))
fuzzyNegatives$COHORT <- rep(0, each = nrow(fuzzyNegatives))
fuzzyResults <- rbind(fuzzyResults, fuzzyNegatives)

allResults <- rbind(strictResults, fuzzyResults)
allResults <- allResults[!(duplicated(allResults$ICUSTAY_ID)),]
nrow(allResults)

## [1] 368

write.csv(fuzzyResults, file = "fuzzy_regex_results05Nov17.csv", row.names = F)


write.csv(allResults, file = "all_EOL_Note_Results05Nov17.csv", row.names = F)

End of Life Goals of Care Physician Note Exploration

Edward T. Moseley

November 5, 2017

Background

Data Sources

Preprocessing

Domains and keywords

Method

Direct Matching (Regex)

Fuzzy Matching (Levenshtein Distance)

Load Data

Focus only on Physician’s Notes

Direct Regex

Create phrase dictionary & Convert text to lowercase

Run Regex

Generate a table showing number of occurences of each label

Note Selection

Note Subsetting

Fuzzy Matching (Levenshtein Distance)

Reload Phrases

Preprocessing

Levenshtein Distance Calculations

Fuzzy Matches

Regex with Fuzzy Matching

Note Subsetting