This is a report focusing on the quantity and type of patient notes containing phrases associated with end of life decisions and their implementation by the clinical care team.
Notes consist of all patient notes from MIMIC-III where the patient died within 30 days of the clinical encounter. These notes result from the PostgreSQL query found in "dataset_icu_deaths_2017_10_04_notes_queries.txt", and were preprocessed according to preprocessing.R. The query and preprocessing were conducted by Daniele Ramazotti.
Data preprocessing was performed by Daniele Ramazotti according to the following method:
Contextual information has been given by sources provided by Charlotta Lindvall, and include: Keywords_and_definitions.docx as well as the following phrases:
Code status: “DNR,do not resuscitate,no resuscitation, DNI,DNR/DNI,CPR,ventilator,breathing machine,breathing tube,full code,full resuscitation,life-sustaining treatments,chest compressions,intubation,shocks,feeding tube”
Goals of care: “goals of care,GOC,life-sustaining treatment,comfort measures,comfort care,family meeting,hospice,quality of life,end of life, understanding of illness,understanding of prognosis,priorities,quality of life,extending life,comfort-focused care, supportive care”
Illness severity: “advanced cancer,progressing cancer,poor function,poor functional status,worsening cancer,widely metastatic,functional decline,frail,ill-appearing,poor prognosis,no treatment,end of life,progressive cancer”
Advance care planning: “health care proxy,HCP,living will,MOLST,advance directives,advance care planning,ACP,durable power of attorney”
Further:
We have four categories within our note annotation GUI:
To generate a subset of notes containing inclusive phrases we will use regex according to the following trategy:
',' for n unique phrasesgrepl() to generate a logical vector to capture all TRUE evaluationsTRUENot dictionaryTo generate a larger subset of notes likely to contain inclusive phrases, we will use Levenshtein Distance, or a count of the number of insertions, deletions, or substitutions required to transform one string into another.
Our strategy will follow:
',' as well as ' ', for n unique phrases'\n' with ' 'adist() to find Levenshtein Distance of all input phrases relative to each word in the note vocabularygrepl() to generate a logical vector to capture all TRUE evaluationsTRUEWe will include "dataset_icu_deaths_notes_processed.txt", as well as the ADMISSIONS and PATIENTS tables from MIMIC-III, to generate a DAYS_UNTIL_DEATH [relative to admission date].
#Load notes
notes <- read.csv("dataset_icu_deaths_notes_processed.txt",
header = T,
stringsAsFactors = F,
sep = '\t')
#Load ADMISSIONS table from MIMIC for admittime/distime
adm <- read.csv("ADMISSIONS.csv",
header = T, stringsAsFactors = F)
#Load PATIENTS table from MIMIC for date of death
pat <- read.csv("PATIENTS.csv",
header = T, stringsAsFactors = F)
#Convert dates for easier manipulation
adm$ADMITTIME <- as.numeric(as.Date(adm$ADMITTIME, "%Y-%m-%d %H:%M:%S"))
adm$DISCHTIME <- as.numeric(as.Date(adm$DISCHTIME, "%Y-%m-%d %H:%M:%S"))
pat$DOD <- as.numeric(as.Date(pat$DOD, "%Y-%m-%d %H:%M:%S"))
#Drop ROW_ID variables from each table
adm$ROW_ID <- NULL
pat$ROW_ID <- NULL
#Merge adm and pat tables on SUBJECT_ID
dat <- merge(adm, pat, by = "SUBJECT_ID")
#Clean environment of admissions and patient tables
rm(pat)
rm(adm)
#merge notes to other data on hadm_id for time data
notes <- merge(notes, dat, by = c("SUBJECT_ID","HADM_ID"))
rm(dat)
#Generate DAYS_UNTIL_DEATH [from admission date] variable
notes$DAYS_UNTIL_DEATH <- notes$DOD - notes$ADMITTIME
colnames(notes)
## [1] "SUBJECT_ID" "HADM_ID" "ICUSTAY_ID"
## [4] "CATEGORY" "DESCRIPTION" "TEXT"
## [7] "ADMITTIME" "DISCHTIME" "DEATHTIME"
## [10] "ADMISSION_TYPE" "ADMISSION_LOCATION" "DISCHARGE_LOCATION"
## [13] "INSURANCE" "LANGUAGE" "RELIGION"
## [16] "MARITAL_STATUS" "ETHNICITY" "EDREGTIME"
## [19] "EDOUTTIME" "DIAGNOSIS" "HOSPITAL_EXPIRE_FLAG"
## [22] "HAS_CHARTEVENTS_DATA" "GENDER" "DOB"
## [25] "DOD" "DOD_HOSP" "DOD_SSN"
## [28] "EXPIRE_FLAG" "DAYS_UNTIL_DEATH"
#nrow(notes)
#length(unique(notes$SUBJECT_ID))
par(mai=c(1,2,1,1))
barplot(table(factor(notes$CATEGORY)),
horiz = T,
names.arg = attr(table(factor(notes$CATEGORY)), "names"),
main = "Note Count by Type (Entire Cohort, All Time Points)",
las=1)
notes <- notes[(notes$CATEGORY == "Physician"),]
cat("We have", nrow(notes), "observations after keeping only physician's notes")
## We have 7214 observations after keeping only physician's notes
length(unique(notes$SUBJECT_ID))
## [1] 368
cat("We have",length(unique(notes$SUBJECT_ID)),"unique patients in the cohort")
## We have 368 unique patients in the cohort
#notes <- notes[which((notes$DOD - notes$ADMITTIME) <= 30),]
#nrow(notes)
#Because of duplicated notes, add column to count characters
notes$CHARS <- nchar(notes$TEXT)
#Order by subject_ID and note size
notes <- notes[with(notes, order(SUBJECT_ID, -CHARS)), ]
#Remove duplicates via ICUSTAY_ID, pull first note for each patient (maximize size of)
notes <- notes[!duplicated(notes$ICUSTAY_ID),]
cat("We have", nrow(notes), "notes after removing duplicates")
## We have 368 notes after removing duplicates
We will narrow the notes down to 30 days of patient expiration.
hist(notes$DAYS_UNTIL_DEATH,
breaks = 50,
main = "Note Event Frequency by Days Until Death From Admission",
xlab = "Days Until Death",
ylab = "Note Event Frequency")
summary(notes$DAYS_UNTIL_DEATH)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.00 7.00 10.21 16.00 83.00
Phrases will be searched for in text according to the dictionary of terms generated earlier.
#Store text for replacement later
txtHolder <- notes$TEXT
#convert all text to lowercase
notes$TEXT <- tolower(notes$TEXT)
#Remove escapes and carriage returns
notes$TEXT <- gsub("\\\\n", '\n', notes$TEXT)
#Replace carriage returns with spaces to split on
notes$TEXT <- gsub('\n', ' ', notes$TEXT)
#Use phrases from above, convert to lower
phrases <- tolower(c("DNR,do not resuscitate,no resuscitation,DNI,DNR/DNI,CPR,ventilator,breathing machine,breathing tube,full code,full resuscitation,life-sustaining treatments,chest compressions,intubation,shocks,feeding tube,",
"goals of care,GOC,life-sustaining treatment,comfort measures,comfort care,family meeting,hospice,quality of life,end of life,understanding of illness,understanding of prognosis,priorities,quality of life,extending life,comfort-focused care,supportive care,",
"advanced cancer,progressing cancer,poor function,poor functional status,worsening cancer,widely metastatic,functional decline,frail,ill-appearing,poor prognosis,no treatment,end of life,progressive cancer,",
"health care proxy,HCP,living will,MOLST,advance directives,advance care planning,ACP,durable power of attorney,",
"dnr,dnrdni,dni,do not resuscitate,do-not-resuscitate,do not intubate,do-not-intubate,chest compressions,no defibrillation,no endotracheal intubation,no mechanical intubation,shocks,cmo,comfort measures,",
"Full code confirmed,full code d/w,full code discussed,full code verified,would like to be full code,wishes to be full code,would like to remain full code,wishes to remain full code,wish to be full code,remaining full code,",
"full code per,full code as per,",
"goals of care,goc,goals for care,goals of treatment,goals for treatment,treatment goals,family meeting,family discussion,family discussions,",
"pallcare,palliative care,pall care,pallcare,palliative medicine,",
"hospice"))
#Paste phrases together
incPhrases <- paste(phrases, sep = ',', collapse = '')
#Split strings on ',', use unique() for a union
incPhrases <- unique(strsplit(incPhrases, ',')[[1]])
#Display inclusive phrases
print(incPhrases)
## [1] "dnr" "do not resuscitate"
## [3] "no resuscitation" "dni"
## [5] "dnr/dni" "cpr"
## [7] "ventilator" "breathing machine"
## [9] "breathing tube" "full code"
## [11] "full resuscitation" "life-sustaining treatments"
## [13] "chest compressions" "intubation"
## [15] "shocks" "feeding tube"
## [17] "goals of care" "goc"
## [19] "life-sustaining treatment" "comfort measures"
## [21] "comfort care" "family meeting"
## [23] "hospice" "quality of life"
## [25] "end of life" "understanding of illness"
## [27] "understanding of prognosis" "priorities"
## [29] "extending life" "comfort-focused care"
## [31] "supportive care" "advanced cancer"
## [33] "progressing cancer" "poor function"
## [35] "poor functional status" "worsening cancer"
## [37] "widely metastatic" "functional decline"
## [39] "frail" "ill-appearing"
## [41] "poor prognosis" "no treatment"
## [43] "progressive cancer" "health care proxy"
## [45] "hcp" "living will"
## [47] "molst" "advance directives"
## [49] "advance care planning" "acp"
## [51] "durable power of attorney" "dnrdni"
## [53] "do-not-resuscitate" "do not intubate"
## [55] "do-not-intubate" "no defibrillation"
## [57] "no endotracheal intubation" "no mechanical intubation"
## [59] "cmo" "full code confirmed"
## [61] "full code d/w" "full code discussed"
## [63] "full code verified" "would like to be full code"
## [65] "wishes to be full code" "would like to remain full code"
## [67] "wishes to remain full code" "wish to be full code"
## [69] "remaining full code" "full code per"
## [71] "full code as per" "goals for care"
## [73] "goals of treatment" "goals for treatment"
## [75] "treatment goals" "family discussion"
## [77] "family discussions" "pallcare"
## [79] "palliative care" "pall care"
## [81] "palliative medicine"
#Use exclusionary phrases from above
excPhrases <- tolower("Full code per np admit note,full code per admission,full code per orders,full code per review of chart,full code per LMR,full code per chart,full code per recent,full code per last admission,full code per order set,full code per records,full code per pepl,full code per cas,full code per team,Full code as per np admit note,full code as per admission,full code as per orders,full code as per review of chart,full code as per LMR,full code as per chart,full code as per recent,full code as per last admission,full code as per order set,full code as per records,full code as per pepl,full code as per cas,full code as per team")
#Split strings on ',', use unique() for a union
excPhrases <- unique(strsplit(excPhrases, ',')[[1]])
#Display exclusive phrases
print(excPhrases)
## [1] "full code per np admit note" "full code per admission"
## [3] "full code per orders" "full code per review of chart"
## [5] "full code per lmr" "full code per chart"
## [7] "full code per recent" "full code per last admission"
## [9] "full code per order set" "full code per records"
## [11] "full code per pepl" "full code per cas"
## [13] "full code per team" "full code as per np admit note"
## [15] "full code as per admission" "full code as per orders"
## [17] "full code as per review of chart" "full code as per lmr"
## [19] "full code as per chart" "full code as per recent"
## [21] "full code as per last admission" "full code as per order set"
## [23] "full code as per records" "full code as per pepl"
## [25] "full code as per cas" "full code as per team"
strictRegex() will accept all phrases kwds, and all note texts, texts, it will utilize grepl() to find direct matches in the text, and will return a list of booleans.
strictRegex <- function(kwds, texts){
#Create a list to store results
tmpList <- list()
#Loop through all keywords
for (i in 1:length(kwds)){
#Store results as a logical vector in its respective list entry position
tmpList[[i]] <- grepl(kwds[i], texts)
}
#Return list and control to environment
return(tmpList)
}
system.time(hold <- strictRegex(incPhrases, notes$TEXT))
## user system elapsed
## 31.78 0.00 31.84
system.time(excHold <- strictRegex(excPhrases, notes$TEXT))
## user system elapsed
## 15.32 0.00 15.31
#Convert from list entries to dataframe columns
hold <- as.data.frame(hold)
#Each column correspondes to each phrase in the phrases vector
colnames(hold) <- incPhrases
#Multiply logicals by 1 for binary numeric
hold <- hold*1
#Sum each column (phrase) to show the number of occurences of the phrase
posTable <- apply(hold[,1:length(colnames(hold))],2, FUN = sum)
#Print matches and count, omit phrases where no matches were found
posTable[posTable > 0]
## dnr do not resuscitate
## 155 50
## no resuscitation dni
## 2 129
## dnr/dni cpr
## 77 44
## ventilator breathing machine
## 226 1
## breathing tube full code
## 4 201
## full resuscitation chest compressions
## 1 16
## intubation shocks
## 115 20
## feeding tube goals of care
## 15 54
## goc comfort measures
## 10 30
## comfort care family meeting
## 22 88
## hospice quality of life
## 23 9
## end of life extending life
## 9 2
## comfort-focused care supportive care
## 5 28
## poor function poor functional status
## 4 3
## widely metastatic functional decline
## 10 1
## frail ill-appearing
## 17 6
## poor prognosis no treatment
## 38 5
## progressive cancer health care proxy
## 1 21
## hcp living will
## 92 6
## advance directives do not intubate
## 1 2
## cmo full code confirmed
## 54 3
## full code discussed wishes to be full code
## 1 1
## wishes to remain full code wish to be full code
## 1 1
## remaining full code full code per
## 1 3
## goals for care treatment goals
## 2 4
## family discussion family discussions
## 21 5
## palliative care pall care
## 28 1
excHold <- as.data.frame(excHold)
colnames(excHold) <- excPhrases
excHold <- excHold*1
excTable <- apply(excHold[,1:length(colnames(excHold))],2, FUN = sum)
excTable[excTable > 0]
## named numeric(0)
No exclusive phrases are found.
Strategy: Include any patient note that contained any inclusive phrase.
#Create a vector
inc <- vector()
for (i in 1:nrow(hold)){
#Populate vector with logical value if note contains any concepts associated with inclusion
inc[length(inc)+1] <- any(hold[i,] == 1)
}
#Replace text without tolower() and clean its tmp variable
notes$TEXT <- txtHolder
#Clean txtHolder from environment
rm(txtHolder)
#Subset all positive notes
results <- notes[inc,]
nrow(results)
## [1] 360
#Subset negatives
negatives <- notes[!inc,]
nrow(negatives)
## [1] 8
results$COHORT <- rep(1, each = nrow(results))
negatives$COHORT <- rep(0, each = nrow(negatives))
strictResults <- rbind(results, negatives)
write.csv(strictResults, file = "strict_regex_results05Nov17.csv", row.names = F)
phrases <- tolower(c("DNR,do not resuscitate,no resuscitation,DNI,DNR/DNI,CPR,ventilator,breathing machine,breathing tube,full code,full resuscitation,life-sustaining treatments,chest compressions,intubation,shocks,feeding tube,",
"goals of care,GOC,life-sustaining treatment,comfort measures,comfort care,family meeting,hospice,quality of life,end of life,understanding of illness,understanding of prognosis,priorities,quality of life,extending life,comfort-focused care,supportive care,",
"advanced cancer,progressing cancer,poor function,poor functional status,worsening cancer,widely metastatic,functional decline,frail,ill-appearing,poor prognosis,no treatment,end of life,progressive cancer,",
"health care proxy,HCP,living will,MOLST,advance directives,advance care planning,ACP,durable power of attorney,",
"dnr,dnrdni,dni,do not resuscitate,do-not-resuscitate,do not intubate,do-not-intubate,chest compressions,no defibrillation,no endotracheal intubation,no mechanical intubation,shocks,cmo,comfort measures,",
"Full code confirmed,full code d/w,full code discussed,full code verified,would like to be full code,wishes to be full code,would like to remain full code,wishes to remain full code,wish to be full code,remaining full code,",
"full code per,full code as per,",
"goals of care,goc,goals for care,goals of treatment,goals for treatment,treatment goals,family meeting,family discussion,family discussions,",
"pallcare,palliative care,pall care,pallcare,palliative medicine,",
"hospice"))
#Create a union of all phrases split between spaces
phrases <- unique(unlist(strsplit(unlist(strsplit(phrases, ',')), ' ')))
print(phrases)
## [1] "dnr" "do" "not"
## [4] "resuscitate" "no" "resuscitation"
## [7] "dni" "dnr/dni" "cpr"
## [10] "ventilator" "breathing" "machine"
## [13] "tube" "full" "code"
## [16] "life-sustaining" "treatments" "chest"
## [19] "compressions" "intubation" "shocks"
## [22] "feeding" "goals" "of"
## [25] "care" "goc" "treatment"
## [28] "comfort" "measures" "family"
## [31] "meeting" "hospice" "quality"
## [34] "life" "end" "understanding"
## [37] "illness" "prognosis" "priorities"
## [40] "extending" "comfort-focused" "supportive"
## [43] "advanced" "cancer" "progressing"
## [46] "poor" "function" "functional"
## [49] "status" "worsening" "widely"
## [52] "metastatic" "decline" "frail"
## [55] "ill-appearing" "progressive" "health"
## [58] "proxy" "hcp" "living"
## [61] "will" "molst" "advance"
## [64] "directives" "planning" "acp"
## [67] "durable" "power" "attorney"
## [70] "dnrdni" "do-not-resuscitate" "intubate"
## [73] "do-not-intubate" "defibrillation" "endotracheal"
## [76] "mechanical" "cmo" "confirmed"
## [79] "d/w" "discussed" "verified"
## [82] "would" "like" "to"
## [85] "be" "wishes" "remain"
## [88] "wish" "remaining" "per"
## [91] "as" "for" "discussion"
## [94] "discussions" "pallcare" "palliative"
## [97] "pall" "medicine"
#Store text for replacement later
txtHolder <- notes$TEXT
#convert all text to lowercase
notes$TEXT <- tolower(notes$TEXT)
#Remove escapes and carriage returns
notes$TEXT <- gsub("\\\\n", '\n', notes$TEXT)
#Replace carriage returns with spaces to split on
notes$TEXT <- gsub('\n', ' ', notes$TEXT)
#Split notes on spaces, create union, then create a dictionary of all unique words
dict <- unique(gsub("[^[:alpha:]]", '',unique(unlist(strsplit(notes$TEXT, ' ')))))
#Remove empty strings
dict <- dict[dict != ""]
cat("Our phrase library has",length(phrases),"unique words in it.")
## Our phrase library has 98 unique words in it.
cat("Our vocabulary has", length(dict), "unique words in it.")
## Our vocabulary has 36706 unique words in it.
lDist() will accept dict, or a dictionary of the entire vocabulary, phrase, or a dictionary of all phrases, and a res, a data frame populated with a single result. lDist() will compute the Levenshtein Distance for each dict/phrase pair and insert the results into the data frame.
#lDist() will compute the Levenshtein Distances for each
#phrase/word pair and insert them into a dataframe column-wise
lDist <- function(dict, phrase, res){
tmp <- vector()
for (i in 1:length(dict)){
#adist applies the levenshtein distance algorithm, select the only first element from resulting matrix
tmp[i] <- adist(dict[i], phrase)[1]
}
res <- cbind(res, tmp)
colnames(res)[length(colnames(res))] <- phrase
return(res)
}
#Create an initial data frame using the dictionary and first phrase
v <- vector()
for (i in 1:length(dict)){
v[length(v)+1] <- adist(phrases[1], dict[i])[1]
}
res <- data.frame(dict,v)
colnames(res) <- c("Dict", phrases[1])
#For each phrase/word pair, calculate Levenshtein Distance
for (phrase in phrases[2:length(phrases)]){
res <- lDist(dict, phrase, res)
}
#Replace all identical matches with NA
res[res == 0] <- NA
head(res)
## Dict dnr do not resuscitate no resuscitation dni dnr/dni cpr
## 1 chief 5 5 5 10 5 11 4 7 4
## 2 complaint 8 8 7 9 8 11 8 8 7
## 3 altered 6 7 6 10 7 12 7 7 6
## 4 mental 5 6 5 8 5 10 5 7 6
## 5 status 6 6 5 8 6 9 6 7 6
## 6 fever 4 5 5 10 5 12 5 7 4
## ventilator breathing machine tube full code life-sustaining treatments
## 1 9 7 4 4 5 3 14 9
## 2 8 7 6 9 8 7 12 8
## 3 9 8 6 5 7 6 14 8
## 4 6 7 6 6 5 6 12 7
## 5 7 7 7 5 6 6 12 7
## 6 8 8 7 4 4 4 13 8
## chest compressions intubation shocks feeding goals of care goc treatment
## 1 3 10 9 5 6 5 4 3 5 8
## 2 7 6 8 9 7 7 8 7 8 7
## 3 6 10 9 7 7 7 7 5 7 8
## 4 5 10 8 6 6 5 6 6 6 7
## 5 6 10 7 4 7 4 6 5 6 7
## 6 5 11 10 6 5 5 5 4 5 7
## comfort measures family meeting hospice quality life end understanding
## 1 6 7 5 6 6 6 4 5 12
## 2 5 9 7 6 6 7 7 8 10
## 3 7 6 7 7 7 7 5 5 10
## 4 7 6 6 4 7 7 6 4 10
## 5 7 5 6 6 7 6 6 6 10
## 6 6 6 5 6 7 7 4 4 11
## illness prognosis priorities extending comfort-focused supportive
## 1 6 9 8 8 13 9
## 2 8 8 9 7 12 8
## 3 6 9 8 7 12 9
## 4 6 8 9 7 13 9
## 5 6 8 8 8 12 8
## 6 6 9 9 8 13 9
## advanced cancer progressing poor function functional status worsening
## 1 7 4 10 5 6 8 6 8
## 2 9 8 8 8 8 9 8 6
## 3 5 5 9 6 8 9 6 8
## 4 7 5 10 6 6 6 5 7
## 5 7 6 10 6 7 9 NA 8
## 6 6 4 10 4 7 9 6 8
## widely metastatic decline frail ill-appearing progressive health proxy
## 1 5 10 5 5 12 10 6 5
## 2 8 9 6 7 9 9 8 8
## 3 6 8 7 7 10 9 6 7
## 4 6 6 6 5 12 10 5 6
## 5 6 6 7 5 12 10 5 6
## 6 5 9 6 4 11 9 5 5
## hcp living will molst advance directives planning acp durable power
## 1 4 5 4 5 7 7 7 5 7 4
## 2 8 7 8 6 8 9 6 8 8 8
## 3 7 6 7 7 6 8 7 6 7 5
## 4 6 6 5 5 7 8 7 6 6 6
## 5 6 6 6 6 6 8 7 5 6 6
## 6 5 5 5 5 6 7 8 5 7 3
## attorney dnrdni do-not-resuscitate intubate do-not-intubate
## 1 7 6 17 8 14
## 2 9 8 15 8 11
## 3 4 7 15 7 14
## 4 8 6 15 6 12
## 5 7 6 15 6 12
## 6 7 6 17 8 15
## defibrillation endotracheal mechanical cmo confirmed d/w discussed
## 1 13 10 7 4 6 5 7
## 2 11 11 9 7 7 9 9
## 3 13 10 10 7 6 7 7
## 4 12 8 5 6 8 6 9
## 5 12 10 9 6 9 6 7
## 6 12 11 9 5 7 5 8
## verified would like to be wishes remain wish remaining per as for
## 1 6 5 4 5 4 5 6 4 8 4 5 5
## 2 8 7 7 8 9 9 5 8 6 8 8 8
## 3 6 6 5 6 6 6 7 7 9 5 6 6
## 4 7 6 6 5 5 6 5 6 7 5 5 6
## 5 8 6 6 5 6 5 6 6 8 6 4 6
## 6 6 5 4 5 4 5 5 5 8 3 5 3
## discussion discussions pallcare palliative pall medicine
## 1 8 9 7 9 5 7
## 2 9 9 8 8 7 7
## 3 10 11 5 7 6 8
## 4 10 11 7 9 5 6
## 5 8 9 8 8 5 8
## 6 10 11 7 9 5 7
Now that we have the results of our Levenshtein Distance search, we will create a new dictionary of keywords that will include the phrase matches with minimal distances from our patient note vocabulary.
#Create a data frame for holding results
tmpFrame <- data.frame()
#Populate it with a single row
tmpFrame <- res[which(res[[colnames(res)[2]]] == min(res[[colnames(res)[2]]], na.rm = TRUE)),]
#Populate the data frame with all other Levenshtein Distance minima
for (phrase in colnames(res)[3:ncol(res)]){
tmpFrame <- rbind(tmpFrame,res[which(res[[phrase]] == min(res[[phrase]], na.rm = TRUE)),])
}
#Convert results from factor to character
tmpFrame$Dict <- as.character(tmpFrame$Dict)
#Remove duplicates
keywords <- unique(tmpFrame$Dict)
print(rev(keywords[order(nchar(keywords))]))
## [1] "dnrdonotreintubate" "postresuscitative" "eventsintubated"
## [4] "comfortfocused" "undertstanding" "lifesustaining"
## [7] "resusciatation" "resuscitations" "rescuscitation"
## [10] "defibrillator" "afibrillation" "resuscutation"
## [13] "resuccitation" "resussitation" "nasotracheal"
## [16] "fibrillation" "proigressive" "illappearing"
## [19] "progressings" "understading" "resucitation"
## [22] "resusciation" "resusitation" "resuscitated"
## [25] "wpalliative" "discussionw" "discussions"
## [28] "mechanincal" "wmechanical" "orotracheal"
## [31] "downtitrate" "resuscitate" "progressove"
## [34] "mestastatic" "metastastic" "metatstatic"
## [37] "wmetastatic" "intuabation" "intubations"
## [40] "compresions" "compression" "ventillator"
## [43] "ventilatory" "rususcitate" "resussitate"
## [46] "discussion" "confirmned" "planningdc"
## [49] "kgplanning" "progresive" "metestatic"
## [52] "metastatis" "worsesning" "woresening"
## [55] "worsensing" "wworsening" "worseining"
## [58] "worstening" "junctional" "supportuve"
## [61] "edtrending" "wtreatment" "treatement"
## [64] "treatjment" "treatments" "incubation"
## [67] "brearthing" "breathinig" "ventilaton"
## [70] "resusitate" "medicines" "pallative"
## [73] "paliative" "palliaive" "discusion"
## [76] "regaining" "retaining" "terrified"
## [79] "discusses" "comfirmed" "conformed"
## [82] "mechnical" "endotrach" "intubated"
## [85] "directive" "metatatic" "metasatic"
## [88] "metastaic" "functions" "suportive"
## [91] "cxpending" "intending" "exceeding"
## [94] "expanding" "attending" "nitrities"
## [97] "measurses" "measueres" "intubatin"
## [100] "treatment" "pericare" "selfcare"
## [103] "ballpark" "palpbale" "remining"
## [106] "dicussed" "discused" "attained"
## [109] "plaquing" "plannied" "spanning"
## [112] "scanning" "advanced" "declinie"
## [115] "declines" "declined" "worsenng"
## [118] "worsenin" "wosening" "worseing"
## [121] "junction" "advamced" "stending"
## [124] "purities" "procitis" "pruritis"
## [127] "priority" "prognois" "progosis"
## [130] "illnessi" "illiness" "hospiceo"
## [133] "meetings" "measured" "treatmen"
## [136] "treament" "feedings" "machines"
## [139] "beathing" "breating" "pancake"
## [142] "daycare" "palpate" "calcarb"
## [145] "remaine" "remaind" "remains"
## [148] "whishes" "intubat" "actonel"
## [151] "atropne" "autoreg" "durably"
## [154] "landing" "wanning" "pinning"
## [157] "plating" "playing" "placing"
## [160] "liveing" "shealth" "healthy"
## [163] "widedly" "statusi" "statuso"
## [166] "statius" "funtion" "fuction"
## [169] "cancers" "advance" "tending"
## [172] "qualify" "meetins" "seeting"
## [175] "sfamily" "wfamily" "familys"
## [178] "measure" "comforl" "confort"
## [181] "beeding" "seeding" "needing"
## [184] "feeling" "paular" "palate"
## [187] "pallor" "palmar" "retain"
## [190] "regain" "dishes" "wished"
## [193] "washes" "wiould" "wouldn"
## [196] "dnrdnr" "dnidni" "plower"
## [199] "powers" "powder" "diving"
## [202] "loving" "lining" "giving"
## [205] "proxys" "declin" "statis"
## [208] "states" "cancel" "dancer"
## [211] "ilness" "meting" "famlly"
## [214] "familt" "feding" "shocke"
## [217] "xchest" "cheast" "cheest"
## [220] "dnrdni" "palce" "swish"
## [223] "wihes" "liked" "likes"
## [226] "likey" "woulf" "wpuld"
## [229] "world" "wound" "could"
## [232] "lower" "moist" "wills"
## [235] "willl" "trail" "flail"
## [238] "satus" "stats" "staus"
## [241] "fmily" "faily" "famiy"
## [244] "famil" "amily" "famly"
## [247] "wcare" "cares" "dcare"
## [250] "carea" "cared" "socks"
## [253] "shock" "chesp" "crest"
## [256] "coded" "codes" "fulll"
## [259] "sfull" "fully" "tubee"
## [262] "tubed" "ttube" "jtube"
## [265] "gtube" "tubes" "pala"
## [268] "ball" "paln" "tall"
## [271] "gall" "call" "palp"
## [274] "pale" "mfor" "fior"
## [277] "fore" "fort" "form"
## [280] "four" "perm" "sper"
## [283] "pver" "perf" "perc"
## [286] "peri" "perl" "wich"
## [289] "dish" "wise" "wash"
## [292] "fish" "with" "bike"
## [295] "life" "wold" "cmho"
## [298] "ower" "most" "wiol"
## [301] "wild" "wiil" "pill"
## [304] "till" "wall" "well"
## [307] "hcps" "hcpo" "hcap"
## [310] "prox" "rail" "fail"
## [313] "pour" "pool" "poon"
## [316] "porr" "door" "popr"
## [319] "pend" "bend" "wend"
## [322] "iend" "tend" "ends"
## [325] "endo" "send" "lift"
## [328] "live" "wife" "like"
## [331] "line" "carb" "fare"
## [334] "cahe" "cape" "bare"
## [337] "cage" "card" "cure"
## [340] "cart" "cane" "nare"
## [343] "case" "rare" "came"
## [346] "goal" "bode" "cope"
## [349] "core" "node" "mode"
## [352] "come" "foll" "fuli"
## [355] "dull" "fill" "pull"
## [358] "fell" "fall" "tubs"
## [361] "wcpr" "nots" "note"
## [364] "dnrd" "dnri" "pll"
## [367] "pal" "all" "fcr"
## [370] "fur" "fob" "vor"
## [373] "ror" "fqr" "fos"
## [376] "fbr" "mor" "fow"
## [379] "foe" "fom" "far"
## [382] "fxr" "aks" "asu"
## [385] "das" "qas" "cas"
## [388] "ahs" "pas" "asf"
## [391] "asx" "asc" "aos"
## [394] "abs" "nas" "als"
## [397] "ras" "asm" "asd"
## [400] "ass" "vas" "asl"
## [403] "ans" "bas" "ams"
## [406] "gas" "ask" "las"
## [409] "asa" "ast" "has"
## [412] "was" "pep" "pef"
## [415] "ner" "pel" "pej"
## [418] "par" "wer" "pjr"
## [421] "pmr" "ped" "pen"
## [424] "peg" "pcr" "pes"
## [427] "pvr" "pfr" "pea"
## [430] "pet" "her" "wih"
## [433] "ish" "sbe" "bef"
## [436] "abe" "bue" "ble"
## [439] "bee" "bed" "tow"
## [442] "tio" "ton" "teo"
## [445] "tro" "tpo" "toe"
## [448] "tob" "top" "tol"
## [451] "too" "two" "tco"
## [454] "tox" "cmc" "cml"
## [457] "cme" "cmp" "cmb"
## [460] "cmf" "cmg" "cmy"
## [463] "cmh" "cvo" "cms"
## [466] "cco" "cmv" "acx"
## [469] "aep" "axp" "act"
## [472] "acd" "aca" "ach"
## [475] "atp" "acc" "aip"
## [478] "avp" "ace" "acv"
## [481] "alp" "acl" "abp"
## [484] "afp" "amp" "acs"
## [487] "asp" "hcp" "wil"
## [490] "ill" "hcb" "ncp"
## [493] "ccp" "hyp" "hop"
## [496] "tcp" "ocp" "hlp"
## [499] "hbp" "hcl" "hco"
## [502] "hep" "hcc" "hcv"
## [505] "icp" "hap" "hip"
## [508] "hct" "pcp" "por"
## [511] "und" "evd" "ind"
## [514] "gnd" "snd" "ent"
## [517] "eed" "enc" "wnd"
## [520] "egd" "pnd" "and"
## [523] "ife" "lie" "woc"
## [526] "gcc" "gtc" "gdc"
## [529] "gsc" "god" "glc"
## [532] "soc" "goo" "poc"
## [535] "loc" "gpc" "cae"
## [538] "cre" "are" "odf"
## [541] "oft" "ofa" "ofr"
## [544] "sof" "wof" "eof"
## [547] "mof" "ofm" "oof"
## [550] "iof" "tof" "yof"
## [553] "off" "ube" "tub"
## [556] "tue" "cmr" "cpb"
## [559] "cpm" "cpa" "cpd"
## [562] "cpx" "cpf" "fpr"
## [565] "ccr" "cpp" "ctr"
## [568] "clr" "cpv" "chr"
## [571] "rpr" "cpk" "cpt"
## [574] "gpr" "cor" "car"
## [577] "cxr" "ini" "tni"
## [580] "uni" "dwi" "ddi"
## [583] "dri" "dli" "dti"
## [586] "dmi" "dnr" "nto"
## [589] "nco" "bno" "ino"
## [592] "wno" "nwo" "neo"
## [595] "npo" "not" "vot"
## [598] "nom" "nit" "nok"
## [601] "noz" "npt" "cot"
## [604] "noa" "nox" "fot"
## [607] "nod" "nos" "hot"
## [610] "noc" "pot" "tot"
## [613] "ngt" "nor" "net"
## [616] "got" "lot" "non"
## [619] "now" "dfo" "doa"
## [622] "dto" "dow" "dob"
## [625] "dos" "dop" "dog"
## [628] "doe" "dng" "drr"
## [631] "dnh" "dur" "dna"
## [634] "dtr" "dir" "gnr"
## [637] "dns" "dni" "inr"
## [640] "fr" "ah" "ay"
## [643] "au" "rs" "aj"
## [646] "ks" "ws" "ad"
## [649] "ak" "qs" "es"
## [652] "ax" "av" "hs"
## [655] "ai" "gs" "ag"
## [658] "ls" "xs" "cs"
## [661] "aa" "aw" "ar"
## [664] "ms" "fs" "us"
## [667] "ab" "ss" "ps"
## [670] "am" "vs" "at"
## [673] "an" "is" "er"
## [676] "bn" "bj" "bd"
## [679] "bu" "bk" "ba"
## [682] "ae" "br" "ke"
## [685] "bw" "bt" "ce"
## [688] "se" "bg" "ue"
## [691] "ee" "fe" "ie"
## [694] "bm" "ge" "re"
## [697] "bc" "bx" "bb"
## [700] "bl" "bs" "pe"
## [703] "le" "he" "we"
## [706] "ve" "bp" "me"
## [709] "by" "ti" "tj"
## [712] "tu" "tk" "tz"
## [715] "te" "ta" "tt"
## [718] "tl" "tn" "td"
## [721] "tg" "tw" "ts"
## [724] "tp" "th" "tx"
## [727] "tr" "tb" "tv"
## [730] "tm" "tc" "cm"
## [733] "ac" "ap" "hc"
## [736] "hp" "en" "ed"
## [739] "gc" "oc" "oe"
## [742] "gf" "oj" "oi"
## [745] "wf" "mf" "os"
## [748] "oy" "kf" "xf"
## [751] "oq" "qf" "yf"
## [754] "sf" "ow" "ff"
## [757] "ou" "oh" "cf"
## [760] "ov" "hf" "om"
## [763] "oa" "od" "ol"
## [766] "oz" "vf" "ob"
## [769] "ox" "rf" "ok"
## [772] "pf" "og" "lf"
## [775] "ef" "op" "af"
## [778] "uf" "tf" "if"
## [781] "or" "on" "cp"
## [784] "pr" "cr" "nm"
## [787] "nk" "nn" "nz"
## [790] "nw" "ne" "ny"
## [793] "nb" "nx" "nf"
## [796] "nv" "np" "nu"
## [799] "na" "nh" "nj"
## [802] "nd" "do" "nl"
## [805] "ng" "ns" "nc"
## [808] "ni" "ot" "nt"
## [811] "du" "de" "dh"
## [814] "dv" "dg" "db"
## [817] "lo" "dq" "vo"
## [820] "da" "dj" "bo"
## [823] "df" "ds" "oo"
## [826] "dl" "dk" "dz"
## [829] "dw" "di" "ao"
## [832] "fo" "dd" "io"
## [835] "dt" "go" "uo"
## [838] "mo" "wo" "dx"
## [841] "dc" "co" "ro"
## [844] "yo" "dm" "ho"
## [847] "eo" "dp" "po"
## [850] "so" "no" "to"
## [853] "dn" "nr" "dr"
## [856] "s" "a" "e"
## [859] "b" "t" "f"
## [862] "n" "d" "o"
Note that a dictionary of keywords has been discovered, we can complete another strict regex for the fuzzy matches discovered through the Levenshtein Distance algorithm.
fuzzyHold <- strictRegex(keywords, notes$TEXT)
#Convert from list entries to dataframe columns
fuzzyHold <- as.data.frame(fuzzyHold)
#Each column correspondes to each phrase in the phrases vector
colnames(fuzzyHold) <- keywords
#Multiply logicals by 1 for binary numeric
fuzzyHold <- fuzzyHold*1
#Sum each column (phrase) to show the number of occurences of the phrase
posFuzzyTable <- apply(fuzzyHold[,1:length(colnames(fuzzyHold))],2, FUN = sum)
#Print matches and count, omit phrases where no matches were found
posFuzzyTable[posFuzzyTable > 0]
## inr dni dr gnr dir
## 289 129 367 29 74
## dtr dna nr dn dur
## 17 13 346 226 196
## dnh drr dng to no
## 2 1 4 368 368
## so o po dp eo
## 367 368 368 165 328
## d ho dm yo ro
## 368 368 337 223 368
## co dc dx wo mo
## 368 54 68 333 368
## uo go dt io dd
## 199 318 41 368 262
## fo ao doe di dw
## 367 239 145 368 43
## dz dog dk dl dop
## 18 22 17 338 106
## oo ds df dos bo
## 367 365 32 364 359
## dob dj da vo dq
## 40 65 366 320 5
## lo db dow dg dto
## 368 34 178 56 2
## dv doa dh de du
## 353 12 182 367 367
## dfo now note non nt
## 2 360 246 317 368
## lot got net nor ngt
## 52 47 64 293 111
## tot ot pot noc hot
## 365 367 271 49 34
## nos nod fot nox noa
## 333 82 4 92 5
## cot npt noz nok nit
## 40 5 4 11 367
## nots nom vot resuscitated resusitate
## 1 100 62 18 6
## resussitate rususcitate not ni nc
## 2 1 345 368 367
## ns ng nl npo do
## 368 368 273 230 367
## nd neo n nj nh
## 368 263 368 185 97
## na nu np nv nwo
## 368 367 259 150 1
## nf nx nb ny wno
## 367 107 16 260 1
## ne nw ino nz nn
## 368 45 249 118 235
## nk nm bno nco nto
## 238 31 90 245 184
## resusitation rescuscitation resussitation resusciation resucitation
## 6 3 5 2 8
## resuccitation resuscitations resusciatation resuscutation dnr
## 2 1 1 1 155
## dmi dti dli dri ddi
## 329 27 64 145 129
## dwi uni tni ini cxr
## 5 358 7 344 245
## cr car pr cp cor
## 364 367 368 215 262
## gpr cpt cpk rpr chr
## 6 12 16 28 177
## cpv clr ctr cpp ccr
## 3 1 158 3 2
## fpr cpx cpd cpa cpm
## 2 1 1 82 3
## cpb cmr ventilatory ventillator ventilaton
## 1 1 25 4 1
## breating beathing breathinig brearthing machines
## 2 1 1 1 5
## tubes tue gtube jtube ttube
## 106 23 6 4 2
## tubed tub tubs ube tubee
## 7 300 1 267 1
## fall fell fully pull fill
## 102 58 49 34 43
## dull fuli foll sfull fulll
## 37 2 275 24 3
## come mode node core codes
## 72 269 34 26 12
## coded cope bode treatment crest
## 10 45 2 94 11
## cheest cheast chesp compression compresions
## 1 1 1 48 1
## intubations intubatin incubation intuabation shock
## 1 3 2 1 113
## socks shocke feeling feedings needing
## 2 17 36 12 13
## seeding beeding feding goal on
## 6 1 1 198 368
## or if tf uf f
## 368 345 296 164 368
## af off op ef lf
## 314 183 367 368 252
## og yof pf ok tof
## 367 17 48 217 134
## rf ox ob vf oz
## 208 315 348 279 38
## ol od oa iof oof
## 368 367 329 5 4
## om hf ov ofm mof
## 368 101 367 2 7
## cf eof oh ou ff
## 15 9 221 368 341
## ow sf yf qf oq
## 368 277 18 4 21
## xf kf oy os mf
## 10 15 29 368 120
## wf sof ofr oi ofa
## 20 311 38 333 8
## oj gf oft oe odf
## 18 25 311 193 4
## oc came are rare case
## 364 85 366 38 43
## nare cane cart cure cre
## 16 13 10 32 336
## cared card cage cares bare
## 7 352 5 3 16
## cape cahe fare carb cae
## 12 1 2 109 4
## gpc loc poc goo soc
## 30 248 35 85 229
## glc god gsc gdc gtc
## 14 4 2 1 2
## gc treatments treatjment treatement treament
## 33 16 1 5 1
## treatmen confort comforl measure measueres
## 94 3 1 57 1
## measured measurses famly familt amily
## 17 1 4 2 317
## famil famiy faily famlly fmily
## 317 2 2 1 1
## meetings seeting meting meetins qualify
## 14 2 2 1 4
## line like wife live lie
## 366 274 76 353 64
## lift ife and ed send
## 25 119 368 368 79
## pnd egd endo ends en
## 31 46 270 35 368
## enc eed ent snd gnd
## 272 328 368 7 1
## ind evd tend iend wend
## 215 10 306 46 1
## bend und pend undertstanding understading
## 10 358 225 1 1
## illiness ilness progosis prognois priority
## 1 1 1 1 7
## pruritis procitis purities nitrities attending
## 7 1 1 1 69
## expanding exceeding tending stending intending
## 7 2 81 2 1
## supportuve suportive advance advamced cancers
## 1 1 40 1 10
## dancer cancel progressings popr door
## 1 10 1 2 9
## por porr poon pool pour
## 346 1 7 16 2
## junction functions fuction funtion junctional
## 27 4 3 2 11
## states staus statius stats satus
## 49 4 1 2 1
## statis worstening worseining worseing worsensing
## 6 1 1 3 1
## wosening woresening worsenin worsesning worsenng
## 2 1 115 1 1
## widedly metastaic metastatis metasatic metestatic
## 1 2 3 1 1
## metatatic metatstatic metastastic mestastatic declined
## 2 1 1 1 31
## declines declin declinie fail rail
## 7 62 1 224 21
## flail trail proigressive progressove progresive
## 5 7 1 1 1
## healthy shealth prox proxys pcp
## 18 1 93 1 62
## hct hip hap icp hcv
## 324 53 67 19 25
## hcc hp hep hco hcl
## 12 316 261 42 47
## hbp hcap hcpo hlp ocp
## 2 13 1 2 4
## tcp hop hc hyp ccp
## 3 80 341 335 3
## hcps ncp hcb giving lining
## 1 2 1 29 25
## loving liveing diving well ill
## 3 1 1 238 358
## wall till pill willl wills
## 114 91 28 2 1
## wiil wild wil wiol most
## 1 3 276 2 144
## moist advanced directive placing playing
## 15 33 5 17 17
## scanning plating pinning spanning wanning
## 9 2 3 2 1
## plannied landing plaquing hcp asp
## 1 1 1 92 186
## ap acs amp afp ac
## 367 34 96 14 366
## abp acl alp acv ace
## 16 55 116 7 336
## avp aip acc atp ach
## 9 2 272 6 326
## aca acd act aep acx
## 43 18 346 2 3
## durably lower powder powers ower
## 1 150 16 1 153
## plower attained autoreg atropne actonel
## 1 3 10 1 1
## resuscitate intubated intubat downtitrate fibrillation
## 61 214 242 2 112
## defibrillator endotrach nasotracheal orotracheal mechanincal
## 10 203 6 2 1
## mechnical cm cmv cco cms
## 1 270 164 200 4
## cvo cmh cmy cmg cmf
## 8 222 1 1 2
## cmb cmp cme cml cmc
## 1 5 9 1 2
## conformed confirmned comfirmed discused discusses
## 1 1 1 1 3
## dicussed terrified could wound wouldn
## 4 1 124 124 6
## wiould world wpuld woulf life
## 1 1 1 1 36
## likey likes liked bike t
## 5 3 1 1 368
## tc tm tox tco two
## 330 352 95 135 70
## too tv tb tol tr
## 364 40 67 302 368
## tx th tp ts tw
## 69 368 167 367 132
## top tg tob td tn
## 246 37 191 54 285
## tl toe tt ta te
## 318 49 361 367 368
## tpo tro tz teo tk
## 13 365 29 58 15
## ton tu tj tio tow
## 228 368 4 368 29
## ti by me bp ve
## 368 276 368 363 368
## we he le pe bed
## 353 368 368 367 126
## bs bl bb bx bc
## 367 367 110 106 328
## re ge bee b bm
## 368 368 184 368 51
## e ie fe ee ue
## 368 368 366 367 348
## bg ble se ce bt
## 334 356 368 367 220
## bw bue abe ke br
## 11 9 107 360 362
## ae ba bk bu bd
## 85 368 11 355 355
## bj bn bef sbe washes
## 25 122 62 8 3
## wished whishes wihes dishes remains
## 9 1 1 1 70
## regain remaind remaine retain with
## 22 38 55 15 364
## fish wash wise ish dish
## 11 24 86 179 8
## swish wih wich retaining remining
## 6 7 4 5 1
## regaining her pet er pea
## 2 367 58 368 338
## pfr perl pvr pes pcr
## 4 86 3 21 13
## peri peg perc perf pen
## 278 25 128 168 340
## pver ped pmr wer par
## 1 200 6 234 315
## sper pej pel ner perm
## 19 1 75 311 23
## pef pep is an a
## 18 77 367 368 368
## was has at vs am
## 307 260 368 190 368
## ps s ast ss asa
## 307 368 366 367 215
## ab us fs las ask
## 367 368 92 366 51
## gas ms ar aw aa
## 215 292 367 223 60
## cs xs ls ag gs
## 364 36 349 368 275
## ai hs av ax ams
## 368 163 330 365 59
## bas es ans asl vas
## 244 368 339 12 315
## qs ass asd asm ras
## 13 367 44 52 194
## ak ad als nas abs
## 308 368 282 152 367
## aos ws asc ks aj
## 4 364 301 220 27
## rs au ay ah asx
## 366 358 343 68 1
## asf pas ahs cas qas
## 3 281 1 108 3
## das asu aks fxr far
## 11 69 17 2 110
## four form fom fr foe
## 26 164 8 343 1
## fow mor fbr fort fos
## 2 331 6 130 25
## ror fore vor fob fior
## 20 82 31 2 4
## mfor fur fcr discussions discusion
## 112 220 1 17 1
## discussionw discussion palmar calcarb palpate
## 1 77 6 1 17
## pallor palate palpbale ballpark selfcare
## 30 8 1 1 1
## palce daycare paular palliaive pancake
## 6 2 1 1 1
## pericare paliative pallative pale all
## 1 3 3 35 368
## palp call gall pal tall
## 102 258 66 152 36
## paln ball pll pala medicines
## 1 30 2 11 8
#Create a vector
incFuzzy <- vector()
for (i in 1:nrow(fuzzyHold)){
#Populate vector with logical value if note contains any concepts associated with inclusion
incFuzzy[length(incFuzzy)+1] <- any(fuzzyHold[i,] == 1)
}
#Replace text without tolower() and clean its tmp variable
notes$TEXT <- txtHolder
#Clean txtHolder from environment
rm(txtHolder)
#Subset all positive notes
fuzzyResults <- notes[incFuzzy,]
nrow(fuzzyResults)
## [1] 368
#Subset negatives
fuzzyNegatives <- notes[!incFuzzy,]
nrow(fuzzyNegatives)
## [1] 0
fuzzyResults$COHORT <- rep(1, each = nrow(fuzzyResults))
fuzzyNegatives$COHORT <- rep(0, each = nrow(fuzzyNegatives))
fuzzyResults <- rbind(fuzzyResults, fuzzyNegatives)
allResults <- rbind(strictResults, fuzzyResults)
allResults <- allResults[!(duplicated(allResults$ICUSTAY_ID)),]
nrow(allResults)
## [1] 368
write.csv(fuzzyResults, file = "fuzzy_regex_results05Nov17.csv", row.names = F)
write.csv(allResults, file = "all_EOL_Note_Results05Nov17.csv", row.names = F)