Background & Data:
It’s an ongoing research project of a systematic literature review. Text data were article titles and abstracts that contain key terms of interest downloaded from Web of Science. In this demonstration, I will show how to extract the combined frequency table of individual words and n-grams. I mostly rely R base commands. I use WordDistance built-in stop words list. I also use the str_view_all command in the stringr package to highlight the matched patterns in a long text.
In text analysis, it may be necessary to create a project-specific stop words list.
Packages Preparation
library(readr) # import .csv
library(dbplyr) # %in%
library(DT) # show tables in R markdown
library(WordDistance) # a package currently available for IOS (still testing for Windows and Linux), contains a relatively comprehensive stop words list
data("stopwords") # a list of stop words
library(writexl) # export multiple excel (.xlsx) worksheets
library(stringr) # str_view_all highlights all searched terms a string
CGN <- read_csv("CGN.csv")
PADM <- CGN[grepl("public administration",CGN$`WoS Categories`, ignore.case = T)==TRUE,]
#names(CGN)
Index a list of abstracts by keywords.
PADM_text <- paste(PADM$`Article Title`, PADM$Abstract)
PADM$text <- PADM_text
ID.collab <- PADM$ID[grepl("collaborative governance", PADM$text, ignore.case = T)==T]
ID.netgov <- PADM$ID[grepl("governance network|network(|ed) governance", PADM$text, ignore.case = T)==T] # the previous excel did not capture the spelling variation of "networked governance"
ID.complex <- PADM$ID[grepl("complex(|ity) (theory|network|governance)|complex(|ity)(| adaptive) system", PADM$text, ignore.case = T)==T] # search multiple patterns
ID.self <- PADM$ID[grepl("self(| |-)govern|self(| |-)organi", PADM$text, ignore.case = T)==T]
ID.collabnet <- PADM$ID[grepl("collaborative network", PADM$text, ignore.case = T)==T]
ID.collabPAPM <- PADM$ID[grepl("collaborative public (administration|management)", PADM$text, ignore.case = T)==T]
It is important to search as many spelling variations as possible. When searching multiple terms/spelling variations in a string, use (), |, .* to translate your will to R codes. For example, the pattern "governance network|network(|ed) governance" matches “governance network(s)”, “network governance”, “networked governance”.
setdiff() shows the discrepancies of text searching. If setdiff returns numeric(0), it means the results are the same for both coding methods.
Counting all occurrences of a pattern, and then aggregate all occurrences.
If a pattern occurs multiple times in the same text, count only once.
network(ed) governance/governance network(s)
If a pattern occurs multiple times in the same text, count only once.
Counting all occurrences of a pattern, and then aggregate all occurrences.
If a pattern occurs multiple times in the same text, count only once.
Write Multiple Sheets to .xlsx
write_xlsx(list("Complex_Term_Raw" = Complex_Term_Freq_Raw, "Complex_Term_Adjusted" = Complex_Term_Freq_Adjusted, "Collaborative_Term_Raw" = Collab_Term_Freq_Raw, "Collaborative_Term_Adjusted" = Collab_Term_Freq_Adjusted, "NetGov_Term_Raw" = netgov_Term_Freq_Raw, "NetGov_Term_Adjusted" = netgov_Term_Freq_Adjusted, "Self_Term_Raw" = self_Term_Freq_Raw, "Self_Term_Adjusted" = self_Term_Freq_Adjusted), "Term_Freq_CGN.xlsx")