Background:

Natural language processing or NLP is a branch of artificial intelligence which enables computers to understand human language. There are various tasks for NLP, one of them is topic modeling which we will focus on this topic. Topic modeling in machine learning and NLP is a statistical model for discovering topics occuring in collection of documents. For example, treatment, bed and service would appear more frequently in medical service documents.

At this topic, we will cover unsupervised topic modeling which helps us to identify major topics in unlabeled texts.latent Dirichlet allocation (LDA) is more commonly used in unsupervised topic modeling. LDA classifies topics in a document based on its words.

We will do also detailed word processing to get the best results.

Data

We have collected some of Yale New Haven Hospital reviews from Yelp website.

library(readxl)
## Warning: package 'readxl' was built under R version 3.6.2
ynhhreviews <- read_excel("ynhhreviews.xlsx")

Install Packages

library(tidyverse) # general utility & workflow functions
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages ------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## Warning: package 'stringr' was built under R version 3.6.2
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts --------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidytext) # tidy implimentation of NLP methods
library(topicmodels) # for LDA topic modelling 
## Warning: package 'topicmodels' was built under R version 3.6.3
library(tm) # general text mining functions, making document term matrixes
## Warning: package 'tm' was built under R version 3.6.3
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(SnowballC) 
library(stringi)
## Warning: package 'stringi' was built under R version 3.6.2

Unsupervised data processing and visualization

# function to get & plot the most informative terms by a specificed number
# of topics, using LDA
top_terms_by_topic_LDA <- function(text, # should be a columm from a dataframe
                                   plot = T, 
                                   number_of_topics = 4) # number of topics (4 by default)
{    
    # create a corpus (type of object expected by tm) and document term matrix
    Corpus <- Corpus(VectorSource(text)) # make a corpus object
    DTM <- DocumentTermMatrix(Corpus) # get the count of words/document

    # remove any empty rows in our document term matrix (if there are any 
    # we'll get an error when we try to run our LDA)
    unique_indexes <- unique(DTM$i) # get the index of each unique value
    DTM <- DTM[unique_indexes,] # get a subset of only those indexes
    
    # preform LDA & get the words/topic in a tidy text format
    lda <- LDA(DTM, k = number_of_topics, control = list(seed = 1234))
    topics <- tidy(lda, matrix = "beta") #beta is the parameter of the Dirichlet prior on the per-topic word distribution

    # get the top ten terms for each topic
    top_terms <- topics  %>% # take the topics data frame and..
      group_by(topic) %>% # treat each topic as a different group
      top_n(10, beta) %>% # get the top 10 most informative words
      ungroup() %>% # ungroup
      arrange(topic, -beta) # arrange words in descending informativeness

    # if the user asks for a plot (TRUE by default)
    if(plot == T){
        # plot the top ten terms for each topic in order
        top_terms %>% # take the top terms
          mutate(term = reorder(term, beta)) %>% # sort terms by beta value 
          ggplot(aes(term, beta, fill = factor(topic))) + # plot beta by theme
          geom_col(show.legend = FALSE) + # as a bar plot
          facet_wrap(~ topic, scales = "free") + # which each topic in a seperate plot
          labs(x = NULL, y = "Beta") + # no x label, change y label 
          coord_flip() # turn bars sideways
    }else{ 
        # if the user does not request a plot
        # return a list of sorted terms instead
        return(top_terms)
    }
}

As we are expecting that our reviews consists of positive and negative reviews, i wil specify that I would like to know about two topics.

top_terms_by_topic_LDA(ynhhreviews$text, number_of_topics = 2)

As we can see, the topic models consists of common non informative words which is non useful. Next step, we will clean common non informative words.

#original text cleaning
ynhhreviews <- ynhhreviews[-grep("\\b\\d+\\b", ynhhreviews$text),]
#this code function is to remove non text charachters. It is important if you are working on twitter texts.
ynhhreviews$text <- sapply(ynhhreviews$text,function(row) iconv(row, "latin1", "ASCII", sub=""))
# create a document term matrix to clean
reviewsCorpus <- Corpus(VectorSource(ynhhreviews$text))
#remove punctuation, numbers, lower upper case letters then stem words
ynhh_corpus <- tm_map(reviewsCorpus, content_transformer(removePunctuation))
## Warning in tm_map.SimpleCorpus(reviewsCorpus,
## content_transformer(removePunctuation)): transformation drops documents
ynhh_corpus2 <- tm_map(ynhh_corpus, content_transformer(removeNumbers))
## Warning in tm_map.SimpleCorpus(ynhh_corpus, content_transformer(removeNumbers)):
## transformation drops documents
ynhh_corpus3 <- tm_map(ynhh_corpus2, content_transformer(stri_trans_tolower))
## Warning in tm_map.SimpleCorpus(ynhh_corpus2,
## content_transformer(stri_trans_tolower)): transformation drops documents
ynhh_corpus4  <-  tm_map(ynhh_corpus3,stemDocument)
## Warning in tm_map.SimpleCorpus(ynhh_corpus3, stemDocument): transformation drops
## documents
inspect(ynhh_corpus2)
## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 16
## 
##  [1] TRASH If you want to find a place to come so yu can leave with the same condition you came in with or to come close to DEATH this is the hospital for you Its always over crowded and instead of trying to find a solution they mask the issue for as long as they can and send yu on your way feeling no different just to clear your bed for someone else AVOID IF YOU CAN TAKE THE TRIP TO STAMFORD HOSPITAL WHICH IS OUT OF THIS NETWORK Nurses take hours to come to the room Ridiculous smh                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
##  [2] We went to the emergency room around am so my boyfriend can get stitches He was treated immediately The surgeon doing the stitches took his time and was precise He didnt say much but when he did speak it was all the right things I wish I knew his name Definitely happy with the results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
##  [3] The valet parking here is horrendous It took over an hour to get our car There is no excuse for such incompetence                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
##  [4] Time and time again this place has proven to be subpar Nurses that cant get an IV using patients as teaching tools without consent and BS charges that never occurred Then they basically try to take your first born for payment\r\nDO NOT GO HERE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
##  [5] My family and I have spent the last week here at Yale New Haven As you can imagine no one spends a week at Yale New Haven unless it is a difficult and life changing moment While the most devastating moment of my life and my familys life took place here this week we are overwhelmed with the kindness generosity and love directed towards us from the many staff we have interacted with The Ninja Nurses are the backbone of this facility I will never forget how these ladies in Labor  Delivery treated us Tiffany Alyssa Jamie Sydney  Lori were above amazing and were an answer to prayer We are thankful for the love you directed our way and will never forget your kindness Thank You                                                                                                                                                                                                                                                                             
##  [6] If you can avoid this place I have been to the ER for myself and friends and family and it is always several hours wait even for something as serious as a spine fracture and the staff seems determined to make the experience as demeaning and traumatizing as possible It is not clean it is understaffed and it is always chaotic and loud even if there are only a few people waiting                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
##  [7] Even a one is being generous tonight They cant get their story straight on patient care Ive had a few great nurses who really cared and did everything they could and a few that were clearly just collecting their paycheck Patient services are a joke They may listen to your complaints but they do nothing to make anything better Save your breath  They told me I should leave my moms side while she cant communicate but have shown no effort to make me or her feel comfortable with that happening They transferred us down to a floor and we were greeted with visiting hours are over The chair Im trying to sleep in is a broken piece of shit and the they ran out of fucking pillows Pillows  How is that even a thing in a hospital this size  Walk your lazy asses over to the next unit and borrow a few to make your patient comfortable They may have some great doctors but patient care seems to be low on the list of priorities for hospital administration
##  [8] Horrible horrible experiences with this hospital overall                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
##  [9] Staff sit watching Netflix and talking amongst other coworker while I waiting patiently for a doctor Dont know why Yale buying up other hospitals Its really unprofessional to have to have to hear other patients information while being transferred                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## [10] Great hospital with great doctors and nurses This hospital gets two stars for accessibility Parking is terrible You either have to pay for expensive valet parking or try to find street parking and feed the meters If you have to visit a loved one on multiple days parking gets very expensive Also there is a no smoking blue zone and I always see employees smoking within the zone Who ever came up with that idea and had the lines painted wasted their and someones money                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
## [11] Great doctors run under a disgustingly incompetent bureaucratic machine It took us over two hours to get discharged and an hour to find an available wheelchair Would not recommend but theres nowhere else to go                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
## [12] Yale has saved my life a few times I know it would not have been the same outcome in some other hospitals They have all of the equipment and talent to save you                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
## [13] Avoid the ER I think a hospital run by Yale is better in theory than in practice I imagine their staff is quite intelligent but are completely unable to communicate their knowledge with suffering patients Complete chaos in the ER Sick people litter the hallways on gurneys and health providers make a one time appearance and then dash off in distress hoping the entourage of heavily armed security will save them from having to engage with a real person Seek out other options Yale is not the only gig in town nor are they the best nor are or the cheapest The Gods must come down from Olympus and see the mess theyve made                                                                                                                                                                                                                                                                                                                                       
## [14] Have had excellent people working on my boyfriend while he has been here after his second liver surgery Dr Cha and his team have done a fantastic job and the nurses and hospital in Smilow have been outstanding Were partial to Shane on the th floor but theyre all excellent                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
## [15] I walked in with a pregnant wife and walked out with a much less pregnant wife and a baby What can I say this is a place where magic happens  Joking aside based on my very limited experience YNNH is a pretty good hospital At the maternity ward we mostly had amazing nurses and a couple of very inexperienced ones We were given some options along the way and they were well communicated Their equipment seemed very new and the facilities were well kept However if you are here for the emergency room you may have a completely different perspective Downtown New Haven is a hot bed for emergency visits so you may get served quicker if you visit a different location like Branford Good luck                                                                                                                                                                                                                                                                     
## [16] Came here for late night food poisoning The emergency service was prompt but didnt like the atmosphere And they definitely need to hire some cleaners Hope Ill never revisit this place
reviewsDTM <- DocumentTermMatrix(ynhh_corpus4)

# convert the document term matrix to a tidytext corpus
reviewsDTM_tidy <- tidy(reviewsDTM)

# I'm going to add my own custom stop words that I don't think will be
# very informative in hospital reviews
custom_stop_words <- tibble(word = c("hospit", "room", "week", "patient", "life", "day", "bed", "someon", "hour", "befor", "time", "floor", "peopl", "becaus", "tri", "star", "alway", "els", "realli", "veri", "pillow"))
# remove stopwords
reviewsDTM_tidy_cleaned <- reviewsDTM_tidy %>% # take our tidy dtm and...
    anti_join(stop_words, by = c("term" = "word")) %>% # remove English stopwords and...
    anti_join(custom_stop_words, by = c("term" = "word")) # remove my custom stopwords

# reconstruct cleaned documents (so that each word shows up the correct number of times)
cleaned_documents <- reviewsDTM_tidy_cleaned %>%
    group_by(document) %>% 
    mutate(terms = toString(rep(term, count))) %>%
    select(document, terms) %>%
    unique()

# check out what the cleaned documents look like (should just be a bunch of content words)
# in alphabetic order
head(cleaned_documents)
## # A tibble: 6 x 2
## # Groups:   document [6]
##   document terms                                                                
##   <chr>    <chr>                                                                
## 1 1        avoid, close, condit, crowd, death, feel, issu, leav, mask, network,~
## 2 2        boyfriend, definit, didnt, emerg, happi, immedi, precis, result, spe~
## 3 3        car, excus, horrend, incompet, park, valet                           
## 4 4        basic, born, charg, consent, nurs, occur, payment, proven, subpar, t~
## 5 5        abov, alyssa, amaz, answer, backbon, chang, deliveri, devast, diffic~
## 6 6        avoid, chaotic, clean, demean, determin, experi, famili, fractur, fr~
# now let's look at the new most informative terms
top_terms_by_topic_LDA(cleaned_documents$terms, number_of_topics = 2)

In topic 1 “joke” & “horrible” keywords refers to negative reviews while “love” in topic 2 refers to positive reviews.