The dataset includes 1,000 reviews of 3 Disneyland branches - Paris, California and Hong Kong, posted by various visitors on Trip Advisor.

The Link to the specific dataset used to read the data is given below: https://www.kaggle.com/datasets/arushchillar/disneyland-reviews

We will use the columns ‘Review_ID’, ‘Label’ and the ‘Review_Text’ column to analyse text.

The code Snippet

####Loading the libraries required in this Assignment

library("tidytext")
library("topicmodels")
library("quanteda")
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "packedMatrix" of class "mMatrix"; definition not updated
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "packedMatrix" of class "replValueSp"; definition not updated
## Package version: 3.2.2
## Unicode version: 13.0
## ICU version: 69.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
library("seededlda")
## Loading required package: proxyC
## 
## Attaching package: 'proxyC'
## The following object is masked from 'package:stats':
## 
##     dist
## 
## Attaching package: 'seededlda'
## The following objects are masked from 'package:topicmodels':
## 
##     terms, topics
## The following object is masked from 'package:stats':
## 
##     terms
library("topicdoc")
library("ldatuning")
library("LDAvis")
library("broom")
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("ggplot2")
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.6     ✓ purrr   0.3.4
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library("readtext")
library("textrank")
library("stm")
## stm v1.3.6 successfully loaded. See ?stm for help. 
##  Papers, resources, and other materials at structuraltopicmodel.com
library("keyATM")
## keyATM 0.4.1 successfully loaded.
##  Papers, examples, resources, and other materials are at
##  https://keyatm.github.io/keyATM/
library("servr") 
library("kableExtra")
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

####1.Describe data and data statistics by importing and forming data frame

The required dataframe from the csv file

disneyReviews_df <- readtext("Assignment2.csv",docid_field="Review_ID",text_field = "Review_Text")
head(disneyReviews_df, n=5) %>%
  kbl() %>%
  kable_styling()
doc_id text Rating Year_Month Reviewer_Location Branch
670772142 If you’ve ever been to Disneyland anywhere you’ll find Disneyland Hong Kong very similar in the layout when you walk into main street! It has a very familiar feel. One of the rides its a Small World is absolutely fabulous and worth doing. The day we visited was fairly hot and relatively busy but the queues moved fairly well. 4 2019-4 Australia Disneyland_HongKong
670682799 Its been a while since d last time we visit HK Disneyland .. Yet, this time we only stay in Tomorrowland .. AKA Marvel land!Now they have Iron Man Experience n d Newly open Ant Man n d Wasp!!Ironman .. Great feature n so Exciting, especially d whole scenery of HK (HK central area to Kowloon)!Antman .. Changed by previous Buzz lightyear! More or less d same, but I’m expecting to have something most!!However, my boys like it!!Space Mountain .. Turns into Star Wars!! This 1 is Great!!!For cast members (staffs) .. Felt bit MINUS point from before!!! Just dun feel like its a Disney brand!! Seems more local like Ocean Park or even worst!!They got no SMILING face, but just wanna u to enter n attraction n leave!!Hello this is supposed to be Happiest Place on Earth brand!! But, just really Dont feel it!!Bakery in Main Street now have more attractive delicacies n Disney theme sweets .. These are Good Points!!Last, they also have Starbucks now inside the theme park!! 4 2019-5 Philippines Disneyland_HongKong
670623270 Thanks God it wasn t too hot or too humid when I was visiting the park otherwise it would be a big issue (there is not a lot of shade).I have arrived around 10:30am and left at 6pm. Unfortunately I didn t last until evening parade, but 8.5 hours was too much for me.There is plenty to do and everyone will find something interesting for themselves to enjoy.It wasn t extremely busy and the longest time I had to queue for certain attractions was 45 minutes (which is really not that bad).Although I had an amazing time, I felt a bit underwhelmed with choice of rides and attractions. The park itself is quite small (I was really expecting something grand even the main castle which was closed by the way was quite small).The food options are good, few coffee shops (including Starbucks) and plenty of gift shops. There was no issue with toilets as they are everywhere.All together it was a great day out and I really enjoyed it. 4 2019-4 United Arab Emirates Disneyland_HongKong
670607911 HK Disneyland is a great compact park. Unfortunately there is quite a bit of maintenance work going on at present so a number of areas are closed off (including the famous castle) If you go midweek, it is not too crowded and certainly no where near as bus as LA Disneyland. We did notice on this visit that prices for food, drinks etc have really gone through the roof so be prepared to pay top dollar for snacks (and avoid the souvenir shops if you can) Regardless, kids will love it. 4 2019-4 Australia Disneyland_HongKong
670607296 the location is not in the city, took around 1 hour from Kowlon, my kids like disneyland so much, everything is fine. but its really crowded and hot in Hong Kong 4 2019-4 United Kingdom Disneyland_HongKong

####3.1. Corpus from created from Assignment2.csv

disneyReviews_corp <- corpus(disneyReviews_df)
summary(disneyReviews_corp, n = 5)%>%
  kbl() %>%
  kable_styling()
Text Types Tokens Sentences Rating Year_Month Reviewer_Location Branch
670772142 54 63 4 4 2019-4 Australia Disneyland_HongKong
670682799 141 230 23 4 2019-5 Philippines Disneyland_HongKong
670623270 119 192 7 4 2019-4 United Arab Emirates Disneyland_HongKong
670607911 80 101 3 4 2019-4 Australia Disneyland_HongKong
670607296 30 35 1 4 2019-4 United Kingdom Disneyland_HongKong

####3.2. Generate tokens

  disneyReviews_toks_orig <- tokens(
  disneyReviews_corp,
  remove_punct = TRUE,
  remove_numbers = TRUE,
  remove_symbols = TRUE,
  remove_url = TRUE,
  split_hyphens = FALSE)
disneyReviews_toks_orig
## Tokens consisting of 1,000 documents and 4 docvars.
## 670772142 :
##  [1] "If"         "you've"     "ever"       "been"       "to"        
##  [6] "Disneyland" "anywhere"   "you'll"     "find"       "Disneyland"
## [11] "Hong"       "Kong"      
## [ ... and 47 more ]
## 
## 670682799 :
##  [1] "Its"        "been"       "a"          "while"      "since"     
##  [6] "d"          "last"       "time"       "we"         "visit"     
## [11] "HK"         "Disneyland"
## [ ... and 161 more ]
## 
## 670623270 :
##  [1] "Thanks" "God"    "it"     "wasn"   "t"      "too"    "hot"    "or"    
##  [9] "too"    "humid"  "when"   "I"     
## [ ... and 158 more ]
## 
## 670607911 :
##  [1] "HK"            "Disneyland"    "is"            "a"            
##  [5] "great"         "compact"       "park"          "Unfortunately"
##  [9] "there"         "is"            "quite"         "a"            
## [ ... and 79 more ]
## 
## 670607296 :
##  [1] "the"      "location" "is"       "not"      "in"       "the"     
##  [7] "city"     "took"     "around"   "hour"     "from"     "Kowlon"  
## [ ... and 18 more ]
## 
## 670591897 :
##  [1] "Have"       "been"       "to"         "Disney"     "World"     
##  [6] "Disneyland" "Anaheim"    "and"        "Tokyo"      "Disneyland"
## [11] "but"        "I"         
## [ ... and 162 more ]
## 
## [ reached max_ndoc ... 994 more documents ]

Remove unnecessary words and regenerate tokens

myStopWords = c("A","are", "Jul", "Mon", "Apr", "but",
                "Wed", "Aug", "i", "for","at", "has",
                "Tue", "but", "doesnt", "from", "The",
                "have", "been", "had", "than","with",
                "use","who","of","to","show","and","has",
                "on","said","were","by","that","is",
                "as","was","an","it","which","its","if",
                "had","are","they","he","be","us")

dis_toks <- tokens_remove(
disneyReviews_toks_orig, pattern = myStopWords)
dis_toks 
## Tokens consisting of 1,000 documents and 4 docvars.
## 670772142 :
##  [1] "you've"     "ever"       "Disneyland" "anywhere"   "you'll"    
##  [6] "find"       "Disneyland" "Hong"       "Kong"       "very"      
## [11] "similar"    "in"        
## [ ... and 29 more ]
## 
## 670682799 :
##  [1] "while"      "since"      "d"          "last"       "time"      
##  [6] "we"         "visit"      "HK"         "Disneyland" "Yet"       
## [11] "this"       "time"      
## [ ... and 130 more ]
## 
## 670623270 :
##  [1] "Thanks"   "God"      "wasn"     "t"        "too"      "hot"     
##  [7] "or"       "too"      "humid"    "when"     "visiting" "park"    
## [ ... and 92 more ]
## 
## 670607911 :
##  [1] "HK"            "Disneyland"    "great"         "compact"      
##  [5] "park"          "Unfortunately" "there"         "quite"        
##  [9] "bit"           "maintenance"   "work"          "going"        
## [ ... and 50 more ]
## 
## 670607296 :
##  [1] "location"   "not"        "in"         "city"       "took"      
##  [6] "around"     "hour"       "Kowlon"     "my"         "kids"      
## [11] "like"       "disneyland"
## [ ... and 10 more ]
## 
## 670591897 :
##  [1] "Disney"     "World"      "Disneyland" "Anaheim"    "Tokyo"     
##  [6] "Disneyland" "feel"       "Disneyland" "Hong"       "Kong"      
## [11] "really"     "too"       
## [ ... and 98 more ]
## 
## [ reached max_ndoc ... 994 more documents ]

####3.3. Create document-feature matrix

dis_dfmat <- dfm(dis_toks, tolower = TRUE) %>%
dfm_trim(min_termfreq = 5, min_docfreq = 10)
head(dis_dfmat, n = 5)%>%
  kbl() %>%
  kable_styling()
## Warning: 'as.data.frame.dfm' is deprecated.
## Use 'convert(x, to = "data.frame")' instead.
## See help("Deprecated")
doc_id ever disneyland anywhere you’ll find hong kong very similar in when you walk into main street feel one rides small world absolutely fabulous worth doing day we visited fairly hot busy queues well while since d last time visit hk yet this only stay tomorrowland marvel land now iron man experience n open ant ironman great so exciting especially whole central area kowloon more or less same i’m expecting something most however my like space mountain star wars cast members felt bit point before just disney seems local ocean park even worst got no u enter attraction leave happiest place earth really dont theme these good also inside wasn t too visiting otherwise would big there not lot shade arrived around left unfortunately didn until evening parade hours much plenty do everyone will interesting extremely longest queue certain attractions minutes bad although amazing choice itself quite castle closed way food options few shops including gift toilets out enjoyed work going number areas off go crowded certainly where near bus la did prices drinks etc through prepared pay top snacks avoid souvenir can kids love city took hour everything anaheim tokyo souvenirs entrance tickets slightly expensive other children people never choices mostly fast water your pretty what rude lines take forget see shows free all don’t how let far managed know obviously went daughter she loved though think magic little ones almost disney’s some restaurants close mid week best plan during biggest disappointment decent restaurant hongkong style service staff down train fantastic get station want working could map over priced fun characters seen under having photos visitors queuing up rain waste money enough eating places amount rest atmosphere fantasy about say childhood come true popcorn mtr trains should two ticket explore else may online save hkd comparison spot must miss jungle river cruise lion king better home cover public holidays me i’ve florida thought kid still spent here our many parks haunted house catch list lots spend course days second early then straight back twice suitable young them disappointing fireworks season indeed family made birthday memorable huge need variety ages helpful awesome brought price after look deal due being weekday min wait recommend weekends drink easily done pass pm character per eat any cost half ride construction doesn’t soon track single average s does smaller expected nothing except child once again easy definitely waiting bought via klook value missed favourite parades saw moana mickey wondrous book works isn magical first fan live california years able why getting super takes themes next compared didn’t run slow please it’s nice thrill tip make sure read each line because those life new four year old probably overall beautiful wonderful buy chinese reasonable adults roller disappointed ve such enjoyable disneylands sleeping beauty renovation pictures taken donald photo paid available excellent gulch coaster minute different others trip ago high clean warm cold non entry times already chicken try ended grizzly opening found short wanted afternoon decided enjoy meal stayed hollywood hotel dinner breakfast cheap unless travelling crazy discount told toy story sun light actually surely put parents don either note five anything hotels makes easier couple foods shopping highlight special bring umbrella needed large started their totally keep mind hyperspace car shanghai every walking gates morning things front purchased version used lunch purchase staying passes asian limited quality ate sad mystic manor three runaway mine cars unique remember choose english cantonese both break christmas change sunny bay travel along option shaped windows full experiences rc families younger kind fantasyland right reviews shop seem hand opened stop crowds favorite wasn’t usa happy summer halloween might check without always maintained priority instead away festival you’re wear shoes long rained nearly weather cute lucky group re skip orlando various given till later am opens weekend ok friendly set adult bigger came knew paris smallest liked night pricey part hard friend entertainment halal needs glad meet friends fact plus facilities games own rather thrilling maybe holiday allowed app reach metro superb access watching highly ask stickers usually give age lovely son tour resort activities play taking his wife items american offer towards return quiet least kept weekdays m drop tried despite longer mins march included several speak expect guests booked end moment closing truly making opportunity lands incredible seemed airport size currently themed disappoint merchandise arrive winnie pooh parachute start download mouse goofy tired lights extra there’s accessible packed shorter excited real waited quick convenient buying paint princess gave air journey pre thoroughly reason picture express renovations gets coming enjoying cancelled seeing voucher within thing stand wish cheaper mansion gate yes another adventure ice monday wouldn’t heart throughout smile usual store missing recommended possible believe raining rode meals earlier phone original stuff couldn’t hit thunder perfect looking cool rainy forward crowd compare late finish help dream january outside watch minnie her charm typhoon planning suggest using that’s can’t between meeting music experienced complete met popular entertaining normal overpriced organised everywhere site cream watched december job side october teenagers tourists heat
670772142 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
670682799 0 1 0 0 0 0 0 0 0 2 0 0 0 1 1 1 2 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 1 5 2 2 1 3 1 3 1 1 1 1 1 3 1 2 1 6 1 1 1 2 1 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 3 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
670623270 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 2 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 1 0 0 1 0 0 0 0 0 0 0 3 0 0 0 1 0 0 2 3 3 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
670607911 0 2 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
670607296 0 1 0 0 0 1 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Top features of DFM

topfeatures(dis_dfmat, 30)
##         we         in        you disneyland      rides       park     disney 
##       1309       1220        918        777        697        688        677 
##        day        not      there        all         so       this       time 
##        662        660        655        531        519        503        442 
##       very         my      great        one       kids       good       food 
##        421        389        348        340        312        299        298 
##        get      visit        our       hong       kong      place        can 
##        298        297        295        290        285        283        270 
##       ride          s 
##        261        244

####4. Keyword-incontexts analysis using quanteda::kwic() 3 The key words used here are (disneyland, kids, rides)

kw_disneyland <- kwic(disneyReviews_toks_orig, pattern = "disneyland*", window = 3)
head(kw_disneyland, 5)%>%
  kbl() %>%
  kable_styling()
docname from to pre keyword post pattern
670772142 6 6 ever been to Disneyland anywhere you’ll find disneyland*
670772142 10 10 anywhere you’ll find Disneyland Hong Kong very disneyland*
670682799 12 12 we visit HK Disneyland Yet this time disneyland*
670607911 2 2 HK Disneyland is a great disneyland*
670607911 51 51 bus as LA Disneyland We did notice disneyland*
kw_kids <- kwic(disneyReviews_toks_orig, pattern = "kids*", window = 3)
head(kw_kids, 5)%>%
  kbl() %>%
  kable_styling()
docname from to pre keyword post pattern
670607911 88 88 you can Regardless kids will love it kids*
670607296 14 14 from Kowlon my kids like disneyland so kids*
670435886 9 9 with our grown kids and I have kids*
670435886 23 23 It seems the kids never tire of kids*
670435886 83 83 of course The kids will love the kids*
kw_rides <- kwic(disneyReviews_toks_orig, pattern = "rides*", window = 3)
head(kw_rides, 5)%>%
  kbl() %>%
  kable_styling()
docname from to pre keyword post pattern
670772142 33 33 One of the rides its a Small rides*
670623270 106 106 with choice of rides and attractions The rides*
670591897 32 32 way too few rides and attractions Souvenirs rides*
670591897 131 131 lines for the rides gift shops food rides*
670571027 35 35 was rainning and rides were not working rides*

5. Perform LDA Topic Modeling

Assumptions in LDA 1. Each document is just a “bag of words”. 2. Each document has a mixture of topics; words are generated by topics. 3. Topics are uncorrelated (this is a strong assumption; why?). 4. We know beforehand how many topics we want (what if we don’t?). 5.1 Number of topics set to 5 and the top 5 keywords in each topic are visualized

dis_dtmat = quanteda::convert(dis_dfmat, to="topicmodels")
dis_lda5 <- LDA(dis_dtmat, k = 5, control = list(seed = 123))
dis_lda5_betas <- broom::tidy(dis_lda5)
top_terms_in_topics <- dis_lda5_betas %>%
  group_by(topic) %>%
  top_n(5, beta) %>%
  ungroup() %>%
  arrange(topic, -beta)
top_terms_in_topics %>%
  mutate(term = reorder(term, beta)) %>%
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

5.2 Find the best number of topics based on perplexity Perplexity is a measure of how successfully a trained topic model predicts new data. In LDA topic modeling of text documents, perplexity is a decreasing function of the likelihood of new documents.

A low perplexity score implies a good topic model

train_dis_dtmat <- corpus_subset(disneyReviews_corp)[1:500,] %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE,
         remove_symbols = TRUE, remove_url = TRUE) %>%
  dfm(tolower = TRUE) %>%
  dfm_remove(myStopWords) %>%
  dfm_trim(min_termfreq = 5, min_docfreq = 10) %>%
  quanteda::convert(to="topicmodels")

test_dis_dtmat <- corpus_subset(disneyReviews_corp)[501:1000,] %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE,
         remove_symbols = TRUE, remove_url = TRUE) %>%
  dfm(tolower = TRUE) %>%
  dfm_remove(myStopWords) %>%
  dfm_trim(min_termfreq = 5, min_docfreq = 10) %>%
  quanteda::convert(to="topicmodels")

train_dis_lda5 <- LDA(train_dis_dtmat, k = 5, control = list(seed = 123))
perplexity(train_dis_lda5, test_dis_dtmat)
## [1] 263.2279

5.3 Using ldatuning to find the best number of topics based on the CaoJuan2009,Arun2010, and Deveaud2014 measures

n_topics_vec = 2:5 # try different number of topics: 2, 3, 4, 5
lda_ldatuning_result <- FindTopicsNumber(
  dis_dtmat, topics = n_topics_vec,
  metrics = c("CaoJuan2009", "Arun2010", "Deveaud2014"),
  method = "VEM", control = list(seed = 123), mc.cores = 4L, verbose = TRUE
)
## fit models... done.
## calculate metrics:
##   CaoJuan2009... done.
##   Arun2010... done.
##   Deveaud2014... done.
FindTopicsNumber_plot(lda_ldatuning_result)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Observation: As per the measures, 5 would be best number of topics

5.4 Use the best number of topics (if the results are inconsistent, pick one that has fewer number of topics) and fit a LDA model

dep_lda5 <- LDA(dis_dtmat, k = 5, control = list(seed = 123))
topicmodels::terms(dep_lda5, 10)
##       Topic 1  Topic 2      Topic 3      Topic 4      Topic 5     
##  [1,] "we"     "rides"      "in"         "in"         "rides"     
##  [2,] "day"    "in"         "you"        "disney"     "you"       
##  [3,] "so"     "you"        "day"        "not"        "disneyland"
##  [4,] "time"   "park"       "not"        "we"         "great"     
##  [5,] "this"   "there"      "all"        "disneyland" "there"     
##  [6,] "just"   "disneyland" "disneyland" "there"      "we"        
##  [7,] "disney" "this"       "place"      "park"       "very"      
##  [8,] "my"     "very"       "we"         "so"         "so"        
##  [9,] "fun"    "great"      "park"       "kids"       "park"      
## [10,] "all"    "not"        "disney"     "my"         "this"

5.5 Show topic-specific diagnostics from the topicdoc package for the best model

topicdoc_result = topic_diagnostics(dep_lda5, dis_dtmat)
topicdoc_result
##   topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
## 1         1   166.8771               3.3        0.2635935   13.32312
## 2         2   163.6613               4.5        0.2242519   14.17444
## 3         3   169.1313               4.1        0.2372721   17.41762
## 4         4   168.2446               4.0        0.2320088   16.50516
## 5         5   162.0857               4.4        0.2402626   15.57001
##   doc_prominence topic_coherence topic_exclusivity
## 1            434       -72.57032          8.726018
## 2            508       -57.87248          7.981235
## 3            513       -59.35122          8.252114
## 4            496       -62.79265          8.095828
## 5            476       -59.42356          8.682567

Q6. Fit a Structure Topic Model (STM)

6.1 Include the document-level variable(s) to fit a STM that has 5 topics

dep_v2_df = disneyReviews_df %>%
  select(doc_id,Rating,Year_Month,Reviewer_Location,Branch,text) #we are only considering selected columns for STM

dep_v2_corp = corpus(
  dep_v2_df,
  docid_field = "doc_id",
  text_field = "text")

vars <- docvars(dep_v2_corp)
head(vars)
##   Rating Year_Month    Reviewer_Location              Branch
## 1      4     2019-4            Australia Disneyland_HongKong
## 2      4     2019-5          Philippines Disneyland_HongKong
## 3      4     2019-4 United Arab Emirates Disneyland_HongKong
## 4      4     2019-4            Australia Disneyland_HongKong
## 5      4     2019-4       United Kingdom Disneyland_HongKong
## 6      3     2019-4            Singapore Disneyland_HongKong
dep_v2_toks <- tokens(dep_v2_corp, remove_punct = T, remove_numbers = T,
                            remove_symbols = T, remove_url = T) %>%
  tokens_remove(pattern = myStopWords) %>%
  tokens_keep(min_nchar = 2)
dep_v2_dfmat <- dfm(dep_v2_toks, tolower = T) %>%
  dfm_trim(min_termfreq = 5, min_docfreq = 10)
stm_dep_v2_dfmat <- quanteda::convert(dep_v2_dfmat, to = "stm")
out <- prepDocuments(
  stm_dep_v2_dfmat$documents, stm_dep_v2_dfmat$vocab, stm_dep_v2_dfmat$meta)

6.2.1 Fit STM model

dep_tmob_stm <- stm(
  out$documents, out$vocab, K=5,
  prevalence = ~s(Rating),
  data=out$meta,
  init.type= "Spectral",
  max.em.its=75,
  seed=123)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      .....
##   Recovering initialization...
##      ........
## Initialization complete.
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -5.956) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -5.909, relative change = 7.839e-03) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -5.891, relative change = 3.093e-03) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -5.883, relative change = 1.377e-03) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -5.878, relative change = 7.634e-04) 
## Topic 1: in, disneyland, we, hong, kong 
##  Topic 2: not, very, this, place, you 
##  Topic 3: you, in, there, time, will 
##  Topic 4: rides, in, park, we, kids 
##  Topic 5: we, disney, day, in, park 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -5.876, relative change = 4.863e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -5.874, relative change = 3.419e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -5.872, relative change = 2.560e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -5.871, relative change = 2.000e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -5.870, relative change = 1.618e-04) 
## Topic 1: in, disneyland, hong, kong, we 
##  Topic 2: not, this, very, place, you 
##  Topic 3: you, in, time, will, there 
##  Topic 4: rides, kids, all, park, in 
##  Topic 5: we, disney, day, in, park 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -5.869, relative change = 1.360e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -5.868, relative change = 1.173e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -5.868, relative change = 1.034e-04) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -5.867, relative change = 9.258e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -5.867, relative change = 8.378e-05) 
## Topic 1: in, disneyland, hong, kong, hk 
##  Topic 2: not, this, very, good, place 
##  Topic 3: you, in, will, can, time 
##  Topic 4: rides, kids, all, great, in 
##  Topic 5: we, disney, day, in, park 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -5.866, relative change = 7.660e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -5.866, relative change = 7.101e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -5.866, relative change = 6.713e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -5.865, relative change = 6.502e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -5.865, relative change = 6.408e-05) 
## Topic 1: in, disneyland, hong, kong, disney 
##  Topic 2: not, this, very, good, park 
##  Topic 3: you, can, will, in, your 
##  Topic 4: rides, kids, great, all, my 
##  Topic 5: we, day, disney, in, so 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -5.864, relative change = 6.414e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -5.864, relative change = 6.398e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -5.864, relative change = 6.282e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -5.863, relative change = 6.054e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -5.863, relative change = 5.712e-05) 
## Topic 1: in, disneyland, hong, kong, disney 
##  Topic 2: not, this, very, good, park 
##  Topic 3: you, can, will, your, in 
##  Topic 4: rides, kids, great, all, my 
##  Topic 5: we, day, in, so, our 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -5.863, relative change = 5.286e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -5.862, relative change = 4.865e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -5.862, relative change = 4.427e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -5.862, relative change = 4.023e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -5.862, relative change = 3.688e-05) 
## Topic 1: in, disneyland, disney, hong, kong 
##  Topic 2: not, very, this, park, good 
##  Topic 3: you, can, will, your, in 
##  Topic 4: rides, my, kids, great, all 
##  Topic 5: we, day, our, so, in 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -5.862, relative change = 3.385e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -5.861, relative change = 3.147e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -5.861, relative change = 2.915e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -5.861, relative change = 2.730e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -5.861, relative change = 2.542e-05) 
## Topic 1: in, disneyland, disney, hong, kong 
##  Topic 2: not, very, this, park, good 
##  Topic 3: you, can, will, your, in 
##  Topic 4: rides, my, great, kids, all 
##  Topic 5: we, day, our, so, in 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 36 (approx. per word bound = -5.861, relative change = 2.393e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 37 (approx. per word bound = -5.861, relative change = 2.156e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 38 (approx. per word bound = -5.860, relative change = 2.166e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 39 (approx. per word bound = -5.860, relative change = 2.053e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 40 (approx. per word bound = -5.860, relative change = 1.971e-05) 
## Topic 1: in, disney, disneyland, hong, kong 
##  Topic 2: not, very, this, park, good 
##  Topic 3: you, can, your, will, get 
##  Topic 4: rides, my, great, all, kids 
##  Topic 5: we, day, our, so, in 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 41 (approx. per word bound = -5.860, relative change = 1.914e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 42 (approx. per word bound = -5.860, relative change = 1.895e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 43 (approx. per word bound = -5.860, relative change = 1.858e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 44 (approx. per word bound = -5.860, relative change = 1.865e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 45 (approx. per word bound = -5.860, relative change = 1.810e-05) 
## Topic 1: in, disney, disneyland, hong, kong 
##  Topic 2: not, very, this, park, in 
##  Topic 3: you, can, your, will, get 
##  Topic 4: rides, my, great, all, kids 
##  Topic 5: we, day, our, so, in 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 46 (approx. per word bound = -5.860, relative change = 1.722e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 47 (approx. per word bound = -5.859, relative change = 1.604e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 48 (approx. per word bound = -5.859, relative change = 1.487e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 49 (approx. per word bound = -5.859, relative change = 1.354e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 50 (approx. per word bound = -5.859, relative change = 1.219e-05) 
## Topic 1: in, disney, hong, disneyland, kong 
##  Topic 2: not, very, park, this, in 
##  Topic 3: you, can, your, will, get 
##  Topic 4: rides, my, great, all, kids 
##  Topic 5: we, day, our, so, in 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 51 (approx. per word bound = -5.859, relative change = 1.113e-05) 
## ....................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Model Converged
toLDAvis(mod=dep_tmob_stm, docs=out$documents)

6.2.2 Plot summary for the model

plot(dep_tmob_stm, type="summary", n=5)

6.3 Used stm::topicQuality() for visualization of the quality of the topics

topicQuality(dep_tmob_stm, out$documents)
## [1] -56.48935 -42.91375 -46.06511 -53.19774 -50.92442
## [1] 9.110252 8.531176 8.741227 8.839551 8.574772

6.4 Interpret what you see from these results

There are two ways of measuring topic “interpretability”:

Semantic coherence measures the consistency of the words used within the topic. Larger values are better and mean the topic is more consistent.

Exclusivity measures how distinctive the top words are to that topic.For this,larger or smaller is not necessarily better or worse, but indicates whether the topic is unique (high value) or broad (low value).

Topic 1 has more semantic coherence which makes it more consistent and Topic 2 has high value of exclusivity which makes the topic more unique.

####7. Fiting a Keyword Assisted Topic Model

7.1 Come up with at least 4 sets of keywords, each associated with a topic

keyATM_docs <- keyATM_read(texts = dis_dfmat)
## Using quanteda dfm.
disney_list = list(
  Features = c("rides", "kids", "park", "food"),
  Disney = c("disneyland", "castle", "experience","mountain")
  
)
dep_key_viz <- visualize_keywords(docs = keyATM_docs, keywords = disney_list)
dep_key_viz

7.2 Fit a keyATM Base model with the keyword sets and allow 2 topics to be outside the scope of the provided keyword sets

disney_tmod_keyatm_base <- keyATM(
  docs = keyATM_docs, 
  no_keyword_topics = 2, 
  keywords = disney_list, 
  model = "base", 
  options = list(seed = 123))
## Initializing the model...
## Fitting the model. 1500 iterations...
## Creating an output object. It may take time...

7.3 Showing top 5 keywords in each topic

top_words(disney_tmod_keyatm_base, 5)
##   1_Features       2_Disney Other_1  Other_2
## 1        you             we     bay       we
## 2   park [✓] disneyland [✓] station      our
## 3        day             in   sunny  tickets
## 4        not             my    line priority
## 5      there           hong   train     pass