The dataset includes 1,000 reviews of 3 Disneyland branches - Paris, California and Hong Kong, posted by various visitors on Trip Advisor.
The Link to the specific dataset used to read the data is given below: https://www.kaggle.com/datasets/arushchillar/disneyland-reviews
We will use the columns ‘Review_ID’, ‘Label’ and the ‘Review_Text’ column to analyse text.
The code Snippet
####Loading the libraries required in this Assignment
library("tidytext")
library("topicmodels")
library("quanteda")
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "packedMatrix" of class "mMatrix"; definition not updated
## Warning in .recacheSubclasses(def@className, def, env): undefined subclass
## "packedMatrix" of class "replValueSp"; definition not updated
## Package version: 3.2.2
## Unicode version: 13.0
## ICU version: 69.1
## Parallel computing: 8 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
library("seededlda")
## Loading required package: proxyC
##
## Attaching package: 'proxyC'
## The following object is masked from 'package:stats':
##
## dist
##
## Attaching package: 'seededlda'
## The following objects are masked from 'package:topicmodels':
##
## terms, topics
## The following object is masked from 'package:stats':
##
## terms
library("topicdoc")
library("ldatuning")
library("LDAvis")
library("broom")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("ggplot2")
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.6 ✓ purrr 0.3.4
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("readtext")
library("textrank")
library("stm")
## stm v1.3.6 successfully loaded. See ?stm for help.
## Papers, resources, and other materials at structuraltopicmodel.com
library("keyATM")
## keyATM 0.4.1 successfully loaded.
## Papers, examples, resources, and other materials are at
## https://keyatm.github.io/keyATM/
library("servr")
library("kableExtra")
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
####1.Describe data and data statistics by importing and forming data frame
disneyReviews_df <- readtext("Assignment2.csv",docid_field="Review_ID",text_field = "Review_Text")
head(disneyReviews_df, n=5) %>%
kbl() %>%
kable_styling()
| doc_id | text | Rating | Year_Month | Reviewer_Location | Branch |
|---|---|---|---|---|---|
| 670772142 | If you’ve ever been to Disneyland anywhere you’ll find Disneyland Hong Kong very similar in the layout when you walk into main street! It has a very familiar feel. One of the rides its a Small World is absolutely fabulous and worth doing. The day we visited was fairly hot and relatively busy but the queues moved fairly well. | 4 | 2019-4 | Australia | Disneyland_HongKong |
| 670682799 | Its been a while since d last time we visit HK Disneyland .. Yet, this time we only stay in Tomorrowland .. AKA Marvel land!Now they have Iron Man Experience n d Newly open Ant Man n d Wasp!!Ironman .. Great feature n so Exciting, especially d whole scenery of HK (HK central area to Kowloon)!Antman .. Changed by previous Buzz lightyear! More or less d same, but I’m expecting to have something most!!However, my boys like it!!Space Mountain .. Turns into Star Wars!! This 1 is Great!!!For cast members (staffs) .. Felt bit MINUS point from before!!! Just dun feel like its a Disney brand!! Seems more local like Ocean Park or even worst!!They got no SMILING face, but just wanna u to enter n attraction n leave!!Hello this is supposed to be Happiest Place on Earth brand!! But, just really Dont feel it!!Bakery in Main Street now have more attractive delicacies n Disney theme sweets .. These are Good Points!!Last, they also have Starbucks now inside the theme park!! | 4 | 2019-5 | Philippines | Disneyland_HongKong |
| 670623270 | Thanks God it wasn t too hot or too humid when I was visiting the park otherwise it would be a big issue (there is not a lot of shade).I have arrived around 10:30am and left at 6pm. Unfortunately I didn t last until evening parade, but 8.5 hours was too much for me.There is plenty to do and everyone will find something interesting for themselves to enjoy.It wasn t extremely busy and the longest time I had to queue for certain attractions was 45 minutes (which is really not that bad).Although I had an amazing time, I felt a bit underwhelmed with choice of rides and attractions. The park itself is quite small (I was really expecting something grand even the main castle which was closed by the way was quite small).The food options are good, few coffee shops (including Starbucks) and plenty of gift shops. There was no issue with toilets as they are everywhere.All together it was a great day out and I really enjoyed it. | 4 | 2019-4 | United Arab Emirates | Disneyland_HongKong |
| 670607911 | HK Disneyland is a great compact park. Unfortunately there is quite a bit of maintenance work going on at present so a number of areas are closed off (including the famous castle) If you go midweek, it is not too crowded and certainly no where near as bus as LA Disneyland. We did notice on this visit that prices for food, drinks etc have really gone through the roof so be prepared to pay top dollar for snacks (and avoid the souvenir shops if you can) Regardless, kids will love it. | 4 | 2019-4 | Australia | Disneyland_HongKong |
| 670607296 | the location is not in the city, took around 1 hour from Kowlon, my kids like disneyland so much, everything is fine. but its really crowded and hot in Hong Kong | 4 | 2019-4 | United Kingdom | Disneyland_HongKong |
####3.1. Corpus from created from Assignment2.csv
disneyReviews_corp <- corpus(disneyReviews_df)
summary(disneyReviews_corp, n = 5)%>%
kbl() %>%
kable_styling()
| Text | Types | Tokens | Sentences | Rating | Year_Month | Reviewer_Location | Branch |
|---|---|---|---|---|---|---|---|
| 670772142 | 54 | 63 | 4 | 4 | 2019-4 | Australia | Disneyland_HongKong |
| 670682799 | 141 | 230 | 23 | 4 | 2019-5 | Philippines | Disneyland_HongKong |
| 670623270 | 119 | 192 | 7 | 4 | 2019-4 | United Arab Emirates | Disneyland_HongKong |
| 670607911 | 80 | 101 | 3 | 4 | 2019-4 | Australia | Disneyland_HongKong |
| 670607296 | 30 | 35 | 1 | 4 | 2019-4 | United Kingdom | Disneyland_HongKong |
####3.2. Generate tokens
disneyReviews_toks_orig <- tokens(
disneyReviews_corp,
remove_punct = TRUE,
remove_numbers = TRUE,
remove_symbols = TRUE,
remove_url = TRUE,
split_hyphens = FALSE)
disneyReviews_toks_orig
## Tokens consisting of 1,000 documents and 4 docvars.
## 670772142 :
## [1] "If" "you've" "ever" "been" "to"
## [6] "Disneyland" "anywhere" "you'll" "find" "Disneyland"
## [11] "Hong" "Kong"
## [ ... and 47 more ]
##
## 670682799 :
## [1] "Its" "been" "a" "while" "since"
## [6] "d" "last" "time" "we" "visit"
## [11] "HK" "Disneyland"
## [ ... and 161 more ]
##
## 670623270 :
## [1] "Thanks" "God" "it" "wasn" "t" "too" "hot" "or"
## [9] "too" "humid" "when" "I"
## [ ... and 158 more ]
##
## 670607911 :
## [1] "HK" "Disneyland" "is" "a"
## [5] "great" "compact" "park" "Unfortunately"
## [9] "there" "is" "quite" "a"
## [ ... and 79 more ]
##
## 670607296 :
## [1] "the" "location" "is" "not" "in" "the"
## [7] "city" "took" "around" "hour" "from" "Kowlon"
## [ ... and 18 more ]
##
## 670591897 :
## [1] "Have" "been" "to" "Disney" "World"
## [6] "Disneyland" "Anaheim" "and" "Tokyo" "Disneyland"
## [11] "but" "I"
## [ ... and 162 more ]
##
## [ reached max_ndoc ... 994 more documents ]
Remove unnecessary words and regenerate tokens
myStopWords = c("A","are", "Jul", "Mon", "Apr", "but",
"Wed", "Aug", "i", "for","at", "has",
"Tue", "but", "doesnt", "from", "The",
"have", "been", "had", "than","with",
"use","who","of","to","show","and","has",
"on","said","were","by","that","is",
"as","was","an","it","which","its","if",
"had","are","they","he","be","us")
dis_toks <- tokens_remove(
disneyReviews_toks_orig, pattern = myStopWords)
dis_toks
## Tokens consisting of 1,000 documents and 4 docvars.
## 670772142 :
## [1] "you've" "ever" "Disneyland" "anywhere" "you'll"
## [6] "find" "Disneyland" "Hong" "Kong" "very"
## [11] "similar" "in"
## [ ... and 29 more ]
##
## 670682799 :
## [1] "while" "since" "d" "last" "time"
## [6] "we" "visit" "HK" "Disneyland" "Yet"
## [11] "this" "time"
## [ ... and 130 more ]
##
## 670623270 :
## [1] "Thanks" "God" "wasn" "t" "too" "hot"
## [7] "or" "too" "humid" "when" "visiting" "park"
## [ ... and 92 more ]
##
## 670607911 :
## [1] "HK" "Disneyland" "great" "compact"
## [5] "park" "Unfortunately" "there" "quite"
## [9] "bit" "maintenance" "work" "going"
## [ ... and 50 more ]
##
## 670607296 :
## [1] "location" "not" "in" "city" "took"
## [6] "around" "hour" "Kowlon" "my" "kids"
## [11] "like" "disneyland"
## [ ... and 10 more ]
##
## 670591897 :
## [1] "Disney" "World" "Disneyland" "Anaheim" "Tokyo"
## [6] "Disneyland" "feel" "Disneyland" "Hong" "Kong"
## [11] "really" "too"
## [ ... and 98 more ]
##
## [ reached max_ndoc ... 994 more documents ]
####3.3. Create document-feature matrix
dis_dfmat <- dfm(dis_toks, tolower = TRUE) %>%
dfm_trim(min_termfreq = 5, min_docfreq = 10)
head(dis_dfmat, n = 5)%>%
kbl() %>%
kable_styling()
## Warning: 'as.data.frame.dfm' is deprecated.
## Use 'convert(x, to = "data.frame")' instead.
## See help("Deprecated")
| doc_id | ever | disneyland | anywhere | you’ll | find | hong | kong | very | similar | in | when | you | walk | into | main | street | feel | one | rides | small | world | absolutely | fabulous | worth | doing | day | we | visited | fairly | hot | busy | queues | well | while | since | d | last | time | visit | hk | yet | this | only | stay | tomorrowland | marvel | land | now | iron | man | experience | n | open | ant | ironman | great | so | exciting | especially | whole | central | area | kowloon | more | or | less | same | i’m | expecting | something | most | however | my | like | space | mountain | star | wars | cast | members | felt | bit | point | before | just | disney | seems | local | ocean | park | even | worst | got | no | u | enter | attraction | leave | happiest | place | earth | really | dont | theme | these | good | also | inside | wasn | t | too | visiting | otherwise | would | big | there | not | lot | shade | arrived | around | left | unfortunately | didn | until | evening | parade | hours | much | plenty | do | everyone | will | interesting | extremely | longest | queue | certain | attractions | minutes | bad | although | amazing | choice | itself | quite | castle | closed | way | food | options | few | shops | including | gift | toilets | out | enjoyed | work | going | number | areas | off | go | crowded | certainly | where | near | bus | la | did | prices | drinks | etc | through | prepared | pay | top | snacks | avoid | souvenir | can | kids | love | city | took | hour | everything | anaheim | tokyo | souvenirs | entrance | tickets | slightly | expensive | other | children | people | never | choices | mostly | fast | water | your | pretty | what | rude | lines | take | forget | see | shows | free | all | don’t | how | let | far | managed | know | obviously | went | daughter | she | loved | though | think | magic | little | ones | almost | disney’s | some | restaurants | close | mid | week | best | plan | during | biggest | disappointment | decent | restaurant | hongkong | style | service | staff | down | train | fantastic | get | station | want | working | could | map | over | priced | fun | characters | seen | under | having | photos | visitors | queuing | up | rain | waste | money | enough | eating | places | amount | rest | atmosphere | fantasy | about | say | childhood | come | true | popcorn | mtr | trains | should | two | ticket | explore | else | may | online | save | hkd | comparison | spot | must | miss | jungle | river | cruise | lion | king | better | home | cover | public | holidays | me | i’ve | florida | thought | kid | still | spent | here | our | many | parks | haunted | house | catch | list | lots | spend | course | days | second | early | then | straight | back | twice | suitable | young | them | disappointing | fireworks | season | indeed | family | made | birthday | memorable | huge | need | variety | ages | helpful | awesome | brought | price | after | look | deal | due | being | weekday | min | wait | recommend | weekends | drink | easily | done | pass | pm | character | per | eat | any | cost | half | ride | construction | doesn’t | soon | track | single | average | s | does | smaller | expected | nothing | except | child | once | again | easy | definitely | waiting | bought | via | klook | value | missed | favourite | parades | saw | moana | mickey | wondrous | book | works | isn | magical | first | fan | live | california | years | able | why | getting | super | takes | themes | next | compared | didn’t | run | slow | please | it’s | nice | thrill | tip | make | sure | read | each | line | because | those | life | new | four | year | old | probably | overall | beautiful | wonderful | buy | chinese | reasonable | adults | roller | disappointed | ve | such | enjoyable | disneylands | sleeping | beauty | renovation | pictures | taken | donald | photo | paid | available | excellent | gulch | coaster | minute | different | others | trip | ago | high | clean | warm | cold | non | entry | times | already | chicken | try | ended | grizzly | opening | found | short | wanted | afternoon | decided | enjoy | meal | stayed | hollywood | hotel | dinner | breakfast | cheap | unless | travelling | crazy | discount | told | toy | story | sun | light | actually | surely | put | parents | don | either | note | five | anything | hotels | makes | easier | couple | foods | shopping | highlight | special | bring | umbrella | needed | large | started | their | totally | keep | mind | hyperspace | car | shanghai | every | walking | gates | morning | things | front | purchased | version | used | lunch | purchase | staying | passes | asian | limited | quality | ate | sad | mystic | manor | three | runaway | mine | cars | unique | remember | choose | english | cantonese | both | break | christmas | change | sunny | bay | travel | along | option | shaped | windows | full | experiences | rc | families | younger | kind | fantasyland | right | reviews | shop | seem | hand | opened | stop | crowds | favorite | wasn’t | usa | happy | summer | halloween | might | check | without | always | maintained | priority | instead | away | festival | you’re | wear | shoes | long | rained | nearly | weather | cute | lucky | group | re | skip | orlando | various | given | till | later | am | opens | weekend | ok | friendly | set | adult | bigger | came | knew | paris | smallest | liked | night | pricey | part | hard | friend | entertainment | halal | needs | glad | meet | friends | fact | plus | facilities | games | own | rather | thrilling | maybe | holiday | allowed | app | reach | metro | superb | access | watching | highly | ask | stickers | usually | give | age | lovely | son | tour | resort | activities | play | taking | his | wife | items | american | offer | towards | return | quiet | least | kept | weekdays | m | drop | tried | despite | longer | mins | march | included | several | speak | expect | guests | booked | end | moment | closing | truly | making | opportunity | lands | incredible | seemed | airport | size | currently | themed | disappoint | merchandise | arrive | winnie | pooh | parachute | start | download | mouse | goofy | tired | lights | extra | there’s | accessible | packed | shorter | excited | real | waited | quick | convenient | buying | paint | princess | gave | air | journey | pre | thoroughly | reason | picture | express | renovations | gets | coming | enjoying | cancelled | seeing | voucher | within | thing | stand | wish | cheaper | mansion | gate | yes | another | adventure | ice | monday | wouldn’t | heart | throughout | smile | usual | store | missing | recommended | possible | believe | raining | rode | meals | earlier | phone | original | stuff | couldn’t | hit | thunder | perfect | looking | cool | rainy | forward | crowd | compare | late | finish | help | dream | january | outside | watch | minnie | her | charm | typhoon | planning | suggest | using | that’s | can’t | between | meeting | music | experienced | complete | met | popular | entertaining | normal | overpriced | organised | everywhere | site | cream | watched | december | job | side | october | teenagers | tourists | heat |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 670772142 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 670682799 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 5 | 2 | 2 | 1 | 3 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 3 | 1 | 2 | 1 | 6 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 670623270 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 3 | 3 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 670607911 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 670607296 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Top features of DFM
topfeatures(dis_dfmat, 30)
## we in you disneyland rides park disney
## 1309 1220 918 777 697 688 677
## day not there all so this time
## 662 660 655 531 519 503 442
## very my great one kids good food
## 421 389 348 340 312 299 298
## get visit our hong kong place can
## 298 297 295 290 285 283 270
## ride s
## 261 244
####4. Keyword-incontexts analysis using quanteda::kwic() 3 The key words used here are (disneyland, kids, rides)
kw_disneyland <- kwic(disneyReviews_toks_orig, pattern = "disneyland*", window = 3)
head(kw_disneyland, 5)%>%
kbl() %>%
kable_styling()
| docname | from | to | pre | keyword | post | pattern |
|---|---|---|---|---|---|---|
| 670772142 | 6 | 6 | ever been to | Disneyland | anywhere you’ll find | disneyland* |
| 670772142 | 10 | 10 | anywhere you’ll find | Disneyland | Hong Kong very | disneyland* |
| 670682799 | 12 | 12 | we visit HK | Disneyland | Yet this time | disneyland* |
| 670607911 | 2 | 2 | HK | Disneyland | is a great | disneyland* |
| 670607911 | 51 | 51 | bus as LA | Disneyland | We did notice | disneyland* |
kw_kids <- kwic(disneyReviews_toks_orig, pattern = "kids*", window = 3)
head(kw_kids, 5)%>%
kbl() %>%
kable_styling()
| docname | from | to | pre | keyword | post | pattern |
|---|---|---|---|---|---|---|
| 670607911 | 88 | 88 | you can Regardless | kids | will love it | kids* |
| 670607296 | 14 | 14 | from Kowlon my | kids | like disneyland so | kids* |
| 670435886 | 9 | 9 | with our grown | kids | and I have | kids* |
| 670435886 | 23 | 23 | It seems the | kids | never tire of | kids* |
| 670435886 | 83 | 83 | of course The | kids | will love the | kids* |
kw_rides <- kwic(disneyReviews_toks_orig, pattern = "rides*", window = 3)
head(kw_rides, 5)%>%
kbl() %>%
kable_styling()
| docname | from | to | pre | keyword | post | pattern |
|---|---|---|---|---|---|---|
| 670772142 | 33 | 33 | One of the | rides | its a Small | rides* |
| 670623270 | 106 | 106 | with choice of | rides | and attractions The | rides* |
| 670591897 | 32 | 32 | way too few | rides | and attractions Souvenirs | rides* |
| 670591897 | 131 | 131 | lines for the | rides | gift shops food | rides* |
| 670571027 | 35 | 35 | was rainning and | rides | were not working | rides* |
Assumptions in LDA 1. Each document is just a “bag of words”. 2. Each document has a mixture of topics; words are generated by topics. 3. Topics are uncorrelated (this is a strong assumption; why?). 4. We know beforehand how many topics we want (what if we don’t?). 5.1 Number of topics set to 5 and the top 5 keywords in each topic are visualized
dis_dtmat = quanteda::convert(dis_dfmat, to="topicmodels")
dis_lda5 <- LDA(dis_dtmat, k = 5, control = list(seed = 123))
dis_lda5_betas <- broom::tidy(dis_lda5)
top_terms_in_topics <- dis_lda5_betas %>%
group_by(topic) %>%
top_n(5, beta) %>%
ungroup() %>%
arrange(topic, -beta)
top_terms_in_topics %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
5.2 Find the best number of topics based on perplexity Perplexity is a measure of how successfully a trained topic model predicts new data. In LDA topic modeling of text documents, perplexity is a decreasing function of the likelihood of new documents.
A low perplexity score implies a good topic model
train_dis_dtmat <- corpus_subset(disneyReviews_corp)[1:500,] %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE,
remove_symbols = TRUE, remove_url = TRUE) %>%
dfm(tolower = TRUE) %>%
dfm_remove(myStopWords) %>%
dfm_trim(min_termfreq = 5, min_docfreq = 10) %>%
quanteda::convert(to="topicmodels")
test_dis_dtmat <- corpus_subset(disneyReviews_corp)[501:1000,] %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE,
remove_symbols = TRUE, remove_url = TRUE) %>%
dfm(tolower = TRUE) %>%
dfm_remove(myStopWords) %>%
dfm_trim(min_termfreq = 5, min_docfreq = 10) %>%
quanteda::convert(to="topicmodels")
train_dis_lda5 <- LDA(train_dis_dtmat, k = 5, control = list(seed = 123))
perplexity(train_dis_lda5, test_dis_dtmat)
## [1] 263.2279
5.3 Using ldatuning to find the best number of topics based on the CaoJuan2009,Arun2010, and Deveaud2014 measures
n_topics_vec = 2:5 # try different number of topics: 2, 3, 4, 5
lda_ldatuning_result <- FindTopicsNumber(
dis_dtmat, topics = n_topics_vec,
metrics = c("CaoJuan2009", "Arun2010", "Deveaud2014"),
method = "VEM", control = list(seed = 123), mc.cores = 4L, verbose = TRUE
)
## fit models... done.
## calculate metrics:
## CaoJuan2009... done.
## Arun2010... done.
## Deveaud2014... done.
FindTopicsNumber_plot(lda_ldatuning_result)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Observation: As per the measures, 5 would be best number of topics
5.4 Use the best number of topics (if the results are inconsistent, pick one that has fewer number of topics) and fit a LDA model
dep_lda5 <- LDA(dis_dtmat, k = 5, control = list(seed = 123))
topicmodels::terms(dep_lda5, 10)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "we" "rides" "in" "in" "rides"
## [2,] "day" "in" "you" "disney" "you"
## [3,] "so" "you" "day" "not" "disneyland"
## [4,] "time" "park" "not" "we" "great"
## [5,] "this" "there" "all" "disneyland" "there"
## [6,] "just" "disneyland" "disneyland" "there" "we"
## [7,] "disney" "this" "place" "park" "very"
## [8,] "my" "very" "we" "so" "so"
## [9,] "fun" "great" "park" "kids" "park"
## [10,] "all" "not" "disney" "my" "this"
5.5 Show topic-specific diagnostics from the topicdoc package for the best model
topicdoc_result = topic_diagnostics(dep_lda5, dis_dtmat)
topicdoc_result
## topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
## 1 1 166.8771 3.3 0.2635935 13.32312
## 2 2 163.6613 4.5 0.2242519 14.17444
## 3 3 169.1313 4.1 0.2372721 17.41762
## 4 4 168.2446 4.0 0.2320088 16.50516
## 5 5 162.0857 4.4 0.2402626 15.57001
## doc_prominence topic_coherence topic_exclusivity
## 1 434 -72.57032 8.726018
## 2 508 -57.87248 7.981235
## 3 513 -59.35122 8.252114
## 4 496 -62.79265 8.095828
## 5 476 -59.42356 8.682567
6.1 Include the document-level variable(s) to fit a STM that has 5 topics
dep_v2_df = disneyReviews_df %>%
select(doc_id,Rating,Year_Month,Reviewer_Location,Branch,text) #we are only considering selected columns for STM
dep_v2_corp = corpus(
dep_v2_df,
docid_field = "doc_id",
text_field = "text")
vars <- docvars(dep_v2_corp)
head(vars)
## Rating Year_Month Reviewer_Location Branch
## 1 4 2019-4 Australia Disneyland_HongKong
## 2 4 2019-5 Philippines Disneyland_HongKong
## 3 4 2019-4 United Arab Emirates Disneyland_HongKong
## 4 4 2019-4 Australia Disneyland_HongKong
## 5 4 2019-4 United Kingdom Disneyland_HongKong
## 6 3 2019-4 Singapore Disneyland_HongKong
dep_v2_toks <- tokens(dep_v2_corp, remove_punct = T, remove_numbers = T,
remove_symbols = T, remove_url = T) %>%
tokens_remove(pattern = myStopWords) %>%
tokens_keep(min_nchar = 2)
dep_v2_dfmat <- dfm(dep_v2_toks, tolower = T) %>%
dfm_trim(min_termfreq = 5, min_docfreq = 10)
stm_dep_v2_dfmat <- quanteda::convert(dep_v2_dfmat, to = "stm")
out <- prepDocuments(
stm_dep_v2_dfmat$documents, stm_dep_v2_dfmat$vocab, stm_dep_v2_dfmat$meta)
6.2.1 Fit STM model
dep_tmob_stm <- stm(
out$documents, out$vocab, K=5,
prevalence = ~s(Rating),
data=out$meta,
init.type= "Spectral",
max.em.its=75,
seed=123)
## Beginning Spectral Initialization
## Calculating the gram matrix...
## Finding anchor words...
## .....
## Recovering initialization...
## ........
## Initialization complete.
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 1 (approx. per word bound = -5.956)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 2 (approx. per word bound = -5.909, relative change = 7.839e-03)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 3 (approx. per word bound = -5.891, relative change = 3.093e-03)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 4 (approx. per word bound = -5.883, relative change = 1.377e-03)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 5 (approx. per word bound = -5.878, relative change = 7.634e-04)
## Topic 1: in, disneyland, we, hong, kong
## Topic 2: not, very, this, place, you
## Topic 3: you, in, there, time, will
## Topic 4: rides, in, park, we, kids
## Topic 5: we, disney, day, in, park
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 6 (approx. per word bound = -5.876, relative change = 4.863e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 7 (approx. per word bound = -5.874, relative change = 3.419e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 8 (approx. per word bound = -5.872, relative change = 2.560e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 9 (approx. per word bound = -5.871, relative change = 2.000e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 10 (approx. per word bound = -5.870, relative change = 1.618e-04)
## Topic 1: in, disneyland, hong, kong, we
## Topic 2: not, this, very, place, you
## Topic 3: you, in, time, will, there
## Topic 4: rides, kids, all, park, in
## Topic 5: we, disney, day, in, park
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 11 (approx. per word bound = -5.869, relative change = 1.360e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 12 (approx. per word bound = -5.868, relative change = 1.173e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 13 (approx. per word bound = -5.868, relative change = 1.034e-04)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 14 (approx. per word bound = -5.867, relative change = 9.258e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 15 (approx. per word bound = -5.867, relative change = 8.378e-05)
## Topic 1: in, disneyland, hong, kong, hk
## Topic 2: not, this, very, good, place
## Topic 3: you, in, will, can, time
## Topic 4: rides, kids, all, great, in
## Topic 5: we, disney, day, in, park
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 16 (approx. per word bound = -5.866, relative change = 7.660e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 17 (approx. per word bound = -5.866, relative change = 7.101e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 18 (approx. per word bound = -5.866, relative change = 6.713e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 19 (approx. per word bound = -5.865, relative change = 6.502e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 20 (approx. per word bound = -5.865, relative change = 6.408e-05)
## Topic 1: in, disneyland, hong, kong, disney
## Topic 2: not, this, very, good, park
## Topic 3: you, can, will, in, your
## Topic 4: rides, kids, great, all, my
## Topic 5: we, day, disney, in, so
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 21 (approx. per word bound = -5.864, relative change = 6.414e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 22 (approx. per word bound = -5.864, relative change = 6.398e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 23 (approx. per word bound = -5.864, relative change = 6.282e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 24 (approx. per word bound = -5.863, relative change = 6.054e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 25 (approx. per word bound = -5.863, relative change = 5.712e-05)
## Topic 1: in, disneyland, hong, kong, disney
## Topic 2: not, this, very, good, park
## Topic 3: you, can, will, your, in
## Topic 4: rides, kids, great, all, my
## Topic 5: we, day, in, so, our
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 26 (approx. per word bound = -5.863, relative change = 5.286e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 27 (approx. per word bound = -5.862, relative change = 4.865e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 28 (approx. per word bound = -5.862, relative change = 4.427e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 29 (approx. per word bound = -5.862, relative change = 4.023e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 30 (approx. per word bound = -5.862, relative change = 3.688e-05)
## Topic 1: in, disneyland, disney, hong, kong
## Topic 2: not, very, this, park, good
## Topic 3: you, can, will, your, in
## Topic 4: rides, my, kids, great, all
## Topic 5: we, day, our, so, in
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 31 (approx. per word bound = -5.862, relative change = 3.385e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 32 (approx. per word bound = -5.861, relative change = 3.147e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 33 (approx. per word bound = -5.861, relative change = 2.915e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 34 (approx. per word bound = -5.861, relative change = 2.730e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 35 (approx. per word bound = -5.861, relative change = 2.542e-05)
## Topic 1: in, disneyland, disney, hong, kong
## Topic 2: not, very, this, park, good
## Topic 3: you, can, will, your, in
## Topic 4: rides, my, great, kids, all
## Topic 5: we, day, our, so, in
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 36 (approx. per word bound = -5.861, relative change = 2.393e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 37 (approx. per word bound = -5.861, relative change = 2.156e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 38 (approx. per word bound = -5.860, relative change = 2.166e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 39 (approx. per word bound = -5.860, relative change = 2.053e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 40 (approx. per word bound = -5.860, relative change = 1.971e-05)
## Topic 1: in, disney, disneyland, hong, kong
## Topic 2: not, very, this, park, good
## Topic 3: you, can, your, will, get
## Topic 4: rides, my, great, all, kids
## Topic 5: we, day, our, so, in
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 41 (approx. per word bound = -5.860, relative change = 1.914e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 42 (approx. per word bound = -5.860, relative change = 1.895e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 43 (approx. per word bound = -5.860, relative change = 1.858e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 44 (approx. per word bound = -5.860, relative change = 1.865e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 45 (approx. per word bound = -5.860, relative change = 1.810e-05)
## Topic 1: in, disney, disneyland, hong, kong
## Topic 2: not, very, this, park, in
## Topic 3: you, can, your, will, get
## Topic 4: rides, my, great, all, kids
## Topic 5: we, day, our, so, in
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 46 (approx. per word bound = -5.860, relative change = 1.722e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 47 (approx. per word bound = -5.859, relative change = 1.604e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 48 (approx. per word bound = -5.859, relative change = 1.487e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 49 (approx. per word bound = -5.859, relative change = 1.354e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 50 (approx. per word bound = -5.859, relative change = 1.219e-05)
## Topic 1: in, disney, hong, disneyland, kong
## Topic 2: not, very, park, this, in
## Topic 3: you, can, your, will, get
## Topic 4: rides, my, great, all, kids
## Topic 5: we, day, our, so, in
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 51 (approx. per word bound = -5.859, relative change = 1.113e-05)
## ....................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Model Converged
toLDAvis(mod=dep_tmob_stm, docs=out$documents)
6.2.2 Plot summary for the model
plot(dep_tmob_stm, type="summary", n=5)
6.3 Used stm::topicQuality() for visualization of the quality of
the topics
topicQuality(dep_tmob_stm, out$documents)
## [1] -56.48935 -42.91375 -46.06511 -53.19774 -50.92442
## [1] 9.110252 8.531176 8.741227 8.839551 8.574772
6.4 Interpret what you see from these results
There are two ways of measuring topic “interpretability”:
Semantic coherence measures the consistency of the words used within the topic. Larger values are better and mean the topic is more consistent.
Exclusivity measures how distinctive the top words are to that topic.For this,larger or smaller is not necessarily better or worse, but indicates whether the topic is unique (high value) or broad (low value).
Topic 1 has more semantic coherence which makes it more consistent and Topic 2 has high value of exclusivity which makes the topic more unique.
####7. Fiting a Keyword Assisted Topic Model
7.1 Come up with at least 4 sets of keywords, each associated with a topic
keyATM_docs <- keyATM_read(texts = dis_dfmat)
## Using quanteda dfm.
disney_list = list(
Features = c("rides", "kids", "park", "food"),
Disney = c("disneyland", "castle", "experience","mountain")
)
dep_key_viz <- visualize_keywords(docs = keyATM_docs, keywords = disney_list)
dep_key_viz
7.2 Fit a keyATM Base model with the keyword sets and allow 2
topics to be outside the scope of the provided keyword sets
disney_tmod_keyatm_base <- keyATM(
docs = keyATM_docs,
no_keyword_topics = 2,
keywords = disney_list,
model = "base",
options = list(seed = 123))
## Initializing the model...
## Fitting the model. 1500 iterations...
## Creating an output object. It may take time...
7.3 Showing top 5 keywords in each topic
top_words(disney_tmod_keyatm_base, 5)
## 1_Features 2_Disney Other_1 Other_2
## 1 you we bay we
## 2 park [✓] disneyland [✓] station our
## 3 day in sunny tickets
## 4 not my line priority
## 5 there hong train pass