Various Analysis on Text and Sentiment & Topic Modelling
The following is a deep-dive into various topics and a Text and Sentiment & Topic Modelling analysis of such topics. These include the following:
Hawaiian Hotel Reviews
McDonalds Reviews
PDC (Professional Darts Corporation) YouTube Comments
Hawaiian Hotel Reviews
The following are the most popular used words in hotel reviews:
Here we can see that the top 5 words are Hotel, Beach, Tower, Resort & Hilton. These tend to be about the hotel itself and not the amenities it has.
The following is how sentiments have changed over time:
Here we can see that there are significantly more positive sentiments than negative sentiments. Both of these sentiments are positively skewed.
The following are the top 30 bigrams:
# A tibble: 30 × 2
bigram n
<chr> <int>
1 rainbow tower 3567
2 hawaiian village 2909
3 hilton hawaiian 2821
4 ocean view 2332
5 diamond head 2180
6 waikiki beach 1710
7 tapa tower 1625
8 ali'i tower 1583
9 front desk 1328
10 resort fee 992
# ℹ 20 more rows
The following are the top 30 trigrams:
# A tibble: 30 × 2
trigram n
<chr> <int>
1 hilton hawaiian village 2614
2 diamond head tower 575
3 partial ocean view 389
4 ala moana shopping 365
5 friday night fireworks 358
6 round table pizza 205
7 moana shopping centre 171
8 ala moana mall 147
9 front desk staff 144
10 10 minute walk 137
# ℹ 20 more rows
Summarising the context in which reviewers are referring to “lagoon”:
The resort is often described as a beautiful place to stay in Waikiki, especially for families. It is nearby to many amenities such as restaurants and shops. Many of the reviews mention the beach and how the hotel has great access to the waterfront. However, it can feel overpriced, with some guests considering trying other resorts for better value in the future.
Summarising the context in which reviewers are referring to “rainbow tower”:
While the Rainbow Tower offers excellent views and access to resort amenities, guests should be prepared for high costs and occasional service issues. Most guests felt the experience was enjoyable and memorable, especially for those who value location and scenic views.
Summarising the context in which reviewers are referring to “ala moana shopping”:
The Ala Moana Shopping Center is a must-visit for anyone staying in the Waikiki area, particularly for its wide array of shops, restaurants, and easy accessibility from the Hilton Hawaiian Village.
The following are Positive & Negative word-clouds:
McDonalds Reviews
Top 10 terms for each topic:
- Topic 1: This section is about the premises itself.
- Topic 2: This section is about the Drive-Thru.
- Topic 3: This section is about coffee and hot drinks.
- Topic 4: This section is about the premises and its staff.
- Topic 5: This section is about the wait times for food.
- Topic 6: This section is about their lunch offering.
- Topic 7: This section is about their breakfast offering.
- Topic 8: This section is about the quality of staff.
- Topic 9: This section is about various topics, generally the aesthetic.
- Topic 10: This section is about people having extra food, or missing food from their orders.
- Topic 11: This section is about the cleanliness of the premises.
- Topic 12: This section is about the ice-cream and shakes on offer.
Topic Size, Mean Token Length, Topic Coherence and Topic Exclusivity:
topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1 1 779.7256 4.2 0.6028429 2.914661
2 2 688.9757 4.6 0.6075362 4.305746
3 3 686.1744 4.4 0.6048537 5.681193
4 4 702.9968 5.9 0.5993128 5.714118
5 5 638.0657 4.6 0.6038621 3.902900
6 6 707.5507 5.5 0.6160123 3.456265
7 7 771.1063 6.1 0.5985176 3.314990
8 8 665.5197 6.4 0.6101866 2.812243
9 9 712.5003 5.7 0.6122982 5.365571
10 10 701.3305 3.0 0.6071290 2.902181
11 11 790.9018 5.1 0.6019169 4.517528
12 12 783.1524 4.7 0.6208016 2.495807
doc_prominence topic_coherence topic_exclusivity
1 6 -156.1456 9.877102
2 14 -147.1414 9.981737
3 6 -154.8087 9.847392
4 7 -150.3175 9.863179
5 25 -116.2328 9.933044
6 32 -150.5144 9.987292
7 12 -152.3027 9.900737
8 10 -144.3929 9.954512
9 8 -162.1767 9.904884
10 12 -141.2059 9.987295
11 14 -169.6840 9.916091
12 8 -189.7840 9.987106
Topic Size: The most frequent terms are in topic 11 (Cleanliness), while the least common terms are in topic 5 (Wait Times).
Mean Token Length: The longest average terms are in topic 8 (Quality of Staff), while the lowest average terms are in topic 10 (Extra or Missing Food).
Topic Coherence: The most coherent topic is topic 5 (Wait Times) - this may be because they are paired with numbers, while the least coherent topic is topic 12 (Ice-cream and Shakes).
Topic Exclusivity: Topics have extremely similar scores in topic exclusivity. This is due to many topics having the same terms such as “McDonalds” and “Location”.
Recommendations to McDonald’s
Upon analysing these topics we would recommend that a greater emphasis is placed on staff as it seems these terms are generally negative such as “worst”, “rude”, and “horrible”. This is a bad outlook on the business and can be solved with more staff training and through a better interview process.
Similarly, we would recommend that cleaning is focused on. This may also have to do with the staff so we may see an improvement if staff are improved.
PDC (Professional Darts Corporation) Comments
The following are the most frequently occurring words in comments:
Here we can see that Gerwyn Price is mentioned significantly more than Gary Anderson. This was due to his behaviour during the game that caused a stir in the darting community.
Frequency of Positive and Negative Sentiments:
Here we can see that the vast majority of sentiments are negative. This aligns with the previous barchart about frequency of player names.
The following are the top 10 bigrams:
# A tibble: 13 × 2
bigram n
<chr> <int>
1 NA NA 43
2 gary anderson 22
3 gerwyn price 22
4 darts player 10
5 rugby player 10
6 sore loser 10
7 gameover ns1gz 9
8 darts match 7
9 eric bristow 7
10 grand slam 7
11 hate price 7
12 rent free 7
13 world championship 7
Wordcloud for Positive & Negative Sentiments:
Warning in wordcloud(pdc_neg_sentiments$word, pdc_neg_sentiments$n, min.freq =
500, : prick could not be fit on page. It will not be plotted.
Warning in wordcloud(pdc_neg_sentiments$word, pdc_neg_sentiments$n, min.freq =
500, : opponent could not be fit on page. It will not be plotted.
- Topic 1: This section is about the players.
- Topic 2: This section is about Gary Anderson.
- Topic 3: This section is negative comments about Gerwyn Price.
- Topic 4: This section is about shots.
- Topic 5: This section is about Gerwyn Price.
- Topic 6: This section is about Gerwyn Price.
- Topic 7: This section is about darts as a sport.
- Topic 8: This section is about the players.
- Topic 9: This section is about Price.
- Topic 10: This section is about the match itself.
- Topic 11: This section is about the world championships.
- Topic 12: This section is about Gerwyn Price.
Topic Size, Mean Token Length, Topic Coherence and Topic Exclusivity:
topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1 1 224.0378 5.9 0.5060513 1.8903695
2 2 240.9044 5.2 0.5146160 1.3673107
3 3 266.3303 5.1 0.5417097 1.0622111
4 4 260.7487 3.8 0.5423954 1.0377930
5 5 230.8146 6.3 0.5064391 1.4403794
6 6 226.4763 4.8 0.5120326 1.7074386
7 7 247.5478 4.9 0.5250443 0.5570867
8 8 256.3129 4.8 0.5258618 1.1894412
9 9 258.9091 5.6 0.5107810 1.5692879
10 10 258.7263 5.2 0.5199350 0.8925300
11 11 238.9732 4.8 0.5047046 1.5034089
12 12 261.2186 4.2 0.4987187 1.5353122
doc_prominence topic_coherence topic_exclusivity
1 1 -149.1469 9.845839
2 0 -180.0638 9.819651
3 0 -185.5597 9.866684
4 2 -185.2446 9.824622
5 0 -188.5819 9.888964
6 0 -180.0251 9.868008
7 0 -205.0634 9.761256
8 0 -200.4252 9.845263
9 0 -193.1287 9.890726
10 0 -187.5346 9.823945
11 1 -175.2336 9.840425
12 0 -199.0026 9.739584
From the above we can see that people are frustrated about Gerwyn Price and how he played.
Topic Size: The most frequent terms are in topic 3 (Gerwyn Price), while the least common terms are in topic 5 (Players).
Mean Token Length: The longest average terms are in topic 5 (Players), while the lowest average terms are in topic 12 (Gerwyn Price).
Topic Coherence: The most coherent topic is topic 1 (Players) - this may be because they are paired with numbers, while the least coherent topic is topic 7 (Darts).
Topic Exclusivity: Topics have extremely similar scores in topic exclusivity. This is due to many topics having the same terms such as “Price” and “Gerwyn”.