Data Assignment
Assessment 3 - Text & Sentiment Analysis
Q1
a)
b)
i.
ii.
The positive sentiments counts are much higher than the negative and peak over 1,500.
The positive sentiments also start high and then show a downward trend.
The negative sentiments are consistently distributed across the blocks with some peaks but never reaching over 1000.
From this, we can tell that the reviews for the Hawaii hotel are majority positive. #
c)
d)
# A tibble: 130,344 × 2
bigram n
<chr> <int>
1 rainbow tower 3567
2 hawaiian village 2909
3 hilton hawaiian 2823
4 ocean view 2332
5 diamond head 2182
6 waikiki beach 1710
7 tapa tower 1625
8 ali'i tower 1584
9 front desk 1330
10 resort fee 992
11 walking distance 973
12 friday night 934
13 abc store 914
14 ala moana 894
15 kalia tower 894
16 hilton honors 714
17 ocean front 648
18 head tower 581
19 highly recommend 580
20 abc stores 539
21 super pool 517
22 minute walk 485
23 alii tower 476
24 tropics bar 459
25 customer service 450
26 partial ocean 437
27 private pool 424
28 north shore 422
29 breakfast buffet 412
30 moana shopping 390
31 ice cream 384
32 pearl harbor 381
33 night fireworks 375
34 time share 375
35 ali tower 373
36 double beds 347
37 short walk 337
38 live music 308
39 lounge chairs 283
40 beach chairs 279
# ℹ 130,304 more rows
e)
# A tibble: 73,652 × 2
trigram n
<chr> <int>
1 hilton hawaiian village 2616
2 diamond head tower 576
3 partial ocean view 389
4 ala moana shopping 365
5 friday night fireworks 358
6 round table pizza 205
7 moana shopping centre 171
8 ala moana mall 147
9 front desk staff 145
10 10 minute walk 137
11 hawaiian village waikiki 136
12 15 minute walk 135
13 moana shopping center 126
14 wailana coffee house 121
15 day resort fee 115
16 village waikiki beach 109
17 waikiki beach resort 108
18 hawaiian village resort 95
19 rainbow tower ocean 90
20 2 double beds 89
21 king size bed 88
22 facing diamond head 85
23 tropics bar grill 81
24 hilton hawaiin village 79
25 diamond head view 75
26 daily resort fee 70
27 hawaii 5 0 70
28 30 resort fee 69
29 easy walking distance 69
30 ice cream shop 69
31 ocean front view 68
32 hilton grand vacations 65
33 15 min walk 59
34 5 minute walk 56
35 tower ocean front 56
36 20 minute walk 55
37 moana shopping mall 55
38 ala moana center 54
39 tower ocean view 54
40 2 abc stores 52
# ℹ 73,612 more rows
The lagoon is referred to as the beach near the hotel.
f)
i.
# A tibble: 2,952 × 3
review_date id review
<chr> <dbl> <chr>
1 06/02/2003 9 "Loved the hotel and the staff. Had a upper floor room in …
2 23/02/2003 11 "We stayed at the Rainbow Tower and the view was amazing! …
3 24/07/2003 26 "We just returned from a 7 day, 6 night stay at the Hilton…
4 12/08/2003 31 "Our Hawaii Family vacation (July 28th, 2003) to Oahu incl…
5 16/08/2003 32 "Our dream vacation at the Hilton Hawaiin on June 22 was n…
6 10/09/2003 38 "My husband and I just returned from the wonderful island …
7 11/10/2003 44 "We made reservations 3 months in advanced for an ocean vi…
8 30/11/2003 57 "My husband and I enjoyed our first three days of our hone…
9 14/12/2003 59 "Since we frequently travel with our young children (2 and…
10 24/12/2003 60 "In Dec'02, I stayed at the HHV for 2 weeks. I stayed at t…
# ℹ 2,942 more rows
The tower is a type of location where some of the accommodation is.
It seems to be a more premium type of accommodation with great views.
ii.
# A tibble: 362 × 3
review_date id review
<chr> <dbl> <chr>
1 10/09/2003 38 "My husband and I just returned from the wonderful island …
2 04/03/2004 82 "I won our holiday in a competition with a local radio sta…
3 11/07/2004 124 "Stayed at the Hilton Hawaiian Village from 7/3/04-7/9/04 …
4 08/05/2005 258 "My Husband and I just came back from HHV after staying fo…
5 14/07/2005 287 "My wife, two boys (12 & 16) and I stayed at in the Ali'i …
6 01/08/2005 300 "My family and I stayed at HHV for our first trip to Hawai…
7 06/10/2005 352 "pros: hotel right on beach! this is not so common on Waik…
8 07/11/2005 367 "We stayed in the Ali'i tower - definately a good move. Fr…
9 26/12/2005 391 "My husband, myself and our 10 year old son just returned …
10 17/01/2006 404 "We just returned...good trip. We have a 14, 11 yr old plu…
# ℹ 352 more rows
The Ala Moana shopping centre is located 10 minutes walk from the hotel and has many shops and a food court.
iii.
# A tibble: 2,825 × 3
word sentiment n
<chr> <chr> <int>
1 nice positive 7274
2 clean positive 3574
3 beautiful positive 3561
4 expensive negative 2809
5 friendly positive 2753
6 free positive 2564
7 crowded negative 2454
8 recommend positive 2355
9 loved positive 2052
10 amazing positive 1940
# ℹ 2,815 more rows
g)
<- filter(hr_word_sentiments, sentiment == "positive")
hr_pos_sentiments
wordcloud(hr_pos_sentiments$word,
$n,
hr_pos_sentimentsmin.freq = 1000,
colors = brewer.pal(8, "Set2"))
Q2
a)
b)
# A tibble: 49,825 × 2
id word
<dbl> <chr>
1 1 huge
2 1 mcds
3 1 lover
4 1 worst
5 1 filthy
6 1 inside
7 1 drive
8 1 completely
9 1 screw
10 1 time
# ℹ 49,815 more rows
# A tibble: 43,352 × 3
id word n
<dbl> <chr> <int>
1 245 mcdonald's 14
2 856 north 12
3 1223 mcdonald's 12
4 742 coffee 11
5 684 window 10
6 1174 price 10
7 245 mcwrap 9
8 246 mcdonald's 9
9 400 breakfast 9
10 742 burned 9
# ℹ 43,342 more rows
<<DocumentTermMatrix (documents: 1525, terms: 8612)>>
Non-/sparse entries: 43352/13089948
Sparsity : 100%
Maximal term length: 22
Weighting : term frequency (tf)
A LDA_Gibbs topic model with 12 topics.
# A tibble: 103,344 × 3
topic term beta
<int> <chr> <dbl>
1 1 mcdonald's 0.00553
2 2 mcdonald's 0.000211
3 3 mcdonald's 0.0000200
4 4 mcdonald's 0.151
5 5 mcdonald's 0.0000203
6 6 mcdonald's 0.0000200
7 7 mcdonald's 0.0000207
8 8 mcdonald's 0.0000205
9 9 mcdonald's 0.0000205
10 10 mcdonald's 0.0000185
# ℹ 103,334 more rows
# A tibble: 120 × 3
topic term beta
<int> <chr> <dbl>
1 1 food 0.169
2 1 fast 0.0474
3 1 friendly 0.0182
4 1 quick 0.0149
5 1 lunch 0.0121
6 1 love 0.0102
7 1 mcd's 0.00941
8 1 stop 0.00880
9 1 restaurant 0.00819
10 1 mickey 0.00798
# ℹ 110 more rows
c)
i) ii)
#Visualise the top terms for each topic using a barchart
%>%
mr_lda_top_terms mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 10 terms in each LDA topic", x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 5, scales = "free")
#Assess the quality of the topics using more numerical methods
<- topic_diagnostics(mr_lda, mr_dtm) topic_quality
Topic 1: Food quality and service at McDonald’s. Words like “food,” “fast,” “friendly,” “quick,” and “lunch” show us customers fast positive experience.
Topic 2: Waiting time. Words like “minutes,” “line,” “time,” “wait,” and “customers” show us concerns with timing and delays.
Topic 3: Drive-thru service. Words like “drive,” “window,” “inside,” and “car” are related to drive through experiences.
Topic 4: General reviews at McDonald’s. Words like “location,” “nice,” “busy,” and “stars” are customers commenting on their experience on overall satisfaction.
Topic 5: Cleanliness and customer behavior. Words like “kids,” “clean,” “dirty,” and “staff” show mentions of hygiene, staff interactions, and customer behavior.
Topic 6: Negative customer experiences with staff or service. Words like “service,” “manager,” “worst,” and “slow” show us bad experiences in McDonalds.
Topic 7: Value for money and desert items. Words like “dollar,” “menu,” and “sweet” suggest reviews about affordable food options or dessert items.
Topic 8: Late-night. Words like “night,” “late,” and “hours” shows experiences during less busy or nighttime hours.
Topic 9: People, atmosphere, and parking. Words like “people,” “parking,” and “street” show us experiences with people and parking and maybe location.
Topic 10: Food Items. Words like “fries,” “chicken,” “burger,” and “nuggets” show topics centered around popular menu items.
Topic 11: Missing or incorrect orders. Words like “location,” “wrong,” “home,” and “missing” shows complaints about orders or delivery issues.
Topic 12: Breakfast menu . Words like “coffee,” “breakfast,” “morning,” and “sausage” suggest reviews of breakfast menu items.
Q3
# A tibble: 1,149 × 2
word lexicon
<chr> <chr>
1 a SMART
2 a's SMART
3 able SMART
4 about SMART
5 above SMART
6 according SMART
7 accordingly SMART
8 across SMART
9 actually SMART
10 after SMART
# ℹ 1,139 more rows
# A tibble: 20 × 2
word n
<chr> <int>
1 tays 474
2 season 230
3 won 176
4 jack 152
5 win 147
6 heather 117
7 pie 93
8 scouse 92
9 adeola 71
10 spoiler 53
11 love 48
12 deserved 43
13 blocker 42
14 it’s 38
15 locked 38
16 final 37
17 footasylum 37
18 happy 36
19 watch 34
20 congrats 33
# A tibble: 2 × 2
sentiment n
<chr> <int>
1 positive 851
2 negative 372
# A tibble: 2,431 × 4
parent_comment author id bigram
<chr> <chr> <dbl> <chr>
1 <NA> @ensee. 1 jack robbed
2 <NA> @bobbifranklin432 2 ebery penny
3 <NA> @bobbifranklin432 2 genuinely lovely
4 <NA> @bobbifranklin432 2 lovely guy
5 <NA> @bobbifranklin432 2 glad tays
6 <NA> @bobbifranklin432 2 tays won
7 <NA> @bobbifranklin432 2 seasons love
8 <NA> @khabebomallie9467 3 na scouse
9 <NA> @berry7191 4 journey loved
10 <NA> @therealrantroom 6 dry lips
# ℹ 2,421 more rows
# A tibble: 1,693 × 2
bigram n
<chr> <int>
1 NA NA 214
2 tays won 83
3 spoiler blocker 41
4 tays wins 18
5 charlie domingo 16
6 scouse mali 15
7 season 6 13
8 congrats tays 12
9 congratulations tays 10
10 happy tays 10
11 tays deserved 10
12 loose rizz 9
13 rizz podcast 8
14 tays win 8
15 tays winning 8
16 can’t wait 7
17 funniest moment 7
18 amazing season 6
19 fair play 6
20 top 5 6
21 day 1 5
22 didn’t win 5
23 favourite season 5
24 gonna miss 5
25 hope tays 5
26 sad it’s 5
27 season 5 5
28 tays heather 5
29 14 days 4
30 2 weeks 4
31 didnt win 4
32 don’t read 4
33 footasylum cooked 4
34 it’s upside 4
35 i’ll miss 4
36 jack wins 4
37 love tays 4
38 season 1 4
39 should've won 4
40 tays carried 4
# ℹ 1,653 more rows
# A tibble: 1,094 × 4
parent_comment author id trigram
<chr> <chr> <dbl> <chr>
1 <NA> @ensee. 1 NA NA NA
2 <NA> @bobbifranklin432 2 genuinely lovely guy
3 <NA> @bobbifranklin432 2 glad tays won
4 <NA> @elliekoleosho 9 NA NA NA
5 <NA> @Bwoii786 15 max’s season hands
6 <NA> @harissahab2743 16 loose rizz podcast
7 <NA> @harissahab2743 16 rizz podcast charlie
8 <NA> @harissahab2743 16 podcast charlie demingo
9 <NA> @babatunde1966 19 didd ava cry
10 <NA> @Lenny-y8z 22 NA NA NA
# ℹ 1,084 more rows
# A tibble: 603 × 2
trigram n
<chr> <int>
1 NA NA NA 461
2 loose rizz podcast 8
3 happy tays won 6
4 hope tays wins 4
5 glad tays won 3
6 hugging tays heather 3
7 hope jack wins 2
8 im guessing adeola 2
9 indieplayssthen don’t read 2
10 jack 2 tays 2
11 jack didn’t win 2
12 pie didnt win 2
13 pie should've won 2
14 scouse should’ve won 2
15 season 6 coming 2
16 season footasylum cooked 2
17 tays won btw 2
18 whys adeola 5th 2
19 00 tays realising 1
20 05 love tays 1
21 09 love house 1
22 1 21 tays 1
23 1 adeola scouse 1
24 1 jack 2 1
25 1 minute late 1
26 100 tays reminds 1
27 11.11 heathers alt 1
28 15 20 housemates 1
29 16 27 jack 1
30 16 36 adeola's 1
31 16 36 jack 1
32 17 50 19 1
33 19 05 love 1
34 19 32 lmaoo 1
35 1st kaci jay 1
36 2 housemates low 1
37 2 minutes ago 1
38 2 scouse adeola 1
39 2 tays 1 1
40 2 tays 3 1
# ℹ 563 more rows
# A tibble: 485 × 4
comments parent_comment author id
<chr> <chr> <chr> <dbl>
1 "I didn't cry I did so well then he said ebery p… <NA> @bobb… 2
2 "was either Jack or Tays for the win for me and … <NA> @abdu… 8
3 "honestly one of the best seasons, i'm pumped fo… <NA> @rosi… 17
4 "Still not a fan of public voting, tays was alwa… <NA> @mang… 23
5 "I think the dark horse was jack, scouse and tom… <NA> @moha… 32
6 "What is that song tays abd jack sung when jack.… <NA> @ishi… 37
7 "I’m acc so happy tays one ( one thing that did … <NA> @Amel… 43
8 "Go on tays glad you won ma bro \U0001f389" <NA> @Dann… 45
9 "Tays realizing the check is upside down\U0001f6… <NA> @Kuuj… 46
10 "Someone tell me how pie didnt win and tays did … <NA> @Icom… 54
# ℹ 475 more rows
# A tibble: 159 × 4
comments parent_comment author id
<chr> <chr> <chr> <dbl>
1 "jack robbed" <NA> @ense… 1
2 "was either Jack or Tays for the win for me and … <NA> @abdu… 8
3 "JACK LEAVING TO JME IS GOLDDDDDD" <NA> @hiin… 20
4 "Jack leaving with a hijab on is sending me lmao… <NA> @KB-u… 27
5 "I think the dark horse was jack, scouse and tom… <NA> @moha… 32
6 "What is that song tays abd jack sung when jack.… <NA> @ishi… 37
7 "thats not ferrari thats a ampon - jack joceph 2… <NA> @AyaA… 40
8 "I wanted jack to winn when he left I paused the… <NA> @Eves… 47
9 "Jack's walk out was the best moment in this who… <NA> @mike… 55
10 "i think Jacks funniest moment was getting stran… <NA> @sime… 56
# ℹ 149 more rows
# A tibble: 124 × 4
comments parent_comment author id
<chr> <chr> <chr> <dbl>
1 "I’m acc so happy tays one ( one thing that did … <NA> @Amel… 43
2 "So happy tays won! Was heather happy for him, I… <NA> @saad… 89
3 "Can’t wait for the reunion awkwardness with Hea… <NA> @user… 98
4 "Heather looks to Tays is like someone who knows… <NA> @jose… 107
5 "Heather looking at tays like if I didn't say an… <NA> @Keen… 111
6 "tays and heather tension was SO AWKWARD" <NA> @sabr… 115
7 "Jacks reaction to Tays and Heather was everythi… <NA> @brum… 133
8 "tays ignored heather in the final but then toda… <NA> @edua… 140
9 "48:14 That side eye from Heather... well, its a… <NA> @Boxe… 143
10 "Heather.....hold that!!" <NA> @moha… 152
# ℹ 114 more rows
# A tibble: 256 × 3
word sentiment n
<chr> <chr> <int>
1 won positive 176
2 win positive 147
3 love positive 48
4 happy positive 36
5 wins positive 32
6 loved positive 26
7 winner positive 26
8 winning positive 26
9 congratulations positive 23
10 top positive 20
# ℹ 246 more rows
# A tibble: 6,881 × 4
parent_comment author id word
<chr> <chr> <dbl> <chr>
1 <NA> @ensee. 1 jack
2 <NA> @ensee. 1 robbed
3 <NA> @bobbifranklin432 2 cry
4 <NA> @bobbifranklin432 2 ebery
5 <NA> @bobbifranklin432 2 penny
6 <NA> @bobbifranklin432 2 mum
7 <NA> @bobbifranklin432 2 bawled
8 <NA> @bobbifranklin432 2 eyes
9 <NA> @bobbifranklin432 2 genuinely
10 <NA> @bobbifranklin432 2 lovely
# ℹ 6,871 more rows
# A tibble: 6,720 × 3
id word n
<dbl> <chr> <int>
1 1762 season 8
2 15 season 4
3 370 tays 4
4 849 jesus 4
5 14 season 3
6 57 loved 3
7 95 season 3
8 123 season 3
9 432 season 3
10 432 tays 3
# ℹ 6,710 more rows
<<DocumentTermMatrix (documents: 1793, terms: 1901)>>
Non-/sparse entries: 6720/3401773
Sparsity : 100%
Maximal term length: 59
Weighting : term frequency (tf)
A LDA_Gibbs topic model with 12 topics.
# A tibble: 22,812 × 3
topic term beta
<int> <chr> <dbl>
1 1 season 0.00270
2 2 season 0.0127
3 3 season 0.000129
4 4 season 0.244
5 5 season 0.00415
6 6 season 0.000131
7 7 season 0.000128
8 8 season 0.00674
9 9 season 0.00291
10 10 season 0.00288
# ℹ 22,802 more rows
# A tibble: 120 × 3
topic term beta
<int> <chr> <dbl>
1 1 tays 0.197
2 1 won 0.0709
3 1 tomisin 0.0246
4 1 jack 0.0194
5 1 mali 0.0181
6 1 finished 0.0117
7 1 honestly 0.00914
8 1 should've 0.00785
9 1 rooting 0.00785
10 1 agree 0.00785
# ℹ 110 more rows
topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1 1 745.0357 5.5 0.6133915 4.340266
2 2 660.7231 4.8 0.6150681 3.715850
3 3 686.8135 4.6 0.6227291 4.365357
4 4 724.3162 5.3 0.6116201 5.263585
5 5 767.0095 5.2 0.6069927 2.100443
6 6 653.8940 6.2 0.6067371 2.750221
7 7 765.4174 3.4 0.6144320 3.175521
8 8 756.8352 5.0 0.6022709 3.506083
9 9 725.1158 5.7 0.6019310 3.952223
10 10 701.1900 5.9 0.6181650 3.753697
11 11 691.3012 5.0 0.6052477 3.477146
12 12 734.3482 5.1 0.6117773 3.788973
doc_prominence topic_coherence topic_exclusivity
1 2 -175.1399 9.910177
2 17 -121.1465 9.946191
3 15 -142.6420 9.986594
4 12 -161.4261 9.893236
5 13 -171.7802 9.914006
6 9 -141.6627 9.985252
7 12 -145.2791 9.911277
8 8 -163.2761 9.937527
9 8 -177.4131 9.939806
10 30 -147.9750 9.977953
11 10 -147.5398 9.935434
12 24 -148.0567 9.942916