Advanced Data Analysis Assignment
Question 1 - Text and Sentiment Analysis ( Hawaiian hotel reviews)
A
What visitors discuss most frequently is indicated by the top terms in the evaluations. These often used terms provide a brief overview of what matters most to consumers.
B,i) Sentiment Using Bing
# A tibble: 1,044 × 2
word n
<chr> <int>
1 great 10697
2 nice 7274
3 good 6503
4 like 5513
5 well 3997
6 clean 3574
7 beautiful 3561
8 right 3282
9 best 2813
10 friendly 2753
# ℹ 1,034 more rows
# A tibble: 1,815 × 2
word n
<chr> <int>
1 expensive 2809
2 crowded 2454
3 bad 1156
4 complex 1011
5 problem 850
6 pricey 835
7 noise 790
8 disappointed 769
9 hard 729
10 cheap 575
# ℹ 1,805 more rows
I discovered the most prevalent positive and negative terms using the Bing dictionary Positive comments indicate that visitors commonly compliment elements like the ambiance, the kindness of the staff, and the surroundings. Negative terms like these highlight common difficulties like loudness, expense, or accommodation issues. In general, the sentiment appears to be more favourable than unfavourable
B,ii)
# A tibble: 133,682 × 4
review_date id word sentiment
<chr> <dbl> <chr> <chr>
1 21/03/2002 1 awesome positive
2 21/03/2002 1 beautiful positive
3 21/03/2002 1 worth positive
4 21/03/2002 1 entertain positive
5 21/03/2002 1 spacious positive
6 21/03/2002 1 comfortable positive
7 21/03/2002 1 clean positive
8 21/03/2002 1 free positive
9 21/03/2002 1 expensive negative
10 21/03/2002 1 annoyed negative
# ℹ 133,672 more rows
# A tibble: 184 × 3
# Groups: block [92]
block sentiment n
<dbl> <chr> <int>
1 0 negative 425
2 0 positive 1116
3 1 negative 516
4 1 positive 1346
5 2 negative 755
6 2 positive 1716
7 3 negative 713
8 3 positive 1535
9 4 negative 706
10 4 positive 1587
# ℹ 174 more rows
C,i) Sentiment with nrc dictionary
# A tibble: 13,872 × 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon fear
3 abandon negative
4 abandon sadness
5 abandoned anger
6 abandoned fear
7 abandoned negative
8 abandoned sadness
9 abandonment anger
10 abandonment fear
# ℹ 13,862 more rows
# A tibble: 10 × 2
sentiment n
<chr> <int>
1 anger 1245
2 anticipation 837
3 disgust 1056
4 fear 1474
5 joy 687
6 negative 3316
7 positive 2308
8 sadness 1187
9 surprise 532
10 trust 1230
# A tibble: 468,106 × 4
review_date id word sentiment
<chr> <dbl> <chr> <chr>
1 21/03/2002 1 time anticipation
2 21/03/2002 1 tower positive
3 21/03/2002 1 forget negative
4 21/03/2002 1 mountain anticipation
5 21/03/2002 1 diamond joy
6 21/03/2002 1 diamond positive
7 21/03/2002 1 beach joy
8 21/03/2002 1 beautiful joy
9 21/03/2002 1 beautiful positive
10 21/03/2002 1 blue sadness
# ℹ 468,096 more rows
# A tibble: 100 × 3
# Groups: sentiment [10]
sentiment word n
<chr> <chr> <int>
1 joy beach 14167
2 positive tower 12737
3 positive pool 7882
4 anticipation time 7081
5 joy food 4844
6 positive food 4844
7 trust food 4844
8 joy clean 3574
9 positive clean 3574
10 trust clean 3574
# ℹ 90 more rows
# A tibble: 10 × 3
# Groups: sentiment [1]
sentiment word n
<chr> <chr> <int>
1 joy beach 14167
2 joy food 4844
3 joy clean 3574
4 joy beautiful 3561
5 joy diamond 3025
6 joy shopping 2977
7 joy friendly 2753
8 joy found 1994
9 joy helpful 1898
10 joy vacation 1876
D Top 30 bigrams
# A tibble: 304,236 × 3
review_date id bigram
<chr> <dbl> <chr>
1 21/03/2002 1 time staying
2 21/03/2002 1 ocean view
3 21/03/2002 1 24th floor
4 21/03/2002 1 31st floor
5 21/03/2002 1 lanai balcony
6 21/03/2002 1 diamond head
7 21/03/2002 1 head beach
8 21/03/2002 1 beautiful blue
9 21/03/2002 1 blue ocean
10 21/03/2002 1 worth staying
# ℹ 304,226 more rows
# A tibble: 130,344 × 2
bigram n
<chr> <int>
1 rainbow tower 3567
2 hawaiian village 2909
3 hilton hawaiian 2823
4 ocean view 2332
5 diamond head 2182
6 waikiki beach 1710
7 tapa tower 1625
8 ali'i tower 1584
9 front desk 1330
10 resort fee 992
# ℹ 130,334 more rows
# A tibble: 30 × 2
bigram n
<chr> <int>
1 rainbow tower 3567
2 hawaiian village 2909
3 hilton hawaiian 2823
4 ocean view 2332
5 diamond head 2182
6 waikiki beach 1710
7 tapa tower 1625
8 ali'i tower 1584
9 front desk 1330
10 resort fee 992
# ℹ 20 more rows
E Top 30 most frequently occurring trigrams
# A tibble: 95,007 × 3
review_date id trigram
<chr> <dbl> <chr>
1 21/03/2002 1 diamond head beach
2 21/03/2002 1 beautiful blue ocean
3 21/03/2002 1 water coffee tea
4 21/03/2002 1 tiny palm size
5 21/03/2002 1 palm size bottle
6 02/08/2002 2 hilton hawaiian village
7 02/08/2002 2 bit overpriced relative
8 02/08/2002 2 mai tai bar
9 02/08/2002 2 choose outrigger waikiki
10 02/08/2002 2 hilton hawaiian village
# ℹ 94,997 more rows
# A tibble: 73,652 × 2
trigram n
<chr> <int>
1 hilton hawaiian village 2616
2 diamond head tower 576
3 partial ocean view 389
4 ala moana shopping 365
5 friday night fireworks 358
6 round table pizza 205
7 moana shopping centre 171
8 ala moana mall 147
9 front desk staff 145
10 10 minute walk 137
# ℹ 73,642 more rows
# A tibble: 30 × 2
trigram n
<chr> <int>
1 hilton hawaiian village 2616
2 diamond head tower 576
3 partial ocean view 389
4 ala moana shopping 365
5 friday night fireworks 358
6 round table pizza 205
7 moana shopping centre 171
8 ala moana mall 147
9 front desk staff 145
10 10 minute walk 137
# ℹ 20 more rows
F,i) i. Find all reviews that contain the word “lagoon”
# A tibble: 2,706 × 3
review_date id review
<chr> <dbl> <chr>
1 17/06/2003 20 "Stayed at HHV on recent June trip to Hawaii. I am an owne…
2 15/07/2003 24 "Great stay at Hilton Hawaiin Village1. Spent 6 nights the…
3 11/10/2003 44 "We made reservations 3 months in advanced for an ocean vi…
4 20/11/2003 54 "I goto Hawai'i twice a year, and every time I go, I stay …
5 14/12/2003 59 "Since we frequently travel with our young children (2 and…
6 20/02/2004 72 "We stayed at the Hilton Hawaiian Village the week prior t…
7 16/04/2004 93 "Just returned from a nine night stay at the Lagoon Tower …
8 13/06/2004 112 "It was PARADISE! Our family stayed at the Hilton Hawaiian…
9 20/06/2004 116 "We honeymooned in Hawaii for two weeks the first being at…
10 29/06/2004 121 "We booked a partial ocean view room for $205 and were all…
# ℹ 2,696 more rows
# A tibble: 2,952 × 3
review_date id review
<chr> <dbl> <chr>
1 06/02/2003 9 "Loved the hotel and the staff. Had a upper floor room in …
2 23/02/2003 11 "We stayed at the Rainbow Tower and the view was amazing! …
3 24/07/2003 26 "We just returned from a 7 day, 6 night stay at the Hilton…
4 12/08/2003 31 "Our Hawaii Family vacation (July 28th, 2003) to Oahu incl…
5 16/08/2003 32 "Our dream vacation at the Hilton Hawaiin on June 22 was n…
6 10/09/2003 38 "My husband and I just returned from the wonderful island …
7 11/10/2003 44 "We made reservations 3 months in advanced for an ocean vi…
8 30/11/2003 57 "My husband and I enjoyed our first three days of our hone…
9 14/12/2003 59 "Since we frequently travel with our young children (2 and…
10 24/12/2003 60 "In Dec'02, I stayed at the HHV for 2 weeks. I stayed at t…
# ℹ 2,942 more rows
# A tibble: 362 × 3
review_date id review
<chr> <dbl> <chr>
1 10/09/2003 38 "My husband and I just returned from the wonderful island …
2 04/03/2004 82 "I won our holiday in a competition with a local radio sta…
3 11/07/2004 124 "Stayed at the Hilton Hawaiian Village from 7/3/04-7/9/04 …
4 08/05/2005 258 "My Husband and I just came back from HHV after staying fo…
5 14/07/2005 287 "My wife, two boys (12 & 16) and I stayed at in the Ali'i …
6 01/08/2005 300 "My family and I stayed at HHV for our first trip to Hawai…
7 06/10/2005 352 "pros: hotel right on beach! this is not so common on Waik…
8 07/11/2005 367 "We stayed in the Ali'i tower - definately a good move. Fr…
9 26/12/2005 391 "My husband, myself and our 10 year old son just returned …
10 17/01/2006 404 "We just returned...good trip. We have a 14, 11 yr old plu…
# ℹ 352 more rows
G World clouds for Positive & Negative words (bing)
# A tibble: 2,825 × 3
word sentiment n
<chr> <chr> <int>
1 nice positive 7274
2 clean positive 3574
3 beautiful positive 3561
4 expensive negative 2809
5 friendly positive 2753
6 free positive 2564
7 crowded negative 2454
8 recommend positive 2355
9 loved positive 2052
10 amazing positive 1940
# ℹ 2,815 more rows
Question 2 - Topic Modelling Analysis (mcdonalds_reviews.csv)
A
# A tibble: 49,825 × 2
id word
<dbl> <chr>
1 1 huge
2 1 mcds
3 1 lover
4 1 worst
5 1 filthy
6 1 inside
7 1 drive
8 1 completely
9 1 screw
10 1 time
# ℹ 49,815 more rows
# A tibble: 43,352 × 3
id word n
<dbl> <chr> <int>
1 245 mcdonald's 14
2 856 north 12
3 1223 mcdonald's 12
4 742 coffee 11
5 684 window 10
6 1174 price 10
7 245 mcwrap 9
8 246 mcdonald's 9
9 400 breakfast 9
10 742 burned 9
# ℹ 43,342 more rows
<<DocumentTermMatrix (documents: 1525, terms: 8612)>>
Non-/sparse entries: 43352/13089948
Sparsity : 100%
Maximal term length: 22
Weighting : term frequency (tf)
B Create the LDA model (Collapsed Gibbs, seed 1234, k ≥ 10)
A LDA_Gibbs topic model with 12 topics.
C,i) Visually
c,ii) Numerically
By reading reviews within a chosen topic (e.g., “service speed”), we see customers often mention long waiting times or slow drive-thru service. From this, McDonalds can take actions such as:
Improving staffing during busy periods
Optimising drive-thru processes
Improving order accuracy and speed
Each topic gives a clear area where McDonalds can make decisions and improvements.