Advanced Data Analysis Assignment

Author

Cephora Miburo

Question 1 - Text and Sentiment Analysis ( Hawaiian hotel reviews)

A

What visitors discuss most frequently is indicated by the top terms in the evaluations. These often used terms provide a brief overview of what matters most to consumers.

B,i) Sentiment Using Bing

# A tibble: 1,044 × 2
   word          n
   <chr>     <int>
 1 great     10697
 2 nice       7274
 3 good       6503
 4 like       5513
 5 well       3997
 6 clean      3574
 7 beautiful  3561
 8 right      3282
 9 best       2813
10 friendly   2753
# ℹ 1,034 more rows
# A tibble: 1,815 × 2
   word             n
   <chr>        <int>
 1 expensive     2809
 2 crowded       2454
 3 bad           1156
 4 complex       1011
 5 problem        850
 6 pricey         835
 7 noise          790
 8 disappointed   769
 9 hard           729
10 cheap          575
# ℹ 1,805 more rows

I discovered the most prevalent positive and negative terms using the Bing dictionary Positive comments indicate that visitors commonly compliment elements like the ambiance, the kindness of the staff, and the surroundings. Negative terms like these highlight common difficulties like loudness, expense, or accommodation issues. In general, the sentiment appears to be more favourable than unfavourable

B,ii)

# A tibble: 133,682 × 4
   review_date    id word        sentiment
   <chr>       <dbl> <chr>       <chr>    
 1 21/03/2002      1 awesome     positive 
 2 21/03/2002      1 beautiful   positive 
 3 21/03/2002      1 worth       positive 
 4 21/03/2002      1 entertain   positive 
 5 21/03/2002      1 spacious    positive 
 6 21/03/2002      1 comfortable positive 
 7 21/03/2002      1 clean       positive 
 8 21/03/2002      1 free        positive 
 9 21/03/2002      1 expensive   negative 
10 21/03/2002      1 annoyed     negative 
# ℹ 133,672 more rows
# A tibble: 184 × 3
# Groups:   block [92]
   block sentiment     n
   <dbl> <chr>     <int>
 1     0 negative    425
 2     0 positive   1116
 3     1 negative    516
 4     1 positive   1346
 5     2 negative    755
 6     2 positive   1716
 7     3 negative    713
 8     3 positive   1535
 9     4 negative    706
10     4 positive   1587
# ℹ 174 more rows

C,i) Sentiment with nrc dictionary

# A tibble: 13,872 × 2
   word        sentiment
   <chr>       <chr>    
 1 abacus      trust    
 2 abandon     fear     
 3 abandon     negative 
 4 abandon     sadness  
 5 abandoned   anger    
 6 abandoned   fear     
 7 abandoned   negative 
 8 abandoned   sadness  
 9 abandonment anger    
10 abandonment fear     
# ℹ 13,862 more rows
# A tibble: 10 × 2
   sentiment        n
   <chr>        <int>
 1 anger         1245
 2 anticipation   837
 3 disgust       1056
 4 fear          1474
 5 joy            687
 6 negative      3316
 7 positive      2308
 8 sadness       1187
 9 surprise       532
10 trust         1230
# A tibble: 468,106 × 4
   review_date    id word      sentiment   
   <chr>       <dbl> <chr>     <chr>       
 1 21/03/2002      1 time      anticipation
 2 21/03/2002      1 tower     positive    
 3 21/03/2002      1 forget    negative    
 4 21/03/2002      1 mountain  anticipation
 5 21/03/2002      1 diamond   joy         
 6 21/03/2002      1 diamond   positive    
 7 21/03/2002      1 beach     joy         
 8 21/03/2002      1 beautiful joy         
 9 21/03/2002      1 beautiful positive    
10 21/03/2002      1 blue      sadness     
# ℹ 468,096 more rows
# A tibble: 100 × 3
# Groups:   sentiment [10]
   sentiment    word      n
   <chr>        <chr> <int>
 1 joy          beach 14167
 2 positive     tower 12737
 3 positive     pool   7882
 4 anticipation time   7081
 5 joy          food   4844
 6 positive     food   4844
 7 trust        food   4844
 8 joy          clean  3574
 9 positive     clean  3574
10 trust        clean  3574
# ℹ 90 more rows
# A tibble: 10 × 3
# Groups:   sentiment [1]
   sentiment word          n
   <chr>     <chr>     <int>
 1 joy       beach     14167
 2 joy       food       4844
 3 joy       clean      3574
 4 joy       beautiful  3561
 5 joy       diamond    3025
 6 joy       shopping   2977
 7 joy       friendly   2753
 8 joy       found      1994
 9 joy       helpful    1898
10 joy       vacation   1876

D Top 30 bigrams

# A tibble: 304,236 × 3
   review_date    id bigram        
   <chr>       <dbl> <chr>         
 1 21/03/2002      1 time staying  
 2 21/03/2002      1 ocean view    
 3 21/03/2002      1 24th floor    
 4 21/03/2002      1 31st floor    
 5 21/03/2002      1 lanai balcony 
 6 21/03/2002      1 diamond head  
 7 21/03/2002      1 head beach    
 8 21/03/2002      1 beautiful blue
 9 21/03/2002      1 blue ocean    
10 21/03/2002      1 worth staying 
# ℹ 304,226 more rows
# A tibble: 130,344 × 2
   bigram               n
   <chr>            <int>
 1 rainbow tower     3567
 2 hawaiian village  2909
 3 hilton hawaiian   2823
 4 ocean view        2332
 5 diamond head      2182
 6 waikiki beach     1710
 7 tapa tower        1625
 8 ali'i tower       1584
 9 front desk        1330
10 resort fee         992
# ℹ 130,334 more rows
# A tibble: 30 × 2
   bigram               n
   <chr>            <int>
 1 rainbow tower     3567
 2 hawaiian village  2909
 3 hilton hawaiian   2823
 4 ocean view        2332
 5 diamond head      2182
 6 waikiki beach     1710
 7 tapa tower        1625
 8 ali'i tower       1584
 9 front desk        1330
10 resort fee         992
# ℹ 20 more rows

E Top 30 most frequently occurring trigrams

# A tibble: 95,007 × 3
   review_date    id trigram                 
   <chr>       <dbl> <chr>                   
 1 21/03/2002      1 diamond head beach      
 2 21/03/2002      1 beautiful blue ocean    
 3 21/03/2002      1 water coffee tea        
 4 21/03/2002      1 tiny palm size          
 5 21/03/2002      1 palm size bottle        
 6 02/08/2002      2 hilton hawaiian village 
 7 02/08/2002      2 bit overpriced relative 
 8 02/08/2002      2 mai tai bar             
 9 02/08/2002      2 choose outrigger waikiki
10 02/08/2002      2 hilton hawaiian village 
# ℹ 94,997 more rows
# A tibble: 73,652 × 2
   trigram                     n
   <chr>                   <int>
 1 hilton hawaiian village  2616
 2 diamond head tower        576
 3 partial ocean view        389
 4 ala moana shopping        365
 5 friday night fireworks    358
 6 round table pizza         205
 7 moana shopping centre     171
 8 ala moana mall            147
 9 front desk staff          145
10 10 minute walk            137
# ℹ 73,642 more rows
# A tibble: 30 × 2
   trigram                     n
   <chr>                   <int>
 1 hilton hawaiian village  2616
 2 diamond head tower        576
 3 partial ocean view        389
 4 ala moana shopping        365
 5 friday night fireworks    358
 6 round table pizza         205
 7 moana shopping centre     171
 8 ala moana mall            147
 9 front desk staff          145
10 10 minute walk            137
# ℹ 20 more rows

F,i) i. Find all reviews that contain the word “lagoon”

# A tibble: 2,706 × 3
   review_date    id review                                                     
   <chr>       <dbl> <chr>                                                      
 1 17/06/2003     20 "Stayed at HHV on recent June trip to Hawaii. I am an owne…
 2 15/07/2003     24 "Great stay at Hilton Hawaiin Village1. Spent 6 nights the…
 3 11/10/2003     44 "We made reservations 3 months in advanced for an ocean vi…
 4 20/11/2003     54 "I goto Hawai'i twice a year, and every time I go, I stay …
 5 14/12/2003     59 "Since we frequently travel with our young children (2 and…
 6 20/02/2004     72 "We stayed at the Hilton Hawaiian Village the week prior t…
 7 16/04/2004     93 "Just returned from a nine night stay at the Lagoon Tower …
 8 13/06/2004    112 "It was PARADISE! Our family stayed at the Hilton Hawaiian…
 9 20/06/2004    116 "We honeymooned in Hawaii for two weeks the first being at…
10 29/06/2004    121 "We booked a partial ocean view room for $205 and were all…
# ℹ 2,696 more rows
# A tibble: 2,952 × 3
   review_date    id review                                                     
   <chr>       <dbl> <chr>                                                      
 1 06/02/2003      9 "Loved the hotel and the staff. Had a upper floor room in …
 2 23/02/2003     11 "We stayed at the Rainbow Tower and the view was amazing! …
 3 24/07/2003     26 "We just returned from a 7 day, 6 night stay at the Hilton…
 4 12/08/2003     31 "Our Hawaii Family vacation (July 28th, 2003) to Oahu incl…
 5 16/08/2003     32 "Our dream vacation at the Hilton Hawaiin on June 22 was n…
 6 10/09/2003     38 "My husband and I just returned from the wonderful island …
 7 11/10/2003     44 "We made reservations 3 months in advanced for an ocean vi…
 8 30/11/2003     57 "My husband and I enjoyed our first three days of our hone…
 9 14/12/2003     59 "Since we frequently travel with our young children (2 and…
10 24/12/2003     60 "In Dec'02, I stayed at the HHV for 2 weeks. I stayed at t…
# ℹ 2,942 more rows
# A tibble: 362 × 3
   review_date    id review                                                     
   <chr>       <dbl> <chr>                                                      
 1 10/09/2003     38 "My husband and I just returned from the wonderful island …
 2 04/03/2004     82 "I won our holiday in a competition with a local radio sta…
 3 11/07/2004    124 "Stayed at the Hilton Hawaiian Village from 7/3/04-7/9/04 …
 4 08/05/2005    258 "My Husband and I just came back from HHV after staying fo…
 5 14/07/2005    287 "My wife, two boys (12 & 16) and I stayed at in the Ali'i …
 6 01/08/2005    300 "My family and I stayed at HHV for our first trip to Hawai…
 7 06/10/2005    352 "pros: hotel right on beach! this is not so common on Waik…
 8 07/11/2005    367 "We stayed in the Ali'i tower - definately a good move. Fr…
 9 26/12/2005    391 "My husband, myself and our 10 year old son just returned …
10 17/01/2006    404 "We just returned...good trip. We have a 14, 11 yr old plu…
# ℹ 352 more rows

G World clouds for Positive & Negative words (bing)

# A tibble: 2,825 × 3
   word      sentiment     n
   <chr>     <chr>     <int>
 1 nice      positive   7274
 2 clean     positive   3574
 3 beautiful positive   3561
 4 expensive negative   2809
 5 friendly  positive   2753
 6 free      positive   2564
 7 crowded   negative   2454
 8 recommend positive   2355
 9 loved     positive   2052
10 amazing   positive   1940
# ℹ 2,815 more rows

Question 2 - Topic Modelling Analysis (mcdonalds_reviews.csv)

A

# A tibble: 49,825 × 2
      id word      
   <dbl> <chr>     
 1     1 huge      
 2     1 mcds      
 3     1 lover     
 4     1 worst     
 5     1 filthy    
 6     1 inside    
 7     1 drive     
 8     1 completely
 9     1 screw     
10     1 time      
# ℹ 49,815 more rows
# A tibble: 43,352 × 3
      id word           n
   <dbl> <chr>      <int>
 1   245 mcdonald's    14
 2   856 north         12
 3  1223 mcdonald's    12
 4   742 coffee        11
 5   684 window        10
 6  1174 price         10
 7   245 mcwrap         9
 8   246 mcdonald's     9
 9   400 breakfast      9
10   742 burned         9
# ℹ 43,342 more rows
<<DocumentTermMatrix (documents: 1525, terms: 8612)>>
Non-/sparse entries: 43352/13089948
Sparsity           : 100%
Maximal term length: 22
Weighting          : term frequency (tf)

B Create the LDA model (Collapsed Gibbs, seed 1234, k ≥ 10)

A LDA_Gibbs topic model with 12 topics.

C,i) Visually

c,ii) Numerically

By reading reviews within a chosen topic (e.g., “service speed”), we see customers often mention long waiting times or slow drive-thru service. From this, McDonalds can take actions such as:

Improving staffing during busy periods

Optimising drive-thru processes

Improving order accuracy and speed

Each topic gives a clear area where McDonalds can make decisions and improvements.