Data Assignment

Author

Zoe Keating

Assessment 3 - Text & Sentiment Analysis

Q1

a)

b)

i.

ii.

  • The positive sentiments counts are much higher than the negative and peak over 1,500.

  • The positive sentiments also start high and then show a downward trend.

  • The negative sentiments are consistently distributed across the blocks with some peaks but never reaching over 1000.

  • From this, we can tell that the reviews for the Hawaii hotel are majority positive. #

c)

d)

# A tibble: 130,344 × 2
   bigram               n
   <chr>            <int>
 1 rainbow tower     3567
 2 hawaiian village  2909
 3 hilton hawaiian   2823
 4 ocean view        2332
 5 diamond head      2182
 6 waikiki beach     1710
 7 tapa tower        1625
 8 ali'i tower       1584
 9 front desk        1330
10 resort fee         992
11 walking distance   973
12 friday night       934
13 abc store          914
14 ala moana          894
15 kalia tower        894
16 hilton honors      714
17 ocean front        648
18 head tower         581
19 highly recommend   580
20 abc stores         539
21 super pool         517
22 minute walk        485
23 alii tower         476
24 tropics bar        459
25 customer service   450
26 partial ocean      437
27 private pool       424
28 north shore        422
29 breakfast buffet   412
30 moana shopping     390
31 ice cream          384
32 pearl harbor       381
33 night fireworks    375
34 time share         375
35 ali tower          373
36 double beds        347
37 short walk         337
38 live music         308
39 lounge chairs      283
40 beach chairs       279
# ℹ 130,304 more rows

e)

# A tibble: 73,652 × 2
   trigram                      n
   <chr>                    <int>
 1 hilton hawaiian village   2616
 2 diamond head tower         576
 3 partial ocean view         389
 4 ala moana shopping         365
 5 friday night fireworks     358
 6 round table pizza          205
 7 moana shopping centre      171
 8 ala moana mall             147
 9 front desk staff           145
10 10 minute walk             137
11 hawaiian village waikiki   136
12 15 minute walk             135
13 moana shopping center      126
14 wailana coffee house       121
15 day resort fee             115
16 village waikiki beach      109
17 waikiki beach resort       108
18 hawaiian village resort     95
19 rainbow tower ocean         90
20 2 double beds               89
21 king size bed               88
22 facing diamond head         85
23 tropics bar grill           81
24 hilton hawaiin village      79
25 diamond head view           75
26 daily resort fee            70
27 hawaii 5 0                  70
28 30 resort fee               69
29 easy walking distance       69
30 ice cream shop              69
31 ocean front view            68
32 hilton grand vacations      65
33 15 min walk                 59
34 5 minute walk               56
35 tower ocean front           56
36 20 minute walk              55
37 moana shopping mall         55
38 ala moana center            54
39 tower ocean view            54
40 2 abc stores                52
# ℹ 73,612 more rows

The lagoon is referred to as the beach near the hotel.

f)

i.

# A tibble: 2,952 × 3
   review_date    id review                                                     
   <chr>       <dbl> <chr>                                                      
 1 06/02/2003      9 "Loved the hotel and the staff. Had a upper floor room in …
 2 23/02/2003     11 "We stayed at the Rainbow Tower and the view was amazing! …
 3 24/07/2003     26 "We just returned from a 7 day, 6 night stay at the Hilton…
 4 12/08/2003     31 "Our Hawaii Family vacation (July 28th, 2003) to Oahu incl…
 5 16/08/2003     32 "Our dream vacation at the Hilton Hawaiin on June 22 was n…
 6 10/09/2003     38 "My husband and I just returned from the wonderful island …
 7 11/10/2003     44 "We made reservations 3 months in advanced for an ocean vi…
 8 30/11/2003     57 "My husband and I enjoyed our first three days of our hone…
 9 14/12/2003     59 "Since we frequently travel with our young children (2 and…
10 24/12/2003     60 "In Dec'02, I stayed at the HHV for 2 weeks. I stayed at t…
# ℹ 2,942 more rows

The tower is a type of location where some of the accommodation is.

It seems to be a more premium type of accommodation with great views.

ii.

# A tibble: 362 × 3
   review_date    id review                                                     
   <chr>       <dbl> <chr>                                                      
 1 10/09/2003     38 "My husband and I just returned from the wonderful island …
 2 04/03/2004     82 "I won our holiday in a competition with a local radio sta…
 3 11/07/2004    124 "Stayed at the Hilton Hawaiian Village from 7/3/04-7/9/04 …
 4 08/05/2005    258 "My Husband and I just came back from HHV after staying fo…
 5 14/07/2005    287 "My wife, two boys (12 & 16) and I stayed at in the Ali'i …
 6 01/08/2005    300 "My family and I stayed at HHV for our first trip to Hawai…
 7 06/10/2005    352 "pros: hotel right on beach! this is not so common on Waik…
 8 07/11/2005    367 "We stayed in the Ali'i tower - definately a good move. Fr…
 9 26/12/2005    391 "My husband, myself and our 10 year old son just returned …
10 17/01/2006    404 "We just returned...good trip. We have a 14, 11 yr old plu…
# ℹ 352 more rows

The Ala Moana shopping centre is located 10 minutes walk from the hotel and has many shops and a food court.

iii.

# A tibble: 2,825 × 3
   word      sentiment     n
   <chr>     <chr>     <int>
 1 nice      positive   7274
 2 clean     positive   3574
 3 beautiful positive   3561
 4 expensive negative   2809
 5 friendly  positive   2753
 6 free      positive   2564
 7 crowded   negative   2454
 8 recommend positive   2355
 9 loved     positive   2052
10 amazing   positive   1940
# ℹ 2,815 more rows

g)

hr_pos_sentiments <- filter(hr_word_sentiments, sentiment == "positive")

wordcloud(hr_pos_sentiments$word, 
          hr_pos_sentiments$n, 
          min.freq = 1000, 
          colors = brewer.pal(8, "Set2"))

Q2

a)

b)

# A tibble: 49,825 × 2
      id word      
   <dbl> <chr>     
 1     1 huge      
 2     1 mcds      
 3     1 lover     
 4     1 worst     
 5     1 filthy    
 6     1 inside    
 7     1 drive     
 8     1 completely
 9     1 screw     
10     1 time      
# ℹ 49,815 more rows
# A tibble: 43,352 × 3
      id word           n
   <dbl> <chr>      <int>
 1   245 mcdonald's    14
 2   856 north         12
 3  1223 mcdonald's    12
 4   742 coffee        11
 5   684 window        10
 6  1174 price         10
 7   245 mcwrap         9
 8   246 mcdonald's     9
 9   400 breakfast      9
10   742 burned         9
# ℹ 43,342 more rows
<<DocumentTermMatrix (documents: 1525, terms: 8612)>>
Non-/sparse entries: 43352/13089948
Sparsity           : 100%
Maximal term length: 22
Weighting          : term frequency (tf)
A LDA_Gibbs topic model with 12 topics.
# A tibble: 103,344 × 3
   topic term            beta
   <int> <chr>          <dbl>
 1     1 mcdonald's 0.00553  
 2     2 mcdonald's 0.000211 
 3     3 mcdonald's 0.0000200
 4     4 mcdonald's 0.151    
 5     5 mcdonald's 0.0000203
 6     6 mcdonald's 0.0000200
 7     7 mcdonald's 0.0000207
 8     8 mcdonald's 0.0000205
 9     9 mcdonald's 0.0000205
10    10 mcdonald's 0.0000185
# ℹ 103,334 more rows
# A tibble: 120 × 3
   topic term          beta
   <int> <chr>        <dbl>
 1     1 food       0.169  
 2     1 fast       0.0474 
 3     1 friendly   0.0182 
 4     1 quick      0.0149 
 5     1 lunch      0.0121 
 6     1 love       0.0102 
 7     1 mcd's      0.00941
 8     1 stop       0.00880
 9     1 restaurant 0.00819
10     1 mickey     0.00798
# ℹ 110 more rows

c)

i) ii)

#Visualise the top terms for each topic using a barchart
mr_lda_top_terms %>%
  mutate(term = reorder_within(term, beta, topic)) %>%
  group_by(topic, term) %>%    
  arrange(desc(beta)) %>%  
  ungroup() %>%
  ggplot(aes(beta, term, fill = as.factor(topic))) +
    geom_col(show.legend = FALSE) +
    scale_y_reordered() +
    labs(title = "Top 10 terms in each LDA topic", x = expression(beta), y = NULL) +
    facet_wrap(~ topic, ncol = 5, scales = "free")

#Assess the quality of the topics using more numerical methods
topic_quality <- topic_diagnostics(mr_lda, mr_dtm)
  • Topic 1: Food quality and service at McDonald’s. Words like “food,” “fast,” “friendly,” “quick,” and “lunch” show us customers fast positive experience.

  • Topic 2: Waiting time. Words like “minutes,” “line,” “time,” “wait,” and “customers” show us concerns with timing and delays.

  • Topic 3: Drive-thru service. Words like “drive,” “window,” “inside,” and “car” are related to drive through experiences.

  • Topic 4: General reviews at McDonald’s. Words like “location,” “nice,” “busy,” and “stars” are customers commenting on their experience on overall satisfaction.

  • Topic 5: Cleanliness and customer behavior. Words like “kids,” “clean,” “dirty,” and “staff” show mentions of hygiene, staff interactions, and customer behavior.

  • Topic 6: Negative customer experiences with staff or service. Words like “service,” “manager,” “worst,” and “slow” show us bad experiences in McDonalds.

  • Topic 7: Value for money and desert items. Words like “dollar,” “menu,” and “sweet” suggest reviews about affordable food options or dessert items.

  • Topic 8: Late-night. Words like “night,” “late,” and “hours” shows experiences during less busy or nighttime hours.

  • Topic 9: People, atmosphere, and parking. Words like “people,” “parking,” and “street” show us experiences with people and parking and maybe location.

  • Topic 10: Food Items. Words like “fries,” “chicken,” “burger,” and “nuggets” show topics centered around popular menu items.

  • Topic 11: Missing or incorrect orders. Words like “location,” “wrong,” “home,” and “missing” shows complaints about orders or delivery issues.

  • Topic 12: Breakfast menu . Words like “coffee,” “breakfast,” “morning,” and “sausage” suggest reviews of breakfast menu items.

Q3

# A tibble: 1,149 × 2
   word        lexicon
   <chr>       <chr>  
 1 a           SMART  
 2 a's         SMART  
 3 able        SMART  
 4 about       SMART  
 5 above       SMART  
 6 according   SMART  
 7 accordingly SMART  
 8 across      SMART  
 9 actually    SMART  
10 after       SMART  
# ℹ 1,139 more rows
# A tibble: 20 × 2
   word           n
   <chr>      <int>
 1 tays         474
 2 season       230
 3 won          176
 4 jack         152
 5 win          147
 6 heather      117
 7 pie           93
 8 scouse        92
 9 adeola        71
10 spoiler       53
11 love          48
12 deserved      43
13 blocker       42
14 it’s          38
15 locked        38
16 final         37
17 footasylum    37
18 happy         36
19 watch         34
20 congrats      33

# A tibble: 2 × 2
  sentiment     n
  <chr>     <int>
1 positive    851
2 negative    372

# A tibble: 2,431 × 4
   parent_comment author                id bigram          
   <chr>          <chr>              <dbl> <chr>           
 1 <NA>           @ensee.                1 jack robbed     
 2 <NA>           @bobbifranklin432      2 ebery penny     
 3 <NA>           @bobbifranklin432      2 genuinely lovely
 4 <NA>           @bobbifranklin432      2 lovely guy      
 5 <NA>           @bobbifranklin432      2 glad tays       
 6 <NA>           @bobbifranklin432      2 tays won        
 7 <NA>           @bobbifranklin432      2 seasons love    
 8 <NA>           @khabebomallie9467     3 na scouse       
 9 <NA>           @berry7191             4 journey loved   
10 <NA>           @therealrantroom       6 dry lips        
# ℹ 2,421 more rows
# A tibble: 1,693 × 2
   bigram                   n
   <chr>                <int>
 1 NA NA                  214
 2 tays won                83
 3 spoiler blocker         41
 4 tays wins               18
 5 charlie domingo         16
 6 scouse mali             15
 7 season 6                13
 8 congrats tays           12
 9 congratulations tays    10
10 happy tays              10
11 tays deserved           10
12 loose rizz               9
13 rizz podcast             8
14 tays win                 8
15 tays winning             8
16 can’t wait               7
17 funniest moment          7
18 amazing season           6
19 fair play                6
20 top 5                    6
21 day 1                    5
22 didn’t win               5
23 favourite season         5
24 gonna miss               5
25 hope tays                5
26 sad it’s                 5
27 season 5                 5
28 tays heather             5
29 14 days                  4
30 2 weeks                  4
31 didnt win                4
32 don’t read               4
33 footasylum cooked        4
34 it’s upside              4
35 i’ll miss                4
36 jack wins                4
37 love tays                4
38 season 1                 4
39 should've won            4
40 tays carried             4
# ℹ 1,653 more rows
# A tibble: 1,094 × 4
   parent_comment author               id trigram                
   <chr>          <chr>             <dbl> <chr>                  
 1 <NA>           @ensee.               1 NA NA NA               
 2 <NA>           @bobbifranklin432     2 genuinely lovely guy   
 3 <NA>           @bobbifranklin432     2 glad tays won          
 4 <NA>           @elliekoleosho        9 NA NA NA               
 5 <NA>           @Bwoii786            15 max’s season hands     
 6 <NA>           @harissahab2743      16 loose rizz podcast     
 7 <NA>           @harissahab2743      16 rizz podcast charlie   
 8 <NA>           @harissahab2743      16 podcast charlie demingo
 9 <NA>           @babatunde1966       19 didd ava cry           
10 <NA>           @Lenny-y8z           22 NA NA NA               
# ℹ 1,084 more rows
# A tibble: 603 × 2
   trigram                        n
   <chr>                      <int>
 1 NA NA NA                     461
 2 loose rizz podcast             8
 3 happy tays won                 6
 4 hope tays wins                 4
 5 glad tays won                  3
 6 hugging tays heather           3
 7 hope jack wins                 2
 8 im guessing adeola             2
 9 indieplayssthen don’t read     2
10 jack 2 tays                    2
11 jack didn’t win                2
12 pie didnt win                  2
13 pie should've won              2
14 scouse should’ve won           2
15 season 6 coming                2
16 season footasylum cooked       2
17 tays won btw                   2
18 whys adeola 5th                2
19 00 tays realising              1
20 05 love tays                   1
21 09 love house                  1
22 1 21 tays                      1
23 1 adeola scouse                1
24 1 jack 2                       1
25 1 minute late                  1
26 100 tays reminds               1
27 11.11 heathers alt             1
28 15 20 housemates               1
29 16 27 jack                     1
30 16 36 adeola's                 1
31 16 36 jack                     1
32 17 50 19                       1
33 19 05 love                     1
34 19 32 lmaoo                    1
35 1st kaci jay                   1
36 2 housemates low               1
37 2 minutes ago                  1
38 2 scouse adeola                1
39 2 tays 1                       1
40 2 tays 3                       1
# ℹ 563 more rows
# A tibble: 485 × 4
   comments                                          parent_comment author    id
   <chr>                                             <chr>          <chr>  <dbl>
 1 "I didn't cry I did so well then he said ebery p… <NA>           @bobb…     2
 2 "was either Jack or Tays for the win for me and … <NA>           @abdu…     8
 3 "honestly one of the best seasons, i'm pumped fo… <NA>           @rosi…    17
 4 "Still not a fan of public voting, tays was alwa… <NA>           @mang…    23
 5 "I think the dark horse was jack, scouse and tom… <NA>           @moha…    32
 6 "What is that song tays abd jack sung when jack.… <NA>           @ishi…    37
 7 "I’m acc so happy tays one ( one thing that did … <NA>           @Amel…    43
 8 "Go on tays glad you won ma bro \U0001f389"       <NA>           @Dann…    45
 9 "Tays realizing the check is upside down\U0001f6… <NA>           @Kuuj…    46
10 "Someone tell me how pie didnt win and tays did … <NA>           @Icom…    54
# ℹ 475 more rows
# A tibble: 159 × 4
   comments                                          parent_comment author    id
   <chr>                                             <chr>          <chr>  <dbl>
 1 "jack robbed"                                     <NA>           @ense…     1
 2 "was either Jack or Tays for the win for me and … <NA>           @abdu…     8
 3 "JACK LEAVING TO JME IS GOLDDDDDD"                <NA>           @hiin…    20
 4 "Jack leaving with a hijab on is sending me lmao… <NA>           @KB-u…    27
 5 "I think the dark horse was jack, scouse and tom… <NA>           @moha…    32
 6 "What is that song tays abd jack sung when jack.… <NA>           @ishi…    37
 7 "thats not ferrari thats a ampon - jack joceph 2… <NA>           @AyaA…    40
 8 "I wanted jack to winn when he left I paused the… <NA>           @Eves…    47
 9 "Jack's walk out was the best moment in this who… <NA>           @mike…    55
10 "i think Jacks funniest moment was getting stran… <NA>           @sime…    56
# ℹ 149 more rows
# A tibble: 124 × 4
   comments                                          parent_comment author    id
   <chr>                                             <chr>          <chr>  <dbl>
 1 "I’m acc so happy tays one ( one thing that did … <NA>           @Amel…    43
 2 "So happy tays won! Was heather happy for him, I… <NA>           @saad…    89
 3 "Can’t wait for the reunion awkwardness with Hea… <NA>           @user…    98
 4 "Heather looks to Tays is like someone who knows… <NA>           @jose…   107
 5 "Heather looking at tays like if I didn't say an… <NA>           @Keen…   111
 6 "tays and heather tension was SO AWKWARD"         <NA>           @sabr…   115
 7 "Jacks reaction to Tays and Heather was everythi… <NA>           @brum…   133
 8 "tays ignored heather in the final but then toda… <NA>           @edua…   140
 9 "48:14 That side eye from Heather... well, its a… <NA>           @Boxe…   143
10 "Heather.....hold that!!"                         <NA>           @moha…   152
# ℹ 114 more rows
# A tibble: 256 × 3
   word            sentiment     n
   <chr>           <chr>     <int>
 1 won             positive    176
 2 win             positive    147
 3 love            positive     48
 4 happy           positive     36
 5 wins            positive     32
 6 loved           positive     26
 7 winner          positive     26
 8 winning         positive     26
 9 congratulations positive     23
10 top             positive     20
# ℹ 246 more rows

# A tibble: 6,881 × 4
   parent_comment author               id word     
   <chr>          <chr>             <dbl> <chr>    
 1 <NA>           @ensee.               1 jack     
 2 <NA>           @ensee.               1 robbed   
 3 <NA>           @bobbifranklin432     2 cry      
 4 <NA>           @bobbifranklin432     2 ebery    
 5 <NA>           @bobbifranklin432     2 penny    
 6 <NA>           @bobbifranklin432     2 mum      
 7 <NA>           @bobbifranklin432     2 bawled   
 8 <NA>           @bobbifranklin432     2 eyes     
 9 <NA>           @bobbifranklin432     2 genuinely
10 <NA>           @bobbifranklin432     2 lovely   
# ℹ 6,871 more rows
# A tibble: 6,720 × 3
      id word       n
   <dbl> <chr>  <int>
 1  1762 season     8
 2    15 season     4
 3   370 tays       4
 4   849 jesus      4
 5    14 season     3
 6    57 loved      3
 7    95 season     3
 8   123 season     3
 9   432 season     3
10   432 tays       3
# ℹ 6,710 more rows
<<DocumentTermMatrix (documents: 1793, terms: 1901)>>
Non-/sparse entries: 6720/3401773
Sparsity           : 100%
Maximal term length: 59
Weighting          : term frequency (tf)
A LDA_Gibbs topic model with 12 topics.
# A tibble: 22,812 × 3
   topic term       beta
   <int> <chr>     <dbl>
 1     1 season 0.00270 
 2     2 season 0.0127  
 3     3 season 0.000129
 4     4 season 0.244   
 5     5 season 0.00415 
 6     6 season 0.000131
 7     7 season 0.000128
 8     8 season 0.00674 
 9     9 season 0.00291 
10    10 season 0.00288 
# ℹ 22,802 more rows
# A tibble: 120 × 3
   topic term         beta
   <int> <chr>       <dbl>
 1     1 tays      0.197  
 2     1 won       0.0709 
 3     1 tomisin   0.0246 
 4     1 jack      0.0194 
 5     1 mali      0.0181 
 6     1 finished  0.0117 
 7     1 honestly  0.00914
 8     1 should've 0.00785
 9     1 rooting   0.00785
10     1 agree     0.00785
# ℹ 110 more rows

   topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1          1   745.0357               5.5        0.6133915   4.340266
2          2   660.7231               4.8        0.6150681   3.715850
3          3   686.8135               4.6        0.6227291   4.365357
4          4   724.3162               5.3        0.6116201   5.263585
5          5   767.0095               5.2        0.6069927   2.100443
6          6   653.8940               6.2        0.6067371   2.750221
7          7   765.4174               3.4        0.6144320   3.175521
8          8   756.8352               5.0        0.6022709   3.506083
9          9   725.1158               5.7        0.6019310   3.952223
10        10   701.1900               5.9        0.6181650   3.753697
11        11   691.3012               5.0        0.6052477   3.477146
12        12   734.3482               5.1        0.6117773   3.788973
   doc_prominence topic_coherence topic_exclusivity
1               2       -175.1399          9.910177
2              17       -121.1465          9.946191
3              15       -142.6420          9.986594
4              12       -161.4261          9.893236
5              13       -171.7802          9.914006
6               9       -141.6627          9.985252
7              12       -145.2791          9.911277
8               8       -163.2761          9.937527
9               8       -177.4131          9.939806
10             30       -147.9750          9.977953
11             10       -147.5398          9.935434
12             24       -148.0567          9.942916