final project

ADA Final Project

Question 1 – Text and Sentiment Analysis

A

B

Fast is a frequently used word in positive reviews which is good for a chain such as McDonalds as getting the food out is an important selling point. Other positive terns such as friendly, clean, sweet and pretty shows that its not just the food but the staff, service and the building is a factor of people coming back.

Worst, bad and wrong are very negative terms that shows issues by staff in preparing the food.The information can be used to train staff to make the overall customer experience better.

D

E

F

F.i

“Waiting” is frequently brought up in the reviews in relation to delays or subpar service at the McDonald’s restaurants that are discussed. Whether in-store or at the drive-thru, reviews complain about unreasonably lengthy wait times for both food and service. Many clients express displeasure with the overall experience after waiting for five to twenty minutes without any support or communication from the staff.

F.ii

In the reviews, the “Shamrock Shake” it shows a mix ofemotions. Since the Shamrock Shake is a seasonal drink, many reviewers talk about how excited they are for it, especially around St. Patrick’s Day. Some say they miss or like the shake, while others say they were disappointed when the place they went to didn’t have it or didn’t make it properly. The shake’s flavour and texture are criticised; some critics say it’s too sugary, artificial, or badly combined, which makes them unhappy. Others complain that staff members are unaware of the shake’s availability or that places are running out of it.

F.iii

The McDonald’s “ice cream machine” is commonly cited in these reviews as a source of annoyance and discontent. The ice cream maker is frequently described as “down,” “locked,” or inaccessible by reviewers, especially later in the evening or during late-night trips. Customers complain that the machine frequently breaks down without providing any reasons, which results in lost chances to buy items made with ice cream, such as cones or sundaes.

Long wait times, variable food quality, and bad customer service are among the other grievances, in addition to the problems with the ice cream machine.

G

Question 2 – Topic Modelling Analysis

<<DocumentTermMatrix (documents: 4682, terms: 9600)>>
Non-/sparse entries: 67601/44879599
Sparsity           : 100%
Maximal term length: 27
Weighting          : term frequency (tf)
# A tibble: 101 × 3
   topic term         beta
   <int> <chr>       <dbl>
 1     1 amazing    0.0357
 2     1 buy        0.0314
 3     1 controller 0.0299
 4     1 recommend  0.0279
 5     1 love       0.0221
 6     1 switch     0.0217
 7     1 awesome    0.0199
 8     1 xbox       0.0199
 9     1 fan        0.0186
10     1 feel       0.0141
# ℹ 91 more rows

  1. Likely focuses on gaming consoles and accessories, particularly positive experiences with Xbox and Switch. Words like “amazing,” “controller,” and “recommend” suggest user satisfaction

  2. Focuses on batteries and power-related products. Terms like “batteries,” “Energizer,” and “life” indicate discussions about battery performance.

  3. Likely about headsets, controllers or console products, with an emphasis on sound quality, price, and value.

  4. Centers on monitors or TVs, particularly gaming-related screens.

  5. Dedicated to gaming, particularly Zelda and similar games. Words like “gameplay,” “story,” and “fans” indicate a focus on game mechanics and narratives.

  6. Likely focused on the “Fallout” game series and gameplay experience.

  7. Highlights positive feelings about products or experiences. Words like “love,” “perfect,” and “beautiful” suggest a focus on satisfaction.

  8. Focuses on gaming broadly, with mentions of “story,” “graphics,” and “characters,” likely highlighting the importance of game design and storytelling.

  9. Discusses Pokémon and related games. Words like “awesome,” “series,” and “graphics” suggest enthusiasm for the franchise.

  10. Likely about gaming-related purchases and family enjoyment.

   topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1          1   972.4367               5.7        0.6279051   3.210947
2          2   823.9564               5.2        0.6511510   8.311598
3          3  1046.5921               4.9        0.6495077   2.281502
4          4   956.3455               5.9        0.6515969   5.142078
5          5   947.4389               4.6        0.6090892  12.403928
6          6  1020.4683               4.9        0.6049524  12.280879
7          7  1039.1180               5.0        0.6363035   2.536874
8          8   914.6177               6.2        0.6011984  12.503753
9          9   949.5422               5.8        0.6034798  14.711659
10        10   929.4843               4.9        0.6366817   2.438540
   doc_prominence topic_coherence topic_exclusivity
1              39       -180.6611          9.843655
2             295       -131.6040          9.940545
3              66       -197.4257          9.902327
4             147       -151.3177          9.863284
5              75       -158.3137          9.771083
6              54       -181.5100          9.794148
7              62       -204.0749          9.860458
8              65       -137.4739          9.766163
9             103       -148.9620          9.738676
10             25       -198.1260          9.868270
  • The fact that Topic 2 has the highest doc_prominence (295) and a strong topic_coherence value (-131.6040) indicates that its material is consistent and well-represented. Additionally, its mean token length (5.2) is marginally higher, which would suggest better content.

  • With a comparatively low doc_prominence (62) and a high dist_from_corpus (0.6363035), Topic 7 sticks out. This could indicate that the topic is less common and possibly less well-formed in the wider corpus. Additionally, it has the lowest topic_coherence (-204.0749), suggesting that it may not be cohesive.

  • Topics with larger topic_size (like Topics 1, 3, and 6, all exceeding 1000) may suggest they cover a broader range of content or represent major themes in the corpus.

  • Topic exclusivity values are consistently high across the board, which is a good indicator that the topics are generally distinct from one another.