Version 2 Assignment based on Mcdonalds Reviews & Gamestop Product Reviews

Author

Rosemary Francis

0.1 Question 1 - Text and Sentiment Analysis

1 a)

This barchart visualises the top 20 most frequently occurring words. Food and McDonalds seem to appear alot while customer and times appear the least.

2 b) i.

# A tibble: 441 × 2
   word         n
   <chr>    <int>
 1 fast       232
 2 pretty     146
 3 hot        132
 4 nice       132
 5 clean      110
 6 friendly    99
 7 sweet       86
 8 love        71
 9 fresh       69
10 free        64
# ℹ 431 more rows

# A tibble: 813 × 2
   word         n
   <chr>    <int>
 1 worst      215
 2 bad        185
 3 wrong      179
 4 slow       137
 5 rude       120
 6 cold       113
 7 horrible    81
 8 dirty       71
 9 hard        66
10 terrible    60
# ℹ 803 more rows

Here are the most common words associated with a “positive” and “negative” sentiments. There is 441 positive sentiments while there is 813 negative sentiments.

3 b) ii.

This barchart shows how the negative and postitive sentiments change over time. This finding shows that there are more negative sentiments than positive sentiments in the mcdonalds review. Each block contains a noticeably higher number of negative words, indicating that the overall tone of the review is predominantly negative. The positive sentiments remain significantly lower.

Overall, this sentiment suggests that the reviewer’s experience with McDonald’s are largely negative, with relatively few positive reviews.

4 c) i.

# A tibble: 2,308 × 2
   word               n
   <chr>          <int>
 1 abba               1
 2 ability            1
 3 abovementioned     1
 4 absolute           1
 5 absolution         1
 6 absorbed           1
 7 abundance          1
 8 abundant           1
 9 academic           1
10 academy            1
# ℹ 2,298 more rows

# A tibble: 3,316 × 2
   word            n
   <chr>       <int>
 1 abandon         1
 2 abandoned       1
 3 abandonment     1
 4 abduction       1
 5 aberrant        1
 6 aberration      1
 7 abhor           1
 8 abhorrent       1
 9 abject          1
10 abnormal        1
# ℹ 3,306 more rows

These are the top 10 most common words associated with each sentiment using “nrc” for the Mcdonalds reviews.

5 c) ii.

The 10 sentiments only appear in the reviews dataset once. Using the NRC sentiment dictionary, each word in the McDonald’s reviews was classified into one or more of 10 sentiment categories. My outputs show that there is 3,316 negative nrc sentiments and 2,308 positive nrc sentiments.

6 d)

Bigram	Count
fast food	153
customer service	116
ice cream	61
worst mcdonalds	52
10 minutes	49
parking lot	43
worst mcdonald's	42
15 minutes	39
chicken nuggets	38
french fries	34
mickey d's	33
20 minutes	32
5 minutes	29
iced coffee	29
dollar menu	28
late night	28
sweet tea	27
24 hours	25
chicken sandwich	23
quarter pounder	23

Here are the Top 20 occurring bigrams. They show key themes and words that appear repeatedly throughout the mcdonalds reviews. Fast food appears the most while quarter pounder appears the least.

7 e)

Trigram	Count
ice cream machine	10
worst customer service	10
24 hour drive	9
eat fast food	8
fast food restaurants	8
ice cream cone	8
10 piece chicken	7
fast food restaurant	7
sausage egg mcmuffin	7
terrible customer service	7
free wi fi	6
ice cream cones	6
piece chicken nugget	5
piece chicken nuggets	5
worst fast food	5
2 apple pies	4
5 10 minutes	4
bad customer service	4
double cheese burger	4
fast food chain	4

Here are the Top 20 occuring trigrams. They reveal longer, more detailed phrases that customers commonly use in their reviews. It is normally a three worded context. Ice cream machine appears the most while fast food chain appears the least.

8 f) i.

# A tibble: 0 × 2
# ℹ 2 variables: word <chr>, n <int>

Export file: write_csv(waiting_reviews, “waiting_review.csv”)

Reviewers mention “waiting” mostly in the context of slow service, long drive thru delays, and extended waits for orders that were incorrect or poorly managed.

9 f) ii.

Export file: write.csv(shamrock_shake_reviews, “shamrock_shake_reviews.csv”)

Reviewers mention “shamrock shake” mainly when complaining about its poor quality, artificial taste, or unavailability despite expecting to buy one.

10 f) iii.

Export file: write.csv(icecream_machine_reviews, “icecream_machine_reviews.csv”)

Reviewers mention the “ice cream machine” mostly to complain that it is constantly broken, shut off early, or unavailable, leading to frustration when trying to order desserts.

11 g)

The first word cloud shows random words while the last two word clouds highlight the most frequent non stopwords associated with positive and negative sentiments in the reviews. The positive word cloud shows terms that appear often in positive contexts, typically reflecting satisfaction with service, food quality, or overall experience. In contrast, the negative word cloud displays words commonly used in complaints or criticisms, revealing the main issues customers talk about.

The word cloud helps to visually see the sentiments in the McDonalds reviews.

11.1 Question 2 - Topic Modelling Analysis

12 a)

<<DocumentTermMatrix (documents: 4682, terms: 9599)>>
Non-/sparse entries: 67545/44874973
Sparsity           : 100%
Maximal term length: 27
Weighting          : term frequency (tf)

This shows the document term matrix, where we see that the output contains 4,682 documents and 9,599 terms. The sparsity of the matrix is the percentage of cells that contain 0, where these cells represent a word that does not appear in a review. We can see that there is 44,874,973 cells (which is 4682 x 9599) in the document term matrix. There are only 67545 cells that have a non zero value and so the sparsity of the document term matrix is close to 100%.

13 b) i/ii/iii

A LDA_Gibbs topic model with 10 topics.

This tells us that the we have 10 topics from the LDA model.

14 c) i

Here are the barcharts showing the Top 10 topics that come up in the gamestop review. I will be evaluating the topics and seeing if there are meaningful and useful and to look for any patterns:

Topic 1 focuses on TV and monitor quality issues such as sound, picture, pricing, and comfort.

Topic 2 appears to be about Pokémon games and graphics performance.

Topic 3 could be describing buying gaming accessories like controllers for Xbox or Nintendo for son.

Topic 4 talks about batteries and, flashlight power, life, and time.

Topic 5 seems to talk about monitor and screen performance, including image quality and user experience.

Topic 6 could be about general gaming enjoyment, including graphics, gameplay, and fun.

Topic 7 seems to focus on story based games like Zelda, with emphasis on graphics, characters, and gameplay.

Topic 8 talks about positive product experiences, describing ease of use and customer recommendations.

Topic 9 seems to talk about time spent playing games and the value people get from their gaming hours.

Topic 10 appears to focus on Fallout and similar games, including controls, system, and characters.

15 c) ii

   topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1          1  1023.1595               5.7        0.6512766   4.433123
2          2   979.5251               6.2        0.6104892  14.769769
3          3   925.7857               4.3        0.6440713   2.417453
4          4   839.6504               6.1        0.6518475   8.366660
5          5   992.0127               7.4        0.6527836   3.237674
6          6   908.6871               4.9        0.6177458  12.166879
7          7  1017.1848               5.5        0.6008149  12.544420
8          8   996.7261               5.2        0.6476142   2.455808
9          9   857.5360               4.9        0.6018382  12.461685
10        10  1058.7326               4.5        0.6276783   3.281093
   doc_prominence topic_coherence topic_exclusivity
1             125       -174.7860          9.959100
2             115       -136.9226          9.680701
3              34       -209.6722          9.979300
4             276       -123.5221          9.953439
5              87       -185.8496          9.969981
6              47       -162.3809          9.818089
7              66       -155.5687          9.641824
8              61       -208.0815          9.931432
9              29       -148.6571          9.802699
10             39       -175.3660          9.816286

The topic quality helps to assess the quality of the topics and to see what LDA topics are useful and more interpretable.

The topic size shows the weighted number of terms per topic. All topics have similar sizes between 839 and 1,058 tokens, suggesting the model distributed reviews fairly evenly.

The mean token length shows the average number of characters for the top terms per topic. Topics with longer mean token length such as Topic 5: 7.4 or Topic 4: 6.1 may include more descriptive words and may be more meaningful, shorter word lengths such as Topic 3: 4.3, may indicate simpler or less informative vocabulary.

The topic coherence is a measure of how often the top terms in each topic appear together in the same document. Looking at the table, the values are negative, but the closer to zero the better. The topics with the best coherence is Topic 4, Topic 9 and Topic 2. These topics are the most semantically consistent and likely represent clear themes.

While, the lowest coherence is Topic 3, Topic 8 and Topic 5. These topics are less coherent, meaning the top words may be more mixed or harder to interpret.

The topic exclusivity measures how unique the top terms in each topic are compared to other topics. Topic 5, Topic 3 and Topic 4 have strong exclusivity while Topic 2 and Topic 7 have low exclusivity. Topics with lower exclusivity have more words shared across topics, which reduces their distinctiveness.

16 d)

Based on my analysis, the topic quality suggests several opportunities for GameStop to improve customer satisfaction, to find consumer pain points and business performance. The high quality topics focused on gameplay experience, product quality, and accessories indicate these are the areas where customers engage the most in, so GameStop should continue prioritising these items such as consoles, controllers, and popular game titles. Lower quality topics, especially those with mixed or unclear themes likely show and reflect inconsistent customer experiences, particularly around pricing, delivery, and product reliability. GameStop could address these concerns by improving product descriptions, offering clearer return policies, and ensuring better quality control for used or refurbished items. The strong sentiment around “recommend,” “easy,” and “value” suggests that enhancing customer support, loyalty programmes, and bundle promotions could further increase positive sentiment and repeat purchases. Overall, this analysis points to the importance of focusing on reliability, competitive pricing, and better communication to strengthen customer trust and retention.