Version 2 Assignment based on Mcdonalds Reviews & Gamestop Product Reviews
0.1 Question 1 - Text and Sentiment Analysis
1 a)
This barchart visualises the top 20 most frequently occurring words. Food and McDonalds seem to appear alot while customer and times appear the least.
2 b) i.
# A tibble: 441 × 2
word n
<chr> <int>
1 fast 232
2 pretty 146
3 hot 132
4 nice 132
5 clean 110
6 friendly 99
7 sweet 86
8 love 71
9 fresh 69
10 free 64
# ℹ 431 more rows
# A tibble: 813 × 2
word n
<chr> <int>
1 worst 215
2 bad 185
3 wrong 179
4 slow 137
5 rude 120
6 cold 113
7 horrible 81
8 dirty 71
9 hard 66
10 terrible 60
# ℹ 803 more rows
Here are the most common words associated with a “positive” and “negative” sentiments. There is 441 positive sentiments while there is 813 negative sentiments.
3 b) ii.
This barchart shows how the negative and postitive sentiments change over time. This finding shows that there are more negative sentiments than positive sentiments in the mcdonalds review. Each block contains a noticeably higher number of negative words, indicating that the overall tone of the review is predominantly negative. The positive sentiments remain significantly lower.
Overall, this sentiment suggests that the reviewer’s experience with McDonald’s are largely negative, with relatively few positive reviews.
4 c) i.
# A tibble: 2,308 × 2
word n
<chr> <int>
1 abba 1
2 ability 1
3 abovementioned 1
4 absolute 1
5 absolution 1
6 absorbed 1
7 abundance 1
8 abundant 1
9 academic 1
10 academy 1
# ℹ 2,298 more rows
# A tibble: 3,316 × 2
word n
<chr> <int>
1 abandon 1
2 abandoned 1
3 abandonment 1
4 abduction 1
5 aberrant 1
6 aberration 1
7 abhor 1
8 abhorrent 1
9 abject 1
10 abnormal 1
# ℹ 3,306 more rows
These are the top 10 most common words associated with each sentiment using “nrc” for the Mcdonalds reviews.
5 c) ii.
The 10 sentiments only appear in the reviews dataset once. Using the NRC sentiment dictionary, each word in the McDonald’s reviews was classified into one or more of 10 sentiment categories. My outputs show that there is 3,316 negative nrc sentiments and 2,308 positive nrc sentiments.
6 d)
| Bigram | Count |
|---|---|
| fast food | 153 |
| customer service | 116 |
| ice cream | 61 |
| worst mcdonalds | 52 |
| 10 minutes | 49 |
| parking lot | 43 |
| worst mcdonald's | 42 |
| 15 minutes | 39 |
| chicken nuggets | 38 |
| french fries | 34 |
| mickey d's | 33 |
| 20 minutes | 32 |
| 5 minutes | 29 |
| iced coffee | 29 |
| dollar menu | 28 |
| late night | 28 |
| sweet tea | 27 |
| 24 hours | 25 |
| chicken sandwich | 23 |
| quarter pounder | 23 |
Here are the Top 20 occurring bigrams. They show key themes and words that appear repeatedly throughout the mcdonalds reviews. Fast food appears the most while quarter pounder appears the least.
7 e)
| Trigram | Count |
|---|---|
| ice cream machine | 10 |
| worst customer service | 10 |
| 24 hour drive | 9 |
| eat fast food | 8 |
| fast food restaurants | 8 |
| ice cream cone | 8 |
| 10 piece chicken | 7 |
| fast food restaurant | 7 |
| sausage egg mcmuffin | 7 |
| terrible customer service | 7 |
| free wi fi | 6 |
| ice cream cones | 6 |
| piece chicken nugget | 5 |
| piece chicken nuggets | 5 |
| worst fast food | 5 |
| 2 apple pies | 4 |
| 5 10 minutes | 4 |
| bad customer service | 4 |
| double cheese burger | 4 |
| fast food chain | 4 |
Here are the Top 20 occuring trigrams. They reveal longer, more detailed phrases that customers commonly use in their reviews. It is normally a three worded context. Ice cream machine appears the most while fast food chain appears the least.
8 f) i.
# A tibble: 0 × 2
# ℹ 2 variables: word <chr>, n <int>
Export file: write_csv(waiting_reviews, “waiting_review.csv”)
Reviewers mention “waiting” mostly in the context of slow service, long drive thru delays, and extended waits for orders that were incorrect or poorly managed.
9 f) ii.
Export file: write.csv(shamrock_shake_reviews, “shamrock_shake_reviews.csv”)
Reviewers mention “shamrock shake” mainly when complaining about its poor quality, artificial taste, or unavailability despite expecting to buy one.
10 f) iii.
Export file: write.csv(icecream_machine_reviews, “icecream_machine_reviews.csv”)
Reviewers mention the “ice cream machine” mostly to complain that it is constantly broken, shut off early, or unavailable, leading to frustration when trying to order desserts.
11 g)
The first word cloud shows random words while the last two word clouds highlight the most frequent non stopwords associated with positive and negative sentiments in the reviews. The positive word cloud shows terms that appear often in positive contexts, typically reflecting satisfaction with service, food quality, or overall experience. In contrast, the negative word cloud displays words commonly used in complaints or criticisms, revealing the main issues customers talk about.
The word cloud helps to visually see the sentiments in the McDonalds reviews.
11.1 Question 2 - Topic Modelling Analysis
12 a)
<<DocumentTermMatrix (documents: 4682, terms: 9599)>>
Non-/sparse entries: 67545/44874973
Sparsity : 100%
Maximal term length: 27
Weighting : term frequency (tf)
This shows the document term matrix, where we see that the output contains 4,682 documents and 9,599 terms. The sparsity of the matrix is the percentage of cells that contain 0, where these cells represent a word that does not appear in a review. We can see that there is 44,874,973 cells (which is 4682 x 9599) in the document term matrix. There are only 67545 cells that have a non zero value and so the sparsity of the document term matrix is close to 100%.
13 b) i/ii/iii
A LDA_Gibbs topic model with 10 topics.
This tells us that the we have 10 topics from the LDA model.
14 c) i
Here are the barcharts showing the Top 10 topics that come up in the gamestop review. I will be evaluating the topics and seeing if there are meaningful and useful and to look for any patterns:
Topic 1 focuses on TV and monitor quality issues such as sound, picture, pricing, and comfort.
Topic 2 appears to be about Pokémon games and graphics performance.
Topic 3 could be describing buying gaming accessories like controllers for Xbox or Nintendo for son.
Topic 4 talks about batteries and, flashlight power, life, and time.
Topic 5 seems to talk about monitor and screen performance, including image quality and user experience.
Topic 6 could be about general gaming enjoyment, including graphics, gameplay, and fun.
Topic 7 seems to focus on story based games like Zelda, with emphasis on graphics, characters, and gameplay.
Topic 8 talks about positive product experiences, describing ease of use and customer recommendations.
Topic 9 seems to talk about time spent playing games and the value people get from their gaming hours.
Topic 10 appears to focus on Fallout and similar games, including controls, system, and characters.
15 c) ii
topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1 1 1023.1595 5.7 0.6512766 4.433123
2 2 979.5251 6.2 0.6104892 14.769769
3 3 925.7857 4.3 0.6440713 2.417453
4 4 839.6504 6.1 0.6518475 8.366660
5 5 992.0127 7.4 0.6527836 3.237674
6 6 908.6871 4.9 0.6177458 12.166879
7 7 1017.1848 5.5 0.6008149 12.544420
8 8 996.7261 5.2 0.6476142 2.455808
9 9 857.5360 4.9 0.6018382 12.461685
10 10 1058.7326 4.5 0.6276783 3.281093
doc_prominence topic_coherence topic_exclusivity
1 125 -174.7860 9.959100
2 115 -136.9226 9.680701
3 34 -209.6722 9.979300
4 276 -123.5221 9.953439
5 87 -185.8496 9.969981
6 47 -162.3809 9.818089
7 66 -155.5687 9.641824
8 61 -208.0815 9.931432
9 29 -148.6571 9.802699
10 39 -175.3660 9.816286
The topic quality helps to assess the quality of the topics and to see what LDA topics are useful and more interpretable.
The topic size shows the weighted number of terms per topic. All topics have similar sizes between 839 and 1,058 tokens, suggesting the model distributed reviews fairly evenly.
The mean token length shows the average number of characters for the top terms per topic. Topics with longer mean token length such as Topic 5: 7.4 or Topic 4: 6.1 may include more descriptive words and may be more meaningful, shorter word lengths such as Topic 3: 4.3, may indicate simpler or less informative vocabulary.
The topic coherence is a measure of how often the top terms in each topic appear together in the same document. Looking at the table, the values are negative, but the closer to zero the better. The topics with the best coherence is Topic 4, Topic 9 and Topic 2. These topics are the most semantically consistent and likely represent clear themes.
While, the lowest coherence is Topic 3, Topic 8 and Topic 5. These topics are less coherent, meaning the top words may be more mixed or harder to interpret.
The topic exclusivity measures how unique the top terms in each topic are compared to other topics. Topic 5, Topic 3 and Topic 4 have strong exclusivity while Topic 2 and Topic 7 have low exclusivity. Topics with lower exclusivity have more words shared across topics, which reduces their distinctiveness.
16 d)
Based on my analysis, the topic quality suggests several opportunities for GameStop to improve customer satisfaction, to find consumer pain points and business performance. The high quality topics focused on gameplay experience, product quality, and accessories indicate these are the areas where customers engage the most in, so GameStop should continue prioritising these items such as consoles, controllers, and popular game titles. Lower quality topics, especially those with mixed or unclear themes likely show and reflect inconsistent customer experiences, particularly around pricing, delivery, and product reliability. GameStop could address these concerns by improving product descriptions, offering clearer return policies, and ensuring better quality control for used or refurbished items. The strong sentiment around “recommend,” “easy,” and “value” suggests that enhancing customer support, loyalty programmes, and bundle promotions could further increase positive sentiment and repeat purchases. Overall, this analysis points to the importance of focusing on reliability, competitive pricing, and better communication to strengthen customer trust and retention.