library(tidyverse)
library(tidytext)
library(wordcloud)Text Sentiment & Topic Modeling
Setup
First we are going to set-up our Quarto file by loading our libraries.
Question 1 a)
First we are going look at the 20 most used words in our reviews to get a sense of what people are mostly talking about. This insight will give us a bit of an idea about the topics in the reviews.
data(stop_words)
mcd <- read_csv("mcdonalds_reviews.csv")
mcd %>%
unnest_tokens(word, review, token = "words") %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
top_n(20, n) %>%
ggplot(aes(x = n, y = reorder(word, n))) +
geom_col() +
labs(
x = "Frequency",
y = NULL,
title = "Top 20 Most Frequently Occurring Words in McDonald's Reviews")Findings
Some of the relevant words words are service, people, fries, time and minutes these words show a pattern of comments about the McDonald’s service, important to parts seem to be waiting time, quality of food and staff behaviour. Nevertheless, these are just assumptions and we will need to conduct a more thorough analysis to get to the bottom of the meaning behind these words.
Question 1 b) i. & ii.
In this analysis we will be looking for sentiment of words used in reviews over time with the goal of finding a trend. We are looking to see of McDonald’s has improved, worsened or stay the same over the course of these reviews.They are stacked in clumps of 150 review to give us an idea of a timeline.
sentiments <- get_sentiments("bing")
data(stop_words)
mcd_words <- mcd %>%
unnest_tokens(word, review, token = "words") %>%
anti_join(stop_words)
mcd_sentiments <- inner_join(mcd_words, sentiments)
mcd_sentiments %>%
filter(sentiment == "positive") %>%
count(word, sort = TRUE)# A tibble: 441 × 2
word n
<chr> <int>
1 fast 232
2 pretty 146
3 hot 132
4 nice 132
5 clean 110
6 friendly 99
7 sweet 86
8 love 71
9 fresh 69
10 free 64
# ℹ 431 more rows
mcd_sentiments %>%
filter(sentiment == "negative") %>%
count(word, sort = TRUE)# A tibble: 813 × 2
word n
<chr> <int>
1 worst 215
2 bad 185
3 wrong 179
4 slow 137
5 rude 120
6 cold 113
7 horrible 81
8 dirty 71
9 hard 66
10 terrible 60
# ℹ 803 more rows
mcd_sentiments <- mcd %>%
unnest_tokens(word, review, token = "words") %>%
anti_join(stop_words) %>%
inner_join(sentiments)
mcd_sentiments <- mutate(mcd_sentiments, block = id %/% 150)
mcd_blocks <- mcd_sentiments %>%
group_by(block) %>%
count(sentiment)
ggplot(mcd_blocks) +
geom_col(aes(x = block, y = n)) +
facet_wrap(~ sentiment, nrow = 1) +
ylab("No. Sentiments")Findings
From the barcharts above we can clearly see that the reviewers tend to sway towards being negative, that is to be expected as a McDonald’s manager i handle customer service on the daily and with prices increasing and food quality stagnated, customers are often not happy. Consumers are also around 3-10 times more likely to leave a negative review do to a negative experience rather then a positive one due to a positive experience.
Question 1 c) i. & ii.
Next we will focus on the emotion behind with some of the words used in our reviews, it is important to find the sentiment behind these words as by themselves they don’t really tell us much.
library(textdata)
data(stop_words)
nrc_sentiments <- get_sentiments("nrc")
mcd_words <- mcd %>%
unnest_tokens(word, review, token = "words") %>%
anti_join(stop_words)
mcd_nrc <- inner_join(mcd_words, nrc_sentiments)
mcd_nrc %>%
count(sentiment, word, sort = TRUE) %>%
group_by(sentiment) %>%
top_n(10, n)# A tibble: 101 × 3
# Groups: sentiment [10]
sentiment word n
<chr> <chr> <int>
1 joy food 866
2 positive food 866
3 trust food 866
4 anticipation time 522
5 positive customer 186
6 anger bad 185
7 disgust bad 185
8 fear bad 185
9 negative bad 185
10 sadness bad 185
# ℹ 91 more rows
mcd_nrc %>%
count(sentiment, sort = TRUE)# A tibble: 10 × 2
sentiment n
<chr> <int>
1 positive 5896
2 negative 4245
3 trust 3526
4 anticipation 2978
5 joy 2820
6 fear 1910
7 anger 1902
8 sadness 1760
9 disgust 1672
10 surprise 1126
Findings
i. Interestingly enough a lot of the most used words are described by positive emotions, which slightly contradicts our previous findings, it is important to note that while there are many words to describe good experiences, there is much more words to describe bad ones. it is likely that the negative experiences are more spread out into more words, while the good ones are sitting at the top. Food is generally described as positive, which is a good sign for McDonald’s as its food quality is likely not the main reason for large amount of negative reviews.
ii. Words with positive emotions are more prevalent when using the nrc database, I would argue that the emotions felt by customers with negative experiences are much stronger. Disgust, anger and fear are stronger than emotions like joy, surprise or anticipation.
Question 1 d)
Finding the top bigrams is another way to get an idea of what our customers are talking about.
mcd_bigrams <- mcd %>%
unnest_tokens(bigram, review, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
unite(bigram, word1, word2, sep = " ")
bigram_counts <- count(mcd_bigrams, bigram, sort = TRUE)
top_bigrams <- top_n(bigram_counts, 20, n)
top_bigrams# A tibble: 20 × 2
bigram n
<chr> <int>
1 fast food 153
2 customer service 116
3 ice cream 61
4 worst mcdonalds 52
5 10 minutes 49
6 parking lot 43
7 worst mcdonald's 42
8 15 minutes 39
9 chicken nuggets 38
10 french fries 34
11 mickey d's 33
12 20 minutes 32
13 5 minutes 29
14 iced coffee 29
15 dollar menu 28
16 late night 28
17 sweet tea 27
18 24 hours 25
19 chicken sandwich 23
20 quarter pounder 23
Findings
The most interesting bigrams are customer service which could either be possitive or negative, worst McDonald’s which is highly negative and very often used, pointing to a mostly negative experience and 10, 15, 20 minutes which is likely negative as McDonald’s headquarters expects a time of 135 second to get an order from the time of ordering to the time of being dropped off at a customers table and while these targets aren’t shared with the customer, they create an expectation in the customers mind. These times clearly many times exceed that amount, pointing to the conclusion that customers expectations aren’t being met.
Question 1 e)
Finding the top trigrams is another way to get an idea of what our customers are talking about.
mcd_trigrams <- mcd %>%
unnest_tokens(trigram, review, token = "ngrams", n = 3) %>%
separate(trigram, c("word1", "word2", "word3"), sep = " ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word3 %in% stop_words$word) %>%
unite(trigram, word1, word2, word3, sep = " ")
trigram_counts <- count(mcd_trigrams, trigram, sort = TRUE)
top_trigrams <- top_n(trigram_counts, 20, n)
top_trigrams# A tibble: 26 × 2
trigram n
<chr> <int>
1 ice cream machine 10
2 worst customer service 10
3 24 hour drive 9
4 eat fast food 8
5 fast food restaurants 8
6 ice cream cone 8
7 10 piece chicken 7
8 fast food restaurant 7
9 sausage egg mcmuffin 7
10 terrible customer service 7
# ℹ 16 more rows
Findings
Some of the interesting trigrams are free wi-fi, this is clearly positive and customers appreciate it, some of the bad ones are terrible customer service, which is clearly something to be worked on, as that is a core value for McDonald’s, which they don’t seem to be delivering on. A more ambiguous one is ice cream machine, but it’s unlikely the customers would mention it unless there was an issue with it, like being broken or serving unsatisfactory cones.
Question 1 f) i. & ii. & iii.
waiting_reviews <- filter(mcd, str_detect(review, regex("waiting", ignore_case = TRUE)))
write_csv(waiting_reviews, "waiting_reviews.csv")
head(waiting_reviews, 10)# A tibble: 10 × 2
id review
<dbl> <chr>
1 2 "Terrible customer service. I came in at 9:30pm and stood in front of …
2 3 "First they \"lost\" my order, actually they gave it to someone one el…
3 8 "One Star and I'm beng kind. I blame management. last day of free coff…
4 9 "Never been upset about any fast food drive thru service till I came t…
5 22 "GHETTO!! went in yesterday just to get a soda and could not even park…
6 31 "It had been a while since I had stopped at this particular one. They …
7 40 "TOXIC DUMP! In food quality and employee humanity/work effortTypicall…
8 53 "Sometimes, you just need a Mickey D's fix. Usually, for me anyway, th…
9 66 "On my way to Curry Honda for my scheduled maintenance appointment, I …
10 69 "I purchased a specialty coffee in the drive through, but soon after I…
shamrock_reviews <- filter(mcd, str_detect(review, regex("shamrock shake", ignore_case = TRUE)))
write_csv(shamrock_reviews, "shamrock_reviews.csv")
head(shamrock_reviews, 10)# A tibble: 10 × 2
id review
<dbl> <chr>
1 359 "I left the Hilton late last night and I was really thirsty. This was …
2 414 "I stop here now and then as it's the closest to where I live. Custome…
3 479 "I have to tell you it's been 2 years since I've been at a McDonalds a…
4 776 "Worst shamrock shake ever. The new shakes are brutal. They didn't eve…
5 786 "This is probably the worst McDonald's ever. It doesn't matter what ti…
6 970 "THIS REVIEW IS FOR THE SHAMROCK SHAKE ONLYIt's brighter green than I …
7 1113 "Went here on March 10th, and there NO Shamrock Shakes. They weren't s…
8 1334 "What is a Shamrock Shake? It's a seasonal shake (milk?) by McDonald's…
9 1401 "I can't comment on the food because when I went to grab a Shamrock Sh…
10 1455 "I don't really eat fast food, let alone Yelp about it. I haven't eate…
icecream_reviews <- filter(mcd, str_detect(review, regex("ice cream machine", ignore_case = TRUE)))
write_csv(icecream_reviews, "icecream_reviews.csv")
head(icecream_reviews, 10)# A tibble: 9 × 2
id review
<dbl> <chr>
1 36 "The ice cream machine is always \"down\" after 11 p.m. If you want a h…
2 108 "Ice cream machine is always down, staff is rude and ghetto, food is al…
3 195 "This is the worst McDonald's I have ever been to.Yes, there ARE better…
4 260 "Every time I go their ice cream machine is down. It's a hang out for a…
5 377 "This place is a joke! It's disgusting enough of a fact that the only t…
6 382 "This is probably the worst McDonald's ever.. They don't know what they…
7 385 "Couldn't get a chocolate-dipped cone because they shut off the ice cre…
8 1120 "This is the McDonald's that my friends and I always go to since it's t…
9 1456 "I have never in my life wrote a corporation to complain about the busi…
Findings
i. Reviews containing waiting, are in the vast majority very negative, which is to be expected when searching for the word waiting as not often would that be used in a positive context, customers were outraged at the waiting times.
ii. Reviews containing shamrock shake were mostly about the taste of it, a few people were fans, but a larger amount was not impressed with it’s “fake” flavour. A lot of fans of the original are very discouraged by the unnatural taste of the current one.
iii. Reviews containing ice cream machine a lot of complaints about the ice cream machine always being broken and off at later hour of the night, this seems to be a common issue, unfortunately other than replacing these machines there is not much McDonald’s can do in this instance, as the machines are very difficult to maintain and often break.
Question 1 g)
Last but not least we are going to create word clouds, both positive and negative to see what words customers associate with good and bad experiences and to hopefully find out what we do well and what we can improve upon.
sentiments <- get_sentiments("bing")
mcd_words <- mcd %>%
unnest_tokens(word, review, token = "words") %>%
anti_join(stop_words)
mcd_word_sentiments <- inner_join(mcd_words, sentiments)
mcd_word_sentiments_count <- mcd_word_sentiments %>%
count(word, sentiment, sort = TRUE)
mcd_pos <- filter(mcd_word_sentiments_count, sentiment == "positive")
wordcloud(mcd_pos$word, mcd_pos$n, min.freq = 50, colors = brewer.pal(8, "Dark2"))mcd_neg <- filter(mcd_word_sentiments_count, sentiment == "negative")
wordcloud(mcd_neg$word, mcd_neg$n, min.freq = 50, colors = brewer.pal(8, "Dark2"))Findings
Some of the positive words include, ** hot, friendly, love, fresh, nice, fast, free, clean** and others. These experiences show us that the way to the customers heart is through friendly clean atmosphere, fresh hot food made in a timely matter. Some of the negative words are worst, wrong, hate, terrible, bad, cold, slow, rude, dirty and others. This show that a negative experience is a dirty restaurant with rude staff, slow service and bad cold food.
Question 2 a)
First we are going to prepare our data by counting the number of times each word appears in each review and converting to a Document Term Matrix.
library(topicmodels)
library(reshape2)
library(topicdoc)
gs <- read_csv("gamestop_product_reviews.csv")
data(stop_words)
gs_tokens <- gs %>%
unnest_tokens(output = word, input = review, token = "words") %>%
anti_join(stop_words)
gs_word_counts <- count(gs_tokens, id, word, sort = TRUE)
gs_dtm <- cast_dtm(gs_word_counts, document = id, term = word, value = n)Question 2 b) i. & ii. & iii.
We are using the Collapsed Gibb’s sampling method at a seed of 1234 and k = 8
gs_lda <- LDA(gs_dtm, method = "Gibbs", k = 8, control = list(seed = 1234))Question 2 c) i. & ii.
gs_lda_beta <- tidy(gs_lda, matrix = "beta")
gs_lda_top_terms <- gs_lda_beta %>%
group_by(topic) %>%
slice_max(beta, n = 8, with_ties = FALSE) %>%
ungroup() %>%
arrange(topic, -beta)
gs_lda_top_terms# A tibble: 64 × 3
topic term beta
<int> <chr> <dbl>
1 1 love 0.0734
2 1 monitor 0.0410
3 1 amazing 0.0332
4 1 gaming 0.0314
5 1 easy 0.0204
6 1 perfect 0.0195
7 1 quality 0.0175
8 1 screen 0.0140
9 2 batteries 0.0943
10 2 energizer 0.0368
# ℹ 54 more rows
gs_lda_top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 10 terms in each LDA topic",
x = expression(beta),
y = NULL) +
facet_wrap(~ topic, ncol = 4, scales = "free")topic_quality <- topic_diagnostics(gs_lda, gs_dtm)
topic_quality topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
1 1 1197.873 5.7 0.6365788 3.955873
2 2 1077.594 5.6 0.6399117 8.255278
3 3 1222.095 6.0 0.6230645 3.006603
4 4 1169.795 5.8 0.5922493 14.769346
5 5 1254.497 4.7 0.6211464 4.031065
6 6 1083.415 5.7 0.6154689 12.155766
7 7 1343.169 4.3 0.5895155 12.305062
8 8 1255.563 5.2 0.5896247 12.695102
doc_prominence topic_coherence topic_exclusivity
1 208 -174.7356 9.867745
2 405 -135.2671 9.921846
3 168 -206.8343 9.915403
4 166 -158.7330 9.695110
5 160 -174.0846 9.800511
6 142 -158.0921 9.844414
7 128 -166.8587 9.529543
8 118 -154.3059 9.508313
Findings
i.
Topic 1 Focuses of quality and love for their displays and screens
Topic 2 Relates to the batteries and battery life, suggesting reviews about battery longevity.
Topic 3 About gaming accessories and consoles, reviews about gaming gear purchases
Topic 4 Associated with the video game Pokemon, points to players being happy with the game due to words like loved and awesome.
Topic 5 Focused on TVs and their picture quality, with words like price, suggests a focus on price and quality of TVs
Topic 6 This topic captures a general enjoyment of games and discussion about the games sold in Gamestop
Topic 7 Related to the game franchise Fallout, suggesting a discusion about the games.
Topic 8 Related to the game franchise Zelda, suggesting a discusion about the games.
ii. Topics 4, 7, and 8 These score well on both coherence and exclusivity, and their word lists are clearly interpretable. Topics 1 and 5 While still meaningful, they are more general product-review topics with lower exclusivity.
Question 2 d)
From the analysis it is clear Gamestop customers are mostly gamers looking for a good combination of price and quality. They are often looking for peripherals, so perhaps bundle deals could help them upsell, like for example a Playstation bundle deal, with 2 controllers and a headset. Gamestop customers are also big fans of vintage games, like Pokemon and Zelda, a focus on marketing vintage games with massive fanbases, would help Gamestop gain more traction as it is often the only place outside online marketplaces, where one can get a vintage game. Gamestop should also look into creating an online forum, where players can interact and share experiences with the games they’ve purchased, this could rise the customer loyalty and community feel of Gamestop.