library(tidyverse)
library(lubridate)
library(tidytext)
library(textdata)
library(widyr)
library(igraph)
library(ggraph)Comparing Apples to Oranges, a Steam Game Text Analysis
Introduction
There are many analysis of steam games all across the world. Steam is one of the most commonly used apps let alone gaming platforms, and they are very consumer-friendly with their data access. Being the person that I am, I did not want to do the usual steam game analysis, so in this document we will dive deep into comparing Apples to Oranges
You’re Doing What?
Yup! That’s what I’m doing! I originally wanted to analyze the reviews of the two games “Apple” and “Orange”, cheap clicker games that are most certainly cash-grabs or first-time projects, seen here:
https://store.steampowered.com/app/3066470/Apple/
https://store.steampowered.com/app/3095450/Orange/
However, I ran into two big problems:
Orange has not released!
Apple doesn’t have enough coherent reviews to analyze anything!
To salvage something from this one-in-a-million idea, I pivoted to two other games in a similar vain, Apple Slash, a top-down sword-fighting adevnture game, and Orange Season, a farming game very similar to Stardew Valley:
https://store.steampowered.com/app/1127850/Apple_Slash/
https://store.steampowered.com/app/416000/Orange_Season/
The Objective
In this project, I have a single, primary goal that I want to achieve:
“Which game do players prefer - ironically or not - and why?”
I will do this by analyzing the comments of both Apple Slash & Orange Season, and compare the average sentiment of their comments to see what people think of both games!
Preparing The Data
I have already done the back-end of getting all 144 Apple Slash reviews that are in English and 200 Orange Season reviews directly from steam, which are seen here:
appleReviews = read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/campisec_xavier_edu/IQBt4dOsURcMRLYOe5iBhHOXAX1sWyT3wQ4aMQpWsMjYKOA?download=1")
orangeReviews = read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/campisec_xavier_edu/IQCnnthTv_nHRqwyhGz12F3KAZDcvTBXtF5bEmib5-_KgBE?download=1")
head(appleReviews)# A tibble: 6 × 1
review_text
<chr>
1 "Posted: February 5, 2020\nProduct received for free\n\n\nJust a little taste…
2 "Posted: February 5, 2020\nWhoao, very cool little game! Love the music and s…
3 "Posted: July 9, 2023\nEntertaining but also kind of depressing....."
4 "Posted: February 12, 2021\nA jaunty but somewhat melancholic micro-epic set …
5 "Posted: February 25, 2020\nApple Splash is a fun, quick game where you play …
6 "Posted: March 11, 2021\nThere's something to be said for a game that FEELS g…
head(orangeReviews)# A tibble: 6 × 1
review_text
<chr>
1 "Posted: October 24, 2024\nI have put 500 hours into the game. Back when Hude…
2 "Posted: October 24, 2024\nIt's still half-done. There's so much missing from…
3 "Posted: November 1, 2024\nThis game is genuinely NOT finished. Do not let th…
4 "Posted: January 13, 2020\nEarly Access Review\n\nFirst off - yes, this game …
5 "Posted: October 24, 2024\nThis game look AMAZING when i initially bought it.…
6 "Posted: December 15, 2019\nEarly Access Review\n\nI really want to like this…
We can see that there is some extra bits near the start, let’s separate the date posted into it’s own column, and the combine the data sets!
appleReviews = appleReviews %>%
mutate(review_date = mdy(sub("\n.*", "", review_text)),
review_text = sub(".*\n", "", review_text),
game = "Apple")
orangeReviews = orangeReviews %>%
mutate(review_date = mdy(sub("\n.*", "", review_text)),
review_text = sub(".*\n", "", review_text),
game = "Orange")
allReviews = bind_rows(appleReviews,orangeReviews)
head(allReviews)# A tibble: 6 × 3
review_text review_date game
<chr> <date> <chr>
1 If you found this review helpful, please consider following… 2020-02-05 Apple
2 Whoao, very cool little game! Love the music and sound effe… 2020-02-05 Apple
3 Entertaining but also kind of depressing..... 2023-07-09 Apple
4 At the end, the developer asks players if a more extensive … 2021-02-12 Apple
5 Apple Splash is a fun, quick game where you play a little a… 2020-02-25 Apple
6 My only gripe - what happens next?? 2021-03-11 Apple
There we are! Now we are ready to start!
Analyzing The Data
Let’s start of simple, what are the most common words we can find?
wordAnalysis = allReviews %>%
arrange(review_date) %>%
mutate(review_id = row_number())
wordAnalysis = wordAnalysis %>%
unnest_tokens(word, review_text) %>%
anti_join(stop_words)
wordAnalysis %>%
group_by(game, word) %>%
summarize(n = n()) %>%
arrange(-n) %>%
head()# A tibble: 6 × 3
# Groups: game [2]
game word n
<chr> <chr> <int>
1 Orange game 260
2 Apple game 115
3 Apple short 41
4 Apple fun 39
5 Orange farming 39
6 Orange play 34
This makes sense as most if not all reviews refer to their games in some way, let’s make a graph to get a better grip at what we’re looking at, as well as initial sentiment viewing.
sentimentBing = get_sentiments("bing")
review_counts = wordAnalysis %>%
group_by(game, word) %>%
summarize(n = n()) %>%
inner_join(sentimentBing)
# checking for positive and negative words
review_counts %>%
group_by(game,sentiment) %>%
summarize(n = n()) %>%
arrange(-n)# A tibble: 4 × 3
# Groups: game [2]
game sentiment n
<chr> <chr> <int>
1 Orange negative 128
2 Orange positive 118
3 Apple positive 69
4 Apple negative 49
# Barplot for positive and negative words
review_counts %>%
group_by(game) %>%
filter(n>5) %>%
mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col() +
coord_flip() +
facet_wrap(~game, ncol = 2) +
geom_text(aes(label = signif(n, digits = 3)), nudge_y = 8) +
labs(title = "Positive and Negative Words for Apple Slash and Orange Season",
subtitle = "Only words appearing at least 5 times are shown")Excellent! Right off the bat, we can see that there are many reviews that label Apple Slash as “fun”, “worth”, etc. while many reviews of Orange Season refer to it as “buggy” and “bad”. Let’s look more into this with another sentiment type that showcases overall emotion rather than positive/negative.
sentimentNRC <- get_sentiments("nrc")
wordAnalysis %>%
inner_join(sentimentNRC, by = "word", relationship = "many-to-many") %>%
group_by(sentiment, game) %>%
summarize(n = n()) %>%
ggplot(aes(x = sentiment, y = n, fill = game)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("red", "orange")) +
labs(title = "Brewery Sentiment Scores",
subtitle = "Total number of emotive words scored ",
y = "Total Number of Words",
x = "Emotional Sentiment",
fill = "Brewery")Now this graph may seen very Pro-Orange Season, but we can see that they have more words in every category! Let’s standardize them to see what people really think.
wordAnalysis %>%
inner_join(sentimentNRC, by = "word", relationship = "many-to-many") %>%
group_by(game) %>%
mutate(total_words = n()) %>%
group_by(sentiment, game, total_words) %>%
summarize(n = n(), .groups = "drop") %>%
mutate(prop = n / total_words) %>%
ggplot(aes(x = sentiment, y = prop, fill = game)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("red", "orange")) +
labs(title = "Brewery Sentiment Scores",
subtitle = "Proportion of emotive words",
y = "Proportion of Words",
x = "Emotional Sentiment",
fill = "Brewery")This looks better! We can that Orange Season has more negative-tending reviews, like disgust/fear/negative/sadness, while Apple Slash is the opposite! We can see trends of joy, positive, and anticipations!
Answering The Question
Now, the answer you’ve all been waiting for… which game do the players prefer?
….Drumroll please….
…
It’s Apple Slash by a large margin.
Now, while both games are completely different generas, there is a very apparent trend between the two sentiments. In the first graph, we can already see a trend in which Apple Slash contains much more positive sentiments compared to Orange Season. Alongside this, in the same graph, there are no actual negative sentiments on the Apple Slash side. While there are technically 2, “hack” and “enemies”, these are important terms used in gaming that aren’t inherently negative! “Hack” tends to refer to the “Hack ’n Slash” genera, where games tend to feature fast paced melee combat. “Enemies” is a more obvious one, in which the review is most certainly speaking of the in-game enemies that the player must defeat. We can also see an apparent trend in the 3rd graph, as explained earlier, with much more negative-leaning sentiments being a majority Orange Season, compared to Apple Slash having mostly a majority in more positive sentiments. Overall, the winner is clear!