Rows: 2,999
Columns: 13
$ appid <dbl> 1808500, 1808500, 1808500, 1808500, 1808500, 180850…
$ title <chr> "ARC Raiders", "ARC Raiders", "ARC Raiders", "ARC R…
$ has_discount <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ review_id <dbl> 212582419, 212582393, 212582291, 212582197, 2125821…
$ review_text <chr> ",", "Best multiplayer shoot n loot I have ever pla…
$ timestamp_created <dttm> 2025-12-05 00:37:00, 2025-12-05 00:36:32, 2025-12-…
$ voted_up <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, T…
$ votes_up <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, …
$ votes_funny <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ playtime_at_review <dbl> 13558, 7701, 9945, 2736, 7879, 7953, 19019, 24054, …
$ price_current <dbl> 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39…
$ original_price <dbl> 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39…
$ discount_percent <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Steam Games Sentiment Analysis
Introduction
Research focus
In this analysis, I examine whether Steam game reviews differ in emotional tone and sentiment depending on whether a game is discounted or sold at full price.
Data
Data Source
The dataset consists of Steam user reviews collected using the Steam reviews JSON endpoint in a separate script.
The dataset includes:
Review text
Timestamp of review submission
Whether the associated game is discounted
Basic game-level information (title and price):
Data wrangling
Before analysis, the following steps are performed:
- Convert timestamps to date
- Create a price group label (Discounted vs Full Price)
- Tokenize text into individual words
- Remove stop words
# A tibble: 6 × 15
appid title has_discount review_id review_text timestamp_created voted_up
<dbl> <chr> <lgl> <int> <chr> <dttm> <lgl>
1 306130 The El… FALSE 1 "This game… 2025-11-27 23:17:22 TRUE
2 306130 The El… FALSE 2 "Still try… 2025-11-28 23:48:01 TRUE
3 306130 The El… FALSE 3 "crap grin… 2025-11-28 23:10:06 FALSE
4 306130 The El… FALSE 4 "Played th… 2025-11-28 21:50:38 TRUE
5 306130 The El… FALSE 5 "I origina… 2025-11-28 21:19:35 FALSE
6 306130 The El… FALSE 6 "I recomen… 2025-11-28 19:54:58 TRUE
# ℹ 8 more variables: votes_up <dbl>, votes_funny <dbl>,
# playtime_at_review <dbl>, price_current <dbl>, original_price <dbl>,
# discount_percent <dbl>, review_date <date>, price_group <chr>
Tokenize reviews and remove stop words
# A tibble: 6 × 15
appid title has_discount review_id timestamp_created voted_up votes_up
<dbl> <chr> <lgl> <int> <dttm> <lgl> <dbl>
1 306130 The Elder… FALSE 1 2025-11-27 23:17:22 TRUE 0
2 306130 The Elder… FALSE 1 2025-11-27 23:17:22 TRUE 0
3 306130 The Elder… FALSE 1 2025-11-27 23:17:22 TRUE 0
4 306130 The Elder… FALSE 1 2025-11-27 23:17:22 TRUE 0
5 306130 The Elder… FALSE 1 2025-11-27 23:17:22 TRUE 0
6 306130 The Elder… FALSE 1 2025-11-27 23:17:22 TRUE 0
# ℹ 8 more variables: votes_funny <dbl>, playtime_at_review <dbl>,
# price_current <dbl>, original_price <dbl>, discount_percent <dbl>,
# review_date <date>, price_group <chr>, word <chr>
Word count
The table shows the most common words used in Steam reviews by pricing group. Full-price games consistently exhibit higher word counts than discounted games, indicating greater review volume or verbosity. Frequently used terms emphasize gameplay experience and enjoyment, such as “game,” “fun,” “play,” and “love.”
# A tibble: 10,720 × 3
price_group word n
<chr> <chr> <int>
1 Full Price game 1813
2 Discounted game 615
3 Full Price fun 344
4 Full Price play 339
5 Full Price 10 255
6 Full Price time 246
7 Full Price games 194
8 Full Price story 180
9 Full Price love 144
10 Full Price hours 141
# ℹ 10,710 more rows
Visualization: Positive and Negative words
Question 1: Are reviews for discounted games more positive or negative, on average, than reviews for full price?
# A tibble: 4 × 3
# Groups: price_group [2]
price_group sentiment n
<chr> <chr> <int>
1 Discounted negative 351
2 Discounted positive 238
3 Full Price negative 704
4 Full Price positive 452
The analysis shows that Steam review language differs meaningfully based on price status. Discounted titles receive more praise, while full-price games attract stronger criticism. This suggests that consumer expectations scale with cost, influencing both sentiment intensity and word choice in reviews.
Emotional Tone Analysis
Question 2: Are Steam for discounted and full price games associated with different emotional profiles?
The NRC sentiment analysis reveals that full-price Steam games generate significantly stronger emotional responses than discounted titles across both positive and negative categories, indicating that pricing influences not only player satisfaction but the emotional intensity with which experiences are evaluated.
Conclusion
The results indicate that pricing influences both sentiment and emotional tone in Steam reviews. Discounted games are associated with calmer, value-focused positive sentiment, whereas full-price games generate stronger emotional reactions, including both heightened praise and criticism. This suggests that consumer expectations scale with price, shaping how players emotionally evaluate their experiences.