Steam Games Sentiment Analysis

Introduction

Research focus

In this analysis, I examine whether Steam game reviews differ in emotional tone and sentiment depending on whether a game is discounted or sold at full price.

Data

Data Source

The dataset consists of Steam user reviews collected using the Steam reviews JSON endpoint in a separate script.

The dataset includes:

  • Review text

  • Timestamp of review submission

  • Whether the associated game is discounted

  • Basic game-level information (title and price):

Rows: 2,999
Columns: 13
$ appid              <dbl> 1808500, 1808500, 1808500, 1808500, 1808500, 180850…
$ title              <chr> "ARC Raiders", "ARC Raiders", "ARC Raiders", "ARC R…
$ has_discount       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ review_id          <dbl> 212582419, 212582393, 212582291, 212582197, 2125821…
$ review_text        <chr> ",", "Best multiplayer shoot n loot I have ever pla…
$ timestamp_created  <dttm> 2025-12-05 00:37:00, 2025-12-05 00:36:32, 2025-12-…
$ voted_up           <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, T…
$ votes_up           <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, …
$ votes_funny        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ playtime_at_review <dbl> 13558, 7701, 9945, 2736, 7879, 7953, 19019, 24054, …
$ price_current      <dbl> 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39…
$ original_price     <dbl> 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39.99, 39…
$ discount_percent   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

Data wrangling

Before analysis, the following steps are performed:

  1. Convert timestamps to date
  2. Create a price group label (Discounted vs Full Price)
  3. Tokenize text into individual words
  4. Remove stop words
# A tibble: 6 × 15
   appid title   has_discount review_id review_text timestamp_created   voted_up
   <dbl> <chr>   <lgl>            <int> <chr>       <dttm>              <lgl>   
1 306130 The El… FALSE                1 "This game… 2025-11-27 23:17:22 TRUE    
2 306130 The El… FALSE                2 "Still try… 2025-11-28 23:48:01 TRUE    
3 306130 The El… FALSE                3 "crap grin… 2025-11-28 23:10:06 FALSE   
4 306130 The El… FALSE                4 "Played th… 2025-11-28 21:50:38 TRUE    
5 306130 The El… FALSE                5 "I origina… 2025-11-28 21:19:35 FALSE   
6 306130 The El… FALSE                6 "I recomen… 2025-11-28 19:54:58 TRUE    
# ℹ 8 more variables: votes_up <dbl>, votes_funny <dbl>,
#   playtime_at_review <dbl>, price_current <dbl>, original_price <dbl>,
#   discount_percent <dbl>, review_date <date>, price_group <chr>

Tokenize reviews and remove stop words

# A tibble: 6 × 15
   appid title      has_discount review_id timestamp_created   voted_up votes_up
   <dbl> <chr>      <lgl>            <int> <dttm>              <lgl>       <dbl>
1 306130 The Elder… FALSE                1 2025-11-27 23:17:22 TRUE            0
2 306130 The Elder… FALSE                1 2025-11-27 23:17:22 TRUE            0
3 306130 The Elder… FALSE                1 2025-11-27 23:17:22 TRUE            0
4 306130 The Elder… FALSE                1 2025-11-27 23:17:22 TRUE            0
5 306130 The Elder… FALSE                1 2025-11-27 23:17:22 TRUE            0
6 306130 The Elder… FALSE                1 2025-11-27 23:17:22 TRUE            0
# ℹ 8 more variables: votes_funny <dbl>, playtime_at_review <dbl>,
#   price_current <dbl>, original_price <dbl>, discount_percent <dbl>,
#   review_date <date>, price_group <chr>, word <chr>

Word count

The table shows the most common words used in Steam reviews by pricing group. Full-price games consistently exhibit higher word counts than discounted games, indicating greater review volume or verbosity. Frequently used terms emphasize gameplay experience and enjoyment, such as “game,” “fun,” “play,” and “love.”

# A tibble: 10,720 × 3
   price_group word      n
   <chr>       <chr> <int>
 1 Full Price  game   1813
 2 Discounted  game    615
 3 Full Price  fun     344
 4 Full Price  play    339
 5 Full Price  10      255
 6 Full Price  time    246
 7 Full Price  games   194
 8 Full Price  story   180
 9 Full Price  love    144
10 Full Price  hours   141
# ℹ 10,710 more rows

Visualization: Positive and Negative words

Question 1: Are reviews for discounted games more positive or negative, on average, than reviews for full price?

# A tibble: 4 × 3
# Groups:   price_group [2]
  price_group sentiment     n
  <chr>       <chr>     <int>
1 Discounted  negative    351
2 Discounted  positive    238
3 Full Price  negative    704
4 Full Price  positive    452

The analysis shows that Steam review language differs meaningfully based on price status. Discounted titles receive more praise, while full-price games attract stronger criticism. This suggests that consumer expectations scale with cost, influencing both sentiment intensity and word choice in reviews.

Emotional Tone Analysis

Question 2: Are Steam for discounted and full price games associated with different emotional profiles?

The NRC sentiment analysis reveals that full-price Steam games generate significantly stronger emotional responses than discounted titles across both positive and negative categories, indicating that pricing influences not only player satisfaction but the emotional intensity with which experiences are evaluated.

Conclusion

The results indicate that pricing influences both sentiment and emotional tone in Steam reviews. Discounted games are associated with calmer, value-focused positive sentiment, whereas full-price games generate stronger emotional reactions, including both heightened praise and criticism. This suggests that consumer expectations scale with price, shaping how players emotionally evaluate their experiences.