Exploring Steam Game Prices and Discounts

Author

Oma Ugwu

Introduction and Research Question

Digital game platforms like Steam constantly feature a mix of full-price, discounted, and free-to-play titles. From a player’s perspective, it is useful to understand what the overall price landscape looks like and how often games are on sale. From a business perspective, discounts and free-to-play models are central to how games attract attention and players.

In this project, I scraped data from Steam’s public search results pages for PC games. For each game, I collected the title, release date, raw price text, the current numeric price, whether the game was discounted, and the page of the search results where it appeared. Using this dataset, I investigate the following question:

How are steam game prices distributed, and how do discounts and free-to-play titles shape the price landscape?

To answer this, I will summarize the distribution of current prices, compare discounted vs non-discounted games, and quantify the magnitude of discounts where they occur.

Data Collection

The data for this project was collected from the Steam Store search results for PC games.

base_url : https://store.steampowered.com/search/?category1=998&page=

I collected the data by writing a custom R script that sends an HTTP GET request to the Steam Store’s search results URL for PC games, rotating through pages 1–6 of the results. I used the rvest library to extract the relevant HTML elements and process them into structured data. Because Steam uses different text formats for free games, regular prices, and discounted prices, I used regular expressions to extract the current price and identify whether each game is on sale. The script includes a clear identifying user-agent string and a randomized 2–3 second delay between requests to ensure ethical scraping behavior.

The dataset includes each game’s title, release date, raw price text, numeric current price, and whether the game is discounted. This data is suitable for my question because it directly reflects how Steam displays prices and discounts to users and captures the real-time mix of free, full-price, and discounted games. By scraping multiple pages of results instead of only the first page, I obtain a more representative sample across a wide range of price points and game types.

Importing Dataset

library(httr)
library(rvest)
library(dplyr)
library(stringr)
library(purrr)
library(readr)
library(tidyverse)

steam_raw <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/ugwuo_xavier_edu/IQCUYr00la_rRb3kxX6u0J2CAXuH9rluCxUhwvhCy8YL614?download=1")

Data Overview

steam_raw %>%
  head(10)
# A tibble: 10 × 10
   title      release_date price_raw price_current has_discount  page price_nums
   <chr>      <chr>        <chr>             <dbl> <lgl>        <dbl> <lgl>     
 1 ARC Raide… Oct 30, 2025 $39.99            40.0  FALSE            1 NA        
 2 Counter-S… Aug 21, 2012 Free               0    FALSE            1 NA        
 3 Dispatch   Oct 22, 2025 $29.99            30.0  FALSE            1 NA        
 4 Where Win… Nov 14, 2025 Free               0    FALSE            1 NA        
 5 Battlefie… Oct 10, 2025 $69.99 $…         59.5  TRUE             1 NA        
 6 Risk of R… Aug 11, 2020 $24.99 $…          8.24 TRUE             1 NA        
 7 Warframe   Mar 25, 2013 Free               0    FALSE            1 NA        
 8 Clair Obs… Apr 24, 2025 $49.99            50.0  FALSE            1 NA        
 9 Umamusume… Jun 24, 2025 Free               0    FALSE            1 NA        
10 Ghost of … May 16, 2024 $59.99 $…         36.0  TRUE             1 NA        
# ℹ 3 more variables: original_price <dbl>, discount_amount <dbl>,
#   discount_percent <dbl>

The raw dataset contains 6 columns:

  • title – game title

  • release_date – release date as a character string (e.g., “Oct 30, 2025”)

  • price_raw – raw text from the row, such as “Free” or “$69.99 $59.49”

  • price_current – numeric current price (0 for free games)

  • has_discount – TRUE if multiple numeric prices were detected

  • page – search results page number

I performed the following cleaning and transformation steps:

1.   Filtered out any rows missing price_current.
2.   Extracted the release year from release_date.
3.   Created a categorical price band variable for easier comparison.
4.  Calculated the original price, discount amount, and discount percentage for discounted games.

Summary Statistics

This summary gives us:

  • Minimum/maximum prices

  • Average price of Steam games

  • Range of discount percentages

This helps us understand the spread of the pricing data.

summary(select(steam_raw, price_current, original_price, discount_percent))
 price_current    original_price  discount_percent   
 Min.   : 0.000   Min.   : 0.00   Min.   :-1566.000  
 1st Qu.: 8.428   1st Qu.: 9.99   1st Qu.:    0.000  
 Median :19.990   Median :39.99   Median :    8.759  
 Mean   :23.834   Mean   :35.08   Mean   :   13.205  
 3rd Qu.:34.990   3rd Qu.:59.99   3rd Qu.:   50.008  
 Max.   :69.990   Max.   :99.99   Max.   :   95.032  

Analysis and Visualization

Distribution of Game Prices

This histogram shows how most games fall into lower price ranges, with many being free or under $20. Only a small number of games are priced above $40.

Free vs Paid Games

Steam features a noticeable portion of free games. Paid games make up the majority, but free titles represent a significant segment of the marketplace.

Discounts vs No Discounts

A large number of games are discounted at any given time, showing that discounting is a major part of Steam’s pricing ecosystem

Free-to-Play vs Paid

Even with free titles, games are inexpensive; most fall under $20, showing steam’s affordable catalog.

Price vs Discount Percentage

Higher priced games tend to have a more varying discount percentage. there is no clear linear relationship, suggesting that pricing differs between publishers.

Conclusion

The visualizations show that Steam game prices are heavily skewed toward the low end, with many titles priced under $20 and a significant cluster of free-to-play games. Once free games are removed, most paid titles still fall below $30, indicating that cheaper games dominate the marketplace.

Discounts also play a major role: discounted and non-discounted games appear in almost equal numbers, and discounts occur across all price levels. Higher-priced games often receive deeper discounts, which compresses the overall price range and makes premium titles more affordable.

Overall, free-to-play games anchor the bottom of the price distribution, while widespread discounts lower and reshape the entire price landscape, creating a market where most games are inexpensive or temporarily made inexpensive.