library(httr)
library(rvest)
library(dplyr)
library(stringr)
library(purrr)
library(readr)
library(tidyverse)
steam_raw <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/ugwuo_xavier_edu/IQCUYr00la_rRb3kxX6u0J2CAXuH9rluCxUhwvhCy8YL614?download=1")Exploring Steam Game Prices and Discounts
Introduction and Research Question
Digital game platforms like Steam constantly feature a mix of full-price, discounted, and free-to-play titles. From a player’s perspective, it is useful to understand what the overall price landscape looks like and how often games are on sale. From a business perspective, discounts and free-to-play models are central to how games attract attention and players.
In this project, I scraped data from Steam’s public search results pages for PC games. For each game, I collected the title, release date, raw price text, the current numeric price, whether the game was discounted, and the page of the search results where it appeared. Using this dataset, I investigate the following question:
How are steam game prices distributed, and how do discounts and free-to-play titles shape the price landscape?
To answer this, I will summarize the distribution of current prices, compare discounted vs non-discounted games, and quantify the magnitude of discounts where they occur.
Data Collection
The data for this project was collected from the Steam Store search results for PC games.
base_url : https://store.steampowered.com/search/?category1=998&page=
I collected the data by writing a custom R script that sends an HTTP GET request to the Steam Store’s search results URL for PC games, rotating through pages 1–6 of the results. I used the rvest library to extract the relevant HTML elements and process them into structured data. Because Steam uses different text formats for free games, regular prices, and discounted prices, I used regular expressions to extract the current price and identify whether each game is on sale. The script includes a clear identifying user-agent string and a randomized 2–3 second delay between requests to ensure ethical scraping behavior.
The dataset includes each game’s title, release date, raw price text, numeric current price, and whether the game is discounted. This data is suitable for my question because it directly reflects how Steam displays prices and discounts to users and captures the real-time mix of free, full-price, and discounted games. By scraping multiple pages of results instead of only the first page, I obtain a more representative sample across a wide range of price points and game types.
Importing Dataset
Data Overview
steam_raw %>%
head(10)# A tibble: 10 × 10
title release_date price_raw price_current has_discount page price_nums
<chr> <chr> <chr> <dbl> <lgl> <dbl> <lgl>
1 ARC Raide… Oct 30, 2025 $39.99 40.0 FALSE 1 NA
2 Counter-S… Aug 21, 2012 Free 0 FALSE 1 NA
3 Dispatch Oct 22, 2025 $29.99 30.0 FALSE 1 NA
4 Where Win… Nov 14, 2025 Free 0 FALSE 1 NA
5 Battlefie… Oct 10, 2025 $69.99 $… 59.5 TRUE 1 NA
6 Risk of R… Aug 11, 2020 $24.99 $… 8.24 TRUE 1 NA
7 Warframe Mar 25, 2013 Free 0 FALSE 1 NA
8 Clair Obs… Apr 24, 2025 $49.99 50.0 FALSE 1 NA
9 Umamusume… Jun 24, 2025 Free 0 FALSE 1 NA
10 Ghost of … May 16, 2024 $59.99 $… 36.0 TRUE 1 NA
# ℹ 3 more variables: original_price <dbl>, discount_amount <dbl>,
# discount_percent <dbl>
The raw dataset contains 6 columns:
title – game title
release_date – release date as a character string (e.g., “Oct 30, 2025”)
price_raw – raw text from the row, such as “Free” or “$69.99 $59.49”
price_current – numeric current price (0 for free games)
has_discount – TRUE if multiple numeric prices were detected
page – search results page number
I performed the following cleaning and transformation steps:
1. Filtered out any rows missing price_current.
2. Extracted the release year from release_date.
3. Created a categorical price band variable for easier comparison.
4. Calculated the original price, discount amount, and discount percentage for discounted games.
Summary Statistics
This summary gives us:
Minimum/maximum prices
Average price of Steam games
Range of discount percentages
This helps us understand the spread of the pricing data.
summary(select(steam_raw, price_current, original_price, discount_percent)) price_current original_price discount_percent
Min. : 0.000 Min. : 0.00 Min. :-1566.000
1st Qu.: 8.428 1st Qu.: 9.99 1st Qu.: 0.000
Median :19.990 Median :39.99 Median : 8.759
Mean :23.834 Mean :35.08 Mean : 13.205
3rd Qu.:34.990 3rd Qu.:59.99 3rd Qu.: 50.008
Max. :69.990 Max. :99.99 Max. : 95.032
Analysis and Visualization
Distribution of Game Prices
This histogram shows how most games fall into lower price ranges, with many being free or under $20. Only a small number of games are priced above $40.
Free vs Paid Games
Steam features a noticeable portion of free games. Paid games make up the majority, but free titles represent a significant segment of the marketplace.
Discounts vs No Discounts
A large number of games are discounted at any given time, showing that discounting is a major part of Steam’s pricing ecosystem
Free-to-Play vs Paid
Even with free titles, games are inexpensive; most fall under $20, showing steam’s affordable catalog.
Price vs Discount Percentage
Higher priced games tend to have a more varying discount percentage. there is no clear linear relationship, suggesting that pricing differs between publishers.
Conclusion
The visualizations show that Steam game prices are heavily skewed toward the low end, with many titles priced under $20 and a significant cluster of free-to-play games. Once free games are removed, most paid titles still fall below $30, indicating that cheaper games dominate the marketplace.
Discounts also play a major role: discounted and non-discounted games appear in almost equal numbers, and discounts occur across all price levels. Higher-priced games often receive deeper discounts, which compresses the overall price range and makes premium titles more affordable.
Overall, free-to-play games anchor the bottom of the price distribution, while widespread discounts lower and reshape the entire price landscape, creating a market where most games are inexpensive or temporarily made inexpensive.