Introduction

This study applies association rule mining to analyze the factors influencing game pricing. The primary objective is to identify key determinants of game prices and explore potential relationships between game characteristics and pricing strategies.

Here are some pre-analysis expectations:

Indie games and older games are priced lower than their counterparts.
Games classified as free-to-play are likely to contain in-app purchases, since they still have to financed.
Titles with minimum hardware and software requirements should have higher prices, reflecting increased development complexity and quality.

This study uses association rules to reveal relevant insights into price patterns in the gaming business.

Dataset

The author of the dataset games-features is https://data.world/craigkelly. Several changes have been made and only parts of interest were kept in order to find associations between game characteristics and price. The key retained variables and their justifications are outlined below:

Release Date: Converted into a categorical variable (Timeframe) representing the game’s age.
IsFree: A binary variable used to define price categories.
PriceInitial: The game’s initial price, serving as the baseline for the PriceGroup classification.
Categories: Indicates whether a game is single-player, multiplayer, coop, MMO, etc., serving as potential pricing determinants.
InAppPurchase: A binary indicator of in-app purchases, often associated with free-to-play monetization models.
GenreIsIndie: A binary variable used to evaluate whether indie games tend to be priced lower.
GenreIsNonGame & PriceCurrency: Retained for data integrity, ensuring that non-game items and games without valid pricing data are excluded.

Data Preparation

The following section outlines the preprocessing steps applied to the dataset. Given the relatively straightforward nature of most operations, detailed explanations are provided only for bins of key variables. The data’s range is from 1998 to 2017 but most of the observations are on the right tail therefore the bins were created in such a way that keeps the Recent and New groups relatively even, as the Old group is of less interest. For the PriceGroup variable the Average is between the first and third quartiles meaning the middle 50% of the observations. Naturally the cheap level is below the first quartile and expensive is above the third.

library(dplyr)
library(arules)
library(kableExtra)
data <- read.csv("games-features.csv")
data[data == ""] <- NA
data[data == " "] <- NA
data<-na.omit(data)
data <- data[data$GenreIsNonGame == FALSE, ]
str(data)
## 'data.frame':    10476 obs. of  13 variables:
##  $ ReleaseDate          : chr  "Nov 1 2000" "Apr 1 1999" "May 1 2003" "Jun 1 2001" ...
##  $ IsFree               : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PCReqsHaveRec        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ CategorySinglePlayer : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ CategoryMultiplayer  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ CategoryCoop         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ CategoryMMO          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ CategoryInAppPurchase: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ CategoryVRSupport    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ GenreIsNonGame       : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ GenreIsIndie         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PriceCurrency        : chr  "USD" "USD" "USD" "USD" ...
##  $ PriceInitial         : num  9.99 4.99 4.99 4.99 4.99 4.99 9.99 9.99 9.99 4.99 ...
##  - attr(*, "na.action")= 'omit' Named int [1:2636] 16 21 24 26 47 67 75 123 163 164 ...
##   ..- attr(*, "names")= chr [1:2636] "16" "21" "24" "26" ...
data$ReleaseDate<-as.Date(data$ReleaseDate,"%b %d %Y")
unique(format(data$ReleaseDate, "%Y"))
##  [1] "2000" "1999" "2003" "2001" "1998" "2004" "2010" "2006" "2007" "2008"
## [11] "2009" "2011" "2012" "2005" "2013" NA     "2014" "2015" "2016" "1997"
## [21] "2017"
data<-na.omit(data)
summary(data$ReleaseDate)
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## "1997-06-30" "2014-05-01" "2015-08-05" "2014-10-19" "2016-05-20" "2017-01-27"
data <- data %>%
  mutate(Year = as.numeric(format(ReleaseDate, "%Y")),
         Timeframe = case_when(
           Year >= 1998 & Year <= 2010 ~ "Old",
           Year >= 2011 & Year <= 2015 ~ "Recent",
           Year >= 2016 ~ "New"
         )) %>%
  mutate(Timeframe = factor(Timeframe, levels = c("Old", "Recent", "New")))
table(data$Timeframe)
## 
##    Old Recent    New 
##   1053   5387   4002
summary(data$PriceInitial)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.50    4.99    8.99   10.50   12.99  234.99
data <- data %>%
  mutate(PriceGroup = case_when(
           IsFree == TRUE ~ "Free",
           PriceInitial >= 0.1 & PriceInitial <= 5 ~ "Cheap",
           PriceInitial > 5 & PriceInitial <= 13 ~ "Average",
           PriceInitial > 13 ~ "Expensive"
         )) %>%
  mutate(PriceGroup = factor(PriceGroup, levels = c("Free","Cheap", "Average", "Expensive")))
table(data$PriceGroup)
## 
##      Free     Cheap   Average Expensive 
##        40      3822      4044      2537
#What is being done here is changing the variables to be Name/Null instead of TRUE/FALSE in order to achieve a cleaner represantiation in the rules
binary_cols <- c("CategorySinglePlayer", "CategoryMultiplayer", "CategoryCoop", "CategoryMMO", "CategoryInAppPurchase", "CategoryVRSupport")
for (col in binary_cols) {
  data[[col]] <- ifelse(data[[col]] == TRUE, substr(col,9,25), NA)
}
data[["GenreIsIndie"]] <-ifelse(data[["GenreIsIndie"]] == TRUE, "Indie", NA)
data[["PCReqsHaveRec"]] <-ifelse(data[["PCReqsHaveRec"]] == TRUE, "Rec", NA)

for (col in colnames(data)) {
  data[[col]] <- as.factor(data[[col]])
}
str(data)
## 'data.frame':    10443 obs. of  16 variables:
##  $ ReleaseDate          : Factor w/ 1806 levels "1997-06-30","1998-11-08",..: 5 3 9 7 4 5 2 12 12 7 ...
##  $ IsFree               : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PCReqsHaveRec        : Factor w/ 1 level "Rec": NA NA NA NA NA NA NA NA NA NA ...
##  $ CategorySinglePlayer : Factor w/ 1 level "SinglePlayer": NA NA NA NA 1 NA 1 1 1 1 ...
##  $ CategoryMultiplayer  : Factor w/ 1 level "Multiplayer": 1 1 1 1 1 1 1 1 1 NA ...
##  $ CategoryCoop         : Factor w/ 1 level "Coop": NA NA NA NA NA NA NA NA NA NA ...
##  $ CategoryMMO          : Factor w/ 1 level "MMO": NA NA NA NA NA NA NA NA NA NA ...
##  $ CategoryInAppPurchase: Factor w/ 1 level "InAppPurchase": NA NA NA NA NA NA NA NA NA NA ...
##  $ CategoryVRSupport    : Factor w/ 1 level "VRSupport": NA NA NA NA NA NA NA NA NA NA ...
##  $ GenreIsNonGame       : Factor w/ 1 level "FALSE": 1 1 1 1 1 1 1 1 1 1 ...
##  $ GenreIsIndie         : Factor w/ 1 level "Indie": NA NA NA NA NA NA NA NA NA NA ...
##  $ PriceCurrency        : Factor w/ 1 level "USD": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PriceInitial         : Factor w/ 73 levels "0.5","0.9","0.99",..: 32 20 20 20 20 20 32 32 32 20 ...
##  $ Year                 : Factor w/ 20 levels "1997","1998",..: 4 3 6 5 3 4 2 7 7 5 ...
##  $ Timeframe            : Factor w/ 3 levels "Old","Recent",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ PriceGroup           : Factor w/ 4 levels "Free","Cheap",..: 3 2 2 2 2 2 3 3 3 2 ...
##  - attr(*, "na.action")= 'omit' Named int [1:33] 278 556 622 632 687 843 959 1112 1156 1181 ...
##   ..- attr(*, "names")= chr [1:33] "306" "600" "669" "688" ...
data <- data[, -c(1,2,10,12:14)]

data_trans <- as(data, "transactions")
rules <- apriori(data_trans, parameter = list(supp = 0.02, conf = 0.45),minlen=2)
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.45    0.1    1 none FALSE            TRUE       5    0.02      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 208 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[15 item(s), 10443 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [401 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_supp <- sort(rules, by = "support", decreasing = TRUE)
rules_supp_dt <- inspect(head(rules_supp), linebreak = FALSE)
##     lhs                                    rhs                                
## [1] {GenreIsIndie=Indie}                => {CategorySinglePlayer=SinglePlayer}
## [2] {CategorySinglePlayer=SinglePlayer} => {GenreIsIndie=Indie}               
## [3] {Timeframe=Recent}                  => {CategorySinglePlayer=SinglePlayer}
## [4] {CategorySinglePlayer=SinglePlayer} => {Timeframe=Recent}                 
## [5] {PCReqsHaveRec=Rec}                 => {CategorySinglePlayer=SinglePlayer}
## [6] {PriceGroup=Average}                => {CategorySinglePlayer=SinglePlayer}
##     support   confidence coverage  lift     count
## [1] 0.5902518 0.9637273  0.6124677 1.019573 6164 
## [2] 0.5902518 0.6244555  0.9452265 1.019573 6164 
## [3] 0.5000479 0.9693707  0.5158479 1.025543 5222 
## [4] 0.5000479 0.5290244  0.9452265 1.025543 5222 
## [5] 0.4227712 0.9554209  0.4424974 1.010785 4415 
## [6] 0.3738389 0.9653808  0.3872450 1.021322 3904
kable(rules_supp_dt, "html") %>% kable_styling("striped")
lhs rhs support confidence coverage lift count
[1] {GenreIsIndie=Indie} => {CategorySinglePlayer=SinglePlayer} 0.5902518 0.9637273 0.6124677 1.019573 6164
[2] {CategorySinglePlayer=SinglePlayer} => {GenreIsIndie=Indie} 0.5902518 0.6244555 0.9452265 1.019573 6164
[3] {Timeframe=Recent} => {CategorySinglePlayer=SinglePlayer} 0.5000479 0.9693707 0.5158479 1.025543 5222
[4] {CategorySinglePlayer=SinglePlayer} => {Timeframe=Recent} 0.5000479 0.5290244 0.9452265 1.025543 5222
[5] {PCReqsHaveRec=Rec} => {CategorySinglePlayer=SinglePlayer} 0.4227712 0.9554209 0.4424974 1.010785 4415
[6] {PriceGroup=Average} => {CategorySinglePlayer=SinglePlayer} 0.3738389 0.9653808 0.3872450 1.021322 3904

Association Rule Mining

Gaming market overview

First taking a look at the apriori rules for the whole dataset with a support of 2% and confidence of 45% we can see that singleplayer indie games constitue about 60% of the whole market. This is within expectations ,indie games are often projects of a smaller scale produced by individuals or small teams and are released a lot more often than so called triple A games with long development time. It also seems that the recent and new timeframe is dominated by singleplayer games which often have minimum requirements. This however should not be interpreted blindly, though it is true that there are a lot more singleplayer games available this dataset does not include any information about player count which will be a lot smaller for these titles than multiplayer (Marvel Rivals or Counter Strike 2) or MMO’s (World of Warcraft). Unfortunately due to the abundance of indie and singleplayer games in the data most of the rules with very high confidence and support will concern just these to variables and since . For further analysis rhs will be restricted as the object of interest is mostly pricing.

table(data$CategoryInAppPurchase)
## 
## InAppPurchase 
##            29
rules <- apriori(data_trans, parameter = list(supp = 0.02, conf = 0.45),minlen=2,appearance = list(default = "lhs", rhs = c("PriceGroup=Free","PriceGroup=Cheap", "PriceGroup=Average", "PriceGroup=Expensive")))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.45    0.1    1 none FALSE            TRUE       5    0.02      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 208 
## 
## set item appearances ...[4 item(s)] done [0.00s].
## set transactions ...[15 item(s), 10443 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_supp <- sort(rules, by = "support", decreasing = TRUE)
rules_supp_dt <- inspect(rules_supp, linebreak = FALSE)
##      lhs                                                                                                       
## [1]  {GenreIsIndie=Indie, Timeframe=New}                                                                       
## [2]  {CategorySinglePlayer=SinglePlayer, GenreIsIndie=Indie, Timeframe=New}                                    
## [3]  {Timeframe=Old}                                                                                           
## [4]  {CategorySinglePlayer=SinglePlayer, Timeframe=Old}                                                        
## [5]  {PCReqsHaveRec=Rec, CategoryMultiplayer=Multiplayer, Timeframe=Recent}                                    
## [6]  {PCReqsHaveRec=Rec, CategorySinglePlayer=SinglePlayer, CategoryMultiplayer=Multiplayer, Timeframe=Recent} 
## [7]  {PCReqsHaveRec=Rec, CategoryCoop=Coop}                                                                    
## [8]  {CategoryCoop=Coop, Timeframe=Recent}                                                                     
## [9]  {PCReqsHaveRec=Rec, CategoryMultiplayer=Multiplayer, CategoryCoop=Coop}                                   
## [10] {PCReqsHaveRec=Rec, CategorySinglePlayer=SinglePlayer, CategoryCoop=Coop}                                 
## [11] {CategorySinglePlayer=SinglePlayer, CategoryCoop=Coop, Timeframe=Recent}                                  
## [12] {CategoryMultiplayer=Multiplayer, CategoryCoop=Coop, Timeframe=Recent}                                    
## [13] {PCReqsHaveRec=Rec, CategorySinglePlayer=SinglePlayer, CategoryMultiplayer=Multiplayer, CategoryCoop=Coop}
## [14] {CategorySinglePlayer=SinglePlayer, CategoryMultiplayer=Multiplayer, CategoryCoop=Coop, Timeframe=Recent} 
##         rhs                    support    confidence coverage   lift     count
## [1]  => {PriceGroup=Cheap}     0.12534712 0.4700180  0.26668582 1.284248 1309 
## [2]  => {PriceGroup=Cheap}     0.12237863 0.4793698  0.25529063 1.309801 1278 
## [3]  => {PriceGroup=Average}   0.04912381 0.4871795  0.10083309 1.258065  513 
## [4]  => {PriceGroup=Average}   0.04835775 0.4941292  0.09786460 1.276012  505 
## [5]  => {PriceGroup=Expensive} 0.03906923 0.4689655  0.08330939 1.930393  408 
## [6]  => {PriceGroup=Expensive} 0.03514316 0.4699104  0.07478694 1.934282  367 
## [7]  => {PriceGroup=Expensive} 0.02891889 0.5067114  0.05707172 2.085766  302 
## [8]  => {PriceGroup=Expensive} 0.02642919 0.4833625  0.05467777 1.989655  276 
## [9]  => {PriceGroup=Expensive} 0.02604616 0.5084112  0.05123049 2.092762  272 
## [10] => {PriceGroup=Expensive} 0.02585464 0.4954128  0.05218807 2.039258  270 
## [11] => {PriceGroup=Expensive} 0.02470554 0.4751381  0.05199655 1.955801  258 
## [12] => {PriceGroup=Expensive} 0.02346069 0.4803922  0.04883654 1.977428  245 
## [13] => {PriceGroup=Expensive} 0.02298190 0.4938272  0.04653835 2.032730  240 
## [14] => {PriceGroup=Expensive} 0.02183281 0.4701031  0.04644259 1.935076  228
kable(rules_supp_dt, "html") %>% kable_styling("striped")
lhs rhs support confidence coverage lift count
[1] {GenreIsIndie=Indie, Timeframe=New} => {PriceGroup=Cheap} 0.1253471 0.4700180 0.2666858 1.284248 1309
[2] {CategorySinglePlayer=SinglePlayer, GenreIsIndie=Indie, Timeframe=New} => {PriceGroup=Cheap} 0.1223786 0.4793698 0.2552906 1.309801 1278
[3] {Timeframe=Old} => {PriceGroup=Average} 0.0491238 0.4871795 0.1008331 1.258065 513
[4] {CategorySinglePlayer=SinglePlayer, Timeframe=Old} => {PriceGroup=Average} 0.0483578 0.4941292 0.0978646 1.276012 505
[5] {PCReqsHaveRec=Rec, CategoryMultiplayer=Multiplayer, Timeframe=Recent} => {PriceGroup=Expensive} 0.0390692 0.4689655 0.0833094 1.930393 408
[6] {PCReqsHaveRec=Rec, CategorySinglePlayer=SinglePlayer, CategoryMultiplayer=Multiplayer, Timeframe=Recent} => {PriceGroup=Expensive} 0.0351432 0.4699104 0.0747869 1.934282 367
[7] {PCReqsHaveRec=Rec, CategoryCoop=Coop} => {PriceGroup=Expensive} 0.0289189 0.5067114 0.0570717 2.085766 302
[8] {CategoryCoop=Coop, Timeframe=Recent} => {PriceGroup=Expensive} 0.0264292 0.4833625 0.0546778 1.989655 276
[9] {PCReqsHaveRec=Rec, CategoryMultiplayer=Multiplayer, CategoryCoop=Coop} => {PriceGroup=Expensive} 0.0260462 0.5084112 0.0512305 2.092762 272
[10] {PCReqsHaveRec=Rec, CategorySinglePlayer=SinglePlayer, CategoryCoop=Coop} => {PriceGroup=Expensive} 0.0258546 0.4954128 0.0521881 2.039258 270
[11] {CategorySinglePlayer=SinglePlayer, CategoryCoop=Coop, Timeframe=Recent} => {PriceGroup=Expensive} 0.0247055 0.4751381 0.0519966 1.955801 258
[12] {CategoryMultiplayer=Multiplayer, CategoryCoop=Coop, Timeframe=Recent} => {PriceGroup=Expensive} 0.0234607 0.4803922 0.0488365 1.977428 245
[13] {PCReqsHaveRec=Rec, CategorySinglePlayer=SinglePlayer, CategoryMultiplayer=Multiplayer, CategoryCoop=Coop} => {PriceGroup=Expensive} 0.0229819 0.4938272 0.0465384 2.032730 240
[14] {CategorySinglePlayer=SinglePlayer, CategoryMultiplayer=Multiplayer, CategoryCoop=Coop, Timeframe=Recent} => {PriceGroup=Expensive} 0.0218328 0.4701031 0.0464426 1.935075 228

Game Pricing

The main research question of this paper is “What makes up a games price?”. Looking only at rules with the PriceGroup as consequent we get 8 rules with 0 for the Free group (I hoped to see a relation between In app purchasing but since the sample size of games with in app purchasing is only 29 it is drowned out at the support level of 2% I also tried lowering the support but it yielded no results). From the rules we can see that New Indie games have a 47% chance to be cheap. This may be to the mentioned before market saturation and is also in line with the general expectation of indie games being not too expensive. An interesting finding is presented by rule 3 where old games have a 49% chance to be in the Average price group. This may be perhaps due to the fact that if the game is old and still on the market it has to maintain some sort of stable demand or the general price increases in the gaming sector making once expensive games now in the average range. From the other rules we can conclude that games of the coop category tend to be on the expensive side. This is most likely due to the fact that traditional coop games are a rarity on the market thus allowing the developers to increase the prices. The expectation that games with a minimum hardware and software requirements are more expensive seems to be confirmed as it shows up in 6 different rules although not directly (this is perfectly within reason as there is very few games which have requirements but do not have categories)

Conclusion

This study applied association rule mining to identify key determinants of game pricing. The findings confirm several expected trends and disproved others, including the affordability of Indie games, the surprising average pricing of older games, and the higher prices of Coop and high-requirement games. Future research could explore alternative pricing models (e.g., in-app purchases, DLCs) and integrate player engagement metrics as well as expand the sample size to include newer data than form 2017.