INTRODUCTION

In this project, I used Games dataset for getting some insights by finding strong relationships between different genres in the dataset using Apriori algorithm and Association Rules. We want to get useful insights into the purchasing patterns of customers in the gaming industry. By understanding these patterns, we can make better decisions about which games to stock, how to market them, and how to bundle them together to increase sales.

Dataset source: https://www.kaggle.com/datasets/khaiid/most-selling-pc-games

Firstly, we should install packages and libraries.

# install useful packages
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)

Now we read Games dataset with read.csv() function and assigned it to df variable.

getwd()
## [1] "C:/Users/Maryam/Downloads"
df <-read.csv("Games.csv", sep = ",")
str(df)
## 'data.frame':    175 obs. of  7 variables:
##  $ Name     : chr  "PlayerUnknown's Battlegrounds" "Minecraft" "Diablo III" "Garry's Mod" ...
##  $ Sales    : num  42 33 20 20 17.2 14 12 12 11 11 ...
##  $ Series   : chr  "" "Minecraft" "Diablo" "" ...
##  $ Release  : chr  "Dec-17" "Nov-11" "May-12" "Nov-06" ...
##  $ Genre    : chr  "Battle royale" "Sandbox, survival" "Action role-playing" "Sandbox" ...
##  $ Developer: chr  "PUBG Studios" "Mojang Studios" "Blizzard Entertainment" "Facepunch Studios" ...
##  $ Publisher: chr  "Krafton" "Mojang Studios" "Blizzard Entertainment" "Valve" ...

Then we get subset of “df” dataset by “Genre” and “Sales” columns. And we look structure of newly created dataset.

data <- df[, c("Genre", "Sales")]
data <- data[!duplicated(data),]
str(data) 
## 'data.frame':    114 obs. of  2 variables:
##  $ Genre: chr  "Battle royale" "Sandbox, survival" "Action role-playing" "Sandbox" ...
##  $ Sales: num  42 33 20 20 17.2 14 12 12 11 11 ...

In this section, I used the “split()” function for splitting the “Genre” column by the “Sales” column, into a list of transactions based. Then I converted list to a “transactions” object using the “as()” function.

transaction <- as(split(data[,"Genre"], data[, "Sales"]),
                  "transactions")

From this output we also see that the data is stored in sparse format and has a density of 0.06229508. There are 30 transactions and 61 items. And the most frequent item is “Action role-playing” with a count of 9.

summary(transaction)
## transactions as itemMatrix in sparse format with
##  30 rows (elements/itemsets/transactions) and
##  61 columns (items) and a density of 0.06229508 
## 
## most frequent items:
##                    Action role-playing Construction and management simulation 
##                                      9                                      7 
##                     Real-time strategy                       Action-adventure 
##                                      6                                      5 
##                   First-person shooter                                (Other) 
##                                      5                                     82 
## 
## element (itemset/transaction) length distribution:
## sizes
##  1  2  3  4  6  7 16 38 
## 13  8  2  3  1  1  1  1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     1.0     2.0     3.8     3.0    38.0 
## 
## includes extended item information - examples:
##                         labels
## 1                       Action
## 2             Action-adventure
## 3 Action-adventure, platformer
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2           1.1
## 3           1.2

In this chunk, we see first 10 transactions and their items using inspect() function.

inspect(transaction[1:10])
##      items                                         transactionID
## [1]  {Action,                                                   
##       Action-adventure,                                         
##       Action-adventure, platformer,                             
##       Action-adventure, Survival,                               
##       Action role-playing,                                      
##       Adventure,                                                
##       Amateur flight simulation,                                
##       Beat 'em up, run-and-gun,                                 
##       Business simulation,                                      
##       City-building game,                                       
##       Construction and management simulation,                   
##       First-person shooter,                                     
##       Grand strategy,                                           
##       Graphic adventure,                                        
##       Graphic adventure, puzzle,                                
##       Interactive fiction,                                      
##       Interactive movie,                                        
##       Metroidvania,                                             
##       MMORPG,                                                   
##       Platform,                                                 
##       Point-and-click,                                          
##       Puzzle,                                                   
##       Rail shooter,                                             
##       Real-time strategy,                                       
##       Real-time strategy, grand strategy wargame,               
##       Real-time tactics,                                        
##       Role-playing game,                                        
##       Run and gun,                                              
##       Sim racing,                                               
##       Simulation, role-playing game,                            
##       Space combat simulation,                                  
##       Sports,                                                   
##       Survival,                                                 
##       Tactical shooter,                                         
##       Third-person shooter, survival horror,                    
##       Trivia game,                                              
##       Turn-based strategy, 4X,                                  
##       Visual novel, adventure}                               1  
## [2]  {Action role-playing game,                                 
##       Maze, arcade}                                          1.1
## [3]  {Action role-playing, hack and slash,                      
##       Third-person shooter, survival horror}                 1.2
## [4]  {Construction and management simulation,                   
##       Fighting,                                                 
##       Third-person shooter, survival horror}                 1.3
## [5]  {Compilation,                                              
##       Fighting,                                                 
##       RTS, 4X, Grand Strategy,                                  
##       Vehicle simulation}                                    1.5
## [6]  {Action-adventure, stealth}                             1.8
## [7]  {Action-adventure,                                         
##       Action-adventure, roguelike,                              
##       Action role-playing,                                      
##       City-building,                                            
##       Computer role-playing game,                               
##       Construction and management simulation,                   
##       Fighting,                                                 
##       First-person shooter,                                     
##       God game,                                                 
##       Racing game,                                              
##       Real-time strategy,                                       
##       Role-playing game,                                        
##       Simulation,                                               
##       Survival horror,                                          
##       Third-person shooter,                                     
##       Turn-based strategy, 4X}                               2  
## [8]  {Role-playing game}                                     2.1
## [9]  {Adventure, puzzle,                                        
##       City-building game,                                       
##       Construction and management simulation,                   
##       Turn-based strategy}                                   2.5
## [10] {Action role-playing}                                   2.7

Now I generate a plot of the top 15 most frequent items in the transactions and it is seen that Action role-playing, construction and management simulation, and real-time strategy have higher frequency than others.

itemFrequencyPlot(transaction, topN = 15, type="absolute", main="Item Frequency", col ="#733C3C") 

And this one is relatively.

itemFrequencyPlot(transaction, topN = 15, type="relative", main="Item Frequency", col = "#8FBDD3") 

So we can display an image plot of the first 5 rows of the sparse matrix.

image(transaction[1:5])

But now, we can plot the randomly sample 10 transactions of the sparse matrix.

image(sample(transaction,10))

APRIORI ALGORITHM

Yeah! It’s right time for Association Rule Mining! I set “Support” metric to 0.025, meaning that an itemset must appear in at least 2% of the transactions to be considered frequent. Then, “Confidence” is set to 0.25, meaning that a rule must have a confidence of at least 25% to be considered interesting. “Minlen” is set to 2, meaning that only rules with at least 2 items will be considered. And “Maxlen” is set to 5, meaning that rules with up to 5 items will be considered.

rules <- apriori(transaction, parameter = list(support = 0.025, confidence = 0.25, minlen = 2, 
                                               maxtime = 120, maxlen = 5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.25    0.1    1 none FALSE            TRUE     120   0.025      2
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[61 item(s), 30 transaction(s)] done [0.00s].
## sorting and recoding items ... [61 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5
## Warning in apriori(transaction, parameter = list(support = 0.025, confidence =
## 0.25, : Mining stopped (maxlen reached). Only patterns up to a length of 5
## returned!
##  done [0.04s].
## writing ... [2862434 rule(s)] done [0.34s].
## creating S4 object  ... done [1.42s].

In this code chunk, we displayed the first 10 association rules generated by the Apriori algorithm using “inspect()” function.

inspect(rules[1:10])
##      lhs                                        rhs                                         support confidence   coverage      lift count
## [1]  {Sandbox}                               => {Action role-playing}                    0.03333333  1.0000000 0.03333333  3.333333     1
## [2]  {Action role-playing game}              => {Maze, arcade}                           0.03333333  1.0000000 0.03333333 30.000000     1
## [3]  {Maze, arcade}                          => {Action role-playing game}               0.03333333  1.0000000 0.03333333 30.000000     1
## [4]  {Action role-playing, hack and slash}   => {Third-person shooter, survival horror}  0.03333333  1.0000000 0.03333333 10.000000     1
## [5]  {Third-person shooter, survival horror} => {Action role-playing, hack and slash}    0.03333333  0.3333333 0.10000000 10.000000     1
## [6]  {Battle royale}                         => {Construction and management simulation} 0.03333333  0.5000000 0.06666667  2.142857     1
## [7]  {Turn-based strategy}                   => {Adventure, puzzle}                      0.03333333  1.0000000 0.03333333 30.000000     1
## [8]  {Adventure, puzzle}                     => {Turn-based strategy}                    0.03333333  1.0000000 0.03333333 30.000000     1
## [9]  {Turn-based strategy}                   => {City-building game}                     0.03333333  1.0000000 0.03333333 15.000000     1
## [10] {City-building game}                    => {Turn-based strategy}                    0.03333333  0.5000000 0.06666667 15.000000     1

Looking at the top five rules, we can see that there are some interesting associations between video game genres. For example, rule [1] shows that there is a high support and confidence for the association between Real-time strategy games and Construction and management simulation games. Rule [3] shows that MMORPGs are often associated with Survival games. Rule [5] shows that there are strong association between MMORPG and Real-time strategy.

inspect(sort(rules)[1:5])
##     lhs                                         rhs                                        support confidence  coverage     lift count
## [1] {Real-time strategy}                     => {Construction and management simulation} 0.1333333  0.6666667 0.2000000 2.857143     4
## [2] {Construction and management simulation} => {Real-time strategy}                     0.1333333  0.5714286 0.2333333 2.857143     4
## [3] {MMORPG}                                 => {Survival}                               0.1000000  0.6000000 0.1666667 3.600000     3
## [4] {Survival}                               => {MMORPG}                                 0.1000000  0.6000000 0.1666667 3.600000     3
## [5] {MMORPG}                                 => {Real-time strategy}                     0.1000000  0.6000000 0.1666667 3.000000     3

And we can also look first 5 rows of measures of Apriori algorithm separately (Support, Confidence, Lift, Count).

The highest lift values are observed in the rules that involve genres such as action role-playing game, maze, arcade, and turn-based strategy.

inspect(sort(rules, by = "lift")[1:5])
##     lhs                           rhs                        support   
## [1] {Action role-playing game} => {Maze, arcade}             0.03333333
## [2] {Maze, arcade}             => {Action role-playing game} 0.03333333
## [3] {Turn-based strategy}      => {Adventure, puzzle}        0.03333333
## [4] {Adventure, puzzle}        => {Turn-based strategy}      0.03333333
## [5] {Compilation}              => {RTS, 4X, Grand Strategy}  0.03333333
##     confidence coverage   lift count
## [1] 1          0.03333333 30   1    
## [2] 1          0.03333333 30   1    
## [3] 1          0.03333333 30   1    
## [4] 1          0.03333333 30   1    
## [5] 1          0.03333333 30   1

In this Output, it displays that the highest confidence values are seen in the rules that involve genres such as sandbox, action role-playing, and maze arcade.

inspect(sort(rules, by = "confidence")[1:5])
##     lhs                                      rhs                                        support confidence   coverage      lift count
## [1] {Sandbox}                             => {Action role-playing}                   0.03333333          1 0.03333333  3.333333     1
## [2] {Action role-playing game}            => {Maze, arcade}                          0.03333333          1 0.03333333 30.000000     1
## [3] {Maze, arcade}                        => {Action role-playing game}              0.03333333          1 0.03333333 30.000000     1
## [4] {Action role-playing, hack and slash} => {Third-person shooter, survival horror} 0.03333333          1 0.03333333 10.000000     1
## [5] {Turn-based strategy}                 => {Adventure, puzzle}                     0.03333333          1 0.03333333 30.000000     1

As seen as the highest support values are observed in the rules that involve genres such as real-time strategy, construction and management simulation, and MMORPG.

inspect(sort(rules, by = "support")[1:5])
##     lhs                                         rhs                                        support confidence  coverage     lift count
## [1] {Real-time strategy}                     => {Construction and management simulation} 0.1333333  0.6666667 0.2000000 2.857143     4
## [2] {Construction and management simulation} => {Real-time strategy}                     0.1333333  0.5714286 0.2333333 2.857143     4
## [3] {MMORPG}                                 => {Survival}                               0.1000000  0.6000000 0.1666667 3.600000     3
## [4] {Survival}                               => {MMORPG}                                 0.1000000  0.6000000 0.1666667 3.600000     3
## [5] {MMORPG}                                 => {Real-time strategy}                     0.1000000  0.6000000 0.1666667 3.000000     3

In below section, it shows that the most common association rules are also related to Real-time strategy, Construction and management simulation, and MMORPG.

inspect(sort(rules, by = "count")[1:5])
##     lhs                                         rhs                                        support confidence  coverage     lift count
## [1] {Real-time strategy}                     => {Construction and management simulation} 0.1333333  0.6666667 0.2000000 2.857143     4
## [2] {Construction and management simulation} => {Real-time strategy}                     0.1333333  0.5714286 0.2333333 2.857143     4
## [3] {MMORPG}                                 => {Survival}                               0.1000000  0.6000000 0.1666667 3.600000     3
## [4] {Survival}                               => {MMORPG}                                 0.1000000  0.6000000 0.1666667 3.600000     3
## [5] {MMORPG}                                 => {Real-time strategy}                     0.1000000  0.6000000 0.1666667 3.000000     3
plot(rules, method="grouped") 

Additionally, this plot is a graph of the top 100 association rules based on their confidence.

plot(sort(rules, by = "confidence")[1:100], method="graph")

As graphically, I created the Two-Key plot that visualizes the relationship between two different measures of the rules. The plot shades the points according to the order of the rules and darker points are higher Rule numbers.

plot(rules, shading="order", control=list(main="Two-key plot"), jitter=0, max.overlaps = 10)
## Warning: Unknown control parameters: max.overlaps
## Available control parameters (with default values):
## main  =  Scatter plot for 2862434 rules
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## jitter    =  NA
## engine    =  ggplot2
## verbose   =  FALSE

MAIN INSIGHTS

Based on the association rules analysis, there are several interesting insights that we can gain. Here are some of them:

The most frequent itemsets include games from the genres of real-time strategy, construction and management simulation, and MMORPG. This suggests that these genres are popular among gamers.

The highest lift values are observed in the rules that involve genres such as action role-playing game, maze, arcade, and turn-based strategy. This indicates that these genres have a strong positive association with each other and are often purchased together.

The highest confidence values are seen in the rules that involve genres such as sandbox, action role-playing, and maze arcade. This indicates that when a customer buys games from these genres, they are highly likely to also purchase games from the associated genres.

The highest support values are observed in the rules that involve genres such as real-time strategy, construction and management simulation, and MMORPG. This suggests that these genres are popular among gamers and are frequently purchased.

CONCLUSION

These patterns can be used to make recommendations for game developers and marketers. For example, the results indicate that games in certain genres are more likely to be played together, and that some games have a stronger influence on the choice of other games.