INTRODUCTION
In this project, I used Games dataset for getting some insights by
finding strong relationships between different genres in the dataset
using Apriori algorithm and Association Rules. We want to get useful
insights into the purchasing patterns of customers in the gaming
industry. By understanding these patterns, we can make better decisions
about which games to stock, how to market them, and how to bundle them
together to increase sales.
Dataset source: https://www.kaggle.com/datasets/khaiid/most-selling-pc-games
Firstly, we should install packages and libraries.
# install useful packages
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
Now we read Games dataset with read.csv() function and assigned it to
df variable.
getwd()
## [1] "C:/Users/Maryam/Downloads"
df <-read.csv("Games.csv", sep = ",")
str(df)
## 'data.frame': 175 obs. of 7 variables:
## $ Name : chr "PlayerUnknown's Battlegrounds" "Minecraft" "Diablo III" "Garry's Mod" ...
## $ Sales : num 42 33 20 20 17.2 14 12 12 11 11 ...
## $ Series : chr "" "Minecraft" "Diablo" "" ...
## $ Release : chr "Dec-17" "Nov-11" "May-12" "Nov-06" ...
## $ Genre : chr "Battle royale" "Sandbox, survival" "Action role-playing" "Sandbox" ...
## $ Developer: chr "PUBG Studios" "Mojang Studios" "Blizzard Entertainment" "Facepunch Studios" ...
## $ Publisher: chr "Krafton" "Mojang Studios" "Blizzard Entertainment" "Valve" ...
Then we get subset of “df” dataset by “Genre” and “Sales” columns.
And we look structure of newly created dataset.
data <- df[, c("Genre", "Sales")]
data <- data[!duplicated(data),]
str(data)
## 'data.frame': 114 obs. of 2 variables:
## $ Genre: chr "Battle royale" "Sandbox, survival" "Action role-playing" "Sandbox" ...
## $ Sales: num 42 33 20 20 17.2 14 12 12 11 11 ...
In this section, I used the “split()” function for splitting the
“Genre” column by the “Sales” column, into a list of transactions based.
Then I converted list to a “transactions” object using the “as()”
function.
transaction <- as(split(data[,"Genre"], data[, "Sales"]),
"transactions")
From this output we also see that the data is stored in sparse format
and has a density of 0.06229508. There are 30 transactions and 61 items.
And the most frequent item is “Action role-playing” with a count of
9.
summary(transaction)
## transactions as itemMatrix in sparse format with
## 30 rows (elements/itemsets/transactions) and
## 61 columns (items) and a density of 0.06229508
##
## most frequent items:
## Action role-playing Construction and management simulation
## 9 7
## Real-time strategy Action-adventure
## 6 5
## First-person shooter (Other)
## 5 82
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 6 7 16 38
## 13 8 2 3 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 1.0 2.0 3.8 3.0 38.0
##
## includes extended item information - examples:
## labels
## 1 Action
## 2 Action-adventure
## 3 Action-adventure, platformer
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 1.1
## 3 1.2
In this chunk, we see first 10 transactions and their items using
inspect() function.
inspect(transaction[1:10])
## items transactionID
## [1] {Action,
## Action-adventure,
## Action-adventure, platformer,
## Action-adventure, Survival,
## Action role-playing,
## Adventure,
## Amateur flight simulation,
## Beat 'em up, run-and-gun,
## Business simulation,
## City-building game,
## Construction and management simulation,
## First-person shooter,
## Grand strategy,
## Graphic adventure,
## Graphic adventure, puzzle,
## Interactive fiction,
## Interactive movie,
## Metroidvania,
## MMORPG,
## Platform,
## Point-and-click,
## Puzzle,
## Rail shooter,
## Real-time strategy,
## Real-time strategy, grand strategy wargame,
## Real-time tactics,
## Role-playing game,
## Run and gun,
## Sim racing,
## Simulation, role-playing game,
## Space combat simulation,
## Sports,
## Survival,
## Tactical shooter,
## Third-person shooter, survival horror,
## Trivia game,
## Turn-based strategy, 4X,
## Visual novel, adventure} 1
## [2] {Action role-playing game,
## Maze, arcade} 1.1
## [3] {Action role-playing, hack and slash,
## Third-person shooter, survival horror} 1.2
## [4] {Construction and management simulation,
## Fighting,
## Third-person shooter, survival horror} 1.3
## [5] {Compilation,
## Fighting,
## RTS, 4X, Grand Strategy,
## Vehicle simulation} 1.5
## [6] {Action-adventure, stealth} 1.8
## [7] {Action-adventure,
## Action-adventure, roguelike,
## Action role-playing,
## City-building,
## Computer role-playing game,
## Construction and management simulation,
## Fighting,
## First-person shooter,
## God game,
## Racing game,
## Real-time strategy,
## Role-playing game,
## Simulation,
## Survival horror,
## Third-person shooter,
## Turn-based strategy, 4X} 2
## [8] {Role-playing game} 2.1
## [9] {Adventure, puzzle,
## City-building game,
## Construction and management simulation,
## Turn-based strategy} 2.5
## [10] {Action role-playing} 2.7
Now I generate a plot of the top 15 most frequent items in the
transactions and it is seen that Action role-playing, construction and
management simulation, and real-time strategy have higher frequency than
others.
itemFrequencyPlot(transaction, topN = 15, type="absolute", main="Item Frequency", col ="#733C3C")

And this one is relatively.
itemFrequencyPlot(transaction, topN = 15, type="relative", main="Item Frequency", col = "#8FBDD3")
So we can display an image plot of the first 5 rows of the sparse
matrix.
image(transaction[1:5])

But now, we can plot the randomly sample 10 transactions of the
sparse matrix.
image(sample(transaction,10))

APRIORI ALGORITHM
Yeah! It’s right time for Association Rule Mining! I set “Support”
metric to 0.025, meaning that an itemset must appear in at least 2% of
the transactions to be considered frequent. Then, “Confidence” is set to
0.25, meaning that a rule must have a confidence of at least 25% to be
considered interesting. “Minlen” is set to 2, meaning that only rules
with at least 2 items will be considered. And “Maxlen” is set to 5,
meaning that rules with up to 5 items will be considered.
rules <- apriori(transaction, parameter = list(support = 0.025, confidence = 0.25, minlen = 2,
maxtime = 120, maxlen = 5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.25 0.1 1 none FALSE TRUE 120 0.025 2
## maxlen target ext
## 5 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 0
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[61 item(s), 30 transaction(s)] done [0.00s].
## sorting and recoding items ... [61 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5
## Warning in apriori(transaction, parameter = list(support = 0.025, confidence =
## 0.25, : Mining stopped (maxlen reached). Only patterns up to a length of 5
## returned!
## done [0.04s].
## writing ... [2862434 rule(s)] done [0.34s].
## creating S4 object ... done [1.42s].
In this code chunk, we displayed the first 10 association rules
generated by the Apriori algorithm using “inspect()” function.
inspect(rules[1:10])
## lhs rhs support confidence coverage lift count
## [1] {Sandbox} => {Action role-playing} 0.03333333 1.0000000 0.03333333 3.333333 1
## [2] {Action role-playing game} => {Maze, arcade} 0.03333333 1.0000000 0.03333333 30.000000 1
## [3] {Maze, arcade} => {Action role-playing game} 0.03333333 1.0000000 0.03333333 30.000000 1
## [4] {Action role-playing, hack and slash} => {Third-person shooter, survival horror} 0.03333333 1.0000000 0.03333333 10.000000 1
## [5] {Third-person shooter, survival horror} => {Action role-playing, hack and slash} 0.03333333 0.3333333 0.10000000 10.000000 1
## [6] {Battle royale} => {Construction and management simulation} 0.03333333 0.5000000 0.06666667 2.142857 1
## [7] {Turn-based strategy} => {Adventure, puzzle} 0.03333333 1.0000000 0.03333333 30.000000 1
## [8] {Adventure, puzzle} => {Turn-based strategy} 0.03333333 1.0000000 0.03333333 30.000000 1
## [9] {Turn-based strategy} => {City-building game} 0.03333333 1.0000000 0.03333333 15.000000 1
## [10] {City-building game} => {Turn-based strategy} 0.03333333 0.5000000 0.06666667 15.000000 1
Looking at the top five rules, we can see that there are some
interesting associations between video game genres. For example, rule
[1] shows that there is a high support and confidence for the
association between Real-time strategy games and Construction and
management simulation games. Rule [3] shows that MMORPGs are often
associated with Survival games. Rule [5] shows that there are strong
association between MMORPG and Real-time strategy.
inspect(sort(rules)[1:5])
## lhs rhs support confidence coverage lift count
## [1] {Real-time strategy} => {Construction and management simulation} 0.1333333 0.6666667 0.2000000 2.857143 4
## [2] {Construction and management simulation} => {Real-time strategy} 0.1333333 0.5714286 0.2333333 2.857143 4
## [3] {MMORPG} => {Survival} 0.1000000 0.6000000 0.1666667 3.600000 3
## [4] {Survival} => {MMORPG} 0.1000000 0.6000000 0.1666667 3.600000 3
## [5] {MMORPG} => {Real-time strategy} 0.1000000 0.6000000 0.1666667 3.000000 3
And we can also look first 5 rows of measures of Apriori algorithm
separately (Support, Confidence, Lift, Count).
The highest lift values are observed in the rules that involve genres
such as action role-playing game, maze, arcade, and turn-based
strategy.
inspect(sort(rules, by = "lift")[1:5])
## lhs rhs support
## [1] {Action role-playing game} => {Maze, arcade} 0.03333333
## [2] {Maze, arcade} => {Action role-playing game} 0.03333333
## [3] {Turn-based strategy} => {Adventure, puzzle} 0.03333333
## [4] {Adventure, puzzle} => {Turn-based strategy} 0.03333333
## [5] {Compilation} => {RTS, 4X, Grand Strategy} 0.03333333
## confidence coverage lift count
## [1] 1 0.03333333 30 1
## [2] 1 0.03333333 30 1
## [3] 1 0.03333333 30 1
## [4] 1 0.03333333 30 1
## [5] 1 0.03333333 30 1
In this Output, it displays that the highest confidence values are
seen in the rules that involve genres such as sandbox, action
role-playing, and maze arcade.
inspect(sort(rules, by = "confidence")[1:5])
## lhs rhs support confidence coverage lift count
## [1] {Sandbox} => {Action role-playing} 0.03333333 1 0.03333333 3.333333 1
## [2] {Action role-playing game} => {Maze, arcade} 0.03333333 1 0.03333333 30.000000 1
## [3] {Maze, arcade} => {Action role-playing game} 0.03333333 1 0.03333333 30.000000 1
## [4] {Action role-playing, hack and slash} => {Third-person shooter, survival horror} 0.03333333 1 0.03333333 10.000000 1
## [5] {Turn-based strategy} => {Adventure, puzzle} 0.03333333 1 0.03333333 30.000000 1
As seen as the highest support values are observed in the rules that
involve genres such as real-time strategy, construction and management
simulation, and MMORPG.
inspect(sort(rules, by = "support")[1:5])
## lhs rhs support confidence coverage lift count
## [1] {Real-time strategy} => {Construction and management simulation} 0.1333333 0.6666667 0.2000000 2.857143 4
## [2] {Construction and management simulation} => {Real-time strategy} 0.1333333 0.5714286 0.2333333 2.857143 4
## [3] {MMORPG} => {Survival} 0.1000000 0.6000000 0.1666667 3.600000 3
## [4] {Survival} => {MMORPG} 0.1000000 0.6000000 0.1666667 3.600000 3
## [5] {MMORPG} => {Real-time strategy} 0.1000000 0.6000000 0.1666667 3.000000 3
In below section, it shows that the most common association rules are
also related to Real-time strategy, Construction and management
simulation, and MMORPG.
inspect(sort(rules, by = "count")[1:5])
## lhs rhs support confidence coverage lift count
## [1] {Real-time strategy} => {Construction and management simulation} 0.1333333 0.6666667 0.2000000 2.857143 4
## [2] {Construction and management simulation} => {Real-time strategy} 0.1333333 0.5714286 0.2333333 2.857143 4
## [3] {MMORPG} => {Survival} 0.1000000 0.6000000 0.1666667 3.600000 3
## [4] {Survival} => {MMORPG} 0.1000000 0.6000000 0.1666667 3.600000 3
## [5] {MMORPG} => {Real-time strategy} 0.1000000 0.6000000 0.1666667 3.000000 3
plot(rules, method="grouped")
Additionally, this plot is a graph of the top 100 association rules
based on their confidence.
plot(sort(rules, by = "confidence")[1:100], method="graph")

As graphically, I created the Two-Key plot that visualizes the
relationship between two different measures of the rules. The plot
shades the points according to the order of the rules and darker points
are higher Rule numbers.
plot(rules, shading="order", control=list(main="Two-key plot"), jitter=0, max.overlaps = 10)
## Warning: Unknown control parameters: max.overlaps
## Available control parameters (with default values):
## main = Scatter plot for 2862434 rules
## colors = c("#EE0000FF", "#EEEEEEFF")
## jitter = NA
## engine = ggplot2
## verbose = FALSE

MAIN INSIGHTS
Based on the association rules analysis, there are several
interesting insights that we can gain. Here are some of them:
The most frequent itemsets include games from the genres of real-time
strategy, construction and management simulation, and MMORPG. This
suggests that these genres are popular among gamers.
The highest lift values are observed in the rules that involve genres
such as action role-playing game, maze, arcade, and turn-based strategy.
This indicates that these genres have a strong positive association with
each other and are often purchased together.
The highest confidence values are seen in the rules that involve
genres such as sandbox, action role-playing, and maze arcade. This
indicates that when a customer buys games from these genres, they are
highly likely to also purchase games from the associated genres.
The highest support values are observed in the rules that involve
genres such as real-time strategy, construction and management
simulation, and MMORPG. This suggests that these genres are popular
among gamers and are frequently purchased.
CONCLUSION
These patterns can be used to make recommendations for game
developers and marketers. For example, the results indicate that games
in certain genres are more likely to be played together, and that some
games have a stronger influence on the choice of other games.