In this project, we will be performing association rules using the Apriori algorithm on market basket data from Kaggle. Market basket analysis is a technique that helps in uncovering relationships between different products based on customer transactions. By analyzing these relationships, we can identify frequently co-occurring items and use this information to improve product placement, marketing strategies, and ultimately increase sales. The Apriori algorithm is a popular method for mining association rules, and we will be using it to identify interesting and meaningful associations between items in our market basket dataset. The dataset we will be using is available on Kaggle and contains transactional data from a retail store.
library(arules)
library(arulesViz)
library(arulesCBA)
library(tidyverse)
library(readxl)
library(RColorBrewer)
library(arulesViz)
library(gridExtra)
library(cowplot)
library(ggpubr)
MBA <- read_excel("Data/MBA/market basket analysist.xlsx",
sheet = "Worksheet",
range = "A1:M1864")
write.csv(MBA, "Data/MBA/market basket analysis.csv")
MBA
Trans <-read.transactions("Data/MBA/market basket analysis.csv",
format = "basket", sep=",", header = TRUE)
summary(Trans)
## transactions as itemMatrix in sparse format with
## 1863 rows (elements/itemsets/transactions) and
## 1875 columns (items) and a density of 0.002464269
##
## most frequent items:
## frozen meals butter baking powder coffee fish
## 1002 840 663 606 563
## (Other)
## 4934
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8
## 14 210 256 360 421 391 173 38
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 5.000 4.621 6.000 8.000
##
## includes extended item information - examples:
## labels
## 1 1
## 2 10
## 3 100
inspect(tail(Trans))
## items
## [1] {1858,
## baking powder,
## butter,
## frozen meals,
## frozen vegetables}
## [2] {1859,
## baking powder,
## butter,
## frozen meals,
## frozen vegetables,
## grapes}
## [3] {1860,
## abrasive cleaner,
## butter,
## coffee,
## fish,
## frozen vegetables,
## ice cream}
## [4] {1861,
## coffee,
## fish,
## frozen meals,
## frozen vegetables,
## grapes}
## [5] {1862,
## butter,
## fish,
## frozen meals,
## ice cream}
## [6] {1863,
## baking powder,
## butter,
## cake bar,
## coffee,
## frozen meals,
## grapes}
arules::itemFrequencyPlot(Trans,
topN=10,
col=brewer.pal(10,'BrBG'),
main='Absolute Item Frequency Plot',
type="absolute",
ylab="Item Frequency (Absolute)",
xlab= "Retail items",
)
MBA_AR <- apriori(Trans, parameter = list(supp = 0.01, conf = 0.75))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.75 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 18
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1875 item(s), 1863 transaction(s)] done [0.00s].
## sorting and recoding items ... [12 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [148 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(MBA_AR[1:2])
## lhs rhs support confidence
## [1] {coffee} => {frozen meals} 0.31186259 0.9587459
## [2] {baking powder, honey} => {frozen vegetables} 0.01663983 0.7560976
## coverage lift count
## [1] 0.32528180 1.782578 581
## [2] 0.02200751 2.603715 31
MBA_AR_data <- as(MBA_AR,"data.frame")
head(MBA_AR_data, 6)
inspect(sort(MBA_AR, by = "support",decreasing = TRUE )[1:5])
## lhs rhs support confidence
## [1] {coffee} => {frozen meals} 0.3118626 0.9587459
## [2] {baking powder, frozen meals} => {butter} 0.1669351 0.7585366
## [3] {butter, coffee} => {frozen meals} 0.1556629 0.9324759
## [4] {baking powder, coffee} => {frozen meals} 0.1336554 0.9576923
## [5] {coffee, fish} => {frozen meals} 0.1143317 0.9424779
## coverage lift count
## [1] 0.3252818 1.782578 581
## [2] 0.2200751 1.682326 311
## [3] 0.1669351 1.733735 290
## [4] 0.1395598 1.780620 249
## [5] 0.1213097 1.752332 213
The association rule stating that if a customer buys coffee, they are likely to also buy frozen meals had the highest level of support, appearing 581 times which accounts for around 1.8% of all transactions in the dataset.
inspect(sort(MBA_AR, by = "confidence", decreasing = FALSE )[1:5])
## lhs rhs support
## [1] {butter, frozen vegetables, honey} => {frozen meals} 0.01771337
## [2] {domestic eggs, frozen meals, ice cream} => {coffee} 0.01610306
## [3] {coffee, frozen vegetables, ice cream} => {butter} 0.01449275
## [4] {butter, coffee, fish, ice cream} => {frozen meals} 0.01288245
## [5] {baking powder, honey} => {frozen vegetables} 0.01663983
## confidence coverage lift count
## [1] 0.7500000 0.02361782 1.394461 33
## [2] 0.7500000 0.02147075 2.305693 30
## [3] 0.7500000 0.01932367 1.663393 27
## [4] 0.7500000 0.01717660 1.394461 24
## [5] 0.7560976 0.02200751 2.603715 31
The association rule {baking powder, honey} => {frozen meals} has a confidence of 0.7561, indicating that when a customer purchases baking powder and honey, there is a 75.61% probability that they will also purchase frozen meals.
inspect(sort(MBA_AR, by = "lift", decreasing = TRUE)[1:5])
## lhs rhs support confidence coverage lift count
## [1] {baking powder,
## coffee,
## honey} => {frozen vegetables} 0.01234568 0.9200000 0.01341922 3.168133 23
## [2] {baking powder,
## butter,
## coffee,
## honey} => {frozen vegetables} 0.01127214 0.9130435 0.01234568 3.144177 21
## [3] {baking powder,
## coffee,
## frozen meals,
## honey} => {frozen vegetables} 0.01127214 0.9130435 0.01234568 3.144177 21
## [4] {baking powder,
## butter,
## coffee,
## frozen meals,
## honey} => {frozen vegetables} 0.01019860 0.9047619 0.01127214 3.115659 19
## [5] {abrasive cleaner,
## butter,
## fish,
## ice cream} => {coffee} 0.01180891 1.0000000 0.01180891 3.074257 22
` The association rule {baking powder, coffee, honey} => {frozen vegetables} has a lift of 3.16, indicating that the four items are more likely to be purchased together compared to when they are purchased with additional items or fewer items.
This shows the association of the items through the algorithm of support, confidence and lift
plot(MBA_AR, measure=c("support","confidence"),
shading="lift", engine = "plotly")
goods <- unique(MBA$item1)[1:12]
goods_rules_list = list()
goods_rules_plots = list()
for (g in goods){
goods_rules = apriori(data = Trans,
parameter = list(supp = 0.001, conf = 0.75),
appearance = list(default = "lhs", rhs = g),
control = list(verbose = F))
goods_rules_list[[g]] = sort(goods_rules, by="support", decreasing=TRUE)
goods_rules_plots[[g]] = plot(head(goods_rules_list[[g]]), method="graph") +
labs(title = paste(g, "as a consequent item")) +
theme(plot.title = element_text(size=9)) +
theme_bw()
}
ggarrange(plotlist = goods_rules_plots,
common.legend = TRUE, ncol = 3)
## $`1`
##
## $`2`
##
## $`3`
##
## $`4`
##
## attr(,"class")
## [1] "list" "ggarrange"
goods <- unique(MBA$item1)[1:12]
goods_ant_rules_list = list()
goods_ant_rules_plots = list()
for (g in goods){
goods_rules = apriori(data = Trans,
parameter = list(supp = 0.01, conf = 0.075, minlen=2),
appearance = list(default = "rhs", lhs = g),
control = list(verbose = F))
goods_ant_rules_list[[g]] = sort(goods_rules, by="confidence", decreasing=TRUE)
goods_ant_rules_plots[[g]] = plot(head(goods_ant_rules_list[[g]]), method="graph") +
labs(title = paste(g, "as an antecedent item")) +
theme(plot.title = element_text(size=9)) +
theme_bw()
}
ggarrange(plotlist = goods_ant_rules_plots,
common.legend = TRUE, ncol = 3)
## $`1`
##
## $`2`
##
## $`3`
##
## $`4`
##
## attr(,"class")
## [1] "list" "ggarrange"
The above graph above shows the items that are also bought as a result of buying the item on the title of the chart, for instance. For instance we see that when the customer buys baking powder, there is also a high likelihood of buying butter. Which is in line with expectations.
Association rules are important in uncovering how products are related based on transactional data. They aid in identifying how frequently certain items should be stocked in a retail shop or supermarket, taking into account how the purchase of one product influences the likelihood of purchasing another. As a result, these products can be displayed together to facilitate easier purchases for customers.