Market basket analysis (MBA), which allows management to consider and eventually satisfy their customers better, is a series of mathematical affinity measurements for retails. MBA demonstrates in the simplest words what products are the most often mixed in orders. This partnerships may be exploited through cross-sale, reviews, sales or even placements of products in a menu or shop to maximize profitability. The hypothesis is that consumers who purchase a particular object (or collection of items) are more likely to purchase another unique item (or group of items).
library(kableExtra)
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
First we need to read data which i found from the Kaggle
transactions = read.transactions("Groceries.csv",format = "basket",sep = ",",skip = 0,header = TRUE)
Let’s have a look at the item frequency plot
itemFrequencyPlot(transactions,topN = 20,type = "absolute", main = "Item frequency",cex.names = 0.85)
We are using cat() method for acquire some information from our data
cat("Number of baskets:", length(transactions))
## Number of baskets: 9001
print("/")
## [1] "/"
cat("Number of unique items:", sum(size(transactions)))
## Number of unique items: 34456
print("/")
## [1] "/"
cat("The biggest basket consists of", ncol(transactions), "products.")
## The biggest basket consists of 170 products.
R. Agrawal and R. Srikant in 1994 give Apriori algorithm to detect frequent itemsets in a boolean association rule dataset. The algorithm’s name is Apriori because it uses prior knowledge of frequent set characteristics. We use an iterative or level-wise search method in which k- frequent itemsets are used to locate k+1 itemsets. An essential property called Apriori Property which reduces the dimensionality is used to enhance the effectiveness of level-sensitive generation of frequent itemsets.
rules = apriori(transactions, parameter = list(supp = 0.01, conf = 0.45))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.45 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 90
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[170 item(s), 9001 transaction(s)] done [0.00s].
## sorting and recoding items ... [77 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [22 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Support: the amount of transactions including all things in an object. The higher the support, the more likely the collection happens. High support laws are preferred because a significant amount of potential purchases are expected to apply.
support = sort(rules, by = "support", decreasing = TRUE)
support_tab = inspect(head(support), linebreak = FALSE)
## lhs rhs support confidence
## [1] {root vegetables} => {whole milk} 0.04343962 0.4535963
## [2] {whipped/sour cream} => {whole milk} 0.02855238 0.4508772
## [3] {domestic eggs} => {whole milk} 0.02721920 0.4900000
## [4] {butter} => {whole milk} 0.02433063 0.5011442
## [5] {curd} => {whole milk} 0.02266415 0.4892086
## [6] {other vegetables,root vegetables} => {whole milk} 0.02066437 0.4946809
## coverage lift count
## [1] 0.09576714 2.039371 391
## [2] 0.06332630 2.027146 257
## [3] 0.05554938 2.203042 245
## [4] 0.04855016 2.253146 219
## [5] 0.04632819 2.199484 204
## [6] 0.04177314 2.224087 186
Confidence: probability that a transaction involving products on the right side of the law is also included . The greater your confidence, the more inclined you are to buy the item on your right hand, or, in other words, the higher you would predict the return rate under a certain law.
confidence = sort(rules, by = "confidence", decreasing = TRUE)
confidence_tab = inspect(head(confidence), linebreak = FALSE)
## lhs rhs support
## [1] {root vegetables,yogurt} => {whole milk} 0.01366515
## [2] {domestic eggs,other vegetables} => {whole milk} 0.01110988
## [3] {root vegetables,tropical fruit} => {other vegetables} 0.01133207
## [4] {root vegetables,tropical fruit} => {whole milk} 0.01122098
## [5] {tropical fruit,yogurt} => {whole milk} 0.01344295
## [6] {rolls/buns,root vegetables} => {whole milk} 0.01133207
## confidence coverage lift count
## [1] 0.5885167 0.02321964 2.645974 123
## [2] 0.5847953 0.01899789 2.629242 100
## [3] 0.5828571 0.01944228 3.483597 102
## [4] 0.5771429 0.01944228 2.594837 101
## [5] 0.5401786 0.02488612 2.428645 121
## [6] 0.5340314 0.02121986 2.401007 102
plot(rules, engine="plotly")
set.seed(987)
plot(support, method = "graph", cex = 0.7, control = list(layout=igraph::in_circle()))
plot(support, method = "paracoord", measure = "support", lty = "dotted")
set.seed(987)
plot(confidence, method = "graph", cex = 0.7, control = list(layout=igraph::in_circle()))
plot(confidence, method = "paracoord", measure = "confidence", lty = "dotted")