Market Basket Analysis

Market basket analysis (MBA), which allows management to consider and eventually satisfy their customers better, is a series of mathematical affinity measurements for retails. MBA demonstrates in the simplest words what products are the most often mixed in orders. This partnerships may be exploited through cross-sale, reviews, sales or even placements of products in a menu or shop to maximize profitability. The hypothesis is that consumers who purchase a particular object (or collection of items) are more likely to purchase another unique item (or group of items).

library(kableExtra)
library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

## Loading required package: grid

First we need to read data which i found from the Kaggle

transactions = read.transactions("Groceries.csv",format = "basket",sep = ",",skip = 0,header = TRUE)

Let’s have a look at the item frequency plot

itemFrequencyPlot(transactions,topN = 20,type = "absolute", main = "Item frequency",cex.names = 0.85)

We are using cat() method for acquire some information from our data

cat("Number of baskets:", length(transactions))

## Number of baskets: 9001

print("/")

## [1] "/"

cat("Number of unique items:", sum(size(transactions)))

## Number of unique items: 34456

print("/")

## [1] "/"

cat("The biggest basket consists of", ncol(transactions), "products.")

## The biggest basket consists of 170 products.

R. Agrawal and R. Srikant in 1994 give Apriori algorithm to detect frequent itemsets in a boolean association rule dataset. The algorithm’s name is Apriori because it uses prior knowledge of frequent set characteristics. We use an iterative or level-wise search method in which k- frequent itemsets are used to locate k+1 itemsets. An essential property called Apriori Property which reduces the dimensionality is used to enhance the effectiveness of level-sensitive generation of frequent itemsets.

rules = apriori(transactions, parameter = list(supp = 0.01, conf = 0.45))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.45    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 90 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[170 item(s), 9001 transaction(s)] done [0.00s].
## sorting and recoding items ... [77 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [22 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Support: the amount of transactions including all things in an object. The higher the support, the more likely the collection happens. High support laws are preferred because a significant amount of potential purchases are expected to apply.

support = sort(rules, by = "support", decreasing = TRUE)
support_tab = inspect(head(support), linebreak = FALSE)

##     lhs                                   rhs          support    confidence
## [1] {root vegetables}                  => {whole milk} 0.04343962 0.4535963 
## [2] {whipped/sour cream}               => {whole milk} 0.02855238 0.4508772 
## [3] {domestic eggs}                    => {whole milk} 0.02721920 0.4900000 
## [4] {butter}                           => {whole milk} 0.02433063 0.5011442 
## [5] {curd}                             => {whole milk} 0.02266415 0.4892086 
## [6] {other vegetables,root vegetables} => {whole milk} 0.02066437 0.4946809 
##     coverage   lift     count
## [1] 0.09576714 2.039371 391  
## [2] 0.06332630 2.027146 257  
## [3] 0.05554938 2.203042 245  
## [4] 0.04855016 2.253146 219  
## [5] 0.04632819 2.199484 204  
## [6] 0.04177314 2.224087 186

Confidence: probability that a transaction involving products on the right side of the law is also included . The greater your confidence, the more inclined you are to buy the item on your right hand, or, in other words, the higher you would predict the return rate under a certain law.

confidence = sort(rules, by = "confidence", decreasing = TRUE)
confidence_tab = inspect(head(confidence), linebreak = FALSE)

##     lhs                                 rhs                support   
## [1] {root vegetables,yogurt}         => {whole milk}       0.01366515
## [2] {domestic eggs,other vegetables} => {whole milk}       0.01110988
## [3] {root vegetables,tropical fruit} => {other vegetables} 0.01133207
## [4] {root vegetables,tropical fruit} => {whole milk}       0.01122098
## [5] {tropical fruit,yogurt}          => {whole milk}       0.01344295
## [6] {rolls/buns,root vegetables}     => {whole milk}       0.01133207
##     confidence coverage   lift     count
## [1] 0.5885167  0.02321964 2.645974 123  
## [2] 0.5847953  0.01899789 2.629242 100  
## [3] 0.5828571  0.01944228 3.483597 102  
## [4] 0.5771429  0.01944228 2.594837 101  
## [5] 0.5401786  0.02488612 2.428645 121  
## [6] 0.5340314  0.02121986 2.401007 102

plot(rules, engine="plotly")

set.seed(987)

plot(support, method = "graph", cex = 0.7, control = list(layout=igraph::in_circle()))

plot(support, method = "paracoord", measure = "support", lty = "dotted")

set.seed(987)

plot(confidence, method = "graph", cex = 0.7, control = list(layout=igraph::in_circle()))

plot(confidence, method = "paracoord", measure = "confidence", lty = "dotted")

Market Basket Analysis

Gahraman Akbarov

3/2/2021