Using R for Market Basket Analysis

Author

Karani Keith

Market Basket Analysis

Market Basket Analysis(MBA) is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions.

In retail store, most purchases are bought on impulse, MBA gives clues to the store owners as to what a customer might have bought if the idea had occurred to them hence, it’s MBA can be used in deciding the location and promotion of goods inside the store.

If as has been observed, purchasers of yogurt are more likely to buy a short cake, then high margin short cakes can be placed near the yogurt aisle. The outcome of this type of technique in simple terms is a set of rules that can be understood as “if this, then that”

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket — and therefore 'Market Basket Analysis

load the libraries to use

library(arules)
Warning: package 'arules' was built under R version 4.2.3
Loading required package: Matrix
Warning: package 'Matrix' was built under R version 4.3.1

Attaching package: 'arules'
The following objects are masked from 'package:base':

    abbreviate, write
library(datasets)
#load the data set
data("Groceries")

lets explore the data before making any rules

#create an item frequency plot the top 20 items

itemFrequencyPlot(Groceries, topN =20, type ="absolute")

we set the minimum support to 0.001

confidence of 0.8

show the top 5 rules

#Get the rules
rules <- apriori(Groceries, parameter = list(support = 0.001, conf = 0.8))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 9 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
sorting and recoding items ... [157 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.01s].
writing ... [410 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
list(supp = 0.001, conf = 0.8)
$supp
[1] 0.001

$conf
[1] 0.8
# show the top 5 rules, but only 2 digits
options(digits = 2)
inspect(rules[1:5])
    lhs                         rhs            support confidence coverage lift
[1] {liquor, red/blush wine} => {bottled beer} 0.0019  0.90       0.0021   11.2
[2] {curd, cereals}          => {whole milk}   0.0010  0.91       0.0011    3.6
[3] {yogurt, cereals}        => {whole milk}   0.0017  0.81       0.0021    3.2
[4] {butter, jam}            => {whole milk}   0.0010  0.83       0.0012    3.3
[5] {soups, bottled beer}    => {whole milk}   0.0011  0.92       0.0012    3.6
    count
[1] 19   
[2] 10   
[3] 17   
[4] 10   
[5] 11   

sort the rules by confidence to get the most relevant rules

rules <- sort(rules, by="confidence", decreasing = TRUE)

inspect(rules[1:10])
     lhs                      rhs                support confidence coverage lift count
[1]  {rice,                                                                            
      sugar}               => {whole milk}        0.0012          1   0.0012  3.9    12
[2]  {canned fish,                                                                     
      hygiene articles}    => {whole milk}        0.0011          1   0.0011  3.9    11
[3]  {root vegetables,                                                                 
      butter,                                                                          
      rice}                => {whole milk}        0.0010          1   0.0010  3.9    10
[4]  {root vegetables,                                                                 
      whipped/sour cream,                                                              
      flour}               => {whole milk}        0.0017          1   0.0017  3.9    17
[5]  {butter,                                                                          
      soft cheese,                                                                     
      domestic eggs}       => {whole milk}        0.0010          1   0.0010  3.9    10
[6]  {citrus fruit,                                                                    
      root vegetables,                                                                 
      soft cheese}         => {other vegetables}  0.0010          1   0.0010  5.2    10
[7]  {pip fruit,                                                                       
      butter,                                                                          
      hygiene articles}    => {whole milk}        0.0010          1   0.0010  3.9    10
[8]  {root vegetables,                                                                 
      whipped/sour cream,                                                              
      hygiene articles}    => {whole milk}        0.0010          1   0.0010  3.9    10
[9]  {pip fruit,                                                                       
      root vegetables,                                                                 
      hygiene articles}    => {whole milk}        0.0010          1   0.0010  3.9    10
[10] {cream cheese ,                                                                   
      domestic eggs,                                                                   
      sugar}               => {whole milk}        0.0011          1   0.0011  3.9    11
rules <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8, maxlen=3))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.8    0.1    1 none FALSE            TRUE       5   0.001      1
 maxlen target  ext
      3  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 9 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [157 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3
Warning in apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8, :
Mining stopped (maxlen reached). Only patterns up to a length of 3 returned!
 done [0.00s].
writing ... [29 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

Remove redundant rules generated

subset.matrix <- is.subset(rules, rules)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
Warning in `[<-`(`*tmp*`, as.vector(i), value = NA): x[.] <- val: x is
"ngTMatrix", val not in {TRUE, FALSE} is coerced; NA |--> TRUE.
redudant <- colSums(subset.matrix, na.rm = T) >= 1
rules.pruned <- rules[!redudant]
rules <- rules.pruned

Targeting items:

  1. what are customers likely to buy before buying whole milk

  2. what are customers likely to buy if they purchase whole milk

::: {.cell}

```{.r .cell-code}
#adjust our apriori function as follows
rules <- apriori(data = Groceries, parameter = list(supp = 0.001, conf= 0.08), appearance = list(default = "lhs", rhs = "whole milk"), control = list(verbose =F))

rules <- sort(rules, decreasing = TRUE, by="confidence")
inspect(rules[1:5])
```

::: {.cell-output .cell-output-stdout}
```
    lhs                      rhs          support confidence coverage lift count
[1] {rice,                                                                      
     sugar}               => {whole milk}  0.0012          1   0.0012  3.9    12
[2] {canned fish,                                                               
     hygiene articles}    => {whole milk}  0.0011          1   0.0011  3.9    11
[3] {root vegetables,                                                           
     butter,                                                                    
     rice}                => {whole milk}  0.0010          1   0.0010  3.9    10
[4] {root vegetables,                                                           
     whipped/sour cream,                                                        
     flour}               => {whole milk}  0.0017          1   0.0017  3.9    17
[5] {butter,                                                                    
     soft cheese,                                                               
     domestic eggs}       => {whole milk}  0.0010          1   0.0010  3.9    10
```
:::
:::


we set the confidence to 0.15 since to be "whole milk" and find its antecedents

1.  we set ye confidence to 0.15 since we get no rules with 0.8
2.  we set a minimum length of 2 to avoid empty left hand side items
rules <- apriori(data = Groceries, parameter = list(supp = 0.001, conf = 0.15, minlen = 2), 
                 appearance = 
list(default = "rhs", lhs = "whole milk"),
                      control = list(verbose = F))

rules <- sort(rules,
              decreasing = TRUE, by = "confidence")

inspect(rules[1:5])
    lhs             rhs                support confidence coverage lift count
[1] {whole milk} => {other vegetables} 0.075   0.29       0.26     1.5  736  
[2] {whole milk} => {rolls/buns}       0.057   0.22       0.26     1.2  557  
[3] {whole milk} => {yogurt}           0.056   0.22       0.26     1.6  551  
[4] {whole milk} => {root vegetables}  0.049   0.19       0.26     1.8  481  
[5] {whole milk} => {tropical fruit}   0.042   0.17       0.26     1.6  416  

Visualization

Lets map the rules in a graph

library(arulesViz)

plot(rules, method = "graph", interactive = TRUE)
Warning in plot.rules(rules, method = "graph", interactive = TRUE): The
parameter interactive is deprecated. Use engine='interactive' instead.