Introduction

In this project, we will be performing association rules using the Apriori algorithm on market basket data from Kaggle. Market basket analysis is a technique that helps in uncovering relationships between different products based on customer transactions. By analyzing these relationships, we can identify frequently co-occurring items and use this information to improve product placement, marketing strategies, and ultimately increase sales. The Apriori algorithm is a popular method for mining association rules, and we will be using it to identify interesting and meaningful associations between items in our market basket dataset. The dataset we will be using is available on Kaggle and contains transactional data from a retail store.

Data Source

Libraries

library(arules)
library(arulesViz)
library(arulesCBA)
library(tidyverse)
library(readxl)
library(RColorBrewer)
library(arulesViz)
library(gridExtra)
library(cowplot)
library(ggpubr)

EDA

MBA <- read_excel("Data/MBA/market basket analysist.xlsx",
                        sheet = "Worksheet",
                        range = "A1:M1864")
write.csv(MBA, "Data/MBA/market basket analysis.csv")
MBA
Trans <-read.transactions("Data/MBA/market basket analysis.csv", 
                          format = "basket", sep=",", header = TRUE)
summary(Trans)
## transactions as itemMatrix in sparse format with
##  1863 rows (elements/itemsets/transactions) and
##  1875 columns (items) and a density of 0.002464269 
## 
## most frequent items:
##  frozen meals        butter baking powder        coffee          fish 
##          1002           840           663           606           563 
##       (Other) 
##          4934 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8 
##  14 210 256 360 421 391 173  38 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   5.000   4.621   6.000   8.000 
## 
## includes extended item information - examples:
##   labels
## 1      1
## 2     10
## 3    100
inspect(tail(Trans))
##     items               
## [1] {1858,              
##      baking powder,     
##      butter,            
##      frozen meals,      
##      frozen vegetables} 
## [2] {1859,              
##      baking powder,     
##      butter,            
##      frozen meals,      
##      frozen vegetables, 
##      grapes}            
## [3] {1860,              
##      abrasive cleaner,  
##      butter,            
##      coffee,            
##      fish,              
##      frozen vegetables, 
##      ice cream}         
## [4] {1861,              
##      coffee,            
##      fish,              
##      frozen meals,      
##      frozen vegetables, 
##      grapes}            
## [5] {1862,              
##      butter,            
##      fish,              
##      frozen meals,      
##      ice cream}         
## [6] {1863,              
##      baking powder,     
##      butter,            
##      cake bar,          
##      coffee,            
##      frozen meals,      
##      grapes}
arules::itemFrequencyPlot(Trans,
   topN=10,
   col=brewer.pal(10,'BrBG'),
   main='Absolute Item Frequency Plot',
   type="absolute",
   ylab="Item Frequency (Absolute)",
   xlab= "Retail items",
) 

Association Rules

apriori Algorithm

MBA_AR <- apriori(Trans, parameter = list(supp = 0.01, conf = 0.75))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.75    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 18 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1875 item(s), 1863 transaction(s)] done [0.00s].
## sorting and recoding items ... [12 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [148 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(MBA_AR[1:2])
##     lhs                       rhs                 support    confidence
## [1] {coffee}               => {frozen meals}      0.31186259 0.9587459 
## [2] {baking powder, honey} => {frozen vegetables} 0.01663983 0.7560976 
##     coverage   lift     count
## [1] 0.32528180 1.782578 581  
## [2] 0.02200751 2.603715  31
MBA_AR_data <- as(MBA_AR,"data.frame")
head(MBA_AR_data, 6)

SUPPORT

inspect(sort(MBA_AR, by = "support",decreasing = TRUE )[1:5])
##     lhs                              rhs            support   confidence
## [1] {coffee}                      => {frozen meals} 0.3118626 0.9587459 
## [2] {baking powder, frozen meals} => {butter}       0.1669351 0.7585366 
## [3] {butter, coffee}              => {frozen meals} 0.1556629 0.9324759 
## [4] {baking powder, coffee}       => {frozen meals} 0.1336554 0.9576923 
## [5] {coffee, fish}                => {frozen meals} 0.1143317 0.9424779 
##     coverage  lift     count
## [1] 0.3252818 1.782578 581  
## [2] 0.2200751 1.682326 311  
## [3] 0.1669351 1.733735 290  
## [4] 0.1395598 1.780620 249  
## [5] 0.1213097 1.752332 213

The association rule stating that if a customer buys coffee, they are likely to also buy frozen meals had the highest level of support, appearing 581 times which accounts for around 1.8% of all transactions in the dataset.

CONFIDENCE

inspect(sort(MBA_AR, by = "confidence", decreasing = FALSE )[1:5])
##     lhs                                         rhs                 support   
## [1] {butter, frozen vegetables, honey}       => {frozen meals}      0.01771337
## [2] {domestic eggs, frozen meals, ice cream} => {coffee}            0.01610306
## [3] {coffee, frozen vegetables, ice cream}   => {butter}            0.01449275
## [4] {butter, coffee, fish, ice cream}        => {frozen meals}      0.01288245
## [5] {baking powder, honey}                   => {frozen vegetables} 0.01663983
##     confidence coverage   lift     count
## [1] 0.7500000  0.02361782 1.394461 33   
## [2] 0.7500000  0.02147075 2.305693 30   
## [3] 0.7500000  0.01932367 1.663393 27   
## [4] 0.7500000  0.01717660 1.394461 24   
## [5] 0.7560976  0.02200751 2.603715 31

The association rule {baking powder, honey} => {frozen meals} has a confidence of 0.7561, indicating that when a customer purchases baking powder and honey, there is a 75.61% probability that they will also purchase frozen meals.

LIFT

inspect(sort(MBA_AR, by = "lift", decreasing = TRUE)[1:5])
##     lhs                    rhs                    support confidence   coverage     lift count
## [1] {baking powder,                                                                           
##      coffee,                                                                                  
##      honey}             => {frozen vegetables} 0.01234568  0.9200000 0.01341922 3.168133    23
## [2] {baking powder,                                                                           
##      butter,                                                                                  
##      coffee,                                                                                  
##      honey}             => {frozen vegetables} 0.01127214  0.9130435 0.01234568 3.144177    21
## [3] {baking powder,                                                                           
##      coffee,                                                                                  
##      frozen meals,                                                                            
##      honey}             => {frozen vegetables} 0.01127214  0.9130435 0.01234568 3.144177    21
## [4] {baking powder,                                                                           
##      butter,                                                                                  
##      coffee,                                                                                  
##      frozen meals,                                                                            
##      honey}             => {frozen vegetables} 0.01019860  0.9047619 0.01127214 3.115659    19
## [5] {abrasive cleaner,                                                                        
##      butter,                                                                                  
##      fish,                                                                                    
##      ice cream}         => {coffee}            0.01180891  1.0000000 0.01180891 3.074257    22

` The association rule {baking powder, coffee, honey} => {frozen vegetables} has a lift of 3.16, indicating that the four items are more likely to be purchased together compared to when they are purchased with additional items or fewer items.

Visualizations

Ralationship betwwen support, confidence, and lift

This shows the association of the items through the algorithm of support, confidence and lift

plot(MBA_AR, measure=c("support","confidence"), 
     shading="lift", engine = "plotly")

Visualizations of Items as a Consequent

goods <- unique(MBA$item1)[1:12]
goods_rules_list = list()
goods_rules_plots = list()
for (g in goods){
  goods_rules = apriori(data = Trans,
                        parameter = list(supp = 0.001, conf = 0.75),
                        appearance = list(default = "lhs", rhs = g),
                        control = list(verbose = F))
  goods_rules_list[[g]] = sort(goods_rules, by="support", decreasing=TRUE)
  goods_rules_plots[[g]] = plot(head(goods_rules_list[[g]]), method="graph") + 
    labs(title = paste(g, "as a consequent item")) +
    theme(plot.title = element_text(size=9)) +
    theme_bw()
}

ggarrange(plotlist = goods_rules_plots,
          common.legend = TRUE, ncol = 3)
## $`1`

## 
## $`2`

## 
## $`3`

## 
## $`4`

## 
## attr(,"class")
## [1] "list"      "ggarrange"

Visualizations of Items as Antecedents

goods <- unique(MBA$item1)[1:12]
goods_ant_rules_list = list()
goods_ant_rules_plots = list()
for (g in goods){
  goods_rules = apriori(data = Trans,
                        parameter = list(supp = 0.01, conf = 0.075, minlen=2),
                        appearance = list(default = "rhs", lhs = g),
                        control = list(verbose = F))
  goods_ant_rules_list[[g]] = sort(goods_rules, by="confidence", decreasing=TRUE)
  goods_ant_rules_plots[[g]] = plot(head(goods_ant_rules_list[[g]]), method="graph") +
    labs(title = paste(g, "as an antecedent  item")) +
    theme(plot.title = element_text(size=9)) +
    theme_bw()
}

ggarrange(plotlist = goods_ant_rules_plots,
          common.legend = TRUE, ncol = 3)
## $`1`

## 
## $`2`

## 
## $`3`

## 
## $`4`

## 
## attr(,"class")
## [1] "list"      "ggarrange"

The above graph above shows the items that are also bought as a result of buying the item on the title of the chart, for instance. For instance we see that when the customer buys baking powder, there is also a high likelihood of buying butter. Which is in line with expectations.

Conclusion

Association rules are important in uncovering how products are related based on transactional data. They aid in identifying how frequently certain items should be stocked in a retail shop or supermarket, taking into account how the purchase of one product influences the likelihood of purchasing another. As a result, these products can be displayed together to facilitate easier purchases for customers.