Introduction

The paper aims to review different algorithms and methods used for market basket analysis. Data was initially analysed, two different methods - Eclut and Apriori - were used in order to identify the rules and obtained results were visualized.

Data preparation

First of all, necessary libraries are loaded and the database is imported.

#data loading
library(gridExtra)
library(grid)
library(ggplot2)
library(lattice)
library(arules)
data("Groceries")

Let’s have a look at the data from the Groceries database coming from the library arules.

length(Groceries)
## [1] 9835
itemFrequency(Groceries, type="absolute")
##               frankfurter                   sausage                liver loaf 
##                       580                       924                        50 
##                       ham                      meat         finished products 
##                       256                       254                        64 
##           organic sausage                   chicken                    turkey 
##                        22                       422                        80 
##                      pork                      beef            hamburger meat 
##                       567                       516                       327 
##                      fish              citrus fruit            tropical fruit 
##                        29                       814                      1032 
##                 pip fruit                    grapes                   berries 
##                       744                       220                       327 
##               nuts/prunes           root vegetables                    onions 
##                        33                      1072                       305 
##                     herbs          other vegetables packaged fruit/vegetables 
##                       160                      1903                       128 
##                whole milk                    butter                      curd 
##                      2513                       545                       524 
##                   dessert               butter milk                    yogurt 
##                       365                       275                      1372 
##        whipped/sour cream                 beverages                  UHT-milk 
##                       705                       256                       329 
##            condensed milk                     cream               soft cheese 
##                       101                        13                       168 
##             sliced cheese               hard cheese             cream cheese  
##                       241                       241                       390 
##          processed cheese             spread cheese               curd cheese 
##                       163                       110                        50 
##          specialty cheese                mayonnaise            salad dressing 
##                        84                        90                         8 
##                   tidbits         frozen vegetables             frozen fruits 
##                        23                       473                        12 
##              frozen meals               frozen fish            frozen chicken 
##                       279                       115                         6 
##                 ice cream            frozen dessert    frozen potato products 
##                       246                       106                        83 
##             domestic eggs                rolls/buns               white bread 
##                       624                      1809                       414 
##               brown bread                    pastry            roll products  
##                       638                       875                       101 
##       semi-finished bread                  zwieback           potato products 
##                       174                        68                        28 
##                     flour                      salt                      rice 
##                       171                       106                        75 
##                     pasta                   vinegar                       oil 
##                       148                        64                       276 
##                 margarine             specialty fat                     sugar 
##                       576                        36                       333 
##          artif. sweetener                     honey                   mustard 
##                        32                        15                       118 
##                   ketchup                    spices                     soups 
##                        42                        51                        67 
##               ready soups     Instant food products                    sauces 
##                        18                        79                        54 
##                   cereals          organic products             baking powder 
##                        56                        16                       174 
##     preservation products            pudding powder         canned vegetables 
##                         2                        23                       106 
##              canned fruit        pickled vegetables      specialty vegetables 
##                        32                       176                        17 
##                       jam             sweet spreads              meat spreads 
##                        53                        89                        42 
##               canned fish                  dog food                  cat food 
##                       148                        84                       229 
##                  pet care                 baby food                    coffee 
##                        93                         1                       571 
##            instant coffee                       tea              cocoa drinks 
##                        73                        38                        22 
##             bottled water                      soda           misc. beverages 
##                      1087                      1715                       279 
##     fruit/vegetable juice                     syrup              bottled beer 
##                       711                        32                       792 
##               canned beer                    brandy                    whisky 
##                       764                        41                         8 
##                    liquor                       rum                   liqueur 
##                       109                        44                         9 
##        liquor (appetizer)                white wine            red/blush wine 
##                        78                       187                       189 
##                  prosecco            sparkling wine               salty snack 
##                        20                        55                       372 
##                   popcorn                 nut snack            snack products 
##                        71                        31                        30 
##  long life bakery product                   waffles                  cake bar 
##                       368                       378                       130 
##               chewing gum                 chocolate         cooking chocolate 
##                       207                       488                        25 
##       specialty chocolate             specialty bar     chocolate marshmallow 
##                       299                       269                        89 
##                     candy         seasonal products                 detergent 
##                       294                       140                       189 
##                  softener               decalcifier              dish cleaner 
##                        54                        15                       103 
##          abrasive cleaner                   cleaner            toilet cleaner 
##                        35                        50                         7 
##          bathroom cleaner                hair spray               dental care 
##                        27                        11                        57 
##            male cosmetics           make up remover                 skin care 
##                        45                         8                        35 
##  female sanitary products            baby cosmetics                      soap 
##                        60                         6                        26 
##           rubbing alcohol          hygiene articles                   napkins 
##                        10                       324                       515 
##                    dishes                  cookware           kitchen utensil 
##                       173                        27                         4 
##           cling film/bags            kitchen towels    house keeping products 
##                       112                        59                        82 
##                   candles               light bulbs      sound storage medium 
##                        88                        41                         1 
##                newspapers                photo/film                pot plants 
##                       785                        91                       170 
##    flower soil/fertilizer            flower (seeds)             shopping bags 
##                        19                       102                       969 
##                      bags 
##                         4

Frequency of each product was displayed in the table above. This output is not very informative, therefore below we can see the percentage frequency of each product.

round(itemFrequency(Groceries),3)
##               frankfurter                   sausage                liver loaf 
##                     0.059                     0.094                     0.005 
##                       ham                      meat         finished products 
##                     0.026                     0.026                     0.007 
##           organic sausage                   chicken                    turkey 
##                     0.002                     0.043                     0.008 
##                      pork                      beef            hamburger meat 
##                     0.058                     0.052                     0.033 
##                      fish              citrus fruit            tropical fruit 
##                     0.003                     0.083                     0.105 
##                 pip fruit                    grapes                   berries 
##                     0.076                     0.022                     0.033 
##               nuts/prunes           root vegetables                    onions 
##                     0.003                     0.109                     0.031 
##                     herbs          other vegetables packaged fruit/vegetables 
##                     0.016                     0.193                     0.013 
##                whole milk                    butter                      curd 
##                     0.256                     0.055                     0.053 
##                   dessert               butter milk                    yogurt 
##                     0.037                     0.028                     0.140 
##        whipped/sour cream                 beverages                  UHT-milk 
##                     0.072                     0.026                     0.033 
##            condensed milk                     cream               soft cheese 
##                     0.010                     0.001                     0.017 
##             sliced cheese               hard cheese             cream cheese  
##                     0.025                     0.025                     0.040 
##          processed cheese             spread cheese               curd cheese 
##                     0.017                     0.011                     0.005 
##          specialty cheese                mayonnaise            salad dressing 
##                     0.009                     0.009                     0.001 
##                   tidbits         frozen vegetables             frozen fruits 
##                     0.002                     0.048                     0.001 
##              frozen meals               frozen fish            frozen chicken 
##                     0.028                     0.012                     0.001 
##                 ice cream            frozen dessert    frozen potato products 
##                     0.025                     0.011                     0.008 
##             domestic eggs                rolls/buns               white bread 
##                     0.063                     0.184                     0.042 
##               brown bread                    pastry            roll products  
##                     0.065                     0.089                     0.010 
##       semi-finished bread                  zwieback           potato products 
##                     0.018                     0.007                     0.003 
##                     flour                      salt                      rice 
##                     0.017                     0.011                     0.008 
##                     pasta                   vinegar                       oil 
##                     0.015                     0.007                     0.028 
##                 margarine             specialty fat                     sugar 
##                     0.059                     0.004                     0.034 
##          artif. sweetener                     honey                   mustard 
##                     0.003                     0.002                     0.012 
##                   ketchup                    spices                     soups 
##                     0.004                     0.005                     0.007 
##               ready soups     Instant food products                    sauces 
##                     0.002                     0.008                     0.005 
##                   cereals          organic products             baking powder 
##                     0.006                     0.002                     0.018 
##     preservation products            pudding powder         canned vegetables 
##                     0.000                     0.002                     0.011 
##              canned fruit        pickled vegetables      specialty vegetables 
##                     0.003                     0.018                     0.002 
##                       jam             sweet spreads              meat spreads 
##                     0.005                     0.009                     0.004 
##               canned fish                  dog food                  cat food 
##                     0.015                     0.009                     0.023 
##                  pet care                 baby food                    coffee 
##                     0.009                     0.000                     0.058 
##            instant coffee                       tea              cocoa drinks 
##                     0.007                     0.004                     0.002 
##             bottled water                      soda           misc. beverages 
##                     0.111                     0.174                     0.028 
##     fruit/vegetable juice                     syrup              bottled beer 
##                     0.072                     0.003                     0.081 
##               canned beer                    brandy                    whisky 
##                     0.078                     0.004                     0.001 
##                    liquor                       rum                   liqueur 
##                     0.011                     0.004                     0.001 
##        liquor (appetizer)                white wine            red/blush wine 
##                     0.008                     0.019                     0.019 
##                  prosecco            sparkling wine               salty snack 
##                     0.002                     0.006                     0.038 
##                   popcorn                 nut snack            snack products 
##                     0.007                     0.003                     0.003 
##  long life bakery product                   waffles                  cake bar 
##                     0.037                     0.038                     0.013 
##               chewing gum                 chocolate         cooking chocolate 
##                     0.021                     0.050                     0.003 
##       specialty chocolate             specialty bar     chocolate marshmallow 
##                     0.030                     0.027                     0.009 
##                     candy         seasonal products                 detergent 
##                     0.030                     0.014                     0.019 
##                  softener               decalcifier              dish cleaner 
##                     0.005                     0.002                     0.010 
##          abrasive cleaner                   cleaner            toilet cleaner 
##                     0.004                     0.005                     0.001 
##          bathroom cleaner                hair spray               dental care 
##                     0.003                     0.001                     0.006 
##            male cosmetics           make up remover                 skin care 
##                     0.005                     0.001                     0.004 
##  female sanitary products            baby cosmetics                      soap 
##                     0.006                     0.001                     0.003 
##           rubbing alcohol          hygiene articles                   napkins 
##                     0.001                     0.033                     0.052 
##                    dishes                  cookware           kitchen utensil 
##                     0.018                     0.003                     0.000 
##           cling film/bags            kitchen towels    house keeping products 
##                     0.011                     0.006                     0.008 
##                   candles               light bulbs      sound storage medium 
##                     0.009                     0.004                     0.000 
##                newspapers                photo/film                pot plants 
##                     0.080                     0.009                     0.017 
##    flower soil/fertilizer            flower (seeds)             shopping bags 
##                     0.002                     0.010                     0.099 
##                      bags 
##                     0.000

It is hard to point out the most frequent products, thus the 25 most common products were presented on the graph.

itemFrequencyPlot(Groceries, topN=25, type="relative", main="Item Frequency") 

itemFrequencyPlot(Groceries, topN=25, type="absolute", main="Item Frequency") 


First graph shows relative frequency of each product, while second one shows absolute values. Whole milk, other vegetables and rolls/buns are amongst the most commonly bought products.

Association rules analysis

Most association rules are based on four fundamental measures: lift, support, confidence and expected confidence. Support measures the ratio of the number of transaction containing both products to the total number of transactions in the analysed database. The higher the value, the more frequently products appear in analysed transactions. Confidence measures the ratio between number of transactions with both products to the number of transactions containing only one of the products. Expected confidence might be used as a benchmark for other measures, as it shows the percentage of transactions containing the analysed product in the whole database. Finally, lift measures the ratio between the confidence and expected confidence. In other word this measure shows whether products occur jointly more frequently than if they were independent.

Firstly, the Eclat algorithm was used to distinguish the most common sets of products.

#eclat algorithm
Groceries_eclat<-eclat(Groceries, parameter=list(supp=0.03, minlen=2, maxlen=5)) 
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.03      2      5 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 295 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [44 item(s)] done [0.00s].
## creating sparse bit matrix ... [44 row(s), 9835 column(s)] done [0.00s].
## writing  ... [19 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(Groceries_eclat) 
##      items                               support    count
## [1]  {whole milk, whipped/sour cream}    0.03223183 317  
## [2]  {pip fruit, whole milk}             0.03009659 296  
## [3]  {whole milk, pastry}                0.03324860 327  
## [4]  {citrus fruit, whole milk}          0.03050330 300  
## [5]  {sausage, rolls/buns}               0.03060498 301  
## [6]  {whole milk, bottled water}         0.03436706 338  
## [7]  {tropical fruit, whole milk}        0.04229792 416  
## [8]  {tropical fruit, other vegetables}  0.03589222 353  
## [9]  {root vegetables, whole milk}       0.04890696 481  
## [10] {root vegetables, other vegetables} 0.04738180 466  
## [11] {whole milk, soda}                  0.04006101 394  
## [12] {other vegetables, soda}            0.03274021 322  
## [13] {rolls/buns, soda}                  0.03833249 377  
## [14] {whole milk, yogurt}                0.05602440 551  
## [15] {other vegetables, yogurt}          0.04341637 427  
## [16] {yogurt, rolls/buns}                0.03436706 338  
## [17] {whole milk, rolls/buns}            0.05663447 557  
## [18] {other vegetables, rolls/buns}      0.04260295 419  
## [19] {other vegetables, whole milk}      0.07483477 736

Parameters were set to: 0.03 for support, 2 for minimum length and 5 for maximum length. This means that only sets of size 2-5 are analysed for which support is bigger than 0.03. Out of all sets of products in database, 19 meet the specified requirements. Additionally, one can try to identify rules within found sets. Below we can see 5 rules with confidence higher than 0.4.

#rules
rules_eclat<-ruleInduction(Groceries_eclat, Groceries, confidence=0.4) 
rules_eclat
## set of 5 rules
inspect(rules_eclat)
##     lhs                     rhs                support    confidence lift    
## [1] {whipped/sour cream} => {whole milk}       0.03223183 0.4496454  1.759754
## [2] {tropical fruit}     => {whole milk}       0.04229792 0.4031008  1.577595
## [3] {root vegetables}    => {whole milk}       0.04890696 0.4486940  1.756031
## [4] {root vegetables}    => {other vegetables} 0.04738180 0.4347015  2.246605
## [5] {yogurt}             => {whole milk}       0.05602440 0.4016035  1.571735
##     itemset
## [1]  1     
## [2]  7     
## [3]  9     
## [4] 10     
## [5] 14

The highest support is observed for pair yoghurt and whole milk, which means that both of these products were present in around 6% of all transactions. All five rules have the lift values above 1, and pair root vegetable and other vegetables have lift equal to 2.25. This means that root vegetables are bought with other vegetables 2.25 times more often than if these products were fully independent.

Second algorithm used to determine rules was the Apriori algorithm. In order to assess the rules more conveniently, they were sorted by lift and by support.

#apriori

Groceries_apriori<-apriori(Groceries, parameter=list(supp=0.03, conf=0.4))  
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.03      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 295 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [44 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_apriori_lift<-sort(Groceries_apriori, by="lift", decreasing=TRUE)
inspect(rules_apriori_lift)
##     lhs                     rhs                support    confidence coverage  
## [1] {root vegetables}    => {other vegetables} 0.04738180 0.4347015  0.10899847
## [2] {whipped/sour cream} => {whole milk}       0.03223183 0.4496454  0.07168277
## [3] {root vegetables}    => {whole milk}       0.04890696 0.4486940  0.10899847
## [4] {tropical fruit}     => {whole milk}       0.04229792 0.4031008  0.10493137
## [5] {yogurt}             => {whole milk}       0.05602440 0.4016035  0.13950178
##     lift     count
## [1] 2.246605 466  
## [2] 1.759754 317  
## [3] 1.756031 481  
## [4] 1.577595 416  
## [5] 1.571735 551
rules_apriori_support<-sort(Groceries_apriori, by="support", decreasing=TRUE) 
inspect(rules_apriori_support)
##     lhs                     rhs                support    confidence coverage  
## [1] {yogurt}             => {whole milk}       0.05602440 0.4016035  0.13950178
## [2] {root vegetables}    => {whole milk}       0.04890696 0.4486940  0.10899847
## [3] {root vegetables}    => {other vegetables} 0.04738180 0.4347015  0.10899847
## [4] {tropical fruit}     => {whole milk}       0.04229792 0.4031008  0.10493137
## [5] {whipped/sour cream} => {whole milk}       0.03223183 0.4496454  0.07168277
##     lift     count
## [1] 1.571735 551  
## [2] 1.756031 481  
## [3] 2.246605 466  
## [4] 1.577595 416  
## [5] 1.759754 317

Again root vegetables with other vegetables have the highest lift value, while yoghurt and whole milk have the highest support. Results for both algorithms are pretty much in line with intuition: yoghurt and whipped cream are often bought with whole milk, root vegetables are often bought with other vegetables. It is worth mentioning that 4 out of 5 rules contain whole milk, which was earlier determined as the most frequent product in whole database.

Visualization

library(arulesViz)

plot(rules_eclat, method="graph", control=list(type="items")) 
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

plot(rules_eclat, method="paracoord", control=list(reorder=TRUE))

plot(rules_eclat, method="graph",  engine="htmlwidget")

As only 5 rules were identified, the plots are not that impressive and it is hard to utilize their potential. The most informative and also interactive is last plot. One can choose the product or the rule and inspect its interaction with other items alone.

Summary

The analysis of the Groceries dataset from library arules was performed. Used thresholds (support 0.03 and confidence 0.4) allowed for extraction of 5 rules. Whole milk appeared within analysed rules most frequently. However, this was most likely caused by the fact, that this product had the highest frequency out of all products - it appeared in over 25% of all transactions.