Market Basket Analysis

The aim of the paper is to implement the Market Basket Analysis (MBA) association rule to the case study of the grocery shop retail data. This unsupervised learning method shows the patterns in which one items (products) and linked with others. The rule “if - then” gives the clarification of the common behaviour of the subject which makes the decision, for instance, if one bought the bread, then he/she with some likelihood is going to buy also butter. This example seems to be not a “rocket science”, however, the the MBA tool gives the deeper insight into the other dependencies between the purchased products. Therefore, the MBA is a proper way to find the strongest dependencies in the product purchase behaviour.

Data

The data comes from the https://www.kaggle.com/irfanasrullah/groceries website and has rows 9834 and columns 32. Dataset contains the transactions and purchased products for each transaction.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(arules)
## Ładowanie wymaganego pakietu: Matrix
## 
## Dołączanie pakietu: 'Matrix'
## Następujące obiekty zostały zakryte z 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Dołączanie pakietu: 'arules'
## Następujący obiekt został zakryty z 'package:dplyr':
## 
##     recode
## Następujące obiekty zostały zakryte z 'package:base':
## 
##     abbreviate, write
library(arulesViz)
library(arulesCBA)
## 
## Dołączanie pakietu: 'arulesCBA'
## Następujący obiekt został zakryty z 'package:arules':
## 
##     rules
setwd("C:/Users/Mateusz/Documents/Studia/Master/UW/DS WNE/Unsupervised Learning/projekt UL")
gro <- read.csv("gro_csv.csv", sep = ",", header=T, skip=1)
dim(gro)
## [1] 9834   32
head(gro)
##       citrus.fruit semi.finished.bread      margarine              ready.soups
## 1   tropical fruit              yogurt         coffee                         
## 2       whole milk                                                            
## 3        pip fruit              yogurt   cream cheese             meat spreads
## 4 other vegetables          whole milk condensed milk long life bakery product
## 5       whole milk              butter         yogurt                     rice
## 6       rolls/buns                                                            
##                  X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14
## 1                                                                              
## 2                                                                              
## 3                                                                              
## 4                                                                              
## 5 abrasive cleaner                                                             
## 6                                                                              
##   X.15 X.16 X.17 X.18 X.19 X.20 X.21 X.22 X.23 X.24 X.25 X.26 X.27
## 1                                                                 
## 2                                                                 
## 3                                                                 
## 4                                                                 
## 5                                                                 
## 6
summary(gro)
##  citrus.fruit       semi.finished.bread  margarine         ready.soups       
##  Length:9834        Length:9834         Length:9834        Length:9834       
##  Class :character   Class :character    Class :character   Class :character  
##  Mode  :character   Mode  :character    Mode  :character   Mode  :character  
##       X                 X.1                X.2                X.3           
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      X.4                X.5                X.6                X.7           
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      X.8                X.9                X.10               X.11          
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      X.12               X.13               X.14               X.15          
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      X.16               X.17               X.18               X.19          
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      X.20               X.21               X.22               X.23          
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##      X.24               X.25               X.26               X.27          
##  Length:9834        Length:9834        Length:9834        Length:9834       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character
trans <- read.transactions("gro_csv.csv", format = "basket", sep=",", skip=1)
summary(trans)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics
round(itemFrequency(trans),3)
##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                     0.004                     0.003                     0.001 
##                 baby food                      bags             baking powder 
##                     0.000                     0.000                     0.018 
##          bathroom cleaner                      beef                   berries 
##                     0.003                     0.052                     0.033 
##                 beverages              bottled beer             bottled water 
##                     0.026                     0.081                     0.111 
##                    brandy               brown bread                    butter 
##                     0.004                     0.065                     0.055 
##               butter milk                  cake bar                   candles 
##                     0.028                     0.013                     0.009 
##                     candy               canned beer               canned fish 
##                     0.030                     0.078                     0.015 
##              canned fruit         canned vegetables                  cat food 
##                     0.003                     0.011                     0.023 
##                   cereals               chewing gum                   chicken 
##                     0.006                     0.021                     0.043 
##                 chocolate     chocolate marshmallow              citrus fruit 
##                     0.050                     0.009                     0.083 
##                   cleaner           cling film/bags              cocoa drinks 
##                     0.005                     0.011                     0.002 
##                    coffee            condensed milk         cooking chocolate 
##                     0.058                     0.010                     0.003 
##                  cookware                     cream              cream cheese 
##                     0.003                     0.001                     0.040 
##                      curd               curd cheese               decalcifier 
##                     0.053                     0.005                     0.002 
##               dental care                   dessert                 detergent 
##                     0.006                     0.037                     0.019 
##              dish cleaner                    dishes                  dog food 
##                     0.010                     0.018                     0.009 
##             domestic eggs  female sanitary products         finished products 
##                     0.063                     0.006                     0.007 
##                      fish                     flour            flower (seeds) 
##                     0.003                     0.017                     0.010 
##    flower soil/fertilizer               frankfurter            frozen chicken 
##                     0.002                     0.059                     0.001 
##            frozen dessert               frozen fish             frozen fruits 
##                     0.011                     0.012                     0.001 
##              frozen meals    frozen potato products         frozen vegetables 
##                     0.028                     0.008                     0.048 
##     fruit/vegetable juice                    grapes                hair spray 
##                     0.072                     0.022                     0.001 
##                       ham            hamburger meat               hard cheese 
##                     0.026                     0.033                     0.025 
##                     herbs                     honey    house keeping products 
##                     0.016                     0.002                     0.008 
##          hygiene articles                 ice cream            instant coffee 
##                     0.033                     0.025                     0.007 
##     Instant food products                       jam                   ketchup 
##                     0.008                     0.005                     0.004 
##            kitchen towels           kitchen utensil               light bulbs 
##                     0.006                     0.000                     0.004 
##                   liqueur                    liquor        liquor (appetizer) 
##                     0.001                     0.011                     0.008 
##                liver loaf  long life bakery product           make up remover 
##                     0.005                     0.037                     0.001 
##            male cosmetics                 margarine                mayonnaise 
##                     0.005                     0.059                     0.009 
##                      meat              meat spreads           misc. beverages 
##                     0.026                     0.004                     0.028 
##                   mustard                   napkins                newspapers 
##                     0.012                     0.052                     0.080 
##                 nut snack               nuts/prunes                       oil 
##                     0.003                     0.003                     0.028 
##                    onions          organic products           organic sausage 
##                     0.031                     0.002                     0.002 
##          other vegetables packaged fruit/vegetables                     pasta 
##                     0.193                     0.013                     0.015 
##                    pastry                  pet care                photo/film 
##                     0.089                     0.009                     0.009 
##        pickled vegetables                 pip fruit                   popcorn 
##                     0.018                     0.076                     0.007 
##                      pork           potato products             potted plants 
##                     0.058                     0.003                     0.017 
##     preservation products          processed cheese                  prosecco 
##                     0.000                     0.017                     0.002 
##            pudding powder               ready soups            red/blush wine 
##                     0.002                     0.002                     0.019 
##                      rice             roll products                rolls/buns 
##                     0.008                     0.010                     0.184 
##           root vegetables           rubbing alcohol                       rum 
##                     0.109                     0.001                     0.004 
##            salad dressing                      salt               salty snack 
##                     0.001                     0.011                     0.038 
##                    sauces                   sausage         seasonal products 
##                     0.005                     0.094                     0.014 
##       semi-finished bread             shopping bags                 skin care 
##                     0.018                     0.099                     0.004 
##             sliced cheese            snack products                      soap 
##                     0.025                     0.003                     0.003 
##                      soda               soft cheese                  softener 
##                     0.174                     0.017                     0.005 
##      sound storage medium                     soups            sparkling wine 
##                     0.000                     0.007                     0.006 
##             specialty bar          specialty cheese       specialty chocolate 
##                     0.027                     0.009                     0.030 
##             specialty fat      specialty vegetables                    spices 
##                     0.004                     0.002                     0.005 
##             spread cheese                     sugar             sweet spreads 
##                     0.011                     0.034                     0.009 
##                     syrup                       tea                   tidbits 
##                     0.003                     0.004                     0.002 
##            toilet cleaner            tropical fruit                    turkey 
##                     0.001                     0.105                     0.008 
##                  UHT-milk                   vinegar                   waffles 
##                     0.033                     0.007                     0.038 
##        whipped/sour cream                    whisky               white bread 
##                     0.072                     0.001                     0.042 
##                white wine                whole milk                    yogurt 
##                     0.019                     0.256                     0.140 
##                  zwieback 
##                     0.007
itemFrequency(trans, type="absolute")
##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                        35                        32                         6 
##                 baby food                      bags             baking powder 
##                         1                         4                       174 
##          bathroom cleaner                      beef                   berries 
##                        27                       516                       327 
##                 beverages              bottled beer             bottled water 
##                       256                       792                      1087 
##                    brandy               brown bread                    butter 
##                        41                       638                       545 
##               butter milk                  cake bar                   candles 
##                       275                       130                        88 
##                     candy               canned beer               canned fish 
##                       294                       764                       148 
##              canned fruit         canned vegetables                  cat food 
##                        32                       106                       229 
##                   cereals               chewing gum                   chicken 
##                        56                       207                       422 
##                 chocolate     chocolate marshmallow              citrus fruit 
##                       488                        89                       814 
##                   cleaner           cling film/bags              cocoa drinks 
##                        50                       112                        22 
##                    coffee            condensed milk         cooking chocolate 
##                       571                       101                        25 
##                  cookware                     cream              cream cheese 
##                        27                        13                       390 
##                      curd               curd cheese               decalcifier 
##                       524                        50                        15 
##               dental care                   dessert                 detergent 
##                        57                       365                       189 
##              dish cleaner                    dishes                  dog food 
##                       103                       173                        84 
##             domestic eggs  female sanitary products         finished products 
##                       624                        60                        64 
##                      fish                     flour            flower (seeds) 
##                        29                       171                       102 
##    flower soil/fertilizer               frankfurter            frozen chicken 
##                        19                       580                         6 
##            frozen dessert               frozen fish             frozen fruits 
##                       106                       115                        12 
##              frozen meals    frozen potato products         frozen vegetables 
##                       279                        83                       473 
##     fruit/vegetable juice                    grapes                hair spray 
##                       711                       220                        11 
##                       ham            hamburger meat               hard cheese 
##                       256                       327                       241 
##                     herbs                     honey    house keeping products 
##                       160                        15                        82 
##          hygiene articles                 ice cream            instant coffee 
##                       324                       246                        73 
##     Instant food products                       jam                   ketchup 
##                        79                        53                        42 
##            kitchen towels           kitchen utensil               light bulbs 
##                        59                         4                        41 
##                   liqueur                    liquor        liquor (appetizer) 
##                         9                       109                        78 
##                liver loaf  long life bakery product           make up remover 
##                        50                       368                         8 
##            male cosmetics                 margarine                mayonnaise 
##                        45                       576                        90 
##                      meat              meat spreads           misc. beverages 
##                       254                        42                       279 
##                   mustard                   napkins                newspapers 
##                       118                       515                       785 
##                 nut snack               nuts/prunes                       oil 
##                        31                        33                       276 
##                    onions          organic products           organic sausage 
##                       305                        16                        22 
##          other vegetables packaged fruit/vegetables                     pasta 
##                      1903                       128                       148 
##                    pastry                  pet care                photo/film 
##                       875                        93                        91 
##        pickled vegetables                 pip fruit                   popcorn 
##                       176                       744                        71 
##                      pork           potato products             potted plants 
##                       567                        28                       170 
##     preservation products          processed cheese                  prosecco 
##                         2                       163                        20 
##            pudding powder               ready soups            red/blush wine 
##                        23                        18                       189 
##                      rice             roll products                rolls/buns 
##                        75                       101                      1809 
##           root vegetables           rubbing alcohol                       rum 
##                      1072                        10                        44 
##            salad dressing                      salt               salty snack 
##                         8                       106                       372 
##                    sauces                   sausage         seasonal products 
##                        54                       924                       140 
##       semi-finished bread             shopping bags                 skin care 
##                       174                       969                        35 
##             sliced cheese            snack products                      soap 
##                       241                        30                        26 
##                      soda               soft cheese                  softener 
##                      1715                       168                        54 
##      sound storage medium                     soups            sparkling wine 
##                         1                        67                        55 
##             specialty bar          specialty cheese       specialty chocolate 
##                       269                        84                       299 
##             specialty fat      specialty vegetables                    spices 
##                        36                        17                        51 
##             spread cheese                     sugar             sweet spreads 
##                       110                       333                        89 
##                     syrup                       tea                   tidbits 
##                        32                        38                        23 
##            toilet cleaner            tropical fruit                    turkey 
##                         7                      1032                        80 
##                  UHT-milk                   vinegar                   waffles 
##                       329                        64                       378 
##        whipped/sour cream                    whisky               white bread 
##                       705                         8                       414 
##                white wine                whole milk                    yogurt 
##                       187                      2513                      1372 
##                  zwieback 
##                        68

Table above presents the frequency of the purchase of each product in the dataset.

itemFrequencyPlot(trans, topN=25, type="relative", main="Item Frequency")

Above chart presents the percentage share of a certain product frequency in all of the transactions.

rules.trans <- apriori(trans, parameter=list(supp=0.01, conf=0.25))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.25    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [171 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Rules

Understanding of the rules in Market Basket Analysis is crucial to correctly deal and interpret the data. Therefore several definitions have to be clarified:

In general, the higher the value of each index, the better.

Below the Support, Confidence, Count, and Lift are calculated. The example of the interpretation, proposed by Hahsler, M., & Karpienko, R. (2017), is as follows:

“For example, let us assume that we find the rule {milk, bread} -> {butter} with support of 0.2, confidence of 0.9 and lift of 2. Now we know that 20 % of all transactions contain all three items together, the estimated conditional probability of seeing butter in a transaction under the condition that the transaction also contains milk and bread is 0.9, and we saw the items together in transactions at double the rate we would expect under independence between the itemsets {milk, bread} and {butter}”

Therefore, with the use of the Market Basket Analysis, we can find the strongest interdependencies in the products, which are purchased. Such a analysis might help the sellers to adjust the way in which product are presented in shops (also online shops) to on the one hand make the buyers life easier, and on the other hand enhanced them to more frequently buy products which are commonly interrelated.

Support

rules.by.supp<-sort(rules.trans, by="support", decreasing=TRUE) 
inspect(head(rules.by.supp))
##     lhs                   rhs                support    confidence coverage 
## [1] {}                 => {whole milk}       0.25551601 0.2555160  1.0000000
## [2] {other vegetables} => {whole milk}       0.07483477 0.3867578  0.1934926
## [3] {whole milk}       => {other vegetables} 0.07483477 0.2928770  0.2555160
## [4] {rolls/buns}       => {whole milk}       0.05663447 0.3079049  0.1839349
## [5] {yogurt}           => {whole milk}       0.05602440 0.4016035  0.1395018
## [6] {root vegetables}  => {whole milk}       0.04890696 0.4486940  0.1089985
##     lift     count
## [1] 1.000000 2513 
## [2] 1.513634  736 
## [3] 1.513634  736 
## [4] 1.205032  557 
## [5] 1.571735  551 
## [6] 1.756031  481

In the table above it can be seen that the highest support value is for {nothing} => {whole milk}. Numbers should be interpreted as follows: 26% of all transactions contain something and whole milk.

However, the much more interesting interpretation will be for the second highest supporting value for {other vegetables} => {whole milk}. Numbers should be interpreted as follows: 7% of all transactions contain both items together, the estimated conditional probability of seeing whole milk in a transaction under the condition that the transaction also contains other vegetables is 0.39, and we saw the items together in transactions at 1,5 times the rate we would expect under independence between the itemsets {other vegetables} and {whole milk}.

Confidence

rules.by.conf<-sort(rules.trans, by="confidence", decreasing=TRUE) 
inspect(head(rules.by.conf))
##     lhs                                  rhs                support   
## [1] {citrus fruit, root vegetables}   => {other vegetables} 0.01037112
## [2] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3] {curd, yogurt}                    => {whole milk}       0.01006609
## [4] {butter, other vegetables}        => {whole milk}       0.01148958
## [5] {root vegetables, tropical fruit} => {whole milk}       0.01199797
## [6] {root vegetables, yogurt}         => {whole milk}       0.01453991
##     confidence coverage   lift     count
## [1] 0.5862069  0.01769192 3.029608 102  
## [2] 0.5845411  0.02104728 3.020999 121  
## [3] 0.5823529  0.01728521 2.279125  99  
## [4] 0.5736041  0.02003050 2.244885 113  
## [5] 0.5700483  0.02104728 2.230969 118  
## [6] 0.5629921  0.02582613 2.203354 143

Count

rules.by.count<-sort(rules.trans, by="count", decreasing=TRUE) 
inspect(head(rules.by.count))
##     lhs                   rhs                support    confidence coverage 
## [1] {}                 => {whole milk}       0.25551601 0.2555160  1.0000000
## [2] {other vegetables} => {whole milk}       0.07483477 0.3867578  0.1934926
## [3] {whole milk}       => {other vegetables} 0.07483477 0.2928770  0.2555160
## [4] {rolls/buns}       => {whole milk}       0.05663447 0.3079049  0.1839349
## [5] {yogurt}           => {whole milk}       0.05602440 0.4016035  0.1395018
## [6] {root vegetables}  => {whole milk}       0.04890696 0.4486940  0.1089985
##     lift     count
## [1] 1.000000 2513 
## [2] 1.513634  736 
## [3] 1.513634  736 
## [4] 1.205032  557 
## [5] 1.571735  551 
## [6] 1.756031  481

Lift

rules.by.lift<-sort(rules.trans, by="lift", decreasing=TRUE) 
inspect(head(rules.by.lift))
##     lhs                                   rhs                support   
## [1] {citrus fruit, other vegetables}   => {root vegetables}  0.01037112
## [2] {other vegetables, tropical fruit} => {root vegetables}  0.01230300
## [3] {beef}                             => {root vegetables}  0.01738688
## [4] {citrus fruit, root vegetables}    => {other vegetables} 0.01037112
## [5] {root vegetables, tropical fruit}  => {other vegetables} 0.01230300
## [6] {other vegetables, whole milk}     => {root vegetables}  0.02318251
##     confidence coverage   lift     count
## [1] 0.3591549  0.02887646 3.295045 102  
## [2] 0.3427762  0.03589222 3.144780 121  
## [3] 0.3313953  0.05246568 3.040367 171  
## [4] 0.5862069  0.01769192 3.029608 102  
## [5] 0.5845411  0.02104728 3.020999 121  
## [6] 0.3097826  0.07483477 2.842082 228

In the table above it can be seen that the highest lift ratio is for {citrus fruit, other vegetables} => {root vegetables}. Numbers should be interpreted as follows: 1% of all transactions contain all three items together, the estimated conditional probability of seeing root vegetables in a transaction under the condition that the transaction also contains citrus fruit and other vegetables is 0.36, and we saw the items together in transactions at tripled rate that we would expect under independence between the itemsets {citrus fruit, other vegetables} and {root vegetables}.

Third outcome is interesting, however, also logical. The connection between {beef} and {root vegetables} seems to be natural.

What is worth mentioning is that in general vegetables (both root vegetables and other vegetables) might be understood as a central object of purchases. Even though it was whole milk which was w most frequently chosen product. However, it is worth to remind that milk is usually used during the breakfasts or suppers (with cereals or coffee) so during the “smaller meals”, and the vegetables are usually eaten for lunches and dinners, which usually consist of many other products. Therefore, as we were looking for the most common association rules between the product purchase behaviour, I would recommend to the manager from the shop in which the data was gathered, to place vegetables centrally in the shopping area.

Eclat Algorythm

Eclat and Apriori algorithms are the most popular when it comes to Market Basket Analysis. There I will focus on the Eclat Algorithm.

sets<-eclat(trans, parameter = list(supp=0.05, maxlen=20))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.05      1     20 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 491 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [28 item(s)] done [0.02s].
## creating sparse bit matrix ... [28 row(s), 9835 column(s)] done [0.00s].
## writing  ... [31 set(s)] done [0.00s].
## Creating S4 object  ... done [0.01s].
rules_eclat<-ruleInduction(sets, trans, confidence=0.9)
inspect(sets)
##      items                          support    count
## [1]  {whole milk, yogurt}           0.05602440  551 
## [2]  {rolls/buns, whole milk}       0.05663447  557 
## [3]  {other vegetables, whole milk} 0.07483477  736 
## [4]  {whole milk}                   0.25551601 2513 
## [5]  {other vegetables}             0.19349263 1903 
## [6]  {rolls/buns}                   0.18393493 1809 
## [7]  {yogurt}                       0.13950178 1372 
## [8]  {soda}                         0.17437722 1715 
## [9]  {root vegetables}              0.10899847 1072 
## [10] {tropical fruit}               0.10493137 1032 
## [11] {bottled water}                0.11052364 1087 
## [12] {sausage}                      0.09395018  924 
## [13] {shopping bags}                0.09852567  969 
## [14] {citrus fruit}                 0.08276563  814 
## [15] {pastry}                       0.08896797  875 
## [16] {pip fruit}                    0.07564820  744 
## [17] {whipped/sour cream}           0.07168277  705 
## [18] {fruit/vegetable juice}        0.07229283  711 
## [19] {domestic eggs}                0.06344687  624 
## [20] {newspapers}                   0.07981698  785 
## [21] {butter}                       0.05541434  545 
## [22] {margarine}                    0.05856634  576 
## [23] {brown bread}                  0.06487036  638 
## [24] {bottled beer}                 0.08052872  792 
## [25] {frankfurter}                  0.05897306  580 
## [26] {pork}                         0.05765125  567 
## [27] {napkins}                      0.05236401  515 
## [28] {curd}                         0.05327911  524 
## [29] {beef}                         0.05246568  516 
## [30] {coffee}                       0.05805796  571 
## [31] {canned beer}                  0.07768175  764
freq.rules <- ruleInduction(sets, trans, confidence=0.05)
freq.rules
## set of 6 rules
inspect(freq.rules)
##     lhs                   rhs                support    confidence lift    
## [1] {yogurt}           => {whole milk}       0.05602440 0.4016035  1.571735
## [2] {whole milk}       => {yogurt}           0.05602440 0.2192598  1.571735
## [3] {whole milk}       => {rolls/buns}       0.05663447 0.2216474  1.205032
## [4] {rolls/buns}       => {whole milk}       0.05663447 0.3079049  1.205032
## [5] {whole milk}       => {other vegetables} 0.07483477 0.2928770  1.513634
## [6] {other vegetables} => {whole milk}       0.07483477 0.3867578  1.513634
##     itemset
## [1] 1      
## [2] 1      
## [3] 2      
## [4] 2      
## [5] 3      
## [6] 3

Apriori Algorythm

Here comes the Apriori Algorythm.

apr.rules<-apriori(trans, parameter=list(supp=0.001, conf=0.1, minlen=2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.001      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [32783 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules.by.conf<-sort(apr.rules, by="confidence", decreasing=TRUE)
inspect(head(rules.by.conf))
##     lhs                     rhs                    support confidence    coverage     lift count
## [1] {rice,                                                                                      
##      sugar}              => {whole milk}       0.001220132          1 0.001220132 3.913649    12
## [2] {canned fish,                                                                               
##      hygiene articles}   => {whole milk}       0.001118454          1 0.001118454 3.913649    11
## [3] {butter,                                                                                    
##      rice,                                                                                      
##      root vegetables}    => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [4] {flour,                                                                                     
##      root vegetables,                                                                           
##      whipped/sour cream} => {whole milk}       0.001728521          1 0.001728521 3.913649    17
## [5] {butter,                                                                                    
##      domestic eggs,                                                                             
##      soft cheese}        => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [6] {citrus fruit,                                                                              
##      root vegetables,                                                                           
##      soft cheese}        => {other vegetables} 0.001016777          1 0.001016777 5.168156    10

Let’s see now what product is purchased when in the consequence whole milk is chosen, which is the most frequently purchased product.

rules.whole.milk<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.5), 
                      appearance=list(default="lhs", rhs ="whole milk")) 
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [2679 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules.whole.milk.byconf<-sort(rules.whole.milk, by="support", decreasing=TRUE)

inspect(head(rules.whole.milk.byconf))
##     lhs                                       rhs          support   
## [1] {other vegetables, yogurt}             => {whole milk} 0.02226741
## [2] {tropical fruit, yogurt}               => {whole milk} 0.01514997
## [3] {other vegetables, whipped/sour cream} => {whole milk} 0.01464159
## [4] {root vegetables, yogurt}              => {whole milk} 0.01453991
## [5] {other vegetables, pip fruit}          => {whole milk} 0.01352313
## [6] {rolls/buns, root vegetables}          => {whole milk} 0.01270971
##     confidence coverage   lift     count
## [1] 0.5128806  0.04341637 2.007235 219  
## [2] 0.5173611  0.02928317 2.024770 149  
## [3] 0.5070423  0.02887646 1.984385 144  
## [4] 0.5629921  0.02582613 2.203354 143  
## [5] 0.5175097  0.02613116 2.025351 133  
## [6] 0.5230126  0.02430097 2.046888 125

The Apriori rule indicates that when the consequent is the whole milk, the most common antecedent is the choice of other vegetables and yogurt, with the support of 2%, confidence 51% and lift 2.

Let’s seen now for some other product, which is not as frequently purchased as the whole milk. For instance, let’s choose the newspapers.

rules.newspapers<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.15), 
                      appearance=list(default="lhs", rhs ="newspapers")) 
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.15    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [183 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules.newsapers.byconf<-sort(rules.newspapers, by="support", decreasing=TRUE)

inspect(head(rules.newsapers.byconf))
##     lhs                             rhs          support     confidence
## [1] {other vegetables, soda}     => {newspapers} 0.004982206 0.1521739 
## [2] {soda, yogurt}               => {newspapers} 0.004270463 0.1561338 
## [3] {rolls/buns, tropical fruit} => {newspapers} 0.004168785 0.1694215 
## [4] {brown bread, whole milk}    => {newspapers} 0.004067107 0.1612903 
## [5] {bottled water, rolls/buns}  => {newspapers} 0.003863752 0.1596639 
## [6] {beef, other vegetables}     => {newspapers} 0.003253686 0.1649485 
##     coverage   lift     count
## [1] 0.03274021 1.906536 49   
## [2] 0.02735130 1.956148 42   
## [3] 0.02460600 2.122625 41   
## [4] 0.02521607 2.020752 40   
## [5] 0.02419929 2.000375 38   
## [6] 0.01972547 2.066583 32

The Apriori rule indicates that when the consequent is the newspapers, the most common antecedent is the choice of other vegetables and soda, with the support of 0.4%, confidence 15% and lift 1.9.

Vizualization

Below I have presented some visualization plots.

rules_for_plot <- head(sort(sort(apr.rules, by ="confidence"),by="support"),15)
plot(rules_for_plot, method ="matrix", measure="lift")
## Itemsets in Antecedent (LHS)
## [1] "{root vegetables}"  "{other vegetables}" "{yogurt}"          
## [4] "{tropical fruit}"   "{whole milk}"       "{rolls/buns}"      
## Itemsets in Consequent (RHS)
## [1] "{rolls/buns}"       "{whole milk}"       "{yogurt}"          
## [4] "{other vegetables}" "{root vegetables}"

This figure shows the matrix for 15 rules. On the x axis we can see the consequents (LHS) and on the y axis the antecedents (RHS). Colour indicates the lift ratio - the more reddish the higher the ratio. There are two rules that have a very high lift ratio (top left corner): * {other vegetables} and {root vegetables}, * {root vegetables} and {other vegetables}, which is logical because those two products are usually purchased together (the purchase order does not matter)

plot(rules_for_plot, method="paracoord", control=list(reorder=TRUE))

Above chart shows the complexity of rules containing a specific product. The longer the arrow, the longer the baskets are for the given product. However, given the 15 rules, all the arrows have the length of 1 interval.

Below two charts are presented. First concerns the 15 strongest rules for the whole milk buying behaviour, second similarly shows the 15 strongest rules for the newspapers purchasing.

plot(rules.whole.milk[1:15], method="graph", control = list(cex=0.9))
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

plot(rules.newspapers[1:15], method="graph", control = list(cex=0.9))
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

Below plot shows the strongest 15 rules for all of the products in the dataset with the Apriori Algorithm estimation.

plot(apr.rules[1:15], method="grouped")

plot(apr.rules[1:15], method="graph", control=list(type="items"))
## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

Creme-de-la-creme are the interactive visualizations. Here you can play around a little bit with the products and associated to them rules. Have fun!

plot(apr.rules, method="graph",  engine="htmlwidget")

Conclusion

Presented paper aimed to discover the most common product purchase behaviour. The carried out analysis, on the one hand, gives the possibility to reasonably place the products in the shopping area basing on the dependence between the product purchase behaviour which would make shoppers life easier and would save some time spend in the shop. On the other hand, the market basket analysis crates the possibility to nudge the people to buy together more products which are usually purchased together, even if buyers do not really want to buy them.

Summarizing the conducted analysis, it has to be mentioned that the association rules are very powerful tool to detect the relationship between items. In presented paper it was the purchasing behaviour, however, Market Basket Analysis might be also used in other fields. For instance, when detecting the tourism patterns of sightseeing (if the tourists saw one object, which will be the next visited place), or even in medical research - if the patient struggle with some illness, which next illness is the most likely to occur in the future.