Objective: Market Basket Analysis using Association Rules

Steps to achieve:

  1. Structure of data, Read data in Transaction Class
  2. Summary, Item Frequency and Plots
  3. Association Rules using Apiori Algorithm
  4. Analysis on Rules
  5. Identifying and Removing Redundant Rules
  6. Plot different Association Graphs
  7. Association for most frequent item
  8. Recommendations
rm(list=ls())
library(arules)
library(arulesViz)

Step 1: Structure of data, Read data in Transaction Class

We’ll first see how the data is in the given csv file and read it in Transaction Class.

products.initial <- read.csv("MarketBasketAnalysis.csv", sep=",", colClasses = "factor")

str(products.initial)
## 'data.frame':    15001 obs. of  5 variables:
##  $ A          : Factor w/ 15001 levels "30000","30001",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Quantity   : Factor w/ 68 levels "1","10","100",..: 13 13 13 13 13 13 1 1 1 1 ...
##  $ Transaction: Factor w/ 6726 levels "100001","100007",..: 5594 5594 5594 5594 5594 5595 5596 5596 5596 5596 ...
##  $ Store      : Factor w/ 10 levels "1","10","2","3",..: 7 7 7 7 7 1 7 7 7 7 ...
##  $ Product    : Factor w/ 17 levels "Bow","Candy Bar",..: 5 2 2 2 2 8 2 2 2 5 ...

We can see the structure of the data we read using read.csv. This type of data structure can’t be used as input to the Apriori algorithm.
We need to read data in transaction class which can be done using “read.transactions” of the package “arules”

rm(products.initial)

products = read.transactions("MarketBasketAnalysis.csv", format = "single", sep = ",", cols = c("Transaction", "Product"), header = TRUE)

Observe the structure of the dataset now. Class is of ‘transactions’.

str(products)
## Formal class 'transactions' [package "arules"] with 3 slots
##   ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. .. ..@ i       : int [1:9629] 1 15 4 16 1 3 4 7 15 4 ...
##   .. .. ..@ p       : int [1:6727] 0 1 2 3 4 9 11 12 13 14 ...
##   .. .. ..@ Dim     : int [1:2] 17 6726
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ factors : list()
##   ..@ itemInfo   :'data.frame':  17 obs. of  1 variable:
##   .. ..$ labels: chr [1:17] "Bow" "Candy Bar" "Deodorant" "Greeting Cards" ...
##   ..@ itemsetInfo:'data.frame':  6726 obs. of  1 variable:
##   .. ..$ transactionID: chr [1:6726] "100001" "100007" "100010" "100013" ...
inspect(head(products))
##     items             transactionID
## [1] {Candy Bar}              100001
## [2] {Toothpaste}             100007
## [3] {Magazine}               100010
## [4] {Wrapping Paper}         100013
## [5] {Candy Bar,                    
##      Greeting Cards,               
##      Magazine,                     
##      Pencils,                      
##      Toothpaste}             100016
## [6] {Magazine,                     
##      Pencils}                100019

iteminfo contains the list of items present in the dataset. 17 unique items/products are listed in this dataset.

print(products@itemInfo)
##              labels
## 1               Bow
## 2         Candy Bar
## 3         Deodorant
## 4    Greeting Cards
## 5          Magazine
## 6           Markers
## 7     Pain Reliever
## 8           Pencils
## 9              Pens
## 10          Perfume
## 11 Photo Processing
## 12 Prescription Med
## 13          Shampoo
## 14             Soap
## 15       Toothbrush
## 16       Toothpaste
## 17   Wrapping Paper

itemsetInfo has the list of transactions. This dataset has 6726 unique transactions.

head(products@itemsetInfo)
##   transactionID
## 1        100001
## 2        100007
## 3        100010
## 4        100013
## 5        100016
## 6        100019
nrow(products@itemsetInfo)
## [1] 6726

data is the sparse matix with item number in the first column and the transactions in each row. A cell is marked with ‘|’ if an item has a transaction to it and ‘.’ if an item doesn’t have a transaction to it.

products@data
## 17 x 6726 sparse Matrix of class "ngCMatrix"
##                                                                               
##  [1,] . . . . . . . . . . . . . . . . . . . . . . | . . . . . . . . . . ......
##  [2,] | . . . | . . . . . | . | . | . . . . . . . . | . . . . . | . . | ......
##  [3,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
##  [4,] . . . . | . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
##  [5,] . . | . | | . | . | . . . | | . . . | . . | . . | . . | . | . . . ......
##  [6,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
##  [7,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
##  [8,] . . . . | | | . . . | . | . . . . . . . . . . . . . . . . . . . . ......
##  [9,] . . . . . . . . . . . . . . . . . . . . . . . . . . | . . . | . . ......
## [10,] . . . . . . . . . . . . . | . . . | | | . | . . . | . . . . . | | ......
## [11,] . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . . . . ......
## [12,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
## [13,] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
## [14,] . . . . . . . . . . . | . . . . . . . . . . . . . . . . . . . . . ......
## [15,] . . . . . . . . . . . . . | . . | . | . . . . . . . . . . . . . . ......
## [16,] . | . . | . . . . . | . | . . | . . . . | . . . . . . . | . . . . ......
## [17,] . . . | . . . . . . . . . . . | . . . . . . . . . . . . . . . . . ......
## 
##  .....suppressing 6693 columns in show(); maybe adjust options(max.print=, width=)
##  ..............................

Step 2: Summary, Item Frequency and Plots

We’ll have a look at summary of the dataset and do some basic EDA (Exploratory Data Analysis) on these transactions like Frequency of each item, frequency plot.

Summary gives the below information.

  1. No. of transactions and no. of items in the dataset.
  2. Most Frequent items in the dataset. We can see Magazine is the most frequent item.
  3. Element Length distribution gives distribution of no. of items vs transaction.
    Ex: 4848 transactions have one item and 1192 has 2 items and so on.
  4. Five number summary for the distribution is given.
    50% of the transactions are of one item.
  5. Extended information gives the labels of items and transactions.
summary(products)
## transactions as itemMatrix in sparse format with
##  6726 rows (elements/itemsets/transactions) and
##  17 columns (items) and a density of 0.08421228 
## 
## most frequent items:
##       Magazine      Candy Bar     Toothpaste Greeting Cards           Pens 
##           1560           1182           1093           1028            969 
##        (Other) 
##           3797 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9 
## 4848 1192  470  135   54   19    3    3    2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.432   2.000   9.000 
## 
## includes extended item information - examples:
##      labels
## 1       Bow
## 2 Candy Bar
## 3 Deodorant
## 
## includes extended transaction information - examples:
##   transactionID
## 1        100001
## 2        100007
## 3        100010

Item density is the total no. of transactions for all items divided by the product of No. of items and No. of transactions. To be simple, in the sparse matrix, total no. of cells with “|” divided by total no. of cells. We can see the calculation below.

itemNoTrans.Total = 0
for(i in  as.numeric(rownames(products@itemInfo)))
{
  itemNo = products@data[i,]
  itemNoTrans = length(itemNo[itemNo==TRUE])
  itemNoTrans.Total = itemNoTrans.Total + itemNoTrans
}

items.density = itemNoTrans.Total / (nrow(products) * ncol(products))
 
print(items.density) 
## [1] 0.08421228

Frequency of each item is calculated as no. of times each item is in a transaction/ total no. of transactions. We can see the calculation below.

# Transaction list for Item 1
Item1.Transactions = products@data[5,]
Item1.Freq = length(Item1.Transactions[Item1.Transactions==TRUE])/length(Item1.Transactions)
print(Item1.Freq)
## [1] 0.2319358

We can see different Frequency plots below. Magazines are the most frequent items and deodrants are the least frequent items.

itemFrequencyPlot(products)

Top 5 Frequnt items can be seen below.

itemFrequencyPlot(products, topN=5)

#itemFrequencyPlot(products, topN=ncol(products))

Frequnecy plot based on the support can be seen below.

Frequency of items with minimum 0.1 support :

itemFrequencyPlot(products, support = 0.1)

Step 3: Association Rules using Apiori Algorithm

We passed two parameters to apriori function.
1. Transactions Dataset
2. List of parameters.
Minimum Support = 0.01 => Rules should have minimum support of 0.01
Minimum Confidence = 0.1 => Rules should have miniumn confidence of 0.1
Minimum Length = 2 => Minimum length of rule should be 2; which means rule should contains atleast two items involved.
Maximum Length = 5 => Maximum length of rule should be 5; which means rule should contains at max two items involved.

The default parameters are minimum support of 0.1, minimum confidence of 0.8, maxlen of 10 items

Support: It’s the percentage of transactions that contain all of the items in an itemset (e.g., pencil, paper and rubber). The higher the support the more frequently the itemset occurs. Rules with a high support are preferred since they are likely to be applicable to a large number of future transactions.

Confidence: It’s the probability that a transaction that contains the items on the left hand side of the rule (in our example, pencil and paper) also contains the item on the right hand side (a rubber). The higher the confidence, the greater the likelihood that the item on the right hand side will be purchased or, in other words, the greater the return rate you can expect for a given rule.

Lift: It’s the probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right hand side occurring as if there was no association between them. For example, if pencil, paper and rubber occurred together in 2.5% of all transactions, pencil and paper in 10% of transactions and rubber in 8% of transactions, then the lift would be: 0.025/(0.1*0.08) = 3.125. A lift of more than 1 suggests that the presence of pencil and paper increases the probability that a rubber will also occur in the transaction. Overall, lift summarises the strength of association between the products on the left and right hand side of the rule; the larger the lift the greater the link between the two products.

products.apriori <- apriori(products, parameter=list(support=0.01, confidence = 0.1,  minlen=2, maxlen =5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 67 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 6726 transaction(s)] done [0.00s].
## sorting and recoding items ... [15 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [49 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(products.apriori)
## set of 49 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3 
## 28 21 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   2.000   2.429   3.000   3.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift       
##  Min.   :0.01011   Min.   :0.1231   Min.   :0.02275   Min.   :0.6007  
##  1st Qu.:0.01204   1st Qu.:0.1960   1st Qu.:0.03747   1st Qu.:1.0569  
##  Median :0.01710   Median :0.2596   Median :0.06765   Median :1.4774  
##  Mean   :0.02192   Mean   :0.2713   Mean   :0.09811   Mean   :1.7122  
##  3rd Qu.:0.02988   3rd Qu.:0.3309   3rd Qu.:0.15284   3rd Qu.:2.2982  
##  Max.   :0.04609   Max.   :0.4837   Max.   :0.23194   Max.   :3.0575  
##      count      
##  Min.   : 68.0  
##  1st Qu.: 81.0  
##  Median :115.0  
##  Mean   :147.4  
##  3rd Qu.:201.0  
##  Max.   :310.0  
## 
## mining info:
##      data ntransactions support confidence
##  products          6726    0.01        0.1
##                                                                                                  call
##  apriori(data = products, parameter = list(support = 0.01, confidence = 0.1, minlen = 2, maxlen = 5))

Step 4: Analysis on Rules

49 rules have created out of which 28 rules involves two items and 21 rules involves three items.

Summary of Quality Measures gives the five point summary of support, confidence, lift.

We can see the list of rules produced by the algorithm below.

inspect(products.apriori)
##      lhs                             rhs              support    confidence
## [1]  {Bow}                        => {Toothbrush}     0.01011002 0.1959654 
## [2]  {Toothbrush}                 => {Bow}            0.01011002 0.1494505 
## [3]  {Photo Processing}           => {Magazine}       0.01650312 0.2975871 
## [4]  {Toothbrush}                 => {Perfume}        0.01709783 0.2527473 
## [5]  {Perfume}                    => {Toothbrush}     0.01709783 0.2068345 
## [6]  {Toothbrush}                 => {Magazine}       0.01129944 0.1670330 
## [7]  {Perfume}                    => {Magazine}       0.01397562 0.1690647 
## [8]  {Pens}                       => {Magazine}       0.02007136 0.1393189 
## [9]  {Pencils}                    => {Toothpaste}     0.02274755 0.1683168 
## [10] {Toothpaste}                 => {Pencils}        0.02274755 0.1399817 
## [11] {Pencils}                    => {Greeting Cards} 0.02988403 0.2211221 
## [12] {Greeting Cards}             => {Pencils}        0.02988403 0.1955253 
## [13] {Pencils}                    => {Candy Bar}      0.03508772 0.2596260 
## [14] {Candy Bar}                  => {Pencils}        0.03508772 0.1996616 
## [15] {Pencils}                    => {Magazine}       0.02854594 0.2112211 
## [16] {Magazine}                   => {Pencils}        0.02854594 0.1230769 
## [17] {Toothpaste}                 => {Greeting Cards} 0.03330360 0.2049405 
## [18] {Greeting Cards}             => {Toothpaste}     0.03330360 0.2178988 
## [19] {Toothpaste}                 => {Candy Bar}      0.04148082 0.2552608 
## [20] {Candy Bar}                  => {Toothpaste}     0.04148082 0.2360406 
## [21] {Toothpaste}                 => {Magazine}       0.02988403 0.1838975 
## [22] {Magazine}                   => {Toothpaste}     0.02988403 0.1288462 
## [23] {Greeting Cards}             => {Candy Bar}      0.04608980 0.3015564 
## [24] {Candy Bar}                  => {Greeting Cards} 0.04608980 0.2622673 
## [25] {Greeting Cards}             => {Magazine}       0.03746655 0.2451362 
## [26] {Magazine}                   => {Greeting Cards} 0.03746655 0.1615385 
## [27] {Candy Bar}                  => {Magazine}       0.03999405 0.2275804 
## [28] {Magazine}                   => {Candy Bar}      0.03999405 0.1724359 
## [29] {Pencils, Toothpaste}        => {Candy Bar}      0.01100208 0.4836601 
## [30] {Candy Bar, Pencils}         => {Toothpaste}     0.01100208 0.3135593 
## [31] {Candy Bar, Toothpaste}      => {Pencils}        0.01100208 0.2652330 
## [32] {Greeting Cards, Pencils}    => {Magazine}       0.01204282 0.4029851 
## [33] {Magazine, Pencils}          => {Greeting Cards} 0.01204282 0.4218750 
## [34] {Greeting Cards, Magazine}   => {Pencils}        0.01204282 0.3214286 
## [35] {Candy Bar, Pencils}         => {Magazine}       0.01040737 0.2966102 
## [36] {Magazine, Pencils}          => {Candy Bar}      0.01040737 0.3645833 
## [37] {Candy Bar, Magazine}        => {Pencils}        0.01040737 0.2602230 
## [38] {Greeting Cards, Toothpaste} => {Candy Bar}      0.01457032 0.4375000 
## [39] {Candy Bar, Toothpaste}      => {Greeting Cards} 0.01457032 0.3512545 
## [40] {Candy Bar, Greeting Cards}  => {Toothpaste}     0.01457032 0.3161290 
## [41] {Greeting Cards, Toothpaste} => {Magazine}       0.01115076 0.3348214 
## [42] {Magazine, Toothpaste}       => {Greeting Cards} 0.01115076 0.3731343 
## [43] {Greeting Cards, Magazine}   => {Toothpaste}     0.01115076 0.2976190 
## [44] {Candy Bar, Toothpaste}      => {Magazine}       0.01323223 0.3189964 
## [45] {Magazine, Toothpaste}       => {Candy Bar}      0.01323223 0.4427861 
## [46] {Candy Bar, Magazine}        => {Toothpaste}     0.01323223 0.3308550 
## [47] {Candy Bar, Greeting Cards}  => {Magazine}       0.01724651 0.3741935 
## [48] {Greeting Cards, Magazine}   => {Candy Bar}      0.01724651 0.4603175 
## [49] {Candy Bar, Magazine}        => {Greeting Cards} 0.01724651 0.4312268 
##      coverage   lift      count
## [1]  0.05159084 2.8968426  68  
## [2]  0.06764793 2.8968426  68  
## [3]  0.05545644 1.2830584 111  
## [4]  0.06764793 3.0575144 115  
## [5]  0.08266429 3.0575144 115  
## [6]  0.06764793 0.7201691  76  
## [7]  0.08266429 0.7289292  94  
## [8]  0.14406780 0.6006787 135  
## [9]  0.13514719 1.0357722 153  
## [10] 0.16250372 1.0357722 153  
## [11] 0.13514719 1.4467581 201  
## [12] 0.15283973 1.4467581 201  
## [13] 0.13514719 1.4773640 236  
## [14] 0.17573595 1.4773640 236  
## [15] 0.13514719 0.9106880 192  
## [16] 0.23193577 0.9106880 192  
## [17] 0.16250372 1.3408852 224  
## [18] 0.15283973 1.3408852 224  
## [19] 0.16250372 1.4525244 279  
## [20] 0.17573595 1.4525244 279  
## [21] 0.16250372 0.7928813 201  
## [22] 0.23193577 0.7928813 201  
## [23] 0.15283973 1.7159632 310  
## [24] 0.17573595 1.7159632 310  
## [25] 0.15283973 1.0569141 252  
## [26] 0.23193577 1.0569141 252  
## [27] 0.17573595 0.9812215 269  
## [28] 0.23193577 0.9812215 269  
## [29] 0.02274755 2.7521980  74  
## [30] 0.03508772 1.9295517  74  
## [31] 0.04148082 1.9625489  74  
## [32] 0.02988403 1.7374856  81  
## [33] 0.02854594 2.7602444  81  
## [34] 0.03746655 2.3783593  81  
## [35] 0.03508772 1.2788462  70  
## [36] 0.02854594 2.0746087  70  
## [37] 0.03999405 1.9254788  70  
## [38] 0.03330360 2.4895305  98  
## [39] 0.04148082 2.2981884  98  
## [40] 0.04608980 1.9453649  98  
## [41] 0.03330360 1.4435955  75  
## [42] 0.02988403 2.4413439  75  
## [43] 0.03746655 1.8314599  75  
## [44] 0.04148082 1.3753653  89  
## [45] 0.02988403 2.5196101  89  
## [46] 0.03999405 2.0359843  89  
## [47] 0.04608980 1.6133499 116  
## [48] 0.03746655 2.6193699 116  
## [49] 0.03999405 2.8214312 116

These are the top 10 frequent item-sets that have a minimum support of 1%.

inspect(sort(products.apriori, by="support")[1:10])
##      lhs                 rhs              support    confidence coverage 
## [1]  {Greeting Cards} => {Candy Bar}      0.04608980 0.3015564  0.1528397
## [2]  {Candy Bar}      => {Greeting Cards} 0.04608980 0.2622673  0.1757360
## [3]  {Toothpaste}     => {Candy Bar}      0.04148082 0.2552608  0.1625037
## [4]  {Candy Bar}      => {Toothpaste}     0.04148082 0.2360406  0.1757360
## [5]  {Candy Bar}      => {Magazine}       0.03999405 0.2275804  0.1757360
## [6]  {Magazine}       => {Candy Bar}      0.03999405 0.1724359  0.2319358
## [7]  {Greeting Cards} => {Magazine}       0.03746655 0.2451362  0.1528397
## [8]  {Magazine}       => {Greeting Cards} 0.03746655 0.1615385  0.2319358
## [9]  {Pencils}        => {Candy Bar}      0.03508772 0.2596260  0.1351472
## [10] {Candy Bar}      => {Pencils}        0.03508772 0.1996616  0.1757360
##      lift      count
## [1]  1.7159632 310  
## [2]  1.7159632 310  
## [3]  1.4525244 279  
## [4]  1.4525244 279  
## [5]  0.9812215 269  
## [6]  0.9812215 269  
## [7]  1.0569141 252  
## [8]  1.0569141 252  
## [9]  1.4773640 236  
## [10] 1.4773640 236

These are the top 10 frequent item-sets that have a minimum support of 1% and minimum confidence of 10%.

inspect(sort(products.apriori, by=c("support","confidence"))[1:10])
##      lhs                 rhs              support    confidence coverage 
## [1]  {Greeting Cards} => {Candy Bar}      0.04608980 0.3015564  0.1528397
## [2]  {Candy Bar}      => {Greeting Cards} 0.04608980 0.2622673  0.1757360
## [3]  {Toothpaste}     => {Candy Bar}      0.04148082 0.2552608  0.1625037
## [4]  {Candy Bar}      => {Toothpaste}     0.04148082 0.2360406  0.1757360
## [5]  {Candy Bar}      => {Magazine}       0.03999405 0.2275804  0.1757360
## [6]  {Magazine}       => {Candy Bar}      0.03999405 0.1724359  0.2319358
## [7]  {Greeting Cards} => {Magazine}       0.03746655 0.2451362  0.1528397
## [8]  {Magazine}       => {Greeting Cards} 0.03746655 0.1615385  0.2319358
## [9]  {Pencils}        => {Candy Bar}      0.03508772 0.2596260  0.1351472
## [10] {Candy Bar}      => {Pencils}        0.03508772 0.1996616  0.1757360
##      lift      count
## [1]  1.7159632 310  
## [2]  1.7159632 310  
## [3]  1.4525244 279  
## [4]  1.4525244 279  
## [5]  0.9812215 269  
## [6]  0.9812215 269  
## [7]  1.0569141 252  
## [8]  1.0569141 252  
## [9]  1.4773640 236  
## [10] 1.4773640 236

These are the top 5 frequent item-sets that have a minimum support of 1% and minimum confidence of 10% sorted in descending order of lift.

inspect(sort(products.apriori, by="lift")[1:5])
##     lhs                      rhs              support    confidence coverage  
## [1] {Perfume}             => {Toothbrush}     0.01709783 0.2068345  0.08266429
## [2] {Toothbrush}          => {Perfume}        0.01709783 0.2527473  0.06764793
## [3] {Bow}                 => {Toothbrush}     0.01011002 0.1959654  0.05159084
## [4] {Toothbrush}          => {Bow}            0.01011002 0.1494505  0.06764793
## [5] {Candy Bar, Magazine} => {Greeting Cards} 0.01724651 0.4312268  0.03999405
##     lift     count
## [1] 3.057514 115  
## [2] 3.057514 115  
## [3] 2.896843  68  
## [4] 2.896843  68  
## [5] 2.821431 116

Step 5: Identifying and Removing Redundant Rules

From the above list of rules, we can see some redundancy.
Ex: We can see {Pencils} => {Candy Bar} ; {Toothpaste} => {Candy Bar}; {Pencils, Toothpaste} => {Candy Bar}.
This has some amount of redundancy.
We can identify and remove redundancy using below commands.

subset.matrix <- is.subset(products.apriori, products.apriori, sparse = FALSE)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
##                      {Bow,Toothbrush}                  {Perfume,Toothbrush} 
##                                     2                                     5 
##                  {Pencils,Toothpaste}              {Greeting Cards,Pencils} 
##                                    10                                    12 
##                   {Candy Bar,Pencils}                    {Magazine,Pencils} 
##                                    14                                    16 
##           {Greeting Cards,Toothpaste}                {Candy Bar,Toothpaste} 
##                                    18                                    20 
##                 {Magazine,Toothpaste}            {Candy Bar,Greeting Cards} 
##                                    22                                    24 
##             {Greeting Cards,Magazine}                  {Candy Bar,Magazine} 
##                                    26                                    28 
##        {Candy Bar,Pencils,Toothpaste}        {Candy Bar,Pencils,Toothpaste} 
##                                    29                                    30 
##        {Candy Bar,Pencils,Toothpaste}     {Greeting Cards,Magazine,Pencils} 
##                                    31                                    32 
##     {Greeting Cards,Magazine,Pencils}     {Greeting Cards,Magazine,Pencils} 
##                                    33                                    34 
##          {Candy Bar,Magazine,Pencils}          {Candy Bar,Magazine,Pencils} 
##                                    35                                    36 
##          {Candy Bar,Magazine,Pencils} {Candy Bar,Greeting Cards,Toothpaste} 
##                                    37                                    38 
## {Candy Bar,Greeting Cards,Toothpaste} {Candy Bar,Greeting Cards,Toothpaste} 
##                                    39                                    40 
##  {Greeting Cards,Magazine,Toothpaste}  {Greeting Cards,Magazine,Toothpaste} 
##                                    41                                    42 
##  {Greeting Cards,Magazine,Toothpaste}       {Candy Bar,Magazine,Toothpaste} 
##                                    43                                    44 
##       {Candy Bar,Magazine,Toothpaste}       {Candy Bar,Magazine,Toothpaste} 
##                                    45                                    46 
##   {Candy Bar,Greeting Cards,Magazine}   {Candy Bar,Greeting Cards,Magazine} 
##                                    47                                    48 
##   {Candy Bar,Greeting Cards,Magazine} 
##                                    49
products.apriori.pruned <- products.apriori[!redundant]
inspect(sort(products.apriori.pruned, by="lift"))
##      lhs                   rhs              support    confidence coverage  
## [1]  {Toothbrush}       => {Perfume}        0.01709783 0.2527473  0.06764793
## [2]  {Bow}              => {Toothbrush}     0.01011002 0.1959654  0.05159084
## [3]  {Greeting Cards}   => {Candy Bar}      0.04608980 0.3015564  0.15283973
## [4]  {Pencils}          => {Candy Bar}      0.03508772 0.2596260  0.13514719
## [5]  {Toothpaste}       => {Candy Bar}      0.04148082 0.2552608  0.16250372
## [6]  {Pencils}          => {Greeting Cards} 0.02988403 0.2211221  0.13514719
## [7]  {Toothpaste}       => {Greeting Cards} 0.03330360 0.2049405  0.16250372
## [8]  {Photo Processing} => {Magazine}       0.01650312 0.2975871  0.05545644
## [9]  {Greeting Cards}   => {Magazine}       0.03746655 0.2451362  0.15283973
## [10] {Pencils}          => {Toothpaste}     0.02274755 0.1683168  0.13514719
## [11] {Candy Bar}        => {Magazine}       0.03999405 0.2275804  0.17573595
## [12] {Pencils}          => {Magazine}       0.02854594 0.2112211  0.13514719
## [13] {Toothpaste}       => {Magazine}       0.02988403 0.1838975  0.16250372
## [14] {Perfume}          => {Magazine}       0.01397562 0.1690647  0.08266429
## [15] {Toothbrush}       => {Magazine}       0.01129944 0.1670330  0.06764793
## [16] {Pens}             => {Magazine}       0.02007136 0.1393189  0.14406780
##      lift      count
## [1]  3.0575144 115  
## [2]  2.8968426  68  
## [3]  1.7159632 310  
## [4]  1.4773640 236  
## [5]  1.4525244 279  
## [6]  1.4467581 201  
## [7]  1.3408852 224  
## [8]  1.2830584 111  
## [9]  1.0569141 252  
## [10] 1.0357722 153  
## [11] 0.9812215 269  
## [12] 0.9106880 192  
## [13] 0.7928813 201  
## [14] 0.7289292  94  
## [15] 0.7201691  76  
## [16] 0.6006787 135

Step 6: Plot different Association Graphs

We can also plot these association rules using different graphs.

Scatter Plot: Plots all rules with Support on x-axis and confidence on y-axes and color represents the Lift. Brighter the color, more the lift.
We can see top right corner point which has high Support and high confidence but low lift.
But we can’t see the actual rule in this graph which is a great disadvantage.

plot(products.apriori.pruned)

Grouped Plot: This plots the LHS items and RHS items with lines connecting and bubbles placed on the intersection of LHS and RHS items.
Size of the bubble represents Support and color represents the Lift.
Compared to Scatterplot this is a better view as it shows the LHS and RHS items and their associations along with support and Lift.
But confidence is missed in this graph.

plot(products.apriori.pruned,method="group")

Graph Plot: This plots the Items, associations between the Items are joined with arrows with the bubble in between. Size represents the Support and color represents the lift.
There is an interactive version but can’t be plotted in the markdown file and hence it’s commented.

#plot(products.apriori.pruned,method="graph",interactive=TRUE,shading=NA)
plot(products.apriori.pruned,method="graph")

Step 7: Association for most frequent item

From the Item frequency plot we can see Magazine is the most frequent item and hence it’s placed in the center with all other items plotted around it.

We’ll see how Magazine is associated to other items.

summary(products)
## transactions as itemMatrix in sparse format with
##  6726 rows (elements/itemsets/transactions) and
##  17 columns (items) and a density of 0.08421228 
## 
## most frequent items:
##       Magazine      Candy Bar     Toothpaste Greeting Cards           Pens 
##           1560           1182           1093           1028            969 
##        (Other) 
##           3797 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9 
## 4848 1192  470  135   54   19    3    3    2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.432   2.000   9.000 
## 
## includes extended item information - examples:
##      labels
## 1       Bow
## 2 Candy Bar
## 3 Deodorant
## 
## includes extended transaction information - examples:
##   transactionID
## 1        100001
## 2        100007
## 3        100010
products.apriori.magazine <- apriori(products, parameter=list(support=0.01, confidence = 0.1,  minlen=2, maxlen =5),
                            appearance = list(rhs=c("Magazine"), default = "lhs"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 67 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[17 item(s), 6726 transaction(s)] done [0.00s].
## sorting and recoding items ... [15 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [13 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(sort(products.apriori.magazine, by="lift"))
##      lhs                             rhs        support    confidence
## [1]  {Greeting Cards, Pencils}    => {Magazine} 0.01204282 0.4029851 
## [2]  {Candy Bar, Greeting Cards}  => {Magazine} 0.01724651 0.3741935 
## [3]  {Greeting Cards, Toothpaste} => {Magazine} 0.01115076 0.3348214 
## [4]  {Candy Bar, Toothpaste}      => {Magazine} 0.01323223 0.3189964 
## [5]  {Photo Processing}           => {Magazine} 0.01650312 0.2975871 
## [6]  {Candy Bar, Pencils}         => {Magazine} 0.01040737 0.2966102 
## [7]  {Greeting Cards}             => {Magazine} 0.03746655 0.2451362 
## [8]  {Candy Bar}                  => {Magazine} 0.03999405 0.2275804 
## [9]  {Pencils}                    => {Magazine} 0.02854594 0.2112211 
## [10] {Toothpaste}                 => {Magazine} 0.02988403 0.1838975 
## [11] {Perfume}                    => {Magazine} 0.01397562 0.1690647 
## [12] {Toothbrush}                 => {Magazine} 0.01129944 0.1670330 
## [13] {Pens}                       => {Magazine} 0.02007136 0.1393189 
##      coverage   lift      count
## [1]  0.02988403 1.7374856  81  
## [2]  0.04608980 1.6133499 116  
## [3]  0.03330360 1.4435955  75  
## [4]  0.04148082 1.3753653  89  
## [5]  0.05545644 1.2830584 111  
## [6]  0.03508772 1.2788462  70  
## [7]  0.15283973 1.0569141 252  
## [8]  0.17573595 0.9812215 269  
## [9]  0.13514719 0.9106880 192  
## [10] 0.16250372 0.7928813 201  
## [11] 0.08266429 0.7289292  94  
## [12] 0.06764793 0.7201691  76  
## [13] 0.14406780 0.6006787 135

13 rules are produced by the algorithm. We’ll plot these rules the graph method and look for some insights. We can see this plot is different from the previous graph plot. In this we directly linked the Item Magazine with other Itemsets, whereas in the previous one, we used Items in place of Itemsets.

Width of the arrow represents support and color represents lift. Candy Bar => Magazine has the highest support but relatively low lift.

plot(products.apriori.magazine,method="graph",control = list(type="itemsets"))

Step 8: Recommendations

In Market Basket Analysis, it is tough to have thresholds for support, confidence and lift values and pick the items falling above the threshold.
Picking the “appropriate” values for support and confidence can be difficult, as it is very much an unsupervised process. It’s better to pick these values based on the domain knowledge and looking up on the different association rules produced.
In our case, we recommend {Photo Processing, Magazine}, {Greeting Cards, Candy Bar}, {Toothbrush, Perfume} as Itemsets to be combined.

As said earlier, we can find more appropriate combination of itemsets using the domain knowledge.

References: https://select-statistics.co.uk/blog/market-basket-analysis-understanding-customer-behaviour/