Association Rules

Introduction

This article aims to use Association Rules an Unspervised learning technique, to find associations/relationshpis/dependencies between items purchased by customers in a Groceries dataset provided on Kaggle which can be retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset.

Dataset

Original Dataset

groceries_data <-read.csv2("Groceries_dataset.csv", header=TRUE, sep=",")
sprintf("The original dataset consists %s observations with %s features.", nrow(groceries_data), ncol(groceries_data))

## [1] "The original dataset consists 38765 observations with 3 features."

Packages for Analysis

library(arules)
library(arulesViz)

Data Analysis

Creating Transactions using Customer Id and Items Purchased

groceries_tran <- read.transactions("Groceries_dataset.csv", format="single", sep=",", cols=c("Member_number","itemDescription"), header=TRUE)

inspect(groceries_tran[1])

##     items                  transactionID
## [1] {canned beer,                       
##      hygiene articles,                  
##      misc. beverages,                   
##      pastry,                            
##      pickled vegetables,                
##      salty snack,                       
##      sausage,                           
##      semi-finished bread,               
##      soda,                              
##      whole milk,                        
##      yogurt}                        1000

Creating Transactions using Trasaction Date and Items Purchased

groceries_tran_dt <- read.transactions("Groceries_dataset.csv", format="single", sep=",", cols=c("Date","itemDescription"), header=TRUE)

inspect(groceries_tran_dt[1])

##     items                     transactionID
## [1] {berries,                              
##      bottled beer,                         
##      bottled water,                        
##      brown bread,                          
##      butter,                               
##      candles,                              
##      chocolate,                            
##      citrus fruit,                         
##      cleaner,                              
##      coffee,                               
##      curd,                                 
##      dishes,                               
##      domestic eggs,                        
##      flower (seeds),                       
##      frozen potato products,               
##      frozen vegetables,                    
##      hamburger meat,                       
##      Instant food products,                
##      onions,                               
##      other vegetables,                     
##      sausage,                              
##      shopping bags,                        
##      sliced cheese,                        
##      soda,                                 
##      specialty chocolate,                  
##      tropical fruit,                       
##      waffles,                              
##      whipped/sour cream,                   
##      whole milk,                           
##      yogurt}                     01-01-2014

Transaction Size

By Customer Id

sprintf("There are %s unique products purchased and a total of %s transactions recorded", dim(groceries_tran)[2], dim(groceries_tran)[1])

## [1] "There are 167 unique products purchased and a total of 3898 transactions recorded"

By Transaction Date

sprintf("There are a total of %s unique products purchased on %s transaction days recorded", dim(groceries_tran_dt)[2], dim(groceries_tran_dt)[1])

## [1] "There are a total of 167 unique products purchased on 728 transaction days recorded"

#Let’s proceed to analyse transactions by Customer Id

Support level for the last five items in transactions

itemFrequency(groceries_tran[,163:167])

## white bread  white wine  whole milk      yogurt    zwieback 
##  0.08876347  0.04412519  0.45818368  0.28296562  0.01539251

Visualization of Item Frequency with minimum support of 10%

itemFrequencyPlot(groceries_tran, support = 0.10, main="Item Frequency")

Visualization of Top 15 Most frequent item

Relative

itemFrequencyPlot(groceries_tran, topN = 15, main="Item Frequency")

Absolute

itemFrequencyPlot(groceries_tran, type=c("absolute"), topN = 15, main="Item Frequency")

Here, it can be seen that whole milk is most frequent in all transactions and appear in over 40% of transactions

Sparse Matrix Visualization by Customer ID

image(groceries_tran[1:50])

itemFrequency(groceries_tran[1:50,c(120:123,164:167)])

##            rice  roll products       rolls/buns root vegetables      white wine 
##            0.02            0.02            0.52            0.30            0.04 
##      whole milk          yogurt        zwieback 
##            0.46            0.38            0.00

We can identify some frequent occurrence of products in transactions based on the clustering of black points in a vertical straight line. 2 items (rolls/buns and whole milk) can be seen to be present in above 40% of first 50 transactions.

Sparse matrix Visualization by Transaction Date

image(groceries_tran_dt[1:100])

Looking at transactions that occurred the first 3 days of each month in 2014 and 2015, we can see some items that are purchased regularly on almost each day while some are purchased occasionaly. A deeper dive into the items purchased occasionally might reveal some seasonality in purchasing pattern.

Association Rules

For Association Rules, frequent patterns are extracted in the form of X → Y rules with two measures of support and confidence. Support represents the percentage of transactions that contain both X and Y among all transactions in the dataset. Confidence expresses the fraction of transactions containing X that also contain Y.

Let’s find rules that refer to at least two products with a minimum support and confidence level of 1% and 40% respectively.That mean that this rule should appear in at least 1% of all 7501 transactions and in 25% of transactions where antecedent item (or items) occurs, respectively. Enough high confidence level assures us that occurrence of consequent item is really associated with occurrence of an antecedent one.

Apriori Algorithm

Rules

By Customer ID

g_rules <- apriori(groceries_tran, parameter = list(support = 0.03, confidence = 0.3, minlen = 2))

By Transaction Date

g_dt_rules <- apriori(groceries_tran_dt, parameter = list(support = 0.5, confidence = 0.5, minlen = 2, maxlen=4))

Rules Summary

By Customer ID

summary(g_rules)

## set of 370 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4 
## 182 180   8 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    2.00    3.00    2.53    3.00    4.00 
## 
## summary of quality measures:
##     support          confidence        coverage            lift       
##  Min.   :0.03002   Min.   :0.3015   Min.   :0.05080   Min.   :0.9697  
##  1st Qu.:0.03438   1st Qu.:0.3710   1st Qu.:0.07773   1st Qu.:1.1224  
##  Median :0.04233   Median :0.4321   Median :0.10056   Median :1.1753  
##  Mean   :0.05206   Mean   :0.4399   Mean   :0.12342   Mean   :1.1873  
##  3rd Qu.:0.05817   3rd Qu.:0.5000   3rd Qu.:0.13982   3rd Qu.:1.2462  
##  Max.   :0.19138   Max.   :0.6569   Max.   :0.45818   Max.   :1.5547  
##      count      
##  Min.   :117.0  
##  1st Qu.:134.0  
##  Median :165.0  
##  Mean   :202.9  
##  3rd Qu.:226.8  
##  Max.   :746.0  
## 
## mining info:
##            data ntransactions support confidence
##  groceries_tran          3898    0.03        0.3
##                                                                                            call
##  apriori(data = groceries_tran, parameter = list(support = 0.03, confidence = 0.3, minlen = 2))

By Transaction Date

summary(g_dt_rules)

## set of 459 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4 
## 140 219 100 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   3.000   2.913   3.000   4.000 
## 
## summary of quality measures:
##     support         confidence        coverage           lift       
##  Min.   :0.5000   Min.   :0.5452   Min.   :0.5206   Min.   :0.9805  
##  1st Qu.:0.5220   1st Qu.:0.7398   1st Qu.:0.6044   1st Qu.:0.9976  
##  Median :0.5522   Median :0.8518   Median :0.6854   Median :1.0032  
##  Mean   :0.5757   Mean   :0.8276   Mean   :0.7078   Mean   :1.0044  
##  3rd Qu.:0.6044   3rd Qu.:0.9172   3rd Qu.:0.8159   3rd Qu.:1.0099  
##  Max.   :0.8750   Max.   :0.9772   Max.   :0.9574   Max.   :1.0474  
##      count      
##  Min.   :364.0  
##  1st Qu.:380.0  
##  Median :402.0  
##  Mean   :419.1  
##  3rd Qu.:440.0  
##  Max.   :637.0  
## 
## mining info:
##               data ntransactions support confidence
##  groceries_tran_dt           728     0.5        0.5
##                                                                                                          call
##  apriori(data = groceries_tran_dt, parameter = list(support = 0.5, confidence = 0.5, minlen = 2, maxlen = 4))

Viewing last 5 rules generated

By Customer ID

options(width = 250)
inspect(g_rules[366:370])

##     lhs                                           rhs                support    confidence coverage   lift     count
## [1] {other vegetables, rolls/buns, whole milk} => {yogurt}           0.03437660 0.4187500  0.08209338 1.479862 134  
## [2] {other vegetables, rolls/buns, soda}       => {whole milk}       0.03181119 0.6048780  0.05259107 1.320165 124  
## [3] {rolls/buns, soda, whole milk}             => {other vegetables} 0.03181119 0.4881890  0.06516162 1.296295 124  
## [4] {other vegetables, soda, whole milk}       => {rolls/buns}       0.03181119 0.4592593  0.06926629 1.313421 124  
## [5] {other vegetables, rolls/buns, whole milk} => {soda}             0.03181119 0.3875000  0.08209338 1.236068 124

By Transaction Date

options(width = 250)
inspect(g_dt_rules[411:415])

##     lhs                                                rhs                support   confidence coverage  lift      count
## [1] {soda, whole milk, yogurt}                      => {root vegetables}  0.5068681 0.7515275  0.6744505 0.9947491 369  
## [2] {rolls/buns, root vegetables, yogurt}           => {other vegetables} 0.5137363 0.9166667  0.5604396 1.0020020 374  
## [3] {other vegetables, root vegetables, yogurt}     => {rolls/buns}       0.5137363 0.9055690  0.5673077 1.0158001 374  
## [4] {other vegetables, rolls/buns, root vegetables} => {yogurt}           0.5137363 0.8237885  0.6236264 0.9995301 374  
## [5] {other vegetables, rolls/buns, yogurt}          => {root vegetables}  0.5137363 0.7586207  0.6771978 1.0041379 374

Let’s proceed with analysis of Transactions by Customer ID

Rules Analysis

By Support

 options(width = 250)
inspect(sort(g_rules, by = "support")[1:5])

##     lhs                   rhs                support   confidence coverage  lift     count
## [1] {other vegetables} => {whole milk}       0.1913802 0.5081744  0.3766034 1.109106 746  
## [2] {whole milk}       => {other vegetables} 0.1913802 0.4176932  0.4581837 1.109106 746  
## [3] {rolls/buns}       => {whole milk}       0.1785531 0.5106383  0.3496665 1.114484 696  
## [4] {whole milk}       => {rolls/buns}       0.1785531 0.3896976  0.4581837 1.114484 696  
## [5] {soda}             => {whole milk}       0.1511031 0.4819967  0.3134941 1.051973 589

By Confidence

 options(width = 250)
inspect(sort(g_rules, by = "confidence")[1:5])

##     lhs                                       rhs          support    confidence coverage   lift     count
## [1] {other vegetables, rolls/buns, yogurt} => {whole milk} 0.03437660 0.6568627  0.05233453 1.433623 134  
## [2] {bottled water, yogurt}                => {whole milk} 0.04027707 0.6061776  0.06644433 1.323001 157  
## [3] {bottled beer, rolls/buns}             => {whole milk} 0.03822473 0.6056911  0.06310929 1.321939 149  
## [4] {other vegetables, rolls/buns, soda}   => {whole milk} 0.03181119 0.6048780  0.05259107 1.320165 124  
## [5] {shopping bags, yogurt}                => {whole milk} 0.03309389 0.6028037  0.05489995 1.315638 129

By Lift level

 options(width = 250)
inspect(sort(g_rules, by = "lift")[1:5])

##     lhs                                           rhs       support    confidence coverage   lift     count
## [1] {rolls/buns, yogurt}                       => {sausage} 0.03565931 0.3202765  0.11133915 1.554717 139  
## [2] {rolls/buns, sausage}                      => {yogurt}  0.03565931 0.4330218  0.08234992 1.530298 139  
## [3] {other vegetables, yogurt}                 => {sausage} 0.03719856 0.3091684  0.12031811 1.500795 145  
## [4] {sausage, whole milk}                      => {yogurt}  0.04489482 0.4196643  0.10697794 1.483093 175  
## [5] {other vegetables, rolls/buns, whole milk} => {yogurt}  0.03437660 0.4187500  0.08209338 1.479862 134

Based on the above, there is a maximum confidence level of 66% and a lift value of 1.43 that whole milk will be purchased given purchase of other vegetables, rolls/buns and yogurt.A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.

Rules Visualization

plot(g_rules)

Based on the graph above, it can be seen that high lift levels have a maximum support of 0.075.

plot(g_rules, shading="order", control=list(main="Two-key plot"))

It can be seen that combinations of 3 to 4 items have maximum confidence levels compared to 2 items.

plot(g_rules, method="paracoord", control=list(reorder=TRUE))

The graph above shows whole milk and possibly other vegetables present in a large portion of total rules.

Matrix Based Visualization of Rules

plot(g_rules, method="matrix", measure="lift")

## Itemsets in Antecedent (LHS)
##   [1] "{other vegetables,rolls/buns,yogurt}"     "{rolls/buns,sausage}"                     "{rolls/buns,whole milk,yogurt}"           "{other vegetables,whole milk,yogurt}"     "{other vegetables,rolls/buns,whole milk}"
##   [6] "{rolls/buns,yogurt}"                      "{bottled beer,rolls/buns}"                "{sausage,yogurt}"                         "{other vegetables,rolls/buns,soda}"       "{shopping bags,yogurt}"                  
##  [11] "{other vegetables,soda,whole milk}"       "{pastry,yogurt}"                          "{beef,other vegetables}"                  "{bottled water,yogurt}"                   "{sausage,whole milk}"                    
##  [16] "{other vegetables,shopping bags}"         "{frankfurter,whole milk}"                 "{other vegetables,yogurt}"                "{rolls/buns,soda,whole milk}"             "{pip fruit,yogurt}"                      
##  [21] "{rolls/buns,shopping bags}"               "{fruit/vegetable juice,whole milk}"       "{other vegetables,pastry}"                "{citrus fruit,yogurt}"                    "{domestic eggs,rolls/buns}"              
##  [26] "{butter,whole milk}"                      "{other vegetables,sausage}"               "{shopping bags,whole milk}"               "{brown bread,whole milk}"                 "{ham}"                                   
##  [31] "{canned beer,other vegetables}"           "{beef,whole milk}"                        "{rolls/buns,whipped/sour cream}"          "{bottled water,root vegetables}"          "{newspapers,whole milk}"                 
##  [36] "{shopping bags,soda}"                     "{fruit/vegetable juice,other vegetables}" "{pastry,rolls/buns}"                      "{brown bread,rolls/buns}"                 "{brown bread,other vegetables}"          
##  [41] "{bottled beer,whole milk}"                "{soda,whole milk}"                        "{pip fruit,whole milk}"                   "{bottled water,rolls/buns}"               "{other vegetables,whole milk}"           
##  [46] "{pastry,whole milk}"                      "{chocolate}"                              "{bottled beer,other vegetables}"          "{root vegetables,yogurt}"                 "{rolls/buns,whole milk}"                 
##  [51] "{sausage,soda}"                           "{domestic eggs,other vegetables}"         "{whole milk,yogurt}"                      "{sugar}"                                  "{bottled water,other vegetables}"        
##  [56] "{bottled water,whole milk}"               "{other vegetables,soda}"                  "{frankfurter,rolls/buns}"                 "{whipped/sour cream,whole milk}"          "{citrus fruit,whole milk}"               
##  [61] "{butter,other vegetables}"                "{newspapers,rolls/buns}"                  "{soda,yogurt}"                            "{citrus fruit,rolls/buns}"                "{canned beer,soda}"                      
##  [66] "{other vegetables,rolls/buns}"            "{citrus fruit,other vegetables}"          "{domestic eggs,whole milk}"               "{tropical fruit,yogurt}"                  "{rolls/buns,soda}"                       
##  [71] "{other vegetables,pip fruit}"             "{waffles}"                                "{canned beer,rolls/buns}"                 "{meat}"                                   "{bottled water,soda}"                    
##  [76] "{other vegetables,whipped/sour cream}"    "{sausage}"                                "{UHT-milk}"                               "{frankfurter,other vegetables}"           "{pip fruit,rolls/buns}"                  
##  [81] "{pork,whole milk}"                        "{other vegetables,tropical fruit}"        "{oil}"                                    "{root vegetables,sausage}"                "{other vegetables,root vegetables}"      
##  [86] "{hamburger meat}"                         "{root vegetables,whole milk}"             "{rolls/buns,root vegetables}"             "{curd}"                                   "{onions}"                                
##  [91] "{shopping bags}"                          "{other vegetables,pork}"                  "{fruit/vegetable juice}"                  "{rolls/buns,tropical fruit}"              "{ice cream}"                             
##  [96] "{napkins}"                                "{butter}"                                 "{tropical fruit,whole milk}"              "{salty snack}"                            "{frozen vegetables}"                     
## [101] "{pip fruit,soda}"                         "{root vegetables,soda}"                   "{berries}"                                "{white bread}"                            "{canned beer,whole milk}"                
## [106] "{pip fruit}"                              "{frankfurter}"                            "{frozen meals}"                           "{bottled beer}"                           "{yogurt}"                                
## [111] "{cream cheese }"                          "{bottled water}"                          "{brown bread}"                            "{newspapers}"                             "{pastry}"                                
## [116] "{domestic eggs}"                          "{coffee}"                                 "{rolls/buns}"                             "{margarine}"                              "{canned beer}"                           
## [121] "{beef}"                                   "{whole milk}"                             "{citrus fruit}"                           "{dessert}"                                "{other vegetables}"                      
## [126] "{root vegetables}"                        "{pork}"                                   "{chicken}"                                "{tropical fruit}"                         "{pastry,soda}"                           
## [131] "{whipped/sour cream}"                     "{citrus fruit,soda}"                      "{soda}"                                   "{butter milk}"                            "{soda,tropical fruit}"                   
## Itemsets in Consequent (RHS)
## [1] "{soda}"             "{other vegetables}" "{whole milk}"       "{rolls/buns}"       "{yogurt}"           "{tropical fruit}"   "{root vegetables}"  "{sausage}"

It can be seen that whole milk, other vegetables and rolls/buns dominate the rules. However, yogurt and sausage have higher lift levels in transaction rules where found.

Group Based Visualization of Rules

plot(g_rules, method="group")

Graph Based Visualization of Rules

plot(g_rules, method="graph", max = 20)

As ealier stated, yogurt and sausage have higher lift levels in transaction rules where found.

Products associated with Yogurts

yogurt_rules <- subset(g_rules, items %in% "yogurt")
yogurt_rules

## set of 85 rules

options(width = 250)
inspect(yogurt_rules[80:85])

##     lhs                                           rhs                support    confidence coverage   lift     count
## [1] {whole milk, yogurt}                       => {other vegetables} 0.07183171 0.4770017  0.15059005 1.266589 280  
## [2] {other vegetables, whole milk}             => {yogurt}           0.07183171 0.3753351  0.19138019 1.326434 280  
## [3] {other vegetables, rolls/buns, yogurt}     => {whole milk}       0.03437660 0.6568627  0.05233453 1.433623 134  
## [4] {rolls/buns, whole milk, yogurt}           => {other vegetables} 0.03437660 0.5214008  0.06593125 1.384482 134  
## [5] {other vegetables, whole milk, yogurt}     => {rolls/buns}       0.03437660 0.4785714  0.07183171 1.368651 134  
## [6] {other vegetables, rolls/buns, whole milk} => {yogurt}           0.03437660 0.4187500  0.08209338 1.479862 134

Here we can observe that yoghurt is often purchased alongside wholemilk, other vegetables and rolls/buns.

What purchases makes people buy margarine?

Margarine Rules

 options(width = 250)
rules.margarine<-apriori(data=groceries_tran, parameter=list(supp=0.006,conf = 0.2), 
                       appearance=list(default="lhs", rhs="margarine"), control=list(verbose=F)) 
rules.margarine.byconf<-sort(rules.margarine, by="confidence", decreasing=TRUE)
inspect(head(rules.margarine.byconf,3))

##     lhs                                          rhs         support     confidence coverage   lift     count
## [1] {butter, frankfurter}                     => {margarine} 0.006413545 0.2941176  0.02180605 2.514190 25   
## [2] {frankfurter, shopping bags}              => {margarine} 0.006157004 0.2500000  0.02462801 2.137061 24   
## [3] {other vegetables, shopping bags, yogurt} => {margarine} 0.006670087 0.2363636  0.02821960 2.020494 26

 options(width = 150)
inspect(supportingTransactions(rules.margarine, groceries_tran)[1:3])

##   items                                
## 1 {baking powder} => {margarine}       
## 2 {curd,sausage} => {margarine}        
## 3 {curd,root vegetables} => {margarine}
##   transactionIDs                                                                                                                
## 1 {1110,1765,1948,2601,2625,2676,3062,3180,3363,3773,3903,3931,3960,4106,4178,4199,4442,4563,4600,4635,4761,4812,4864,4872,4983}
## 2 {1309,1582,1741,1777,2171,2592,2625,2696,2794,3046,3180,3589,3827,3830,3899,3919,4312,4430,4433,4455,4573,4718,4761,4812,4966}
## 3 {1146,1234,1248,1747,2056,2601,3100,3138,3180,3289,3556,3818,3827,3830,3919,3925,3960,4113,4199,4312,4455,4485,4773,4835}

Rules Visualization

plot(rules.margarine)

Purchase of ice cream anlysis

Ice-cream Rules

 options(width = 250)
rules.ice_cream<-apriori(data=groceries_tran_dt, parameter=list(supp=0.006,conf = 0.2), 
                       appearance=list(default="lhs", rhs="ice cream"), control=list(verbose=F)) 
rules.ice_cream.byconf<-sort(rules.ice_cream, by="confidence", decreasing=TRUE)
inspect(head(rules.ice_cream.byconf,3))

##     lhs                                         rhs         support     confidence coverage    lift     count
## [1] {herbs, organic sausage}                 => {ice cream} 0.006868132 1          0.006868132 3.516908 5    
## [2] {herbs, organic sausage, pip fruit}      => {ice cream} 0.006868132 1          0.006868132 3.516908 5    
## [3] {herbs, organic sausage, tropical fruit} => {ice cream} 0.006868132 1          0.006868132 3.516908 5

Rules Visualization

plot(rules.margarine)

It can be seen that days when herbs and organic sausage are purchased, ice cream is purchased as well.

ECLAT Algorithm

g_items_freq <- eclat(groceries_tran, parameter=list(supp=0.03))

Top 5 Frequent Items

inspect(sort(g_items_freq, by = "support")[1:5])

##     items              support   count
## [1] {whole milk}       0.4581837 1786 
## [2] {other vegetables} 0.3766034 1468 
## [3] {rolls/buns}       0.3496665 1363 
## [4] {soda}             0.3134941 1222 
## [5] {yogurt}           0.2829656 1103

Whole milk is the most frequent item and appeared 1786 times in the 3898 transactions.

 options(width = 150)
summary(g_items_freq)

## set of 415 itemsets
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda           yogurt          (Other) 
##              107               82               75               56               51              476 
## 
## element (itemset/transaction) length distribution:sizes
##   1   2   3   4 
##  72 256  85   2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   2.000   2.041   2.000   4.000 
## 
## summary of quality measures:
##     support            count       
##  Min.   :0.03002   Min.   : 117.0  
##  1st Qu.:0.03527   1st Qu.: 137.5  
##  Median :0.04310   Median : 168.0  
##  Mean   :0.05969   Mean   : 232.7  
##  3rd Qu.:0.06478   3rd Qu.: 252.5  
##  Max.   :0.45818   Max.   :1786.0  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##            data ntransactions support                                                        call
##  groceries_tran          3898    0.03 eclat(data = groceries_tran, parameter = list(supp = 0.03))

g_freq_rules <- ruleInduction(g_items_freq, groceries_tran, confidence=0.3, minlen=2)
g_freq_rules

## set of 370 rules

Rules Analysis

By Support

 options(width = 150)
inspect(sort(g_freq_rules, by = "support")[1:5])

##     lhs                   rhs                support   confidence lift     itemset
## [1] {whole milk}       => {other vegetables} 0.1913802 0.4176932  1.109106 343    
## [2] {other vegetables} => {whole milk}       0.1913802 0.5081744  1.109106 343    
## [3] {whole milk}       => {rolls/buns}       0.1785531 0.3896976  1.114484 341    
## [4] {rolls/buns}       => {whole milk}       0.1785531 0.5106383  1.114484 341    
## [5] {whole milk}       => {soda}             0.1511031 0.3297872  1.051973 337

By Confidence

 options(width = 150)
inspect(sort(g_freq_rules, by = "confidence")[1:5])

##     lhs                                       rhs          support    confidence lift     itemset
## [1] {other vegetables, rolls/buns, yogurt} => {whole milk} 0.03437660 0.6568627  1.433623 325    
## [2] {bottled water, yogurt}                => {whole milk} 0.04027707 0.6061776  1.323001 279    
## [3] {bottled beer, rolls/buns}             => {whole milk} 0.03822473 0.6056911  1.321939 167    
## [4] {other vegetables, rolls/buns, soda}   => {whole milk} 0.03181119 0.6048780  1.320165 333    
## [5] {shopping bags, yogurt}                => {whole milk} 0.03309389 0.6028037  1.315638 182

By Lift Level

 options(width = 150)
inspect(sort(g_freq_rules, by = "lift")[1:5])

##     lhs                                           rhs       support    confidence lift     itemset
## [1] {rolls/buns, yogurt}                       => {sausage} 0.03565931 0.3202765  1.554717 263    
## [2] {rolls/buns, sausage}                      => {yogurt}  0.03565931 0.4330218  1.530298 263    
## [3] {other vegetables, yogurt}                 => {sausage} 0.03719856 0.3091684  1.500795 262    
## [4] {sausage, whole milk}                      => {yogurt}  0.04489482 0.4196643  1.483093 261    
## [5] {other vegetables, rolls/buns, whole milk} => {yogurt}  0.03437660 0.4187500  1.479862 325

Rules Visualization

plot(g_freq_rules, method="graph", max = 15)

Summary

We employed the Aprior and Eclat algorithm to find association between items purchased by customers in a Groceries dataset. The most frequently purchased item in the dataset was whole milk. 370 rules were generated using Aprior. 415 itemsets based on Support were generated using Eclat. It is also important to note that Eclat is faster than Aprior in processing capacity. Transaction dates also revealed some seasonality in the purchase pattern of items.

References

https://www.kaggle.com/heeraldedhia/groceries-dataset

https://www.sciencedirect.com/topics/computer-science/minimum-confidence

https://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html

Association Rules

Odomero Omokahfe

2/26/2022

Introduction

Dataset

Original Dataset

Packages for Analysis

Data Analysis

Creating Transactions using Customer Id and Items Purchased

Creating Transactions using Trasaction Date and Items Purchased

Transaction Size

By Customer Id

By Transaction Date

Support level for the last five items in transactions

Visualization of Item Frequency with minimum support of 10%

Visualization of Top 15 Most frequent item

Relative

Absolute

Here, it can be seen that whole milk is most frequent in all transactions and appear in over 40% of transactions

Sparse Matrix Visualization by Customer ID

Sparse matrix Visualization by Transaction Date

Association Rules

Apriori Algorithm

Rules

By Customer ID

By Transaction Date

Rules Summary

By Customer ID

By Transaction Date

Viewing last 5 rules generated

By Customer ID

By Transaction Date

Let’s proceed with analysis of Transactions by Customer ID

Rules Analysis

By Support

By Confidence

By Lift level

Rules Visualization

Based on the graph above, it can be seen that high lift levels have a maximum support of 0.075.

It can be seen that combinations of 3 to 4 items have maximum confidence levels compared to 2 items.

The graph above shows whole milk and possibly other vegetables present in a large portion of total rules.

Matrix Based Visualization of Rules

It can be seen that whole milk, other vegetables and rolls/buns dominate the rules. However, yogurt and sausage have higher lift levels in transaction rules where found.

Group Based Visualization of Rules

Graph Based Visualization of Rules

As ealier stated, yogurt and sausage have higher lift levels in transaction rules where found.

Products associated with Yogurts

What purchases makes people buy margarine?

Margarine Rules

Rules Visualization

Purchase of ice cream anlysis

Ice-cream Rules

Rules Visualization

It can be seen that days when herbs and organic sausage are purchased, ice cream is purchased as well.

ECLAT Algorithm

Top 5 Frequent Items

Rules Analysis

By Support

By Confidence

By Lift Level

Rules Visualization

Summary

References