Introduction

The aim of this study is to apply association rules to identify patterns and dependencies in consumer purchasing behavior based on their market baskets in a grocery store. The dataset used for this analysis was sourced from Kaggle (https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset) and contains transactional data about customer purchases.

As a BONUS, in the end we will compare ECLAT and APRIORI computation times to determine which algorithm is faster.

Libraries

library(arules)   
library(arulesViz)
library(dplyr)

Data Preparation

The dataset “Groceries_dataset.csv” was preprocessed in R to structure the transactional data in a format suitable for association rules analysis. The data was grouped by customer (Member_number), and all purchased items were concatenated into a single row per customer. Then, using arulez the dataset was converted into a transaction object, where each row represents a single market basket.

data <- read.csv("Groceries_dataset.csv")
head(data)
##   Member_number       Date  itemDescription
## 1          1808 21-07-2015   tropical fruit
## 2          2552 05-01-2015       whole milk
## 3          2300 19-09-2015        pip fruit
## 4          1187 12-12-2015 other vegetables
## 5          3037 01-02-2015       whole milk
## 6          4941 14-02-2015       rolls/buns
summary(data)
##  Member_number      Date           itemDescription   
##  Min.   :1000   Length:38765       Length:38765      
##  1st Qu.:2002   Class :character   Class :character  
##  Median :3005   Mode  :character   Mode  :character  
##  Mean   :3004                                        
##  3rd Qu.:4007                                        
##  Max.   :5000

Current format is not suitable for further analysis. We need to group the products by member_number into one transaction

data_grouped <- data %>%
  group_by(Member_number) %>%
  summarise(Items = paste(itemDescription, collapse = ", "))
data_grouped <- data_grouped[,2]
write.table(data_grouped, file = "data_grouped.csv", sep = ",", row.names = FALSE, col.names = FALSE, quote = FALSE)
transactions<-read.transactions("data_grouped.csv", format="basket",
                                sep=",", skip=0, quote="", rm.duplicates = FALSE)
## Warning in asMethod(object): removing duplicated items in transactions
summary(transactions)
## transactions as itemMatrix in sparse format with
##  3898 rows (elements/itemsets/transactions) and
##  167 columns (items) and a density of 0.05340678 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             1786             1468             1363             1222 
##           yogurt          (Other) 
##             1103            27824 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##   6 248  87 331 261 381 303 332 340 296 276 238 181 179 123  97  66  46  39  28 
##  21  22  23  24  25  26 
##  15  13   3   5   2   2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   6.000   8.500   8.919  12.000  26.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

The dataset consists of 3898 transactions and 167 unique items. The most frequently purchased items were: whole milk, other vegetables, rolls/buns, soda and yogurt. The transaction size varies from 1 to 26 items, with a median of 8.5 items and an average of 8.9 items per transaction.

We can also inspect few baskets.

inspect(transactions[1:10])
##      items                       
## [1]  {canned beer,               
##       hygiene articles,          
##       misc. beverages,           
##       pastry,                    
##       pickled vegetables,        
##       salty snack,               
##       sausage,                   
##       semi-finished bread,       
##       soda,                      
##       whole milk,                
##       yogurt}                    
## [2]  {beef,                      
##       curd,                      
##       frankfurter,               
##       rolls/buns,                
##       sausage,                   
##       soda,                      
##       whipped/sour cream,        
##       white bread,               
##       whole milk}                
## [3]  {butter,                    
##       butter milk,               
##       frozen vegetables,         
##       other vegetables,          
##       specialty chocolate,       
##       sugar,                     
##       tropical fruit,            
##       whole milk}                
## [4]  {dental care,               
##       detergent,                 
##       frozen meals,              
##       rolls/buns,                
##       root vegetables,           
##       sausage}                   
## [5]  {canned beer,               
##       chocolate,                 
##       cling film/bags,           
##       dish cleaner,              
##       frozen fish,               
##       hygiene articles,          
##       other vegetables,          
##       packaged fruit/vegetables, 
##       pastry,                    
##       pip fruit,                 
##       red/blush wine,            
##       rolls/buns,                
##       root vegetables,           
##       shopping bags,             
##       tropical fruit,            
##       whole milk}                
## [6]  {margarine,                 
##       rolls/buns,                
##       whipped/sour cream}        
## [7]  {bottled beer,              
##       bottled water,             
##       chicken,                   
##       chocolate,                 
##       flour,                     
##       frankfurter,               
##       rice,                      
##       rolls/buns,                
##       shopping bags,             
##       skin care,                 
##       softener,                  
##       whole milk}                
## [8]  {dessert,                   
##       domestic eggs,             
##       hamburger meat,            
##       liquor (appetizer),        
##       liver loaf,                
##       photo/film,                
##       root vegetables,           
##       soda,                      
##       tropical fruit,            
##       white wine,                
##       yogurt}                    
## [9]  {canned fish,               
##       cocoa drinks,              
##       herbs,                     
##       ketchup,                   
##       newspapers,                
##       pastry,                    
##       tropical fruit,            
##       yogurt}                    
## [10] {bottled water,             
##       candles,                   
##       coffee,                    
##       frankfurter,               
##       kitchen towels,            
##       pip fruit,                 
##       rolls/buns,                
##       sliced cheese,             
##       specialty bar,             
##       UHT-milk}

Most Frequent and Least Frequent Items

Below we can see charts displaying the 25 most popular products, presented in both relative and absolute terms.

itemFrequencyPlot(transactions, topN=25, type="relative", main="Item Frequency Plot - relative", ylim=c(0, 0.5), col='orange') 

itemFrequencyPlot(transactions, topN=25, type="absolute", main="Item Frequency Plot - absolute", ylim=c(0, 2000.0), col='yellow') 

Most Frequent Items Table:

##                     Item Occurrences
## 1             whole milk        1786
## 2       other vegetables        1468
## 3             rolls/buns        1363
## 4                   soda        1222
## 5                 yogurt        1103
## 6         tropical fruit         911
## 7        root vegetables         899
## 8          bottled water         833
## 9                sausage         803
## 10          citrus fruit         723
## 11                pastry         692
## 12             pip fruit         665
## 13         shopping bags         656
## 14           canned beer         644
## 15          bottled beer         619
## 16    whipped/sour cream         603
## 17            newspapers         545
## 18           frankfurter         536
## 19           brown bread         530
## 20         domestic eggs         519
## 21                  pork         516
## 22                butter         493
## 23 fruit/vegetable juice         487
## 24                  curd         471
## 25                  beef         466

Least Frequent Items Table:

##                      Item Occurrences
## 1         kitchen utensil           1
## 2   preservation products           1
## 3          baby cosmetics           3
## 4                    bags           4
## 5          frozen chicken           5
## 6         make up remover           5
## 7         rubbing alcohol           5
## 8          toilet cleaner           5
## 9          salad dressing           6
## 10                 whisky           8
## 11            decalcifier           9
## 12             hair spray           9
## 13                liqueur           9
## 14       organic products          10
## 15          frozen fruits          11
## 16   specialty vegetables          11
## 17                  cream          12
## 18                  honey          13
## 19      cooking chocolate          15
## 20            ready soups          15
## 21           cocoa drinks          16
## 22 flower soil/fertilizer          16
## 23         pudding powder          16
## 24       bathroom cleaner          17
## 25               cookware          17

Association Rules

To analyze purchasing behavior, we will now explore association rules, which reveal relationships between occurance of two or more products. These rules will help us to identify patterns in transactions.

Association rules are defined by three key metrics:

Apriori algoritm

We will apply the Apriori algorithm, which helps filter out less significant rules by setting a minimum support and confidence threshold. This algorithm follows the principle that if a combination of items is frequent, its subsets must also be frequent, and if an item is infrequent, any set containing it will be infrequent as well.

For our analysis, we will extract rules involving at least two products that meet the minimum support of 4% and confidence of 43%. This ensures that the rule appears in at least 4% of all transactions and that the consequent product is strongly associated with the antecedent in 43% of relevant cases.

rules<-apriori(transactions, parameter=list(supp=0.04, conf=0.43)) 
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.43    0.1    1 none FALSE            TRUE       5    0.04      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 155 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[167 item(s), 3898 transaction(s)] done [0.00s].
## sorting and recoding items ... [60 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [93 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

We found 93 rules. Below we can find all of them.

inspect(rules)
##      lhs                                    rhs                support   
## [1]  {}                                  => {whole milk}       0.45818368
## [2]  {hamburger meat}                    => {whole milk}       0.04540790
## [3]  {UHT-milk}                          => {whole milk}       0.04053361
## [4]  {napkins}                           => {whole milk}       0.04309903
## [5]  {dessert}                           => {whole milk}       0.04079015
## [6]  {cream cheese}                      => {whole milk}       0.04771678
## [7]  {chocolate}                         => {whole milk}       0.04797332
## [8]  {white bread}                       => {whole milk}       0.04771678
## [9]  {chicken}                           => {whole milk}       0.05002565
## [10] {frozen vegetables}                 => {other vegetables} 0.04566444
## [11] {frozen vegetables}                 => {whole milk}       0.05515649
## [12] {coffee}                            => {whole milk}       0.05515649
## [13] {margarine}                         => {whole milk}       0.05951770
## [14] {beef}                              => {whole milk}       0.06413545
## [15] {curd}                              => {whole milk}       0.06362237
## [16] {fruit/vegetable juice}             => {whole milk}       0.06233966
## [17] {butter}                            => {other vegetables} 0.05720883
## [18] {butter}                            => {whole milk}       0.06618779
## [19] {pork}                              => {whole milk}       0.06695741
## [20] {domestic eggs}                     => {whole milk}       0.07029246
## [21] {brown bread}                       => {other vegetables} 0.05977424
## [22] {brown bread}                       => {whole milk}       0.06977937
## [23] {newspapers}                        => {whole milk}       0.07234479
## [24] {frankfurter}                       => {other vegetables} 0.06105695
## [25] {frankfurter}                       => {whole milk}       0.06798358
## [26] {whipped/sour cream}                => {other vegetables} 0.06695741
## [27] {whipped/sour cream}                => {whole milk}       0.07978450
## [28] {bottled beer}                      => {other vegetables} 0.06849666
## [29] {bottled beer}                      => {whole milk}       0.08542842
## [30] {shopping bags}                     => {other vegetables} 0.07311442
## [31] {shopping bags}                     => {whole milk}       0.09132889
## [32] {canned beer}                       => {whole milk}       0.08722422
## [33] {pip fruit}                         => {whole milk}       0.08696768
## [34] {pastry}                            => {whole milk}       0.09107234
## [35] {citrus fruit}                      => {whole milk}       0.09235505
## [36] {sausage}                           => {other vegetables} 0.09286814
## [37] {sausage}                           => {whole milk}       0.10697794
## [38] {bottled water}                     => {other vegetables} 0.09389430
## [39] {bottled water}                     => {whole milk}       0.11236532
## [40] {tropical fruit}                    => {whole milk}       0.11646998
## [41] {root vegetables}                   => {whole milk}       0.11313494
## [42] {yogurt}                            => {whole milk}       0.15059005
## [43] {soda}                              => {whole milk}       0.15110313
## [44] {rolls/buns}                        => {whole milk}       0.17855310
## [45] {other vegetables}                  => {whole milk}       0.19138019
## [46] {rolls/buns, shopping bags}         => {whole milk}       0.04130323
## [47] {shopping bags, whole milk}         => {rolls/buns}       0.04130323
## [48] {other vegetables, shopping bags}   => {whole milk}       0.04284248
## [49] {shopping bags, whole milk}         => {other vegetables} 0.04284248
## [50] {pastry, rolls/buns}                => {whole milk}       0.04027707
## [51] {pastry, whole milk}                => {rolls/buns}       0.04027707
## [52] {other vegetables, pastry}          => {whole milk}       0.04181632
## [53] {pastry, whole milk}                => {other vegetables} 0.04181632
## [54] {citrus fruit, other vegetables}    => {whole milk}       0.04258594
## [55] {citrus fruit, whole milk}          => {other vegetables} 0.04258594
## [56] {sausage, yogurt}                   => {whole milk}       0.04489482
## [57] {sausage, soda}                     => {whole milk}       0.04002052
## [58] {rolls/buns, sausage}               => {other vegetables} 0.04181632
## [59] {other vegetables, sausage}         => {rolls/buns}       0.04181632
## [60] {rolls/buns, sausage}               => {whole milk}       0.04874295
## [61] {sausage, whole milk}               => {rolls/buns}       0.04874295
## [62] {other vegetables, sausage}         => {whole milk}       0.05028220
## [63] {sausage, whole milk}               => {other vegetables} 0.05028220
## [64] {bottled water, yogurt}             => {whole milk}       0.04027707
## [65] {bottled water, soda}               => {whole milk}       0.04002052
## [66] {bottled water, rolls/buns}         => {whole milk}       0.04515136
## [67] {bottled water, other vegetables}   => {whole milk}       0.05618266
## [68] {bottled water, whole milk}         => {other vegetables} 0.05618266
## [69] {tropical fruit, yogurt}            => {whole milk}       0.04232940
## [70] {rolls/buns, tropical fruit}        => {whole milk}       0.04643407
## [71] {other vegetables, tropical fruit}  => {whole milk}       0.05053874
## [72] {tropical fruit, whole milk}        => {other vegetables} 0.05053874
## [73] {root vegetables, soda}             => {whole milk}       0.04309903
## [74] {rolls/buns, root vegetables}       => {other vegetables} 0.04130323
## [75] {other vegetables, root vegetables} => {rolls/buns}       0.04130323
## [76] {rolls/buns, root vegetables}       => {whole milk}       0.04797332
## [77] {other vegetables, root vegetables} => {whole milk}       0.04540790
## [78] {soda, yogurt}                      => {rolls/buns}       0.04232940
## [79] {soda, yogurt}                      => {other vegetables} 0.04309903
## [80] {soda, yogurt}                      => {whole milk}       0.05438687
## [81] {rolls/buns, yogurt}                => {other vegetables} 0.05233453
## [82] {other vegetables, yogurt}          => {rolls/buns}       0.05233453
## [83] {rolls/buns, yogurt}                => {whole milk}       0.06593125
## [84] {whole milk, yogurt}                => {rolls/buns}       0.06593125
## [85] {other vegetables, yogurt}          => {whole milk}       0.07183171
## [86] {whole milk, yogurt}                => {other vegetables} 0.07183171
## [87] {rolls/buns, soda}                  => {other vegetables} 0.05259107
## [88] {rolls/buns, soda}                  => {whole milk}       0.06516162
## [89] {soda, whole milk}                  => {rolls/buns}       0.06516162
## [90] {other vegetables, soda}            => {whole milk}       0.06926629
## [91] {soda, whole milk}                  => {other vegetables} 0.06926629
## [92] {other vegetables, rolls/buns}      => {whole milk}       0.08209338
## [93] {rolls/buns, whole milk}            => {other vegetables} 0.08209338
##      confidence coverage   lift     count
## [1]  0.4581837  1.00000000 1.000000 1786 
## [2]  0.5654952  0.08029759 1.234211  177 
## [3]  0.5163399  0.07850180 1.126928  158 
## [4]  0.5299685  0.08132376 1.156672  168 
## [5]  0.4718101  0.08645459 1.029740  159 
## [6]  0.5391304  0.08850693 1.176669  186 
## [7]  0.5548961  0.08645459 1.211078  187 
## [8]  0.5375723  0.08876347 1.173268  186 
## [9]  0.4974490  0.10056439 1.085698  195 
## [10] 0.4450000  0.10261673 1.181614  178 
## [11] 0.5375000  0.10261673 1.173110  215 
## [12] 0.4799107  0.11493073 1.047420  215 
## [13] 0.5087719  0.11698307 1.110410  232 
## [14] 0.5364807  0.11954849 1.170886  250 
## [15] 0.5265393  0.12083120 1.149188  248 
## [16] 0.4989733  0.12493586 1.089025  243 
## [17] 0.4523327  0.12647512 1.201085  223 
## [18] 0.5233266  0.12647512 1.142176  258 
## [19] 0.5058140  0.13237558 1.103955  261 
## [20] 0.5279383  0.13314520 1.152242  274 
## [21] 0.4396226  0.13596716 1.167336  233 
## [22] 0.5132075  0.13596716 1.120091  272 
## [23] 0.5174312  0.13981529 1.129310  282 
## [24] 0.4440299  0.13750641 1.179038  238 
## [25] 0.4944030  0.13750641 1.079050  265 
## [26] 0.4328358  0.15469472 1.149315  261 
## [27] 0.5157546  0.15469472 1.125650  311 
## [28] 0.4313409  0.15879938 1.145345  267 
## [29] 0.5379645  0.15879938 1.174124  333 
## [30] 0.4344512  0.16829143 1.153604  285 
## [31] 0.5426829  0.16829143 1.184422  356 
## [32] 0.5279503  0.16521293 1.152268  340 
## [33] 0.5097744  0.17060031 1.112598  339 
## [34] 0.5130058  0.17752694 1.119651  355 
## [35] 0.4979253  0.18547973 1.086737  360 
## [36] 0.4508095  0.20600308 1.197040  362 
## [37] 0.5193026  0.20600308 1.133394  417 
## [38] 0.4393758  0.21369933 1.166680  366 
## [39] 0.5258103  0.21369933 1.147597  438 
## [40] 0.4983535  0.23370959 1.087672  454 
## [41] 0.4905451  0.23063109 1.070630  441 
## [42] 0.5321850  0.28296562 1.161510  587 
## [43] 0.4819967  0.31349410 1.051973  589 
## [44] 0.5106383  0.34966650 1.114484  696 
## [45] 0.5081744  0.37660339 1.109106  746 
## [46] 0.6007463  0.06875321 1.311147  161 
## [47] 0.4522472  0.09132889 1.293367  161 
## [48] 0.5859649  0.07311442 1.278886  167 
## [49] 0.4691011  0.09132889 1.245610  167 
## [50] 0.5793358  0.06952283 1.264418  157 
## [51] 0.4422535  0.09107234 1.264787  157 
## [52] 0.5842294  0.07157517 1.275099  163 
## [53] 0.4591549  0.09107234 1.219200  163 
## [54] 0.5496689  0.07747563 1.199669  166 
## [55] 0.4611111  0.09235505 1.224394  166 
## [56] 0.5952381  0.07542329 1.299125  175 
## [57] 0.5182724  0.07721909 1.131146  156 
## [58] 0.5077882  0.08234992 1.348337  163 
## [59] 0.4502762  0.09286814 1.287731  163 
## [60] 0.5919003  0.08234992 1.291841  190 
## [61] 0.4556355  0.10697794 1.303057  190 
## [62] 0.5414365  0.09286814 1.181702  196 
## [63] 0.4700240  0.10697794 1.248061  196 
## [64] 0.6061776  0.06644433 1.323001  157 
## [65] 0.5252525  0.07619292 1.146380  156 
## [66] 0.5695793  0.07927142 1.243124  176 
## [67] 0.5983607  0.09389430 1.305941  219 
## [68] 0.5000000  0.11236532 1.327657  219 
## [69] 0.5593220  0.07567984 1.220738  165 
## [70] 0.5261628  0.08825038 1.148366  181 
## [71] 0.5533708  0.09132889 1.207749  197 
## [72] 0.4339207  0.11646998 1.152195  197 
## [73] 0.5299685  0.08132376 1.156672  168 
## [74] 0.4548023  0.09081580 1.207643  161 
## [75] 0.4386921  0.09415085 1.254601  161 
## [76] 0.5282486  0.09081580 1.152919  187 
## [77] 0.4822888  0.09415085 1.052610  177 
## [78] 0.4342105  0.09748589 1.241785  165 
## [79] 0.4421053  0.09748589 1.173928  168 
## [80] 0.5578947  0.09748589 1.217622  212 
## [81] 0.4700461  0.11133915 1.248120  204 
## [82] 0.4349680  0.12031811 1.243951  204 
## [83] 0.5921659  0.11133915 1.292420  257 
## [84] 0.4378194  0.15059005 1.252106  257 
## [85] 0.5970149  0.12031811 1.303003  280 
## [86] 0.4770017  0.15059005 1.266589  280 
## [87] 0.4389722  0.11980503 1.165609  205 
## [88] 0.5438972  0.11980503 1.187072  254 
## [89] 0.4312394  0.15110313 1.233288  254 
## [90] 0.5578512  0.12416624 1.217528  270 
## [91] 0.4584041  0.15110313 1.217206  270 
## [92] 0.5594406  0.14674192 1.220996  320 
## [93] 0.4597701  0.17855310 1.220834  320

Five rules with highest lift levels

inspect(sort(rules, by = "lift")[1:5])
##     lhs                                  rhs                support   
## [1] {rolls/buns, sausage}             => {other vegetables} 0.04181632
## [2] {bottled water, whole milk}       => {other vegetables} 0.05618266
## [3] {bottled water, yogurt}           => {whole milk}       0.04027707
## [4] {rolls/buns, shopping bags}       => {whole milk}       0.04130323
## [5] {bottled water, other vegetables} => {whole milk}       0.05618266
##     confidence coverage   lift     count
## [1] 0.5077882  0.08234992 1.348337 163  
## [2] 0.5000000  0.11236532 1.327657 219  
## [3] 0.6061776  0.06644433 1.323001 157  
## [4] 0.6007463  0.06875321 1.311147 161  
## [5] 0.5983607  0.09389430 1.305941 219

Five rules with highest confidence levels

inspect(sort(rules, by = "confidence")[1:5])
##     lhs                                  rhs          support    confidence
## [1] {bottled water, yogurt}           => {whole milk} 0.04027707 0.6061776 
## [2] {rolls/buns, shopping bags}       => {whole milk} 0.04130323 0.6007463 
## [3] {bottled water, other vegetables} => {whole milk} 0.05618266 0.5983607 
## [4] {other vegetables, yogurt}        => {whole milk} 0.07183171 0.5970149 
## [5] {sausage, yogurt}                 => {whole milk} 0.04489482 0.5952381 
##     coverage   lift     count
## [1] 0.06644433 1.323001 157  
## [2] 0.06875321 1.311147 161  
## [3] 0.09389430 1.305941 219  
## [4] 0.12031811 1.303003 280  
## [5] 0.07542329 1.299125 175

Five rules with highest support levels

inspect(sort(rules, by = "support")[1:5])
##     lhs                   rhs          support   confidence coverage  lift    
## [1] {}                 => {whole milk} 0.4581837 0.4581837  1.0000000 1.000000
## [2] {other vegetables} => {whole milk} 0.1913802 0.5081744  0.3766034 1.109106
## [3] {rolls/buns}       => {whole milk} 0.1785531 0.5106383  0.3496665 1.114484
## [4] {soda}             => {whole milk} 0.1511031 0.4819967  0.3134941 1.051973
## [5] {yogurt}           => {whole milk} 0.1505900 0.5321850  0.2829656 1.161510
##     count
## [1] 1786 
## [2]  746 
## [3]  696 
## [4]  589 
## [5]  587
plot(rules, measure=c("support","lift"), shading="confidence", engine = "ggplot2")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

summary(rules)
## set of 93 rules
## 
## rule length distribution (lhs + rhs):sizes
##  1  2  3 
##  1 44 48 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.505   3.000   3.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift      
##  Min.   :0.04002   Min.   :0.4312   Min.   :0.06644   Min.   :1.000  
##  1st Qu.:0.04310   1st Qu.:0.4584   1st Qu.:0.09082   1st Qu.:1.142  
##  Median :0.05516   Median :0.5098   Median :0.11493   Median :1.177  
##  Mean   :0.06866   Mean   :0.5058   Mean   :0.13786   Mean   :1.186  
##  3rd Qu.:0.07183   3rd Qu.:0.5380   3rd Qu.:0.15110   3rd Qu.:1.243  
##  Max.   :0.45818   Max.   :0.6062   Max.   :1.00000   Max.   :1.348  
##      count       
##  Min.   : 156.0  
##  1st Qu.: 168.0  
##  Median : 215.0  
##  Mean   : 267.6  
##  3rd Qu.: 280.0  
##  Max.   :1786.0  
## 
## mining info:
##          data ntransactions support confidence
##  transactions          3898    0.04       0.43
##                                                                      call
##  apriori(data = transactions, parameter = list(supp = 0.04, conf = 0.43))

The generated rules have support values ranging from 4% to 45.8%, meaning some associations appear in nearly half of all transactions. Confidence varies between 43.1% and 60.6%, indicating that in many cases, the consequent product appears in over half of the transactions where the antecedent is present. Lift values range from 1.000 to 1.348, confirming positive associations between products.

plot(rules, method="graph", measure="support", shading="lift", engine="html")

Plot Interpretation Example

Rule 77: {other vegetables, root vegetables} => {whole milk}

support = 0.0454 - There is a 4.54% chance of finding a transaction where other vegetables, root vegetables and whole milk are purchased together.

confidence = 0.482 - There is a 48,2% chance that if customer buys other vegetables and root vegetables, he will also buy whole milk.

lift = 1.05 - This indicates a very weak positive relationship, meaning customers who buy other vegetables and root vegetables are only slightly more likely to purchase whole milk than the general population.

Conclusion

In this study, the Apriori algorithm was applied to a grocery store transaction dataset to uncover association rules. The analysis with support = 0.04 and confidence = 0.43 threshold generated 93 different rules. Based on them we could identify key purchasing patterns, revealing how certain products are commonly bought together. While these results provide valuable insights into consumer behavior, further research could refine the analysis by adjusting support and confidence thresholds to uncover more specific and potentially stronger relationships.

[BONUS]

ECLAT Algorithm vs APRIORI Alogirithm - computation time comparison

Since Eclat and Apriori generate similar frequent itemsets with the same support and confidence, the key difference lies in their efficiency. To test this, I will compare their computation time using identical parameters (support = 0.001, confidence = 0.1, maxlen = 20) to determine which algorithm performs better for mining association rules in our dataset

start_timeECLAT <- Sys.time()
computationECLAT<-eclat(transactions, parameter=list(supp=0.001, maxlen=20))
computationECLAT2<-ruleInduction(computationECLAT, transactions, confidence=0.1)
end_timeECLAT <- Sys.time()
start_timeAPRIORI <- Sys.time()
computationAPRIORI<-apriori(transactions, parameter=list(supp=0.001, conf=0.1, maxlen=20)) 
end_timeAPRIORI <- Sys.time()
total_timeECLAT <- end_timeECLAT - start_timeECLAT
print(paste("Total computation time in seconds ECLAT:", total_timeECLAT))
## [1] "Total computation time in seconds ECLAT: 2.36619806289673"
total_timeAPRIORI <- end_timeAPRIORI - start_timeAPRIORI
print(paste("Total computation time in seconds APRIORI:", total_timeAPRIORI))
## [1] "Total computation time in seconds APRIORI: 0.982537984848022"

The results show that Apriori algorithm outperformed Eclat algorithm in terms of computation time, making it more efficient for this dataset and parameter set.