Market basket analysis of bakery

1. Market basket analysis and Associate Rule

1.1. General

Market basket analysis is a technique used to identify patterns or associations between items in large datasets, typically in retail transactions. By identifying these associations, businesses can gain valuable insights into customer purchasing behavior. This information can be leveraged to optimize product placement, design effective promotions, enhance cross-selling strategies, and build recommendation system using for online shopping, ultimately driving sales and improving customer satisfaction and overall shopping experience.

1.2. How do association rules work?

Association rules help uncover these patterns by defining relationships between itemsets. The rules have two parts: antecedent (if) and consequent (then) in the form of “If A then B” which display {A} ➝ {B}.

Itemsets are collections of one or more items that are grouped together based on transactions. These sets represent the products that are frequently bought together by customers in a single purchase. For example, {Bread} is a single itemset or {Bread, Coffee} is a multiple itemset

The antecedent (also known as the LHS - Left Hand Side) is the itemset that appears before the arrow in an association rule. It represents the item(s) that must be present for the rule to be applicable.

The consequent (also known as the RHS - Right Hand Side) is the itemset that appears after the arrow in an association rule. It represents the item(s) that are predicted to occur given the antecedent.

For example, an association rule might reveal that if a customer buys Bread, then they are likely to also buy Milk. The rule is displayed as {Coffee} ➝ {Bread}. {Coffee} is antecedent (LHS) and {Bread} is consequent (RHS)

There are metrics to measure how strong the relationship between itemsets in association rules:

  • Support: measures how frequently an itemset appears in the dataset. It is calculated as the proportion of transactions that contain the itemset..

    \[ \text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}} \]

    For example, in a transaction dataset, Coffee appears in 4528 out of 9465 transactions, the support for Coffee is: Support(Coffee)= 0.48

  • Confidence: measures the likelihood of item B being purchased when item A is purchased.

    \[ \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} = \frac{\text{Number of transactions containing } A \text{ and }B}{\text{Number of transactions containing } A} \]

    For example, in a transaction dataset, there are 852 transactions containing both Coffee and Bread, and there are totally 4582 transactions containing Coffee, so the likelihood of Bread being purchased when Coffee is purchased: Confidence( {Coffee} ➝ {Bread} ) = 0.19

    However, Confidence doesn’t consider how often the consequent occurs on its own which may be misleading. High confidence value does not necessarily mean a strong rule, especially if the consequent is very common in the dataset

  • Lift: measures how much more often the antecedent and consequent of a rule occur together than would be expected if they were statistically independent. With this approach, Lift can avoid misleading and provide a more accurate measure of the rule’s strength.

    \[ \text{Lift}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A) \times \text{Support}(B)} = \frac{\text{Confidence}(A \rightarrow B)}{\text{Support}(B)} \]

    • Lift > 1 indicates a POSITIVE association between the antecedent and consequent in a rule. There is MORE likely for the consequent items being purchased when the antecedent items have been picked.

    • Lift < 1 indicates a NEGATIVE association between the antecedent and consequent in a rule. There is LESS likely for the consequent items being purchased when the antecedent items have been picked.

    • Lift = 1 indicates there is no association between the antecedent and consequent. The presence of antecedent items does not affect the probability of buying consequent items

    For example, in a transaction dataset, there are 852 transactions containing both Coffee and Bread, and there are totally 4582 transactions containing Coffee and 3097 transactions containing Bread. Lift( {Coffee} ➝ {Bread} ) = 0.58 which mean if a customer have picked Coffee then it is LESS likely for that customer to buy Bread.

2. Data

Data used in this article is a transaction data of a bakery whose records collected from 11.01.2016 until 03.12.2017 - Source: Bakery data

##    TransactionNo         Items            DateTime Daypart DayType
## 1              1         Bread 2016-10-30 09:58:11 Morning Weekend
## 2              2  Scandinavian 2016-10-30 10:05:34 Morning Weekend
## 3              2  Scandinavian 2016-10-30 10:05:34 Morning Weekend
## 4              3 Hot chocolate 2016-10-30 10:07:57 Morning Weekend
## 5              3           Jam 2016-10-30 10:07:57 Morning Weekend
## 6              3       Cookies 2016-10-30 10:07:57 Morning Weekend
## 7              4        Muffin 2016-10-30 10:08:41 Morning Weekend
## 8              5        Coffee 2016-10-30 10:13:03 Morning Weekend
## 9              5        Pastry 2016-10-30 10:13:03 Morning Weekend
## 10             5         Bread 2016-10-30 10:13:03 Morning Weekend

arules and arulesViz library are used to do Assosiate Rule Mining

## transactions as itemMatrix in sparse format with
##  9465 rows (elements/itemsets/transactions) and
##  94 columns (items) and a density of 0.02122827 
## 
## most frequent items:
##  Coffee   Bread     Tea    Cake  Pastry (Other) 
##    4528    3097    1350     983     815    8114 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10 
## 3948 3059 1471  662  234   64   17    4    5    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.995   3.000  10.000 
## 
## includes extended item information - examples:
##                     labels
## 1               Adjustment
## 2 Afternoon with the baker
## 3                Alfajores
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2            10
## 3           100

From the summary report, there are some notable information about Bakery dataset:

  • There are 9465 transactions

  • 94 products were sold by the bakery

  • Coffee, Bread, Tea, Cake, Pastry are the most purchased products. Number of purchasing Coffee and Bread are much higher than other products. Here is bar chart of Top 20 most purchased products

  • Number of distinct products in each transaction go from 1 to 10 but most of transactions include just 1 to 2 products (7007 transactions - 74% of total)

3. Associate Rules Mining

In this article, Eclat Algorithm is used to find rules

The Eclat algorithm is a widely-used technique in association rule mining, designed for finding frequent itemsets in transactional datasets.

Eclat transforms the transaction database into a vertical format where each item is associated with a list of transactions containing it. This allows for straightforward intersection of transaction lists to determine frequent itemsets.

By utilizing a depth-first search strategy, Eclat explores itemsets to their maximum depth, ensuring a thorough and efficient search for frequent patterns.

Thank to these mechanisms, the algorithm reduces the need for multiple database scans and manages memory usage effectively.

To run the algorithm on Bakery dataset, the limit threshold of support value is required. Itemsets whose support value higher than the threshold are considered as compete set of frequent itemsets. The support threshold 0.0025 was set which mean itemsets whose support value higher than 0.0025 (itemsets present in more than 24 transactions in the dataset) are detected. The below output includes list of frequent itemsets with their support values and number of transactions containing them. {Coffee}, {Bread}, and {Tea} are the most frequent single itemsets. The most frequent multiple itemsets are {Bread, Coffee}, {Cake, Coffee}, and {Coffee, Tea}.

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE  0.0025      1      5 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 23 
## 
## create itemset ... 
## set transactions ...[94 item(s), 9465 transaction(s)] done [0.00s].
## sorting and recoding items ... [43 item(s)] done [0.00s].
## creating sparse bit matrix ... [43 row(s), 9465 column(s)] done [0.00s].
## writing  ... [209 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
##       items                              support     count
## [1]   {Coffee}                           0.478394083 4528 
## [2]   {Bread}                            0.327205494 3097 
## [3]   {Tea}                              0.142630745 1350 
## [4]   {Cake}                             0.103856313  983 
## [5]   {Bread, Coffee}                    0.090015848  852 
## [6]   {Pastry}                           0.086106709  815 
## [7]   {Sandwich}                         0.071843634  680 
## [8]   {Medialuna}                        0.061806656  585 
## [9]   {Hot chocolate}                    0.058320127  552 
## [10]  {Cake, Coffee}                     0.054727945  518 
## [11]  {Cookies}                          0.054410988  515 
## [12]  {Coffee, Tea}                      0.049867934  472 
## [13]  {Coffee, Pastry}                   0.047543582  450 
## [14]  {Brownie}                          0.040042261  379 
## [15]  {Farm House}                       0.039197042  371 
## [16]  {Juice}                            0.038563127  365 
## [17]  {Muffin}                           0.038457475  364 
## [18]  {Coffee, Sandwich}                 0.038246170  362 
## [19]  {Alfajores}                        0.036344427  344 
## [20]  {Coffee, Medialuna}                0.035182250  333 
## [21]  {Scone}                            0.034548336  327 
## [22]  {Soup}                             0.034442684  326 
## [23]  {Toast}                            0.033597464  318 
## [24]  {Coffee, Hot chocolate}            0.029582673  280 
## [25]  {Bread, Pastry}                    0.029160063  276 
## [26]  {Scandinavian}                     0.029054411  275 
## [27]  {Coffee, Cookies}                  0.028209192  267 
## [28]  {Bread, Tea}                       0.028103539  266 
## [29]  {Cake, Tea}                        0.023771791  225 
## [30]  {Coffee, Toast}                    0.023666138  224 
## [31]  {Bread, Cake}                      0.023349181  221 
## [32]  {Coffee, Juice}                    0.020602219  195 
## [33]  {Truffles}                         0.020285261  192 
## [34]  {Alfajores, Coffee}                0.019651347  186 
## [35]  {Brownie, Coffee}                  0.019651347  186 
## [36]  {Coke}                             0.019440042  184 
## [37]  {Coffee, Muffin}                   0.018806128  178 
## [38]  {Spanish Brunch}                   0.018172213  172 
## [39]  {Coffee, Scone}                    0.018066561  171 
## [40]  {Bread, Sandwich}                  0.017010037  161 
## [41]  {Bread, Medialuna}                 0.016904385  160 
## [42]  {Baguette}                         0.016059165  152 
## [43]  {Coffee, Soup}                     0.015847861  150 
## [44]  {Tiffin}                           0.015425251  146 
## [45]  {Jam}                              0.015002641  142 
## [46]  {Fudge}                            0.015002641  142 
## [47]  {Bread, Cookies}                   0.014474379  137 
## [48]  {Sandwich, Tea}                    0.014368727  136 
## [49]  {Mineral water}                    0.014157422  134 
## [50]  {Bread, Hot chocolate}             0.013417855  127 
## [51]  {Jammie Dodgers}                   0.013206550  125 
## [52]  {Chicken Stew}                     0.012995246  123 
## [53]  {Cake, Hot chocolate}              0.011410460  108 
## [54]  {Bread, Coffee, Pastry}            0.011199155  106 
## [55]  {Coffee, Spanish Brunch}           0.010882198  103 
## [56]  {Bread, Brownie}                   0.010776545  102 
## [57]  {Hearty & Seasonal}                0.010565240  100 
## [58]  {Salad}                            0.010459588   99 
## [59]  {Alfajores, Bread}                 0.010353936   98 
## [60]  {Cake, Coffee, Tea}                0.010036978   95 
## [61]  {Bread, Cake, Coffee}              0.010036978   95 
## [62]  {Cookies, Tea}                     0.009825674   93 
## [63]  {Pastry, Tea}                      0.009614369   91 
## [64]  {Medialuna, Pastry}                0.009191759   87 
## [65]  {Bread, Scone}                     0.009086107   86 
## [66]  {Soup, Tea}                        0.009086107   86 
## [67]  {Frittata}                         0.008557845   81 
## [68]  {Coffee, Tiffin}                   0.008452192   80 
## [69]  {Scone, Tea}                       0.008346540   79 
## [70]  {Bread, Muffin}                    0.008135235   77 
## [71]  {Medialuna, Tea}                   0.008135235   77 
## [72]  {Smoothies}                        0.008135235   77 
## [73]  {Hot chocolate, Tea}               0.008029583   76 
## [74]  {Bread, Toast}                     0.007818278   74 
## [75]  {Coffee, Truffles}                 0.007501321   71 
## [76]  {Bread, Juice}                     0.007395668   70 
## [77]  {Bread, Coffee, Tea}               0.007395668   70 
## [78]  {Cake, Cookies}                    0.007290016   69 
## [79]  {Juice, Tea}                       0.007184363   68 
## [80]  {Bread, Coffee, Sandwich}          0.007184363   68 
## [81]  {Cake, Juice}                      0.007078711   67 
## [82]  {Cake, Coffee, Hot chocolate}      0.006867406   65 
## [83]  {Cake, Sandwich}                   0.006867406   65 
## [84]  {Alfajores, Tea}                   0.006761754   64 
## [85]  {Brownie, Tea}                     0.006761754   64 
## [86]  {Bread, Coffee, Medialuna}         0.006761754   64 
## [87]  {Coffee, Jammie Dodgers}           0.006656101   63 
## [88]  {Coffee, Farm House}               0.006656101   63 
## [89]  {Keeping It Local}                 0.006656101   63 
## [90]  {Coffee, Salad}                    0.006550449   62 
## [91]  {Bread, Scandinavian}              0.006550449   62 
## [92]  {Bread, Soup}                      0.006550449   62 
## [93]  {Muffin, Tea}                      0.006550449   62 
## [94]  {Coffee, Coke}                     0.006444797   61 
## [95]  {Tea, Toast}                       0.006444797   61 
## [96]  {Cookies, Juice}                   0.006127839   58 
## [97]  {Bread, Coffee, Hot chocolate}     0.006127839   58 
## [98]  {The Nomad}                        0.006127839   58 
## [99]  {Cookies, Hot chocolate}           0.006022187   57 
## [100] {Juice, Sandwich}                  0.005810882   55 
## [101] {Coffee, Hearty & Seasonal}        0.005705230   54 
## [102] {Hot chocolate, Pastry}            0.005705230   54 
## [103] {Focaccia}                         0.005705230   54 
## [104] {Coffee, Mineral water}            0.005599577   53 
## [105] {Sandwich, Soup}                   0.005493925   52 
## [106] {Vegan mincepie}                   0.005493925   52 
## [107] {Coffee, Keeping It Local}         0.005388273   51 
## [108] {Coffee, Sandwich, Tea}            0.005388273   51 
## [109] {Bread, Coffee, Cookies}           0.005282620   50 
## [110] {Chicken Stew, Coffee}             0.005176968   49 
## [111] {Coke, Sandwich}                   0.005176968   49 
## [112] {Cake, Pastry}                     0.005176968   49 
## [113] {Bread, Jam}                       0.005071315   48 
## [114] {Bakewell}                         0.005071315   48 
## [115] {Bread, Truffles}                  0.004965663   47 
## [116] {Bread, Farm House}                0.004965663   47 
## [117] {Bread, Tiffin}                    0.004860011   46 
## [118] {Coffee, Scandinavian}             0.004860011   46 
## [119] {Cake, Muffin}                     0.004860011   46 
## [120] {Coffee, Medialuna, Pastry}        0.004860011   46 
## [121] {Bread, Cake, Tea}                 0.004860011   46 
## [122] {Tartine}                          0.004860011   46 
## [123] {Bread, Spanish Brunch}            0.004754358   45 
## [124] {Cake, Scone}                      0.004754358   45 
## [125] {Hot chocolate, Medialuna}         0.004754358   45 
## [126] {Bread, Jammie Dodgers}            0.004648706   44 
## [127] {Spanish Brunch, Tea}              0.004648706   44 
## [128] {Cake, Coffee, Sandwich}           0.004648706   44 
## [129] {Coffee, Pastry, Tea}              0.004648706   44 
## [130] {Coffee, Frittata}                 0.004543053   43 
## [131] {Afternoon with the baker}         0.004543053   43 
## [132] {Cake, Soup}                       0.004437401   42 
## [133] {Brownie, Cake}                    0.004437401   42 
## [134] {Hot chocolate, Sandwich}          0.004437401   42 
## [135] {Alfajores, Bread, Coffee}         0.004331749   41 
## [136] {Cake, Coffee, Cookies}            0.004226096   40 
## [137] {Coffee, Jam}                      0.004120444   39 
## [138] {Alfajores, Cake}                  0.004120444   39 
## [139] {Brownie, Hot chocolate}           0.004120444   39 
## [140] {Coffee, Smoothies}                0.004014791   38 
## [141] {Extra Salami or Feta}             0.004014791   38 
## [142] {Art Tray}                         0.004014791   38 
## [143] {Cake, Coffee, Juice}              0.003909139   37 
## [144] {Coffee, Cookies, Tea}             0.003909139   37 
## [145] {Sandwich, Truffles}               0.003803487   36 
## [146] {Bread, Brownie, Coffee}           0.003803487   36 
## [147] {Coffee, Hot chocolate, Pastry}    0.003803487   36 
## [148] {Bread, Coffee, Toast}             0.003697834   35 
## [149] {Coffee, Cookies, Juice}           0.003697834   35 
## [150] {Coffee, Cookies, Hot chocolate}   0.003697834   35 
## [151] {Coffee, Medialuna, Tea}           0.003697834   35 
## [152] {Cake, Medialuna}                  0.003697834   35 
## [153] {Coffee, Sandwich, Soup}           0.003592182   34 
## [154] {Hot chocolate, Muffin}            0.003592182   34 
## [155] {Alfajores, Hot chocolate}         0.003592182   34 
## [156] {Tea, Tiffin}                      0.003486529   33 
## [157] {Bread, Coffee, Scone}             0.003486529   33 
## [158] {Coffee, Scone, Tea}               0.003380877   32 
## [159] {Coffee, Extra Salami or Feta}     0.003275225   31 
## [160] {Coffee, The Nomad}                0.003275225   31 
## [161] {Coffee, Fudge}                    0.003275225   31 
## [162] {Bread, Fudge}                     0.003275225   31 
## [163] {Mineral water, Sandwich}          0.003275225   31 
## [164] {Coffee, Tea, Toast}               0.003275225   31 
## [165] {Alfajores, Pastry}                0.003275225   31 
## [166] {Coffee, Vegan mincepie}           0.003169572   30 
## [167] {Chicken Stew, Tea}                0.003169572   30 
## [168] {Bread, Mineral water}             0.003169572   30 
## [169] {Alfajores, Medialuna}             0.003169572   30 
## [170] {Coffee, Tartine}                  0.003063920   29 
## [171] {Bakewell, Coffee}                 0.003063920   29 
## [172] {Cake, Jammie Dodgers}             0.003063920   29 
## [173] {Tea, Truffles}                    0.003063920   29 
## [174] {Hot chocolate, Toast}             0.003063920   29 
## [175] {Alfajores, Coffee, Tea}           0.003063920   29 
## [176] {Alfajores, Juice}                 0.003063920   29 
## [177] {Coffee, Hot chocolate, Medialuna} 0.003063920   29 
## [178] {Baguette, Coffee}                 0.002958267   28 
## [179] {Hot chocolate, Scone}             0.002958267   28 
## [180] {Bread, Coffee, Juice}             0.002958267   28 
## [181] {Cookies, Pastry}                  0.002958267   28 
## [182] {Granola}                          0.002958267   28 
## [183] {Eggs}                             0.002958267   28 
## [184] {Cake, Coffee, Scone}              0.002852615   27 
## [185] {Coffee, Muffin, Tea}              0.002852615   27 
## [186] {Alfajores, Brownie}               0.002852615   27 
## [187] {Cookies, Sandwich}                0.002852615   27 
## [188] {Art Tray, Coffee}                 0.002746962   26 
## [189] {Baguette, Bread}                  0.002746962   26 
## [190] {Bread, Chicken Stew}              0.002746962   26 
## [191] {Juice, Spanish Brunch}            0.002746962   26 
## [192] {Farm House, Pastry}               0.002746962   26 
## [193] {Coffee, Soup, Tea}                0.002746962   26 
## [194] {Juice, Muffin}                    0.002746962   26 
## [195] {Coffee, Juice, Sandwich}          0.002746962   26 
## [196] {Bread, Sandwich, Tea}             0.002746962   26 
## [197] {Cake, Coffee, Pastry}             0.002746962   26 
## [198] {Bread, Frittata}                  0.002641310   25 
## [199] {Bread, Hearty & Seasonal}         0.002641310   25 
## [200] {Cake, Truffles}                   0.002641310   25 
## [201] {Alfajores, Sandwich}              0.002641310   25 
## [202] {Fudge, Jam}                       0.002535658   24 
## [203] {Jammie Dodgers, Tea}              0.002535658   24 
## [204] {Muffin, Pastry}                   0.002535658   24 
## [205] {Brownie, Cookies}                 0.002535658   24 
## [206] {Brownie, Juice}                   0.002535658   24 
## [207] {Cookies, Medialuna}               0.002535658   24 
## [208] {Cake, Coffee, Medialuna}          0.002535658   24 
## [209] {Coffee, Hot chocolate, Sandwich}  0.002535658   24

Association rules can be created by running ruleInduction function with prefix-tree method (ptree). The method uses the above list of supplied frequent itemsets and a specified minimum confidence threshold as inputs to create rule. The confidence threshold 0.3 was set which mean rule whose confidence value higher than 0.3 are created. Additionally, the parameter reduce is set as True which will improve speed because unused items are removed from the transaction data before creating the prefix tree.

The result show that there are 73 rules, 68 out of these rules have RHS items is Coffee which mean almost customer buy coffee with other products. However, Coffee is the most frequent item, as mentioned before, Confidence doesn’t consider how often the consequent occurs on its own so confidence values of these rules do not necessarily show how strong the rule is.

Take a look on top 20 highest lift rules, notable information can be gained:

{Extra Salami or Feta} ➝ {Coffee}, {Keeping It Local}➝ {Coffee}, {Toast} ➝ {Coffee} are notable rules because their “higher than 1” lift values which mean a customer buys will more likely to buy Coffee when they intend to buy Extra Salami or Feta/ Keeping It Local/ Toast. The rule {Toast} ➝ {Coffee} also has high support value compare to other in Top 20 which mean there are actually many transactions contains both Toast and Coffee .

We also found many “high-lift” rules whose LHS have 2 products and RHS is Coffee, for example: {Cake, Medialuna}➝ {Coffee}, {Cake, Sandwich} ➝ {Coffee}, or {Hot chocolate, Pastry} ➝ {Coffee},.. The 3-items transactions {Cake, Sandwich} ➝ {Coffee} and {Cake, Cookie} ➝ {Coffee} are fairly frequent comparing to others.

##      lhs                           rhs      support     confidence lift    
## [1]  {Extra Salami or Feta}     => {Coffee} 0.003275225 0.8157895  1.705267
## [2]  {Keeping It Local}         => {Coffee} 0.005388273 0.8095238  1.692169
## [3]  {Toast}                    => {Coffee} 0.023666138 0.7044025  1.472431
## [4]  {Cake, Medialuna}          => {Coffee} 0.002535658 0.6857143  1.433367
## [5]  {Art Tray}                 => {Coffee} 0.002746962 0.6842105  1.430224
## [6]  {Cake, Sandwich}           => {Coffee} 0.004648706 0.6769231  1.414990
## [7]  {Hot chocolate, Pastry}    => {Coffee} 0.003803487 0.6666667  1.393551
## [8]  {Sandwich, Soup}           => {Coffee} 0.003592182 0.6538462  1.366752
## [9]  {Hot chocolate, Medialuna} => {Coffee} 0.003063920 0.6444444  1.347100
## [10] {Tartine}                  => {Coffee} 0.003063920 0.6304348  1.317815
## [11] {Salad}                    => {Coffee} 0.006550449 0.6262626  1.309094
## [12] {Cookies, Hot chocolate}   => {Coffee} 0.003697834 0.6140351  1.283534
## [13] {Bakewell}                 => {Coffee} 0.003063920 0.6041667  1.262906
## [14] {Cookies, Juice}           => {Coffee} 0.003697834 0.6034483  1.261404
## [15] {Cake, Hot chocolate}      => {Coffee} 0.006867406 0.6018519  1.258067
## [16] {Cake, Scone}              => {Coffee} 0.002852615 0.6000000  1.254196
## [17] {Spanish Brunch}           => {Coffee} 0.010882198 0.5988372  1.251766
## [18] {Cake, Cookies}            => {Coffee} 0.004226096 0.5797101  1.211784
## [19] {Vegan mincepie}           => {Coffee} 0.003169572 0.5769231  1.205958
## [20] {Hot chocolate, Sandwich}  => {Coffee} 0.002535658 0.5714286  1.194472
##      itemset
## [1]    5    
## [2]    6    
## [3]   52    
## [4]  123    
## [5]    4    
## [6]  142    
## [7]  133    
## [8]   64    
## [9]  121    
## [10]   1    
## [11]  18    
## [12] 109    
## [13]   2    
## [14]  99    
## [15] 134    
## [16]  56    
## [17]  38    
## [18] 110    
## [19]   3    
## [20] 132

From above rules we have detect, we can take advantage in practice. For example:

  • When a customer order Cake and Cookies, we can offer a drink like Coffee for upselling

  • Create Combo including 2 bakery products (Cake, Cookie, Medialuna, Sandwich,…) and 1 Coffee

  • Build recommendation system for online ordering which always offer Coffee if there is not any Coffee in the order yet or more customized mechanism based on the detected rules

Addition:

Because there is large portion of Coffee in transactions of dataset as well as almost RHS items in list of rules we got in output is Coffee. In this section, I try to find other possible Associate Rules which do not include Coffee. With no change in Support and Confident Threshold, there are 5 rules do not contains Coffee. All of them have Bread as RHS items and are rules representing 2-item transaction. 3 out of these 5 rules have lift value higher than 1. It is showed that a customer is more likely to buy Bread when they intend to buy Jam, Jammie Dodgers, and Pastry. The transaction contains both Pastry and Bread is the most frequent out of 5 rules.

Based on that, we can offer Bread to the customers who already put Jam, Jammie Dodgers, or Pastry in their order; create combos; apply rules to online recommendation system to enhance customer experience and increase sale.

# Assuming 'assoc_rules' is your rules object
# Remove rules containing 'Bag'
no_Coffee_rules <- subset(freq_rules, !(rhs %in% "Coffee"))

# Display the first few rows of the filtered rules
inspect(head(no_Coffee_rules))
##     lhs                 rhs     support     confidence lift      itemset
## [1] {Frittata}       => {Bread} 0.002641310 0.3086420  0.9432665   9    
## [2] {Jam}            => {Bread} 0.005071315 0.3380282  1.0330761  20    
## [3] {Jammie Dodgers} => {Bread} 0.004648706 0.3520000  1.0757766  25    
## [4] {Tiffin}         => {Bread} 0.004860011 0.3150685  0.9629071  32    
## [5] {Pastry}         => {Bread} 0.029160063 0.3386503  1.0349774 154

4. Similarity and Dissimilarity

Beside the concepts of Support, Confidence, and Lift, there are other important metrics for measuring similarity and dissimilarity in association rule mining. These metrics in this context are the Jaccard Index and Affinity

4.1. Jaccard Index

The Jaccard Index measures how likely two itemsets are bought together. It is calculated as the size of the intersection of the itemsets divided by the size of their union. The formula is:

\[ J(A, B) = \frac{|A \cap B|}{|A \cup B|} \]

where 𝐴 and 𝐵 are itemsets. A higher Jaccard Index indicates a greater similarity between the itemsets. This metric helps identify how often items are purchased together relative to the total number of transactions involving either item.

By subtracting Jaccard Index from 1, we can get Jaccard Distance. This metrics measures the dissimilarity between 2 itemsets.

\[ D(A,B)=1−J(A,B) \]

where 𝐴 and 𝐵 are the two itemsets being compared. The Jaccard Distance ranges from 0 to 1, where 0 indicates that the sets are identical (completely similar), and 1 indicates that the sets have no common elements (completely dissimilar)

The below Dendrogram plot using Jaccard Index can show us the similarity and dissimilarity among products in dataset. Many insights can be gained from it.

There are groups of products that are likely to be bought together such as (Medialuna, Pastry, Bread, Coffee), (Fudge, Jam) or (Cake, Tea, Hot Chocolate).

There are two main seperated branches, it can be told that products belonging to left branch are less likely to be bought together with products belonging to right branch. For example, It is low probability for a customer to pick both Coffee and Coke in a order.

4.2. Affinity

Affinity, also known as Cosine Similarity, is a metric used to measure the similarity between two itemsets:

\[ \text{Affinity}(A, B) = \frac{|A \cap B|}{\sqrt{|A| \times |B|}} \]

where 𝐴 and 𝐵 are itemsets, ∣ 𝐴 ∩ 𝐵 ∣ is the number of items common to both sets, and ∣ 𝐴 ∣ and ∣ 𝐵 ∣ are the sizes of the respective itemsets. A higher Affinity value indicates a stronger relationship between the itemsets, meaning they are more similar.

Affinity measures the cosine of the angle between two vectors representing the itemsets while Jaccard index measures the similarity between two sets by comparing the size of their intersection to the size of their union .

In other words, Affinity considers the frequency of each item individually whereas Jaccard Index focuses on the presence or absence of items, not their frequency

Insights can be gained from the Heatmap using Affinity below. Some notable product pair are frequently purchased is Coffee - Bread, Coffee - Cake, Tea - Cake, Pastry - Bread,…

5. Summary

This article have applied clustering techniques to discover association rules among sale products of a bakery. Although there is fairly unbalancing in items frequency that the frequency of 2 products Coffee and Bread are quite higher than others, we can get some general rule. There are a lot of Orders including 2 items which are 1 bakery product and 1 drink or Orders including 3 items which are 2 bakery product and 1 drink. We can create combos for 2 or 3 products. Furthermore, we should specify which drink should be offered with certain bakery product. For example, we can recommend customer to additionally buy a cup of tea or coffee, when they have ordered a Cake because based on dicovered rules, customers who buy Cake is likely buy Tea or Coffee.

Market Basket Analysis is a useful to get better knowledge about customers’ behaviour. It can help shops to increase cross-sell and customer experience.