Market basket analysis is a technique used to identify patterns or associations between items in large datasets, typically in retail transactions. By identifying these associations, businesses can gain valuable insights into customer purchasing behavior. This information can be leveraged to optimize product placement, design effective promotions, enhance cross-selling strategies, and build recommendation system using for online shopping, ultimately driving sales and improving customer satisfaction and overall shopping experience.
Association rules help uncover these patterns by defining relationships between itemsets. The rules have two parts: antecedent (if) and consequent (then) in the form of “If A then B” which display {A} ➝ {B}.
Itemsets are collections of one or more items that are grouped together based on transactions. These sets represent the products that are frequently bought together by customers in a single purchase. For example, {Bread} is a single itemset or {Bread, Coffee} is a multiple itemset
The antecedent (also known as the LHS - Left Hand Side) is the itemset that appears before the arrow in an association rule. It represents the item(s) that must be present for the rule to be applicable.
The consequent (also known as the RHS - Right Hand Side) is the itemset that appears after the arrow in an association rule. It represents the item(s) that are predicted to occur given the antecedent.
For example, an association rule might reveal that if a customer buys Bread, then they are likely to also buy Milk. The rule is displayed as {Coffee} ➝ {Bread}. {Coffee} is antecedent (LHS) and {Bread} is consequent (RHS)
There are metrics to measure how strong the relationship between itemsets in association rules:
Support: measures how frequently an itemset appears in the dataset. It is calculated as the proportion of transactions that contain the itemset..
\[ \text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}} \]
For example, in a transaction dataset, Coffee appears in 4528 out of 9465 transactions, the support for Coffee is: Support(Coffee)= 0.48
Confidence: measures the likelihood of item B being purchased when item A is purchased.
\[ \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} = \frac{\text{Number of transactions containing } A \text{ and }B}{\text{Number of transactions containing } A} \]
For example, in a transaction dataset, there are 852 transactions containing both Coffee and Bread, and there are totally 4582 transactions containing Coffee, so the likelihood of Bread being purchased when Coffee is purchased: Confidence( {Coffee} ➝ {Bread} ) = 0.19
However, Confidence doesn’t consider how often the consequent occurs on its own which may be misleading. High confidence value does not necessarily mean a strong rule, especially if the consequent is very common in the dataset
Lift: measures how much more often the antecedent and consequent of a rule occur together than would be expected if they were statistically independent. With this approach, Lift can avoid misleading and provide a more accurate measure of the rule’s strength.
\[ \text{Lift}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A) \times \text{Support}(B)} = \frac{\text{Confidence}(A \rightarrow B)}{\text{Support}(B)} \]
Lift > 1 indicates a POSITIVE association between the antecedent and consequent in a rule. There is MORE likely for the consequent items being purchased when the antecedent items have been picked.
Lift < 1 indicates a NEGATIVE association between the antecedent and consequent in a rule. There is LESS likely for the consequent items being purchased when the antecedent items have been picked.
Lift = 1 indicates there is no association between the antecedent and consequent. The presence of antecedent items does not affect the probability of buying consequent items
For example, in a transaction dataset, there are 852 transactions containing both Coffee and Bread, and there are totally 4582 transactions containing Coffee and 3097 transactions containing Bread. Lift( {Coffee} ➝ {Bread} ) = 0.58 which mean if a customer have picked Coffee then it is LESS likely for that customer to buy Bread.
Data used in this article is a transaction data of a bakery whose records collected from 11.01.2016 until 03.12.2017 - Source: Bakery data
## TransactionNo Items DateTime Daypart DayType
## 1 1 Bread 2016-10-30 09:58:11 Morning Weekend
## 2 2 Scandinavian 2016-10-30 10:05:34 Morning Weekend
## 3 2 Scandinavian 2016-10-30 10:05:34 Morning Weekend
## 4 3 Hot chocolate 2016-10-30 10:07:57 Morning Weekend
## 5 3 Jam 2016-10-30 10:07:57 Morning Weekend
## 6 3 Cookies 2016-10-30 10:07:57 Morning Weekend
## 7 4 Muffin 2016-10-30 10:08:41 Morning Weekend
## 8 5 Coffee 2016-10-30 10:13:03 Morning Weekend
## 9 5 Pastry 2016-10-30 10:13:03 Morning Weekend
## 10 5 Bread 2016-10-30 10:13:03 Morning Weekend
arules and arulesViz library are used to do Assosiate Rule Mining
## transactions as itemMatrix in sparse format with
## 9465 rows (elements/itemsets/transactions) and
## 94 columns (items) and a density of 0.02122827
##
## most frequent items:
## Coffee Bread Tea Cake Pastry (Other)
## 4528 3097 1350 983 815 8114
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 3948 3059 1471 662 234 64 17 4 5 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 1.995 3.000 10.000
##
## includes extended item information - examples:
## labels
## 1 Adjustment
## 2 Afternoon with the baker
## 3 Alfajores
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 10
## 3 100
From the summary report, there are some notable information about Bakery dataset:
There are 9465 transactions
94 products were sold by the bakery
Coffee, Bread, Tea, Cake, Pastry are the most purchased products. Number of purchasing Coffee and Bread are much higher than other products. Here is bar chart of Top 20 most purchased products
Number of distinct products in each transaction go from 1 to 10 but most of transactions include just 1 to 2 products (7007 transactions - 74% of total)
In this article, Eclat Algorithm is used to find rules
The Eclat algorithm is a widely-used technique in association rule mining, designed for finding frequent itemsets in transactional datasets.
Eclat transforms the transaction database into a vertical format where each item is associated with a list of transactions containing it. This allows for straightforward intersection of transaction lists to determine frequent itemsets.
By utilizing a depth-first search strategy, Eclat explores itemsets to their maximum depth, ensuring a thorough and efficient search for frequent patterns.
Thank to these mechanisms, the algorithm reduces the need for multiple database scans and manages memory usage effectively.
To run the algorithm on Bakery dataset, the limit threshold of support value is required. Itemsets whose support value higher than the threshold are considered as compete set of frequent itemsets. The support threshold 0.0025 was set which mean itemsets whose support value higher than 0.0025 (itemsets present in more than 24 transactions in the dataset) are detected. The below output includes list of frequent itemsets with their support values and number of transactions containing them. {Coffee}, {Bread}, and {Tea} are the most frequent single itemsets. The most frequent multiple itemsets are {Bread, Coffee}, {Cake, Coffee}, and {Coffee, Tea}.
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.0025 1 5 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 23
##
## create itemset ...
## set transactions ...[94 item(s), 9465 transaction(s)] done [0.00s].
## sorting and recoding items ... [43 item(s)] done [0.00s].
## creating sparse bit matrix ... [43 row(s), 9465 column(s)] done [0.00s].
## writing ... [209 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
## items support count
## [1] {Coffee} 0.478394083 4528
## [2] {Bread} 0.327205494 3097
## [3] {Tea} 0.142630745 1350
## [4] {Cake} 0.103856313 983
## [5] {Bread, Coffee} 0.090015848 852
## [6] {Pastry} 0.086106709 815
## [7] {Sandwich} 0.071843634 680
## [8] {Medialuna} 0.061806656 585
## [9] {Hot chocolate} 0.058320127 552
## [10] {Cake, Coffee} 0.054727945 518
## [11] {Cookies} 0.054410988 515
## [12] {Coffee, Tea} 0.049867934 472
## [13] {Coffee, Pastry} 0.047543582 450
## [14] {Brownie} 0.040042261 379
## [15] {Farm House} 0.039197042 371
## [16] {Juice} 0.038563127 365
## [17] {Muffin} 0.038457475 364
## [18] {Coffee, Sandwich} 0.038246170 362
## [19] {Alfajores} 0.036344427 344
## [20] {Coffee, Medialuna} 0.035182250 333
## [21] {Scone} 0.034548336 327
## [22] {Soup} 0.034442684 326
## [23] {Toast} 0.033597464 318
## [24] {Coffee, Hot chocolate} 0.029582673 280
## [25] {Bread, Pastry} 0.029160063 276
## [26] {Scandinavian} 0.029054411 275
## [27] {Coffee, Cookies} 0.028209192 267
## [28] {Bread, Tea} 0.028103539 266
## [29] {Cake, Tea} 0.023771791 225
## [30] {Coffee, Toast} 0.023666138 224
## [31] {Bread, Cake} 0.023349181 221
## [32] {Coffee, Juice} 0.020602219 195
## [33] {Truffles} 0.020285261 192
## [34] {Alfajores, Coffee} 0.019651347 186
## [35] {Brownie, Coffee} 0.019651347 186
## [36] {Coke} 0.019440042 184
## [37] {Coffee, Muffin} 0.018806128 178
## [38] {Spanish Brunch} 0.018172213 172
## [39] {Coffee, Scone} 0.018066561 171
## [40] {Bread, Sandwich} 0.017010037 161
## [41] {Bread, Medialuna} 0.016904385 160
## [42] {Baguette} 0.016059165 152
## [43] {Coffee, Soup} 0.015847861 150
## [44] {Tiffin} 0.015425251 146
## [45] {Jam} 0.015002641 142
## [46] {Fudge} 0.015002641 142
## [47] {Bread, Cookies} 0.014474379 137
## [48] {Sandwich, Tea} 0.014368727 136
## [49] {Mineral water} 0.014157422 134
## [50] {Bread, Hot chocolate} 0.013417855 127
## [51] {Jammie Dodgers} 0.013206550 125
## [52] {Chicken Stew} 0.012995246 123
## [53] {Cake, Hot chocolate} 0.011410460 108
## [54] {Bread, Coffee, Pastry} 0.011199155 106
## [55] {Coffee, Spanish Brunch} 0.010882198 103
## [56] {Bread, Brownie} 0.010776545 102
## [57] {Hearty & Seasonal} 0.010565240 100
## [58] {Salad} 0.010459588 99
## [59] {Alfajores, Bread} 0.010353936 98
## [60] {Cake, Coffee, Tea} 0.010036978 95
## [61] {Bread, Cake, Coffee} 0.010036978 95
## [62] {Cookies, Tea} 0.009825674 93
## [63] {Pastry, Tea} 0.009614369 91
## [64] {Medialuna, Pastry} 0.009191759 87
## [65] {Bread, Scone} 0.009086107 86
## [66] {Soup, Tea} 0.009086107 86
## [67] {Frittata} 0.008557845 81
## [68] {Coffee, Tiffin} 0.008452192 80
## [69] {Scone, Tea} 0.008346540 79
## [70] {Bread, Muffin} 0.008135235 77
## [71] {Medialuna, Tea} 0.008135235 77
## [72] {Smoothies} 0.008135235 77
## [73] {Hot chocolate, Tea} 0.008029583 76
## [74] {Bread, Toast} 0.007818278 74
## [75] {Coffee, Truffles} 0.007501321 71
## [76] {Bread, Juice} 0.007395668 70
## [77] {Bread, Coffee, Tea} 0.007395668 70
## [78] {Cake, Cookies} 0.007290016 69
## [79] {Juice, Tea} 0.007184363 68
## [80] {Bread, Coffee, Sandwich} 0.007184363 68
## [81] {Cake, Juice} 0.007078711 67
## [82] {Cake, Coffee, Hot chocolate} 0.006867406 65
## [83] {Cake, Sandwich} 0.006867406 65
## [84] {Alfajores, Tea} 0.006761754 64
## [85] {Brownie, Tea} 0.006761754 64
## [86] {Bread, Coffee, Medialuna} 0.006761754 64
## [87] {Coffee, Jammie Dodgers} 0.006656101 63
## [88] {Coffee, Farm House} 0.006656101 63
## [89] {Keeping It Local} 0.006656101 63
## [90] {Coffee, Salad} 0.006550449 62
## [91] {Bread, Scandinavian} 0.006550449 62
## [92] {Bread, Soup} 0.006550449 62
## [93] {Muffin, Tea} 0.006550449 62
## [94] {Coffee, Coke} 0.006444797 61
## [95] {Tea, Toast} 0.006444797 61
## [96] {Cookies, Juice} 0.006127839 58
## [97] {Bread, Coffee, Hot chocolate} 0.006127839 58
## [98] {The Nomad} 0.006127839 58
## [99] {Cookies, Hot chocolate} 0.006022187 57
## [100] {Juice, Sandwich} 0.005810882 55
## [101] {Coffee, Hearty & Seasonal} 0.005705230 54
## [102] {Hot chocolate, Pastry} 0.005705230 54
## [103] {Focaccia} 0.005705230 54
## [104] {Coffee, Mineral water} 0.005599577 53
## [105] {Sandwich, Soup} 0.005493925 52
## [106] {Vegan mincepie} 0.005493925 52
## [107] {Coffee, Keeping It Local} 0.005388273 51
## [108] {Coffee, Sandwich, Tea} 0.005388273 51
## [109] {Bread, Coffee, Cookies} 0.005282620 50
## [110] {Chicken Stew, Coffee} 0.005176968 49
## [111] {Coke, Sandwich} 0.005176968 49
## [112] {Cake, Pastry} 0.005176968 49
## [113] {Bread, Jam} 0.005071315 48
## [114] {Bakewell} 0.005071315 48
## [115] {Bread, Truffles} 0.004965663 47
## [116] {Bread, Farm House} 0.004965663 47
## [117] {Bread, Tiffin} 0.004860011 46
## [118] {Coffee, Scandinavian} 0.004860011 46
## [119] {Cake, Muffin} 0.004860011 46
## [120] {Coffee, Medialuna, Pastry} 0.004860011 46
## [121] {Bread, Cake, Tea} 0.004860011 46
## [122] {Tartine} 0.004860011 46
## [123] {Bread, Spanish Brunch} 0.004754358 45
## [124] {Cake, Scone} 0.004754358 45
## [125] {Hot chocolate, Medialuna} 0.004754358 45
## [126] {Bread, Jammie Dodgers} 0.004648706 44
## [127] {Spanish Brunch, Tea} 0.004648706 44
## [128] {Cake, Coffee, Sandwich} 0.004648706 44
## [129] {Coffee, Pastry, Tea} 0.004648706 44
## [130] {Coffee, Frittata} 0.004543053 43
## [131] {Afternoon with the baker} 0.004543053 43
## [132] {Cake, Soup} 0.004437401 42
## [133] {Brownie, Cake} 0.004437401 42
## [134] {Hot chocolate, Sandwich} 0.004437401 42
## [135] {Alfajores, Bread, Coffee} 0.004331749 41
## [136] {Cake, Coffee, Cookies} 0.004226096 40
## [137] {Coffee, Jam} 0.004120444 39
## [138] {Alfajores, Cake} 0.004120444 39
## [139] {Brownie, Hot chocolate} 0.004120444 39
## [140] {Coffee, Smoothies} 0.004014791 38
## [141] {Extra Salami or Feta} 0.004014791 38
## [142] {Art Tray} 0.004014791 38
## [143] {Cake, Coffee, Juice} 0.003909139 37
## [144] {Coffee, Cookies, Tea} 0.003909139 37
## [145] {Sandwich, Truffles} 0.003803487 36
## [146] {Bread, Brownie, Coffee} 0.003803487 36
## [147] {Coffee, Hot chocolate, Pastry} 0.003803487 36
## [148] {Bread, Coffee, Toast} 0.003697834 35
## [149] {Coffee, Cookies, Juice} 0.003697834 35
## [150] {Coffee, Cookies, Hot chocolate} 0.003697834 35
## [151] {Coffee, Medialuna, Tea} 0.003697834 35
## [152] {Cake, Medialuna} 0.003697834 35
## [153] {Coffee, Sandwich, Soup} 0.003592182 34
## [154] {Hot chocolate, Muffin} 0.003592182 34
## [155] {Alfajores, Hot chocolate} 0.003592182 34
## [156] {Tea, Tiffin} 0.003486529 33
## [157] {Bread, Coffee, Scone} 0.003486529 33
## [158] {Coffee, Scone, Tea} 0.003380877 32
## [159] {Coffee, Extra Salami or Feta} 0.003275225 31
## [160] {Coffee, The Nomad} 0.003275225 31
## [161] {Coffee, Fudge} 0.003275225 31
## [162] {Bread, Fudge} 0.003275225 31
## [163] {Mineral water, Sandwich} 0.003275225 31
## [164] {Coffee, Tea, Toast} 0.003275225 31
## [165] {Alfajores, Pastry} 0.003275225 31
## [166] {Coffee, Vegan mincepie} 0.003169572 30
## [167] {Chicken Stew, Tea} 0.003169572 30
## [168] {Bread, Mineral water} 0.003169572 30
## [169] {Alfajores, Medialuna} 0.003169572 30
## [170] {Coffee, Tartine} 0.003063920 29
## [171] {Bakewell, Coffee} 0.003063920 29
## [172] {Cake, Jammie Dodgers} 0.003063920 29
## [173] {Tea, Truffles} 0.003063920 29
## [174] {Hot chocolate, Toast} 0.003063920 29
## [175] {Alfajores, Coffee, Tea} 0.003063920 29
## [176] {Alfajores, Juice} 0.003063920 29
## [177] {Coffee, Hot chocolate, Medialuna} 0.003063920 29
## [178] {Baguette, Coffee} 0.002958267 28
## [179] {Hot chocolate, Scone} 0.002958267 28
## [180] {Bread, Coffee, Juice} 0.002958267 28
## [181] {Cookies, Pastry} 0.002958267 28
## [182] {Granola} 0.002958267 28
## [183] {Eggs} 0.002958267 28
## [184] {Cake, Coffee, Scone} 0.002852615 27
## [185] {Coffee, Muffin, Tea} 0.002852615 27
## [186] {Alfajores, Brownie} 0.002852615 27
## [187] {Cookies, Sandwich} 0.002852615 27
## [188] {Art Tray, Coffee} 0.002746962 26
## [189] {Baguette, Bread} 0.002746962 26
## [190] {Bread, Chicken Stew} 0.002746962 26
## [191] {Juice, Spanish Brunch} 0.002746962 26
## [192] {Farm House, Pastry} 0.002746962 26
## [193] {Coffee, Soup, Tea} 0.002746962 26
## [194] {Juice, Muffin} 0.002746962 26
## [195] {Coffee, Juice, Sandwich} 0.002746962 26
## [196] {Bread, Sandwich, Tea} 0.002746962 26
## [197] {Cake, Coffee, Pastry} 0.002746962 26
## [198] {Bread, Frittata} 0.002641310 25
## [199] {Bread, Hearty & Seasonal} 0.002641310 25
## [200] {Cake, Truffles} 0.002641310 25
## [201] {Alfajores, Sandwich} 0.002641310 25
## [202] {Fudge, Jam} 0.002535658 24
## [203] {Jammie Dodgers, Tea} 0.002535658 24
## [204] {Muffin, Pastry} 0.002535658 24
## [205] {Brownie, Cookies} 0.002535658 24
## [206] {Brownie, Juice} 0.002535658 24
## [207] {Cookies, Medialuna} 0.002535658 24
## [208] {Cake, Coffee, Medialuna} 0.002535658 24
## [209] {Coffee, Hot chocolate, Sandwich} 0.002535658 24
Association rules can be created by running ruleInduction function with prefix-tree method (ptree). The method uses the above list of supplied frequent itemsets and a specified minimum confidence threshold as inputs to create rule. The confidence threshold 0.3 was set which mean rule whose confidence value higher than 0.3 are created. Additionally, the parameter reduce is set as True which will improve speed because unused items are removed from the transaction data before creating the prefix tree.
The result show that there are 73 rules, 68 out of these rules have RHS items is Coffee which mean almost customer buy coffee with other products. However, Coffee is the most frequent item, as mentioned before, Confidence doesn’t consider how often the consequent occurs on its own so confidence values of these rules do not necessarily show how strong the rule is.
Take a look on top 20 highest lift rules, notable information can be gained:
{Extra Salami or Feta} ➝ {Coffee}, {Keeping It Local}➝ {Coffee}, {Toast} ➝ {Coffee} are notable rules because their “higher than 1” lift values which mean a customer buys will more likely to buy Coffee when they intend to buy Extra Salami or Feta/ Keeping It Local/ Toast. The rule {Toast} ➝ {Coffee} also has high support value compare to other in Top 20 which mean there are actually many transactions contains both Toast and Coffee .
We also found many “high-lift” rules whose LHS have 2 products and RHS is Coffee, for example: {Cake, Medialuna}➝ {Coffee}, {Cake, Sandwich} ➝ {Coffee}, or {Hot chocolate, Pastry} ➝ {Coffee},.. The 3-items transactions {Cake, Sandwich} ➝ {Coffee} and {Cake, Cookie} ➝ {Coffee} are fairly frequent comparing to others.
## lhs rhs support confidence lift
## [1] {Extra Salami or Feta} => {Coffee} 0.003275225 0.8157895 1.705267
## [2] {Keeping It Local} => {Coffee} 0.005388273 0.8095238 1.692169
## [3] {Toast} => {Coffee} 0.023666138 0.7044025 1.472431
## [4] {Cake, Medialuna} => {Coffee} 0.002535658 0.6857143 1.433367
## [5] {Art Tray} => {Coffee} 0.002746962 0.6842105 1.430224
## [6] {Cake, Sandwich} => {Coffee} 0.004648706 0.6769231 1.414990
## [7] {Hot chocolate, Pastry} => {Coffee} 0.003803487 0.6666667 1.393551
## [8] {Sandwich, Soup} => {Coffee} 0.003592182 0.6538462 1.366752
## [9] {Hot chocolate, Medialuna} => {Coffee} 0.003063920 0.6444444 1.347100
## [10] {Tartine} => {Coffee} 0.003063920 0.6304348 1.317815
## [11] {Salad} => {Coffee} 0.006550449 0.6262626 1.309094
## [12] {Cookies, Hot chocolate} => {Coffee} 0.003697834 0.6140351 1.283534
## [13] {Bakewell} => {Coffee} 0.003063920 0.6041667 1.262906
## [14] {Cookies, Juice} => {Coffee} 0.003697834 0.6034483 1.261404
## [15] {Cake, Hot chocolate} => {Coffee} 0.006867406 0.6018519 1.258067
## [16] {Cake, Scone} => {Coffee} 0.002852615 0.6000000 1.254196
## [17] {Spanish Brunch} => {Coffee} 0.010882198 0.5988372 1.251766
## [18] {Cake, Cookies} => {Coffee} 0.004226096 0.5797101 1.211784
## [19] {Vegan mincepie} => {Coffee} 0.003169572 0.5769231 1.205958
## [20] {Hot chocolate, Sandwich} => {Coffee} 0.002535658 0.5714286 1.194472
## itemset
## [1] 5
## [2] 6
## [3] 52
## [4] 123
## [5] 4
## [6] 142
## [7] 133
## [8] 64
## [9] 121
## [10] 1
## [11] 18
## [12] 109
## [13] 2
## [14] 99
## [15] 134
## [16] 56
## [17] 38
## [18] 110
## [19] 3
## [20] 132
From above rules we have detect, we can take advantage in practice. For example:
When a customer order Cake and Cookies, we can offer a drink like Coffee for upselling
Create Combo including 2 bakery products (Cake, Cookie, Medialuna, Sandwich,…) and 1 Coffee
Build recommendation system for online ordering which always offer Coffee if there is not any Coffee in the order yet or more customized mechanism based on the detected rules
Because there is large portion of Coffee in transactions of dataset as well as almost RHS items in list of rules we got in output is Coffee. In this section, I try to find other possible Associate Rules which do not include Coffee. With no change in Support and Confident Threshold, there are 5 rules do not contains Coffee. All of them have Bread as RHS items and are rules representing 2-item transaction. 3 out of these 5 rules have lift value higher than 1. It is showed that a customer is more likely to buy Bread when they intend to buy Jam, Jammie Dodgers, and Pastry. The transaction contains both Pastry and Bread is the most frequent out of 5 rules.
Based on that, we can offer Bread to the customers who already put Jam, Jammie Dodgers, or Pastry in their order; create combos; apply rules to online recommendation system to enhance customer experience and increase sale.
# Assuming 'assoc_rules' is your rules object
# Remove rules containing 'Bag'
no_Coffee_rules <- subset(freq_rules, !(rhs %in% "Coffee"))
# Display the first few rows of the filtered rules
inspect(head(no_Coffee_rules))
## lhs rhs support confidence lift itemset
## [1] {Frittata} => {Bread} 0.002641310 0.3086420 0.9432665 9
## [2] {Jam} => {Bread} 0.005071315 0.3380282 1.0330761 20
## [3] {Jammie Dodgers} => {Bread} 0.004648706 0.3520000 1.0757766 25
## [4] {Tiffin} => {Bread} 0.004860011 0.3150685 0.9629071 32
## [5] {Pastry} => {Bread} 0.029160063 0.3386503 1.0349774 154
Beside the concepts of Support, Confidence, and Lift, there are other important metrics for measuring similarity and dissimilarity in association rule mining. These metrics in this context are the Jaccard Index and Affinity
The Jaccard Index measures how likely two itemsets are bought together. It is calculated as the size of the intersection of the itemsets divided by the size of their union. The formula is:
\[ J(A, B) = \frac{|A \cap B|}{|A \cup B|} \]
where 𝐴 and 𝐵 are itemsets. A higher Jaccard Index indicates a greater similarity between the itemsets. This metric helps identify how often items are purchased together relative to the total number of transactions involving either item.
By subtracting Jaccard Index from 1, we can get Jaccard Distance. This metrics measures the dissimilarity between 2 itemsets.
\[ D(A,B)=1−J(A,B) \]
where 𝐴 and 𝐵 are the two itemsets being compared. The Jaccard Distance ranges from 0 to 1, where 0 indicates that the sets are identical (completely similar), and 1 indicates that the sets have no common elements (completely dissimilar)
The below Dendrogram plot using Jaccard Index can show us the similarity and dissimilarity among products in dataset. Many insights can be gained from it.
There are groups of products that are likely to be bought together such as (Medialuna, Pastry, Bread, Coffee), (Fudge, Jam) or (Cake, Tea, Hot Chocolate).
There are two main seperated branches, it can be told that products belonging to left branch are less likely to be bought together with products belonging to right branch. For example, It is low probability for a customer to pick both Coffee and Coke in a order.
Affinity, also known as Cosine Similarity, is a metric used to measure the similarity between two itemsets:
\[ \text{Affinity}(A, B) = \frac{|A \cap B|}{\sqrt{|A| \times |B|}} \]
where 𝐴 and 𝐵 are itemsets, ∣ 𝐴 ∩ 𝐵 ∣ is the number of items common to both sets, and ∣ 𝐴 ∣ and ∣ 𝐵 ∣ are the sizes of the respective itemsets. A higher Affinity value indicates a stronger relationship between the itemsets, meaning they are more similar.
Affinity measures the cosine of the angle between two vectors representing the itemsets while Jaccard index measures the similarity between two sets by comparing the size of their intersection to the size of their union .
In other words, Affinity considers the frequency of each item individually whereas Jaccard Index focuses on the presence or absence of items, not their frequency
Insights can be gained from the Heatmap using Affinity below. Some notable product pair are frequently purchased is Coffee - Bread, Coffee - Cake, Tea - Cake, Pastry - Bread,…
This article have applied clustering techniques to discover association rules among sale products of a bakery. Although there is fairly unbalancing in items frequency that the frequency of 2 products Coffee and Bread are quite higher than others, we can get some general rule. There are a lot of Orders including 2 items which are 1 bakery product and 1 drink or Orders including 3 items which are 2 bakery product and 1 drink. We can create combos for 2 or 3 products. Furthermore, we should specify which drink should be offered with certain bakery product. For example, we can recommend customer to additionally buy a cup of tea or coffee, when they have ordered a Cake because based on dicovered rules, customers who buy Cake is likely buy Tea or Coffee.
Market Basket Analysis is a useful to get better knowledge about customers’ behaviour. It can help shops to increase cross-sell and customer experience.