The key to the success of any business lies in understanding its customers. Companies need to determine what their customers like, how much they are willing to spend on products, and what product categories they are likely to purchase together.
Association rule mining is an effective tool for analyzing customer behavior patterns. This technique involves generating “if-then” statements that describe relationships between data items. For example, if a customer purchases a scarf, it is likely that they will also purchase gloves. In this scenario, the scarf is the antecedent and the gloves are the consequent.
The purpose of this project is to analyze a bakery transaction dataset using association rule mining and compare the results from morning transactions to those from afternoon transactions. The aim is to determine if there are any differences in customer behavior during these two time periods. The findings from this analysis can provide valuable insights into customer preferences and habits, which can then be used to make informed business decisions, resulting in increased sales and profits.
The dataset used in this study was collected from “The Bread Basket,” a bakery located in Edinburgh, and contains 9192 transactions. This data is accessible on the Kaggle website (https://www.kaggle.com/mittalvasu95/the-bread-basket) and covers the time frame of January 26, 2011 to December 27, 2003. The transactions in the dataset represent the items that customers ordered through online ordering.
Transaction ID: This column represents a unique identifier for each order made by a customer. It is important to have this column because it distinguishes each transaction from one another and allows for easy tracking of individual orders.
Item: This column lists all the items that the customer has ordered in a particular transaction.
Date and Time: This column contains the date and time of the transaction, formatted in the “dd-mm-yyyy hh:mm” format. This information is crucial as it allows for analysis of customer behavior at different times and dates.
Period of Day: This column provides information about the time period of the day during which the transaction took place, for example, morning, afternoon, or evening. This information can be useful in identifying patterns of customer behavior during different times of the day.
Weekday/Weekend: This column categorizes the day on which the transaction took place as either a weekday (Monday to Friday) or a weekend (Saturday or Sunday). This information can be used to analyze customer behavior on different days of the week and can help in planning business strategies accordingly.
library(knitr)
## Warning: package 'knitr' was built under R version 4.1.3
library(arules)
## Warning: package 'arules' was built under R version 4.1.3
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.1.3
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 4.1.3
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.1.3
data <- read.transactions("bread basket.csv", format="single", sep=",", cols=c("Transaction","Item"), header=TRUE)
summary(size(data))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.029 3.000 10.000
cat("Number of baskets:", length(data))
## Number of baskets: 6576
cat("Number of unique items:", sum(size(data)))
## Number of unique items: 13341
To gain familiarity with the data, I generated plots of the absolute and relative item frequency. These plots provide a visual representation of the frequency of occurrence of each item in the dataset, both in absolute terms and as a percentage of the total number of transactions. This information can be used to identify the most popular items in the bakery and to understand the overall distribution of items sold. Understanding the item frequency can provide valuable insights into customer preferences and help in making informed business decisions.
itemFrequencyPlot(data, topN = 10, type = "absolute", main = "Item frequency", cex.names = 0.75)
The analysis of the item frequency in the bakery data has revealed that the two most commonly ordered products are coffee and bread. This outcome is surprising as one might expect bread to be the most popular item in a bakery, however, the high demand for coffee highlights the need for the bakery to understand and cater to the diverse preferences of its customers. This information can be used to further investigate customer behavior and the factors that influence their purchasing decisions. Additionally, the presence of a high demand for coffee may present an opportunity for the bakery to expand its product offerings or promote coffee more effectively, thereby improving the customer experience and driving sales.
The Eclat algorithm is a technique used to identify frequent patterns in transaction data. It is an efficient and scalable alternative to the Apriori algorithm. The Eclat algorithm operates in a vertical manner, like a Depth-First Search of a graph, while the Apriori algorithm works in a horizontal manner, similar to a Breadth-First Search of a graph. The Eclat algorithm starts by creating a vertical database that represents all itemsets in the original database. The vertical database is built by inserting all items in the original database as individual sets, and then combining sets into larger sets based on the support value. The algorithm then generates the frequent itemsets and association rules by traversing the vertical database in a depth-first manner and finding itemsets that meet a minimum support threshold.
The key advantage of the Eclat algorithm is that it is more efficient than the Apriori algorithm for large databases with many transactions. This is because the Eclat algorithm avoids redundant database scans and reduces the number of comparisons required to find frequent itemsets. Additionally, the Eclat algorithm is able to handle databases with high itemset cardinalities more efficiently than the Apriori algorithm. In evaluating the performance of association rule mining algorithms, three key measures are used: support, confidence, and lift. Support indicates the frequency of an itemset or rule in the data, confidence represents the proportion of times that the consequence is satisfied given a specific antecedent, and lift provides insight into the strength of the association between the antecedent and consequence. If the lift value is greater than 1, it indicates a positive correlation between the antecedent and consequence. A lift value of 1 suggests independence between the two, and a lift value less than 1 indicates a negative correlation.
eclat_algorithm <-eclat(data, parameter=list(supp=0.05))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.05 1 10 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 328
##
## create itemset ...
## set transactions ...[102 item(s), 6576 transaction(s)] done [0.00s].
## sorting and recoding items ... [9 item(s)] done [0.00s].
## creating bit matrix ... [9 row(s), 6576 column(s)] done [0.00s].
## writing ... [12 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
eclat_algorithm_i <- inspect(head(sort(eclat_algorithm, by = "support"), 15))
## items support count
## [1] {Coffee} 0.48479319 3188
## [2] {Bread} 0.32633820 2146
## [3] {Tea} 0.14309611 941
## [4] {Cake} 0.10553528 694
## [5] {Bread, Coffee} 0.09032847 594
## [6] {Pastry} 0.08759124 576
## [7] {Sandwich} 0.07496959 493
## [8] {Medialuna} 0.05763382 379
## [9] {Cake, Coffee} 0.05687348 374
## [10] {Cookies} 0.05687348 374
## [11] {Coffee, Tea} 0.05200730 342
## [12] {Hot chocolate} 0.05200730 342
kable(eclat_algorithm_i) %>% kable_styling("striped")
| items | support | count | |
|---|---|---|---|
| [1] | {Coffee} | 0.4847932 | 3188 |
| [2] | {Bread} | 0.3263382 | 2146 |
| [3] | {Tea} | 0.1430961 | 941 |
| [4] | {Cake} | 0.1055353 | 694 |
| [5] | {Bread, Coffee} | 0.0903285 | 594 |
| [6] | {Pastry} | 0.0875912 | 576 |
| [7] | {Sandwich} | 0.0749696 | 493 |
| [8] | {Medialuna} | 0.0576338 | 379 |
| [9] | {Cake, Coffee} | 0.0568735 | 374 |
| [10] | {Cookies} | 0.0568735 | 374 |
| [11] | {Coffee, Tea} | 0.0520073 | 342 |
| [12] | {Hot chocolate} | 0.0520073 | 342 |
It shows the top 10 items and the combinations of items that have the highest support, defined as the percentage of transactions in the dataset that contain the item or combination of items. The items are listed in descending order of support, with the item or combination of items with the highest support appearing first. For example, “Coffee” has a support of 48.48% (3188 transactions) and “Bread” has a support of 32.63% (2146 transactions). The combination of “Bread” and “Coffee” has a support of 9.03% (594 transactions). The results provide insights into which items are most frequently bought together and can help the bakery make informed decisions regarding its product offerings and marketing strategies.
The algorithm has returned a list of 10 frequent item sets along with their corresponding support, count values.
The support value indicates the frequency of an itemset in the data. For example, the first item set, {Coffee}, has a support of 0.48479319, meaning that this itemset occurs in 48.48% of all transactions. Bread also has a significant support value of 0.32633820, indicating that this itemset is also quite frequent in the data.
The count value shows the absolute number of transactions where the corresponding itemset appears. For example, the first itemset, {Coffee}, appears in 3188 transactions.
It is important to note that eclat is not used for creating association rules, but for finding frequent item sets. The support value can then be used as a threshold for further analysis and rule creation if necessary.
rules_eclat_algorithm<-ruleInduction(eclat_algorithm, data, confidence=0.05)
rules_eclat_algorithm_i <- inspect(head(sort(rules_eclat_algorithm, by = "confidence", decreasing = TRUE),15))
## lhs rhs support confidence lift itemset
## [1] {Cake} => {Coffee} 0.05687348 0.5389049 1.1116181 1
## [2] {Tea} => {Coffee} 0.05200730 0.3634431 0.7496870 2
## [3] {Bread} => {Coffee} 0.09032847 0.2767940 0.5709528 3
## [4] {Coffee} => {Bread} 0.09032847 0.1863237 0.5709528 3
## [5] {Coffee} => {Cake} 0.05687348 0.1173149 1.1116181 1
## [6] {Coffee} => {Tea} 0.05200730 0.1072773 0.7496870 2
kable(rules_eclat_algorithm_i) %>% kable_styling("striped")
| lhs | rhs | support | confidence | lift | itemset | ||
|---|---|---|---|---|---|---|---|
| [1] | {Cake} | => | {Coffee} | 0.0568735 | 0.5389049 | 1.1116181 | 1 |
| [2] | {Tea} | => | {Coffee} | 0.0520073 | 0.3634431 | 0.7496870 | 2 |
| [3] | {Bread} | => | {Coffee} | 0.0903285 | 0.2767940 | 0.5709528 | 3 |
| [4] | {Coffee} | => | {Bread} | 0.0903285 | 0.1863237 | 0.5709528 | 3 |
| [5] | {Coffee} | => | {Cake} | 0.0568735 | 0.1173149 | 1.1116181 | 1 |
| [6] | {Coffee} | => | {Tea} | 0.0520073 | 0.1072773 | 0.7496870 | 2 |
It means that for the rules [1] (Cake => Coffee) and [6] (Coffee => Tea), the items are dependent on each other, meaning that the occurrence of the item in the left-hand side (LHS) increases the likelihood of the item in the right-hand side (RHS) occurring as well. The confidence of the rule tells us the percentage of times the RHS occurs when the LHS occurs. The lift value of 1 indicates that the occurrence of the items in the LHS and RHS are independent. A lift value higher than 1 means that the items in the RHS are more likely to occur when the items in the LHS are present.
rules_eclat_algorithm<-ruleInduction(eclat(data, parameter=list(supp=0.02)) , data, confidence=0.05)
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.02 1 10 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 131
##
## create itemset ...
## set transactions ...[102 item(s), 6576 transaction(s)] done [0.00s].
## sorting and recoding items ... [21 item(s)] done [0.00s].
## creating sparse bit matrix ... [21 row(s), 6576 column(s)] done [0.00s].
## writing ... [37 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
rules_supp_i <- inspect(head(sort(rules_eclat_algorithm, by="lift", decreasing=TRUE),15), linebreak=F)
## lhs rhs support confidence lift itemset
## [1] {Tea} => {Cake} 0.02630779 0.18384697 1.742043 13
## [2] {Cake} => {Tea} 0.02630779 0.24927954 1.742043 13
## [3] {Toast} => {Coffee} 0.02585158 0.72961373 1.505000 1
## [4] {Coffee} => {Toast} 0.02585158 0.05332497 1.505000 1
## [5] {Medialuna} => {Coffee} 0.03315085 0.57519789 1.186481 6
## [6] {Coffee} => {Medialuna} 0.03315085 0.06838143 1.186481 6
## [7] {Sandwich} => {Coffee} 0.04257908 0.56795132 1.171533 8
## [8] {Coffee} => {Sandwich} 0.04257908 0.08782936 1.171533 8
## [9] {Pastry} => {Coffee} 0.04896594 0.55902778 1.153126 9
## [10] {Coffee} => {Pastry} 0.04896594 0.10100376 1.153126 9
## [11] {Alfajores} => {Coffee} 0.02250608 0.55223881 1.139122 2
## [12] {Coffee} => {Cake} 0.05687348 0.11731493 1.111618 11
## [13] {Cake} => {Coffee} 0.05687348 0.53890490 1.111618 11
## [14] {Juice} => {Coffee} 0.02144161 0.53007519 1.093405 4
## [15] {Cookies} => {Coffee} 0.02995742 0.52673797 1.086521 7
kable(rules_supp_i) %>% kable_styling("striped")
| lhs | rhs | support | confidence | lift | itemset | ||
|---|---|---|---|---|---|---|---|
| [1] | {Tea} | => | {Cake} | 0.0263078 | 0.1838470 | 1.742043 | 13 |
| [2] | {Cake} | => | {Tea} | 0.0263078 | 0.2492795 | 1.742043 | 13 |
| [3] | {Toast} | => | {Coffee} | 0.0258516 | 0.7296137 | 1.505000 | 1 |
| [4] | {Coffee} | => | {Toast} | 0.0258516 | 0.0533250 | 1.505000 | 1 |
| [5] | {Medialuna} | => | {Coffee} | 0.0331509 | 0.5751979 | 1.186481 | 6 |
| [6] | {Coffee} | => | {Medialuna} | 0.0331509 | 0.0683814 | 1.186481 | 6 |
| [7] | {Sandwich} | => | {Coffee} | 0.0425791 | 0.5679513 | 1.171533 | 8 |
| [8] | {Coffee} | => | {Sandwich} | 0.0425791 | 0.0878294 | 1.171533 | 8 |
| [9] | {Pastry} | => | {Coffee} | 0.0489659 | 0.5590278 | 1.153126 | 9 |
| [10] | {Coffee} | => | {Pastry} | 0.0489659 | 0.1010038 | 1.153126 | 9 |
| [11] | {Alfajores} | => | {Coffee} | 0.0225061 | 0.5522388 | 1.139123 | 2 |
| [12] | {Coffee} | => | {Cake} | 0.0568735 | 0.1173149 | 1.111618 | 11 |
| [13] | {Cake} | => | {Coffee} | 0.0568735 | 0.5389049 | 1.111618 | 11 |
| [14] | {Juice} | => | {Coffee} | 0.0214416 | 0.5300752 | 1.093405 | 4 |
| [15] | {Cookies} | => | {Coffee} | 0.0299574 | 0.5267380 | 1.086521 | 7 |
rules_eclat_algorithm
## set of 29 rules
There are 29 rules generated in total, with lift values higher than 1 for 4 rules, where the lift value represents the increase in likelihood of the items in the right-hand side (rhs) being purchased together with the items in the left-hand side (lhs), as compared to being purchased randomly. The highest lift values are observed for the rules: Toast => Coffee (1.505000), Medialuna => Coffee (1.186481), Sandwich => Coffee (1.171533), and Pastry => Coffee (1.153126). The results showed that Coffee and Bread had the greatest variation in support values compared to other items, which was evident in the item frequency by relative plot.
To create rules, I used the ruleInduction() function and set the confidence value to 0.05. This generated 6 rules, but only 2 of these had a lift value higher than 1.
When I changed the support value from 0.05 to 0.02, the number of rules with a lift value higher than 1 increased, but the confidence values were too low. Therefore, I decided to increase the confidence value for a more meaningful result.
rules_eclat_algorithm_2<-ruleInduction(eclat(data, parameter=list(supp=0.02)) , data, confidence=0.3)
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.02 1 10 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 131
##
## create itemset ...
## set transactions ...[102 item(s), 6576 transaction(s)] done [0.00s].
## sorting and recoding items ... [21 item(s)] done [0.00s].
## creating sparse bit matrix ... [21 row(s), 6576 column(s)] done [0.00s].
## writing ... [37 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
rules_supp_i_2 <- inspect(head(sort(rules_eclat_algorithm_2, by="lift", decreasing=TRUE),15), linebreak=F)
## lhs rhs support confidence lift itemset
## [1] {Toast} => {Coffee} 0.02585158 0.7296137 1.5050000 1
## [2] {Medialuna} => {Coffee} 0.03315085 0.5751979 1.1864810 6
## [3] {Sandwich} => {Coffee} 0.04257908 0.5679513 1.1715332 8
## [4] {Pastry} => {Coffee} 0.04896594 0.5590278 1.1531263 9
## [5] {Alfajores} => {Coffee} 0.02250608 0.5522388 1.1391225 2
## [6] {Cake} => {Coffee} 0.05687348 0.5389049 1.1116181 11
## [7] {Juice} => {Coffee} 0.02144161 0.5300752 1.0934048 4
## [8] {Cookies} => {Coffee} 0.02995742 0.5267380 1.0865210 7
## [9] {Hot chocolate} => {Coffee} 0.02737226 0.5263158 1.0856501 5
## [10] {Pastry} => {Bread} 0.02980535 0.3402778 1.0427151 10
## [11] {Brownie} => {Coffee} 0.02098540 0.4758621 0.9815775 3
## [12] {Tea} => {Coffee} 0.05200730 0.3634431 0.7496870 14
kable(rules_supp_i_2) %>% kable_styling("striped")
| lhs | rhs | support | confidence | lift | itemset | ||
|---|---|---|---|---|---|---|---|
| [1] | {Toast} | => | {Coffee} | 0.0258516 | 0.7296137 | 1.5050000 | 1 |
| [2] | {Medialuna} | => | {Coffee} | 0.0331509 | 0.5751979 | 1.1864810 | 6 |
| [3] | {Sandwich} | => | {Coffee} | 0.0425791 | 0.5679513 | 1.1715332 | 8 |
| [4] | {Pastry} | => | {Coffee} | 0.0489659 | 0.5590278 | 1.1531263 | 9 |
| [5] | {Alfajores} | => | {Coffee} | 0.0225061 | 0.5522388 | 1.1391225 | 2 |
| [6] | {Cake} | => | {Coffee} | 0.0568735 | 0.5389049 | 1.1116181 | 11 |
| [7] | {Juice} | => | {Coffee} | 0.0214416 | 0.5300752 | 1.0934048 | 4 |
| [8] | {Cookies} | => | {Coffee} | 0.0299574 | 0.5267380 | 1.0865210 | 7 |
| [9] | {Hot chocolate} | => | {Coffee} | 0.0273723 | 0.5263158 | 1.0856501 | 5 |
| [10] | {Pastry} | => | {Bread} | 0.0298054 | 0.3402778 | 1.0427151 | 10 |
| [11] | {Brownie} | => | {Coffee} | 0.0209854 | 0.4758621 | 0.9815775 | 3 |
| [12] | {Tea} | => | {Coffee} | 0.0520073 | 0.3634431 | 0.7496870 | 14 |
rules_eclat_algorithm_2
## set of 12 rules
Items that appear to be most strongly associated with coffee are Toast, Medialuna, Sandwich, Pastry, Alfajores, Cake, Juice, Cookies, Hot chocolate, Pastry with Bread, and Brownie with a lift value higher than 1. The highest lift values are seen between Toast and Coffee (1.505), Medialuna and Coffee (1.186), and Sandwich and Coffee (1.171).
Out of these items, the highest confidence values are seen between Toast and Coffee (0.730), Pastry and Coffee (0.559), Alfajores and Coffee (0.552), Cake and Coffee (0.539), and Juice and Coffee (0.530). The association between Tea and Coffee also appears to have a moderate lift value (0.749) and confidence value (0.363). This means that most of the items that people purchase in this dataset are related to coffee. The high lift values indicate strong association between the items in the antecedent and the consequent of the rule. The support and confidence values further support this conclusion. The fact that only one rule has a consequent of bread instead of coffee further supports the idea that coffee is the most frequently purchased item in this dataset.
plot(rules_eclat_algorithm_2, method="graph", shading="lift")
Results showing in a graphical way.
plot(rules_eclat_algorithm_2, method="graph", measure="support", shading="lift", engine="ggplot2")
## Apriori Algorithm
Apriori is a classic algorithm for association rule mining in market basket analysis. It is used to identify frequent item sets in a large dataset, which can be used to generate association rules between items.
The algorithm operates on a database of transactions and works by iteratively reducing the number of items in the frequent item sets until only frequent items remain. This is achieved by using a “bottom-up” approach, starting with single items and then combining items to create larger item sets. The algorithm prunes candidate item sets that are not frequent enough, based on a support threshold. The algorithm generates all frequent item sets and then selects the rules that meet a minimum confidence threshold.
Apriori has a time complexity of O(2^n), which makes it computationally expensive for large datasets. However, it is widely used due to its simplicity and its ability to identify frequent patterns in large datasets.
The algorithm starts by generating itemsets of size 1, called candidates. Then, it checks each candidate against the data to determine its support. If a candidate itemset has sufficient support, it is considered a frequent itemset and added to a set of frequent itemsets. Next, the algorithm generates new candidates by combining the frequent itemsets, and the process is repeated. The algorithm continues to generate new candidates and check their support until no more frequent itemsets can be found.
Finally, association rules can be generated by computing the confidence of each frequent itemset. Confidence is defined as the proportion of transactions that contain the antecedent (left-hand side) also contain the consequent (right-hand side).
rules_apriori_algorithm<-apriori(data, parameter=list(supp=0.02, conf=0.3))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.02 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 131
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[102 item(s), 6576 transaction(s)] done [0.00s].
## sorting and recoding items ... [21 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [14 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_apriori_algorithm_i <- inspect(head(sort(rules_apriori_algorithm, by = "lift", decreasing = TRUE),15))
## lhs rhs support confidence coverage lift
## [1] {Toast} => {Coffee} 0.02585158 0.7296137 0.03543187 1.5050000
## [2] {Medialuna} => {Coffee} 0.03315085 0.5751979 0.05763382 1.1864810
## [3] {Sandwich} => {Coffee} 0.04257908 0.5679513 0.07496959 1.1715332
## [4] {Pastry} => {Coffee} 0.04896594 0.5590278 0.08759124 1.1531263
## [5] {Alfajores} => {Coffee} 0.02250608 0.5522388 0.04075426 1.1391225
## [6] {Cake} => {Coffee} 0.05687348 0.5389049 0.10553528 1.1116181
## [7] {Juice} => {Coffee} 0.02144161 0.5300752 0.04045012 1.0934048
## [8] {Cookies} => {Coffee} 0.02995742 0.5267380 0.05687348 1.0865210
## [9] {Hot chocolate} => {Coffee} 0.02737226 0.5263158 0.05200730 1.0856501
## [10] {Pastry} => {Bread} 0.02980535 0.3402778 0.08759124 1.0427151
## [11] {} => {Bread} 0.32633820 0.3263382 1.00000000 1.0000000
## [12] {} => {Coffee} 0.48479319 0.4847932 1.00000000 1.0000000
## [13] {Brownie} => {Coffee} 0.02098540 0.4758621 0.04409976 0.9815775
## [14] {Tea} => {Coffee} 0.05200730 0.3634431 0.14309611 0.7496870
## count
## [1] 170
## [2] 218
## [3] 280
## [4] 322
## [5] 148
## [6] 374
## [7] 141
## [8] 197
## [9] 180
## [10] 196
## [11] 2146
## [12] 3188
## [13] 138
## [14] 342
kable(rules_apriori_algorithm_i ) %>% kable_styling("striped")
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {Toast} | => | {Coffee} | 0.0258516 | 0.7296137 | 0.0354319 | 1.5050000 | 170 |
| [2] | {Medialuna} | => | {Coffee} | 0.0331509 | 0.5751979 | 0.0576338 | 1.1864810 | 218 |
| [3] | {Sandwich} | => | {Coffee} | 0.0425791 | 0.5679513 | 0.0749696 | 1.1715332 | 280 |
| [4] | {Pastry} | => | {Coffee} | 0.0489659 | 0.5590278 | 0.0875912 | 1.1531263 | 322 |
| [5] | {Alfajores} | => | {Coffee} | 0.0225061 | 0.5522388 | 0.0407543 | 1.1391225 | 148 |
| [6] | {Cake} | => | {Coffee} | 0.0568735 | 0.5389049 | 0.1055353 | 1.1116181 | 374 |
| [7] | {Juice} | => | {Coffee} | 0.0214416 | 0.5300752 | 0.0404501 | 1.0934048 | 141 |
| [8] | {Cookies} | => | {Coffee} | 0.0299574 | 0.5267380 | 0.0568735 | 1.0865210 | 197 |
| [9] | {Hot chocolate} | => | {Coffee} | 0.0273723 | 0.5263158 | 0.0520073 | 1.0856501 | 180 |
| [10] | {Pastry} | => | {Bread} | 0.0298054 | 0.3402778 | 0.0875912 | 1.0427151 | 196 |
| [11] | {} | => | {Bread} | 0.3263382 | 0.3263382 | 1.0000000 | 1.0000000 | 2146 |
| [12] | {} | => | {Coffee} | 0.4847932 | 0.4847932 | 1.0000000 | 1.0000000 | 3188 |
| [13] | {Brownie} | => | {Coffee} | 0.0209854 | 0.4758621 | 0.0440998 | 0.9815775 | 138 |
| [14] | {Tea} | => | {Coffee} | 0.0520073 | 0.3634431 | 0.1430961 | 0.7496870 | 342 |
rules_apriori_algorithm
## set of 14 rules
The results of the Apriori algorithm are very similar to those obtained through the Eclat algorithm. However, there is a difference in the number of rules generated. The Apriori algorithm generated 14 rules, while the Eclat algorithm generated 12. Additionally, two of the rules generated by the Apriori algorithm are not interpretable because their left-hand side (lhs) is equal to “{}”. Despite these differences, the conclusions drawn from both algorithms are the same. The choice between using the Apriori algorithm or the Eclat algorithm will depend on the specific dataset being analyzed, as the Eclat algorithm is a more efficient and scalable version of the Apriori algorithm.
plot(rules_apriori_algorithm, method="graph", measure="support", shading="lift", engine="ggplot2")
In conclusion, both the Apriori and Eclat algorithms were used to analyze customer purchases data to determine the association rules between different products. The results from both algorithms were similar with Apriori showing 14 rules and Eclat showing 12 rules. However, two of the Apriori rules were uninterpretable as they had an empty left-hand side.
Overall, the results of the analysis show that there is a lack of strong associations between products. This information can be used by the company to make decisions regarding their product offerings and promotions. However, it is recommended that further analysis be conducted to ensure the accuracy of these results.
https://towardsdatascience.com/association-rules-2-aa9a77241654
M. Kaur and S. Kang, “Market Basket Analysis: Identify the Changing Trends of Market Data Using Association Rule Mining,” Procedia Comput. Sci., vol. 85, no. Cms, pp. 78–85, 2016.
A. Mansur and T. Kuncoro, “Product Inventory Predictions at Small Medium Enterprise Using Market Basket Analysis Approach-Neural Networks,” Procedia Econ. Financ., vol. 4, no. Icsmed, pp. 312–320, 2012.
X. Su, “Intertemporal Pricing with Strategic Customer Behavior,” Manage. Sci., vol. 53, no. 5, pp. 726–741, 2007. [4] G. Armstrong, S. Adam, S. Denize, and P. Kotler, Armstrong, G., Adam, S., Denize, S., & Kotler, P. Pearson Australia., 2014.
E. Sherman, A. Mathur, and R. B. Smith, “Store Environment and Consumer Purchase Behavior: Mediating Role of Consumer Emotions,” Psychol. Mark., vol. 14, no. 4, pp. 361–378, 1997.
N. Jothi, N. A. Rashid, and W. Husain, “Data Mining in Healthcare - A Review,” Procedia Comput. Sci., vol. 72, pp. 306–313, 2015. [7] A. Bertoni and T. Larsson, “ScienceDirect Data Mining in Product Service Systems Design: Literature Review and Research Questions,” Procedia CIRP, vol. 64, pp. 306–311, 2017.