During last year a retail optimization has gained a lot of attention in the data science community. The COVID-19 pandemic resulted in restrictions that brought restrictions that changed the behavior of customers. The retail industry forced the closure of physical stores. “The Next Normal” requires adaptation to new requirements and the winners are these retailers that priorities online retail. For instance in the United States, e-commerce availability and hygiene caused 17 percent of consumers to leave their primary store. One of the American retailers, Instacart has already addressed issue of optimizing e-commerce and released prediction competition on kaggle with several data sets (“Instacart Market Basket Analysis” 2021). Instacart is an online American retailer that uses advanced analytics algorithms to provide the best experience for their customers.
Firstly all the data need to be loaded to the R environment.
aisles <- read.csv("data\\aisles.csv", sep=",")
departments <- read.csv("data\\departments.csv", sep=",")
order_products_train <- read.csv("data\\order_products__train.csv", sep=",")
orders <- read.csv("data\\orders.csv", sep=",")
products <- read.csv("data\\products.csv", sep=",")
random_names <- read.csv("data\\random_names.csv", sep=",")
Then in order to reduce the number of rows sample fraction out of order products is taken. We take 50% of transactions which still result in more than half a million transactions.
set.seed(0402)
trans <-
order_products_train %>%
sample_frac(size = 0.5) %>%
inner_join(products, by = "product_id", keep = FALSE) %>%
inner_join(aisles, by = "aisle_id", keep = FALSE) %>%
inner_join(orders, by = "order_id", keep = FALSE) %>%
inner_join(departments, by ="department_id", keep = FALSE)
A transactional data set is not very easy to handle in current format. Data set contains a lot of ids that are not very interpretable. A decision has been made to perform several transformations which will result in transactional data.
kable_head <- function(df){
df %>% kable() %>% head()
}
day_of_week <-
tibble(day_id = 0:6,
dow_name = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
)
day_of_week %>% kable_head
[1] "
| day_id | dow_name |
|---|---|
| 0 | Monday |
| 1 | Tuesday |
| 2 | Wednesday |
| 3 | Thursday |
| 4 | Friday |
| 5 | Saturday |
| 6 | Sunday |
"
reordered_df <-
tibble(reordered_id = 0:1,
reorder_name = c("Not reordered", "Reordered")
)
reordered_df %>% kable_head
[1] "
| reordered_id | reorder_name |
|---|---|
| 0 | Not reordered |
| 1 | Reordered |
"
order_hour_of_day <-
tibble(
hour_id = 0:23,
time_of_day = c(rep("Night",6), rep("Morning", 4), rep("Senior hours", 2),
rep("Noon", 2), rep("Afternoon", 4), rep("Early night",4), rep("Night", 2))
)
order_hour_of_day %>% kable_head
[1] "
| hour_id | time_of_day |
|---|---|
| 0 | Night |
| 1 | Night |
| 2 | Night |
| 3 | Night |
| 4 | Night |
| 5 | Night |
| 6 | Morning |
| 7 | Morning |
| 8 | Morning |
| 9 | Morning |
| 10 | Senior hours |
| 11 | Senior hours |
| 12 | Noon |
| 13 | Noon |
| 14 | Afternoon |
| 15 | Afternoon |
| 16 | Afternoon |
| 17 | Afternoon |
| 18 | Early night |
| 19 | Early night |
| 20 | Early night |
| 21 | Early night |
| 22 | Night |
| 23 | Night |
"
Above three mapping tables have been created. The first two ones are self-explanatory, for an hour of the day, six time periods have been assigned for further analysis.
trans_for_ar <-
trans %>%
inner_join(day_of_week, by = c("order_dow" = "day_id")) %>%
inner_join(reordered_df, by = c("reordered" = "reordered_id")) %>%
inner_join(order_hour_of_day, by = c("order_hour_of_day" = "hour_id")) %>%
inner_join(random_names, by = c("user_id"= "user_id")) %>%
select(-aisle_id, -department_id, -eval_set,
-product_id, -order_dow, - reordered,
-order_hour_of_day, -days_since_prior_order,
-order_number, -add_to_cart_order, -user_id) %>%
arrange(order_id, user_name)
Transactions are combined in one big dataframe, id columns are removed. Columns that are also redundant in terms of analysis is eval_set. Since the chosen data set is very large and only a sample of it has been taken into account the column days_since_prior_order does not make a lot of sense since as of now not every user has order number 1 associated. Therefore there is a decision to omit this column as well order_number.
trans <- as(trans_for_ar, "transactions")
trans_for_viz <- as(trans_for_ar %>% select(product_name, aisle, department), "transactions")
The final transactional dataframe is made of the following attributes:
Association rules are often used for reducing a number of transactions in the databases where each row is coded by binary attributes. There is no need for them to be trained nor to be labeled beforehand. Most common applications are found in the market basket analysis, discovering interesting patterns of DNA and protein sequences, common patterns of behavior can be found for customers that proceed customers dropping their cell phone operator.
During transactional data analysis the technique of association rules and mining frequent itemsets plays an important role in retail basket analysis. This technique is especially useful in mining patterns inside large databases. What is the most frequently used are such statistics as frequent itemsets, maximal frequent itemsets, closed frequent itemsets, and association rules.
Definition:
Let \(I = \{i_1, i_2, ..., i_n \}\) be a set of n binary attributes called items. Let \(D = \{t_1, t_2, ..., t_n \}\) be a set of transactions called database. Each transaction in D has a unique transaction ID and contains a subset of the items in I. The sets of items X and Y are called antecedent and consequent of the rule .
The support(X) is defined as the count of transactions that contains item X to the number of transactions in the whole data set.
The confidence of a rule is defined as \(conf(X => Y) = \frac{supp(X u Y)}{supp(X)}\).
Another practical measure used for adressing issue of finding too many constraints is measure called lift, whihc is defined as \(lift(X => Y)=\frac{supp(X u Y)}{(supp(X)supp(Y))}\). It can be interpreted as the deviation of the support of the whole rule from the support expected under independence given the supports of the LHS and the RHS (Hahsler, Grün, and Hornik 2005).
In order to investigate the nature of the dataset the read.transactions function has been used. Transformations lead to creating transactional matrix with 565097 rows and 132889 columns. Most frequent items are not surprisingly binary feature and transaction identificator which is order_id. There are no missing values inside the database.
kableRules <- function(rules, sortable){
rules.sorted<-sort(rules, by=sortable, decreasing=TRUE)
rules.sorted.df <- as(head(rules.sorted), "data.frame")
rownames(rules.sorted.df) <- NULL
rules.sorted.df %>%
as_tibble() %>%
mutate(across(.cols = c(support, confidence, coverage, lift, count),
~round(., 4))) %>%
kable()
}
rules.trans<-apriori(trans, parameter=list(supp=0.1, conf=0.5),
control=list(verbose=FALSE))
Mined rules are specified with two thresholds, with minima support 0.1 and minimal confidence 0.5.
kableRules(rules.trans, "confidence")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {aisle=fresh vegetables} => {department=produce} | 0.1084 | 1.0000 | 0.1084 | 3.3928 | 61246 |
| {aisle=fresh fruits} => {department=produce} | 0.1085 | 1.0000 | 0.1085 | 3.3928 | 61332 |
| {department=dairy eggs} => {reorder_name=Reordered} | 0.1061 | 0.6752 | 0.1572 | 1.1267 | 59976 |
| {department=produce} => {reorder_name=Reordered} | 0.1961 | 0.6654 | 0.2947 | 1.1104 | 110824 |
| {dow_name=Monday} => {reorder_name=Reordered} | 0.1429 | 0.6093 | 0.2345 | 1.0168 | 80742 |
| {order_id=[2.28e+06,3.42e+06]} => {reorder_name=Reordered} | 0.2008 | 0.6025 | 0.3333 | 1.0054 | 113483 |
All the found associations are pretty obvious.
kableRules(rules.trans, "lift")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {aisle=fresh vegetables} => {department=produce} | 0.1084 | 1.0000 | 0.1084 | 3.3928 | 61246 |
| {aisle=fresh fruits} => {department=produce} | 0.1085 | 1.0000 | 0.1085 | 3.3928 | 61332 |
| {department=dairy eggs} => {reorder_name=Reordered} | 0.1061 | 0.6752 | 0.1572 | 1.1267 | 59976 |
| {department=produce} => {reorder_name=Reordered} | 0.1961 | 0.6654 | 0.2947 | 1.1104 | 110824 |
| {dow_name=Monday} => {reorder_name=Reordered} | 0.1429 | 0.6093 | 0.2345 | 1.0168 | 80742 |
| {order_id=[2.28e+06,3.42e+06]} => {reorder_name=Reordered} | 0.2008 | 0.6025 | 0.3333 | 1.0054 | 113483 |
What is definitely interesting is the dependency between reordering and shopping for food care.
kableRules(rules.trans, "count")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {} => {reorder_name=Reordered} | 0.5992 | 0.5992 | 1.0000 | 1.0000 | 338631 |
| {order_id=[2.28e+06,3.42e+06]} => {reorder_name=Reordered} | 0.2008 | 0.6025 | 0.3333 | 1.0054 | 113483 |
| {order_id=[1,1.13e+06)} => {reorder_name=Reordered} | 0.1993 | 0.5978 | 0.3333 | 0.9976 | 112602 |
| {order_id=[1.13e+06,2.28e+06)} => {reorder_name=Reordered} | 0.1992 | 0.5975 | 0.3333 | 0.9971 | 112546 |
| {department=produce} => {reorder_name=Reordered} | 0.1961 | 0.6654 | 0.2947 | 1.1104 | 110824 |
| {time_of_day=Afternoon} => {reorder_name=Reordered} | 0.1889 | 0.5882 | 0.3211 | 0.9816 | 106731 |
Not surprisingly order_id and reorder_name are placed high. Count suggest that afternoon time of the day is the most applicable in terms of shopping behavior.
What is the profile of transactions that are being made during Senior hours?
rules.Senior.hours<-apriori(data=trans,
parameter=list(supp=0.001, conf=0.08),
appearance=list(default="lhs", rhs= "time_of_day=Senior hours"),
control=list(verbose=F))
kableRules(rules.Senior.hours, "confidence")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {department=beverages,dow_name=Tuesday,reorder_name=Reordered} => {time_of_day=Senior hours} | 0.0017 | 0.2023 | 0.0084 | 1.2501 | 965 |
| {department=snacks,dow_name=Tuesday,reorder_name=Reordered} => {time_of_day=Senior hours} | 0.0016 | 0.2012 | 0.0078 | 1.2431 | 889 |
| {department=beverages,dow_name=Tuesday} => {time_of_day=Senior hours} | 0.0024 | 0.1921 | 0.0124 | 1.1873 | 1351 |
| {department=household,dow_name=Monday} => {time_of_day=Senior hours} | 0.0010 | 0.1902 | 0.0054 | 1.1750 | 583 |
| {aisle=coffee} => {time_of_day=Senior hours} | 0.0012 | 0.1897 | 0.0061 | 1.1721 | 651 |
| {aisle=coffee,department=beverages} => {time_of_day=Senior hours} | 0.0012 | 0.1897 | 0.0061 | 1.1721 | 651 |
As documentation claims rule that contain empty bracket means that no matter what other items have been chosen the item in the rhs will be chosen with the level of confidence which is equal to support.
What is the profile of transactions that are being made during Senior hours?
rules.Senior.hours<-apriori(data=trans,
parameter=list(supp=0.001, conf=0.08),
appearance=list(default="lhs", rhs= "time_of_day=Senior hours"),
control=list(verbose=F))
kableRules(rules.Senior.hours, "lift")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {department=beverages,dow_name=Tuesday,reorder_name=Reordered} => {time_of_day=Senior hours} | 0.0017 | 0.2023 | 0.0084 | 1.2501 | 965 |
| {department=snacks,dow_name=Tuesday,reorder_name=Reordered} => {time_of_day=Senior hours} | 0.0016 | 0.2012 | 0.0078 | 1.2431 | 889 |
| {department=beverages,dow_name=Tuesday} => {time_of_day=Senior hours} | 0.0024 | 0.1921 | 0.0124 | 1.1873 | 1351 |
| {department=household,dow_name=Monday} => {time_of_day=Senior hours} | 0.0010 | 0.1902 | 0.0054 | 1.1750 | 583 |
| {aisle=coffee} => {time_of_day=Senior hours} | 0.0012 | 0.1897 | 0.0061 | 1.1721 | 651 |
| {aisle=coffee,department=beverages} => {time_of_day=Senior hours} | 0.0012 | 0.1897 | 0.0061 | 1.1721 | 651 |
We can clearly see that there is no difference for senior hours with ordering by confidence and by lift. Given these metrics we spot that most beverages are chosen, people have tendency to use Senior hours on Tuesdays. Others need additional coffeine. It is clear that there are no aisles selected such as “eye ear care”, “digestion” which might suggest that these hours are often used by people with some serious disorders. Let’s try now to check what is the profile of these two categories based on hours.
rules.muscles.joints.pain.relief<-apriori(data=trans,
parameter=list(supp=0.0001, conf=0.000001),
appearance=list(default="lhs",
rhs= "aisle=muscles joints pain relief"),
control=list(verbose=F))
kableRules(rules.muscles.joints.pain.relief, "coverage")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {} => {aisle=muscles joints pain relief} | 7e-04 | 0.0007 | 1.0000 | 1.0000 | 378 |
| {reorder_name=Reordered} => {aisle=muscles joints pain relief} | 2e-04 | 0.0004 | 0.5992 | 0.5739 | 130 |
| {reorder_name=Not reordered} => {aisle=muscles joints pain relief} | 4e-04 | 0.0011 | 0.4008 | 1.6371 | 248 |
| {order_id=[2.28e+06,3.42e+06]} => {aisle=muscles joints pain relief} | 2e-04 | 0.0007 | 0.3333 | 0.9841 | 124 |
| {order_id=[1.13e+06,2.28e+06)} => {aisle=muscles joints pain relief} | 2e-04 | 0.0007 | 0.3333 | 1.0794 | 136 |
| {order_id=[1,1.13e+06)} => {aisle=muscles joints pain relief} | 2e-04 | 0.0006 | 0.3333 | 0.9365 | 118 |
As we can see senior hours are placed in the sixth position but still coverage is very low. For other metrics label that indicates these particular hours has not been placed in the top of the resulting table.
How the Friday mood affects shooping?
rules.friday<-apriori(data=trans,
parameter=list(supp=0.0001, conf=0.000001),
appearance=list(default="lhs", rhs= "dow_name=Friday"),
control=list(verbose=F))
kableRules(rules.friday, "confidence")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {department=alcohol,time_of_day=Morning} => {dow_name=Friday} | 1e-04 | 0.2336 | 5e-04 | 2.0969 | 64 |
| {order_id=[1.13e+06,2.28e+06),aisle=beers coolers} => {dow_name=Friday} | 1e-04 | 0.2156 | 5e-04 | 1.9357 | 58 |
| {order_id=[1.13e+06,2.28e+06),aisle=beers coolers,department=alcohol} => {dow_name=Friday} | 1e-04 | 0.2156 | 5e-04 | 1.9357 | 58 |
| {order_id=[2.28e+06,3.42e+06],department=alcohol,reorder_name=Not reordered} => {dow_name=Friday} | 1e-04 | 0.2111 | 5e-04 | 1.8949 | 61 |
| {aisle=beers coolers,reorder_name=Not reordered} => {dow_name=Friday} | 1e-04 | 0.2098 | 5e-04 | 1.8838 | 64 |
| {aisle=beers coolers,department=alcohol,reorder_name=Not reordered} => {dow_name=Friday} | 1e-04 | 0.2098 | 5e-04 | 1.8838 | 64 |
Alcohol is definitely placed very high, any sort of it occurs in all of the top results. Once they add alcohol to the cart they do not reorder it. What might be interesting is that people usually buy alcohol in the morning. It might be simply because most of the customers on Friday do shopping in the morning in order to have more free time during the proper weekend.
trans_for_ar %>%
group_by(dow_name, time_of_day) %>%
count() %>%
ungroup() %>%
group_by(dow_name) %>%
mutate(perc = n/sum(n))
The results show that most of the purchases are being made on Friday afternoon. That is interesting why people buy alcohol in the morning rather than in the afternoon. What might also be absorbing is that people do not choose any salty food in addition to the alcoholic drinks.
rules.lunch.meat<-apriori(data=trans,
parameter=list(supp=0.0001, conf=0.000001),
appearance=list(default="lhs", rhs= "aisle=lunch meat"),
control=list(verbose=F))
kableRules(rules.lunch.meat, "confidence")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {product_name=Sliced Soppressata Salame} => {aisle=lunch meat} | 1e-04 | 1 | 1e-04 | 82.017 | 57 |
| {product_name=Rosemary Ham} => {aisle=lunch meat} | 1e-04 | 1 | 1e-04 | 82.017 | 60 |
| {product_name=Hard Salami} => {aisle=lunch meat} | 1e-04 | 1 | 1e-04 | 82.017 | 60 |
| {product_name=Deli Fresh Honey Ham, 97% Fat Free, Gluten Free} => {aisle=lunch meat} | 1e-04 | 1 | 1e-04 | 82.017 | 65 |
| {product_name=Organic Roast Beef} => {aisle=lunch meat} | 1e-04 | 1 | 1e-04 | 82.017 | 66 |
| {product_name=Uncured Diced Pancetta} => {aisle=lunch meat} | 1e-04 | 1 | 1e-04 | 82.017 | 68 |
Investigate weekdays in terms of some association metrics.
rules <- apriori(trans, parameter=list(supp=0.001, conf=0.001))
Apriori
Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext 0.001 0.1 1 none FALSE TRUE 5 0.001 1 10 rules TRUE
Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 565
set item appearances …[0 item(s)] done [0.00s]. set transactions …[132889 item(s), 565097 transaction(s)] done [5.66s]. sorting and recoding items … [236 item(s)] done [0.07s]. creating transaction tree … done [0.97s]. checking subsets of size 1 2 3 4 5 6 done [0.05s]. writing … [33763 rule(s)] done [0.13s]. creating S4 object … done [0.17s].
rules.dow.tod <- subset(rules, subset = lhs %pin% "dow_name=" & rhs %pin% "time_of_day=")
kableRules(rules.dow.tod, "confidence")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {aisle=ice cream ice,dow_name=Sunday} => {time_of_day=Afternoon} | 0.0011 | 0.3826 | 0.0029 | 1.1915 | 619 |
| {aisle=ice cream ice,department=frozen,dow_name=Sunday} => {time_of_day=Afternoon} | 0.0011 | 0.3826 | 0.0029 | 1.1915 | 619 |
| {order_id=[1.13e+06,2.28e+06),department=snacks,dow_name=Sunday} => {time_of_day=Afternoon} | 0.0015 | 0.3704 | 0.0041 | 1.1536 | 849 |
| {aisle=ice cream ice,dow_name=Monday} => {time_of_day=Afternoon} | 0.0013 | 0.3694 | 0.0035 | 1.1503 | 728 |
| {aisle=ice cream ice,department=frozen,dow_name=Monday} => {time_of_day=Afternoon} | 0.0013 | 0.3694 | 0.0035 | 1.1503 | 728 |
| {aisle=yogurt,dow_name=Monday,reorder_name=Not reordered} => {time_of_day=Afternoon} | 0.0011 | 0.3675 | 0.0030 | 1.1446 | 627 |
In terms of the first row of rules sorted by confidence 5.8 % of transactions containing during Night has been made on Thursday. Discussing lift it also indicates that the strongest association is between Thursday and Night.
rules <- apriori(trans, parameter=list(supp=0.001, conf=0.001))
Apriori
Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext 0.001 0.1 1 none FALSE TRUE 5 0.001 1 10 rules TRUE
Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 565
set item appearances …[0 item(s)] done [0.00s]. set transactions …[132889 item(s), 565097 transaction(s)] done [5.07s]. sorting and recoding items … [236 item(s)] done [0.07s]. creating transaction tree … done [1.10s]. checking subsets of size 1 2 3 4 5 6 done [0.05s]. writing … [33763 rule(s)] done [0.13s]. creating S4 object … done [0.17s].
rules.tod.dow <- subset(rules, subset = lhs %pin% "time_of_day=" & rhs %pin% "dow_name=")
kableRules(rules.tod.dow, "count")
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {time_of_day=Afternoon} => {dow_name=Monday} | 0.0780 | 0.2429 | 0.3211 | 1.0360 | 44082 |
| {time_of_day=Afternoon} => {dow_name=Sunday} | 0.0493 | 0.1537 | 0.3211 | 1.0227 | 27883 |
| {time_of_day=Afternoon} => {dow_name=Tuesday} | 0.0481 | 0.1497 | 0.3211 | 1.0062 | 27155 |
| {reorder_name=Reordered,time_of_day=Afternoon} => {dow_name=Monday} | 0.0467 | 0.2475 | 0.1889 | 1.0552 | 26411 |
| {time_of_day=Noon} => {dow_name=Monday} | 0.0407 | 0.2500 | 0.1630 | 1.0659 | 23026 |
| {time_of_day=Senior hours} => {dow_name=Monday} | 0.0396 | 0.2449 | 0.1618 | 1.0445 | 22401 |
Most of the transactions found in dataset is on Monday Afternoon.
trans.sel <-trans_for_viz[,itemFrequency(trans_for_viz)>0.05] # selected transations
d.jac.i<-dissimilarity(trans.sel, which="items") # Jaccard as default
plot(hclust(d.jac.i, method="ward.D2"), main="Dendrogram for items")
The interpretation of the hclust graph is following if we keep the three biggest clusters:
itemFrequencyPlot(trans_for_viz, topN=10, type="absolute", main="Item Frequency")
In terms of item frequency, it is shown that most of the record has been added in the afternoon. Customers also focus on keeping in shape since a lot of transactions are made inside the produce department, others in dairy eggs.
rules_for_viz <- apriori(trans_for_viz, parameter = list(support = 0.001))
Apriori
Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext 0.8 0.1 1 none FALSE TRUE 5 0.001 1 10 rules TRUE
Algorithmic control: filter tree heap memopt load sort verbose 0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 565
set item appearances …[0 item(s)] done [0.00s]. set transactions …[31726 item(s), 565097 transaction(s)] done [2.28s]. sorting and recoding items … [218 item(s)] done [0.02s]. creating transaction tree … done [0.54s]. checking subsets of size 1 2 3 done [0.12s]. writing … [489 rule(s)] done [0.00s]. creating S4 object … done [0.11s].
plot(head(rules_for_viz, 30, by = "lift"), method = "paracoord", reorder =TRUE)
The figure above shows rules with arrows where the width of arrows is linked to support and the color confidence contains information about the confidence of given rule. The values of each dimension are connected with each other via line. The y-axis is formed via nominal values and the x-axis presents the position in the rule.
Association rules are a very useful technique for mining patterns in large datasets due to their algorithms. What is more, it can save a lot of time in finding user preferences and what drives their choices. Most of the useful functionalities are implemented either in a form of association rules mining functionalities or graphs. For the latter arulesViz (Hahsler 2017) is widely used where a lot of brand new visualization techniques to research association rules has been implemented. The main features of arules (Hahsler, Grün, and Hornik 2005) in terms of association rules are efficient implementation with sparse matrices usage.
Hahsler, Michael. 2017. “ArulesViz: Interactive Visualization of Association Rules with R.” R Journal 9 (December): 163–75. https://doi.org/10.32614/RJ-2017-047.
Hahsler, Michael, Bettina Grün, and Kurt Hornik. 2005. “Arules - a Computational Environment for Mining Association Rules and Frequent Item Sets.” Journal of Statistical Software, Articles 14 (15): 1–25. https://doi.org/10.18637/jss.v014.i15.
“Instacart Market Basket Analysis.” 2021. https://www.kaggle.com/c/instacart-market-basket-analysis. February 5, 2021.