Market Basket Analysis (Association Rules) has a huge impact on our daily lives. Primarily, it is used by corporations selling products through supermarkets. By analyzing the consumption baskets of customers, interesting conclusions can be drawn. First of all, we can analyze which group of products is most often found in the customer’s basket, as well as which specific product is related to another. Thanks to this, the probability of such a situation is determined, as well as how often we can notice such grouping of products.
This allows for appropriate and rational placement of the most frequently occurring products in specific boxes in the supermarket. This also minimizes the distance the customer has to travel to buy products, because they will be relatively close to each other.
An example would be dairy refrigerators that contain lots of dairy-like products, butter, cheese. From many analyses it came out that such products very often appear in the basket. The photo below shows an example of such a refrigerator with dairy.
Dairy Fridge
Source: https://www.justfridge.co.za/remote-cabinets/upright-dairy-m2/
The analysis of association rules begins by installing the appropriate packages and loading the libraries into the R environment.
# install.packages("arules")
# install.packages("arulesViz")
# install.packages("arulesCBA")
# install.packages("tidyverse")
# install.packages("kableExtra")
# install.packages("magrittr")
# install.packages("visNetwork")
library(arules)
library(arulesViz)
library(arulesCBA)
library(tidyverse)
library(knitr)
library(kableExtra)
library(magrittr)
library(visNetwork)
Then we proceed to load the data that have been made available thanks to the Kaggle website (data available at the following link: Kaggle Dataset Groceries). This dataset lists the items of customers who chose to shop at one of the nearby grocery stores. Each record represents one customer account. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.
Groceries_all <- read_csv('groceries.csv')
Groceries_all %>%
rename(noItems = "Item(s)") %>%
select(-noItems) -> Groceries
Groceries %>%
write.csv('groceries1.csv', row.names = FALSE)
transactions <- read.transactions('groceries1.csv', format="basket", sep=",", skip=1, rm.duplicates = T)
Let’s look at how many transactions we have for each number of groceries purchased together.
Groceries_all %>%
ggplot(mapping = aes(x = `Item(s)`)) +
geom_bar(fill = "coral2") +
xlab("Number of groceries in basket") +
ylab("Number of transactions") +
ggtitle("Barlpot of number of transactions by number of food items")
We see that the dataset contains the most transactions for 1 item and as the number of items increases, the number of transactions decreases dramatically. We are probably dealing here with a small neighborhood store, not a supermarket.
The essence of finding association rules in the sales of RDA Hijab apparel products is addressed in the paper Nabila et al. (2021) In the current very fast developing technological world, the search for better and better algorithms is happening all the time. Companies are competing fiercely in the market to grab their share. To maximize this share one should use Data Mining algorithms (Apriori) to search for dependencies and associative buying rules of products. This allows you to better customize your in-store stands to grab customers’ attention and maximize sales. In their analysis, they assumed a minimum confidence level of 50% and a support value of 20. They came to many conclusions about sales and the association between different color cuffs and white shirts.
An important role in the selection of Association Rules algorithms is their memory and time complexity. In his dissertation Khedkar and Kumari (2021) presented that Apriori algorithm requires higher time complexity but can be used to work on larger databases. While FP-growth algorithm requires higher memory complexity. Depending on the constraints posed by the environment the appropriate algorithm should be chosen. Also in their work they presented other Association Rules methods available in the literature including: single layer feedforward partially connected neural network, FP-Bonsai, AR-Markov Model or CLoPAR.
We can read a very different approach to association rules from the article Hruschka (2021) They analyzed the preferences of consumption baskets of products from the music and computer industries for different households. They used multivariate logit model (MVL), finite mixture extension (FM-MVL) and conditional restricted Boltzmann machine (CRBM). With these algorithms, they determined the differences between the average probabilities of buying and not buying specific products. They also discovered relationships in latent variables for subcategories in an interesting way. The essence of the relationship between two products is how they are connected, e.g., classical music with religious music or buying a monitor and a keyboard.
Now let’s have a quick look at the 4 basic and most important measures used for association rules.
The support measure tells us how large the common part is for the set A and B. We look for those transactions in which both a product from set A and a product from set B occurred, and then divide that by the number of total transactions. The measure of Support is determined by the following formula:
\[Support = \frac{\text{Number of transactions with both A and B}}{\text{Total number of transactions}} = P(A \cap B) \]
To understand the Confidence measure in a simple way, it is simply a conditional probability in which we ask in how many situations of buying, for example, milk we could see situations of buying milk and cottage cheese. We answer the question what part of the set A is the common part of the set A and B. The equation for the Confidence measure can be written as follows:
\[Confidence = \frac{\text{Number of transactions with both A and B}}{\text{Total number of transactions with A}} = \frac{P(A \cap B)}{P(A)} \]
Additionally we can calculate ExpectedConfidence to get probability of occuring item from set B. The measure of ExpectedConfidence is determined by the following formula:
\[ExpectedConfidence = \frac{\text{Number of transactions with B}}{\text{Total number of transactions}} = P(B) \]
The last measure we should consider is the Lift measure. Lift is nothing more than showing how often the situation of milk with cottage cheese occurs compared to the independent events of buying milk or cottage cheese separately. A Lift of 3 means that you will notice 3 times as many situations where you buy milk and quark as when you buy only milk or only quark. The formula for the Lift measure can be written as follows:
\[Lift = \frac{\text{Confidence}}{\text{Expected Confidence}} = \frac{P(A \cap B)}{P(A)*P(B)} \]
Before we proceed to the target analysis of association rules, we should still take a bird’s eye view of the data in order to select appropriate and satisfactory parameters used for the Apriori algorithm.
transactions %>%
head() %>%
LIST()
## [[1]]
## [1] "citrus fruit" "margarine" "ready soups"
## [4] "semi-finished bread"
##
## [[2]]
## [1] "coffee" "tropical fruit" "yogurt"
##
## [[3]]
## [1] "whole milk"
##
## [[4]]
## [1] "cream cheese" "meat spreads" "pip fruit" "yogurt"
##
## [[5]]
## [1] "condensed milk" "long life bakery product"
## [3] "other vegetables" "whole milk"
##
## [[6]]
## [1] "abrasive cleaner" "butter" "rice" "whole milk"
## [5] "yogurt"
transactions %>%
itemFrequency() %>%
round(4) %>%
sort(decreasing = TRUE)
## whole milk other vegetables rolls/buns
## 0.2555 0.1935 0.1839
## soda yogurt bottled water
## 0.1744 0.1395 0.1104
## root vegetables tropical fruit shopping bags
## 0.1090 0.1049 0.0984
## sausage pastry citrus fruit
## 0.0940 0.0890 0.0828
## bottled beer newspapers canned beer
## 0.0803 0.0797 0.0777
## pip fruit fruit/vegetable juice whipped/sour cream
## 0.0756 0.0721 0.0717
## brown bread domestic eggs frankfurter
## 0.0649 0.0634 0.0590
## margarine coffee pork
## 0.0585 0.0580 0.0577
## butter curd beef
## 0.0554 0.0533 0.0525
## napkins chocolate frozen vegetables
## 0.0522 0.0491 0.0481
## chicken white bread cream cheese
## 0.0429 0.0421 0.0397
## waffles salty snack dessert
## 0.0381 0.0378 0.0371
## long life bakery product sugar UHT-milk
## 0.0371 0.0338 0.0335
## berries hamburger meat hygiene articles
## 0.0332 0.0332 0.0323
## onions specialty chocolate candy
## 0.0310 0.0304 0.0299
## frozen meals misc. beverages oil
## 0.0284 0.0283 0.0281
## butter milk specialty bar beverages
## 0.0280 0.0274 0.0260
## ham meat ice cream
## 0.0260 0.0258 0.0250
## hard cheese sliced cheese cat food
## 0.0245 0.0245 0.0233
## grapes chewing gum detergent
## 0.0224 0.0210 0.0192
## red/blush wine white wine pickled vegetables
## 0.0192 0.0189 0.0179
## baking powder semi-finished bread dishes
## 0.0177 0.0177 0.0176
## flour potted plants soft cheese
## 0.0173 0.0173 0.0171
## processed cheese herbs canned fish
## 0.0166 0.0163 0.0150
## pasta seasonal products cake bar
## 0.0149 0.0142 0.0131
## packaged fruit/vegetables mustard frozen fish
## 0.0130 0.0120 0.0117
## cling film/bags spread cheese liquor
## 0.0113 0.0112 0.0111
## canned vegetables frozen dessert salt
## 0.0108 0.0108 0.0108
## dish cleaner condensed milk flower (seeds)
## 0.0105 0.0103 0.0103
## roll products pet care photo/film
## 0.0102 0.0095 0.0093
## mayonnaise sweet spreads chocolate marshmallow
## 0.0092 0.0090 0.0089
## candles specialty cheese dog food
## 0.0088 0.0085 0.0084
## frozen potato products house keeping products turkey
## 0.0084 0.0081 0.0081
## Instant food products liquor (appetizer) rice
## 0.0080 0.0078 0.0076
## instant coffee popcorn zwieback
## 0.0073 0.0072 0.0069
## soups finished products vinegar
## 0.0067 0.0065 0.0065
## female sanitary products kitchen towels cereals
## 0.0060 0.0060 0.0057
## dental care sparkling wine sauces
## 0.0057 0.0056 0.0055
## softener jam spices
## 0.0055 0.0053 0.0052
## cleaner curd cheese liver loaf
## 0.0051 0.0051 0.0051
## male cosmetics rum ketchup
## 0.0046 0.0044 0.0043
## meat spreads brandy light bulbs
## 0.0043 0.0042 0.0042
## tea specialty fat abrasive cleaner
## 0.0039 0.0036 0.0035
## skin care nuts/prunes artif. sweetener
## 0.0035 0.0034 0.0033
## canned fruit syrup nut snack
## 0.0033 0.0033 0.0031
## snack products fish potato products
## 0.0031 0.0029 0.0028
## bathroom cleaner cookware soap
## 0.0027 0.0027 0.0026
## cooking chocolate pudding powder tidbits
## 0.0024 0.0023 0.0023
## cocoa drinks organic sausage prosecco
## 0.0022 0.0022 0.0020
## flower soil/fertilizer ready soups specialty vegetables
## 0.0019 0.0018 0.0017
## organic products decalcifier honey
## 0.0016 0.0015 0.0015
## cream frozen fruits hair spray
## 0.0013 0.0012 0.0011
## rubbing alcohol liqueur make up remover
## 0.0010 0.0009 0.0008
## salad dressing whisky toilet cleaner
## 0.0008 0.0008 0.0007
## baby cosmetics frozen chicken bags
## 0.0006 0.0006 0.0004
## kitchen utensil preservation products baby food
## 0.0004 0.0002 0.0001
## sound storage medium
## 0.0001
transactions %>%
itemFrequency(type = "absolute") %>%
sort(decreasing = TRUE)
## whole milk other vegetables rolls/buns
## 2513 1903 1809
## soda yogurt bottled water
## 1715 1372 1086
## root vegetables tropical fruit shopping bags
## 1072 1032 968
## sausage pastry citrus fruit
## 924 875 814
## bottled beer newspapers canned beer
## 790 784 764
## pip fruit fruit/vegetable juice whipped/sour cream
## 744 709 705
## brown bread domestic eggs frankfurter
## 638 624 580
## margarine coffee pork
## 575 570 567
## butter curd beef
## 545 524 516
## napkins chocolate frozen vegetables
## 513 483 473
## chicken white bread cream cheese
## 422 414 390
## waffles salty snack dessert
## 375 372 365
## long life bakery product sugar UHT-milk
## 365 332 329
## berries hamburger meat hygiene articles
## 327 327 318
## onions specialty chocolate candy
## 305 299 294
## frozen meals misc. beverages oil
## 279 278 276
## butter milk specialty bar beverages
## 275 269 256
## ham meat ice cream
## 256 254 246
## hard cheese sliced cheese cat food
## 241 241 229
## grapes chewing gum detergent
## 220 207 189
## red/blush wine white wine pickled vegetables
## 189 186 176
## baking powder semi-finished bread dishes
## 174 174 173
## flour potted plants soft cheese
## 170 170 168
## processed cheese herbs canned fish
## 163 160 148
## pasta seasonal products cake bar
## 147 140 129
## packaged fruit/vegetables mustard frozen fish
## 128 118 115
## cling film/bags spread cheese liquor
## 111 110 109
## canned vegetables frozen dessert salt
## 106 106 106
## dish cleaner condensed milk flower (seeds)
## 103 101 101
## roll products pet care photo/film
## 100 93 91
## mayonnaise sweet spreads chocolate marshmallow
## 90 89 88
## candles specialty cheese dog food
## 87 84 83
## frozen potato products house keeping products turkey
## 83 80 80
## Instant food products liquor (appetizer) rice
## 79 77 75
## instant coffee popcorn zwieback
## 72 71 68
## soups finished products vinegar
## 66 64 64
## female sanitary products kitchen towels cereals
## 59 59 56
## dental care sparkling wine sauces
## 56 55 54
## softener jam spices
## 54 52 51
## cleaner curd cheese liver loaf
## 50 50 50
## male cosmetics rum ketchup
## 45 43 42
## meat spreads brandy light bulbs
## 42 41 41
## tea specialty fat abrasive cleaner
## 38 35 34
## skin care nuts/prunes artif. sweetener
## 34 33 32
## canned fruit syrup nut snack
## 32 32 30
## snack products fish potato products
## 30 29 28
## bathroom cleaner cookware soap
## 27 27 26
## cooking chocolate pudding powder tidbits
## 24 23 23
## cocoa drinks organic sausage prosecco
## 22 22 20
## flower soil/fertilizer ready soups specialty vegetables
## 19 18 17
## organic products decalcifier honey
## 16 15 15
## cream frozen fruits hair spray
## 13 12 11
## rubbing alcohol liqueur make up remover
## 10 9 8
## salad dressing whisky toilet cleaner
## 8 8 7
## baby cosmetics frozen chicken bags
## 6 6 4
## kitchen utensil preservation products baby food
## 4 2 1
## sound storage medium
## 1
transactions %>%
itemFrequencyPlot(support = 0.1)
We see that we have the most transactions with the following food items: milk, vegetables, rolls, sodas, yogurt or bottled water. Of course, further down the list are more food items, but for these situations we simply have fewer transactions for the given dataset.
transactions %>%
crossTable(sort = TRUE) -> cross1
cross1[1:8, 1:8] %>%
kable(caption = "Relation between products by count") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| whole milk | other vegetables | rolls/buns | soda | yogurt | bottled water | root vegetables | tropical fruit | |
|---|---|---|---|---|---|---|---|---|
| whole milk | 2513 | 736 | 557 | 394 | 551 | 337 | 481 | 416 |
| other vegetables | 736 | 1903 | 419 | 322 | 427 | 243 | 466 | 353 |
| rolls/buns | 557 | 419 | 1809 | 377 | 338 | 238 | 239 | 242 |
| soda | 394 | 322 | 377 | 1715 | 269 | 285 | 183 | 205 |
| yogurt | 551 | 427 | 338 | 269 | 1372 | 225 | 254 | 288 |
| bottled water | 337 | 243 | 238 | 285 | 225 | 1086 | 153 | 181 |
| root vegetables | 481 | 466 | 239 | 183 | 254 | 153 | 1072 | 207 |
| tropical fruit | 416 | 353 | 242 | 205 | 288 | 181 | 207 | 1032 |
transactions %>%
crossTable(measure = "support", sort = TRUE) -> cross2
round(cross2[1:8, 1:8], 3) %>%
kable(caption = "Relation between products by support") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| whole milk | other vegetables | rolls/buns | soda | yogurt | bottled water | root vegetables | tropical fruit | |
|---|---|---|---|---|---|---|---|---|
| whole milk | 0.256 | 0.075 | 0.057 | 0.040 | 0.056 | 0.034 | 0.049 | 0.042 |
| other vegetables | 0.075 | 0.193 | 0.043 | 0.033 | 0.043 | 0.025 | 0.047 | 0.036 |
| rolls/buns | 0.057 | 0.043 | 0.184 | 0.038 | 0.034 | 0.024 | 0.024 | 0.025 |
| soda | 0.040 | 0.033 | 0.038 | 0.174 | 0.027 | 0.029 | 0.019 | 0.021 |
| yogurt | 0.056 | 0.043 | 0.034 | 0.027 | 0.140 | 0.023 | 0.026 | 0.029 |
| bottled water | 0.034 | 0.025 | 0.024 | 0.029 | 0.023 | 0.110 | 0.016 | 0.018 |
| root vegetables | 0.049 | 0.047 | 0.024 | 0.019 | 0.026 | 0.016 | 0.109 | 0.021 |
| tropical fruit | 0.042 | 0.036 | 0.025 | 0.021 | 0.029 | 0.018 | 0.021 | 0.105 |
transactions %>%
crossTable(measure = "lift", sort = TRUE) -> cross3
round(cross3[1:8, 1:8], 3) %>%
kable(caption = "Relation between products by lift") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| whole milk | other vegetables | rolls/buns | soda | yogurt | bottled water | root vegetables | tropical fruit | |
|---|---|---|---|---|---|---|---|---|
| whole milk | NA | 1.514 | 1.205 | 0.899 | 1.572 | 1.214 | 1.756 | 1.578 |
| other vegetables | 1.514 | NA | 1.197 | 0.970 | 1.608 | 1.156 | 2.247 | 1.768 |
| rolls/buns | 1.205 | 1.197 | NA | 1.195 | 1.339 | 1.191 | 1.212 | 1.275 |
| soda | 0.899 | 0.970 | 1.195 | NA | 1.124 | 1.505 | 0.979 | 1.139 |
| yogurt | 1.572 | 1.608 | 1.339 | 1.124 | NA | 1.485 | 1.698 | 2.000 |
| bottled water | 1.214 | 1.156 | 1.191 | 1.505 | 1.485 | NA | 1.293 | 1.588 |
| root vegetables | 1.756 | 2.247 | 1.212 | 0.979 | 1.698 | 1.293 | NA | 1.840 |
| tropical fruit | 1.578 | 1.768 | 1.275 | 1.139 | 2.000 | 1.588 | 1.840 | NA |
In the food pairings, the most interesting ones are those that achieve the highest values for the Support, Confidence, and Lift measures. Yes for Lift we see that root vegetables are highly correlated with other vegetables. This is due to the fact that most often vegetables are located very close to each other in the store in one sector.
We begin the analysis itself with the Apriori algorithm by selecting the parameters Support, Confidence, and the minimum number of food items in the datasets.
transactions %>%
apriori(parameter = list(support = 0.01, confidence = 0.2, minlen = 2)) -> groceriesRules
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [230 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
groceriesRules %>%
sort(by = "support", decreasing=TRUE) %>%
head() %>%
inspect() %>%
kable(caption = "Groceries Rules ordered by support") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
## lhs rhs support confidence coverage
## [1] {other vegetables} => {whole milk} 0.07483477 0.3867578 0.1934926
## [2] {whole milk} => {other vegetables} 0.07483477 0.2928770 0.2555160
## [3] {rolls/buns} => {whole milk} 0.05663447 0.3079049 0.1839349
## [4] {whole milk} => {rolls/buns} 0.05663447 0.2216474 0.2555160
## [5] {yogurt} => {whole milk} 0.05602440 0.4016035 0.1395018
## [6] {whole milk} => {yogurt} 0.05602440 0.2192598 0.2555160
## lift count
## [1] 1.513634 736
## [2] 1.513634 736
## [3] 1.205032 557
## [4] 1.205032 557
## [5] 1.571735 551
## [6] 1.571735 551
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {other vegetables} | => | {whole milk} | 0.0748348 | 0.3867578 | 0.1934926 | 1.513634 | 736 |
| [2] | {whole milk} | => | {other vegetables} | 0.0748348 | 0.2928770 | 0.2555160 | 1.513634 | 736 |
| [3] | {rolls/buns} | => | {whole milk} | 0.0566345 | 0.3079049 | 0.1839349 | 1.205032 | 557 |
| [4] | {whole milk} | => | {rolls/buns} | 0.0566345 | 0.2216474 | 0.2555160 | 1.205032 | 557 |
| [5] | {yogurt} | => | {whole milk} | 0.0560244 | 0.4016035 | 0.1395018 | 1.571735 | 551 |
| [6] | {whole milk} | => | {yogurt} | 0.0560244 | 0.2192598 | 0.2555160 | 1.571735 | 551 |
groceriesRules %>%
sort(by = "confidence", decreasing=TRUE) %>%
head() %>%
inspect() %>%
kable(caption = "Groceries Rules ordered by confidence") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
## lhs rhs support
## [1] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [2] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3] {curd, yogurt} => {whole milk} 0.01006609
## [4] {butter, other vegetables} => {whole milk} 0.01148958
## [5] {root vegetables, tropical fruit} => {whole milk} 0.01199797
## [6] {root vegetables, yogurt} => {whole milk} 0.01453991
## confidence coverage lift count
## [1] 0.5862069 0.01769192 3.029608 102
## [2] 0.5845411 0.02104728 3.020999 121
## [3] 0.5823529 0.01728521 2.279125 99
## [4] 0.5736041 0.02003050 2.244885 113
## [5] 0.5700483 0.02104728 2.230969 118
## [6] 0.5629921 0.02582613 2.203354 143
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {citrus fruit, root vegetables} | => | {other vegetables} | 0.0103711 | 0.5862069 | 0.0176919 | 3.029608 | 102 |
| [2] | {root vegetables, tropical fruit} | => | {other vegetables} | 0.0123030 | 0.5845411 | 0.0210473 | 3.020999 | 121 |
| [3] | {curd, yogurt} | => | {whole milk} | 0.0100661 | 0.5823529 | 0.0172852 | 2.279125 | 99 |
| [4] | {butter, other vegetables} | => | {whole milk} | 0.0114896 | 0.5736041 | 0.0200305 | 2.244885 | 113 |
| [5] | {root vegetables, tropical fruit} | => | {whole milk} | 0.0119980 | 0.5700483 | 0.0210473 | 2.230969 | 118 |
| [6] | {root vegetables, yogurt} | => | {whole milk} | 0.0145399 | 0.5629921 | 0.0258261 | 2.203354 | 143 |
groceriesRules %>%
sort(by = "lift", decreasing=TRUE) %>%
head() %>%
inspect() %>%
kable(caption = "Groceries Rules ordered by lift") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
## lhs rhs support
## [1] {citrus fruit, other vegetables} => {root vegetables} 0.01037112
## [2] {other vegetables, yogurt} => {whipped/sour cream} 0.01016777
## [3] {other vegetables, tropical fruit} => {root vegetables} 0.01230300
## [4] {beef} => {root vegetables} 0.01738688
## [5] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [6] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## confidence coverage lift count
## [1] 0.3591549 0.02887646 3.295045 102
## [2] 0.2341920 0.04341637 3.267062 100
## [3] 0.3427762 0.03589222 3.144780 121
## [4] 0.3313953 0.05246568 3.040367 171
## [5] 0.5862069 0.01769192 3.029608 102
## [6] 0.5845411 0.02104728 3.020999 121
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {citrus fruit, other vegetables} | => | {root vegetables} | 0.0103711 | 0.3591549 | 0.0288765 | 3.295046 | 102 |
| [2] | {other vegetables, yogurt} | => | {whipped/sour cream} | 0.0101678 | 0.2341920 | 0.0434164 | 3.267062 | 100 |
| [3] | {other vegetables, tropical fruit} | => | {root vegetables} | 0.0123030 | 0.3427762 | 0.0358922 | 3.144780 | 121 |
| [4] | {beef} | => | {root vegetables} | 0.0173869 | 0.3313953 | 0.0524657 | 3.040367 | 171 |
| [5] | {citrus fruit, root vegetables} | => | {other vegetables} | 0.0103711 | 0.5862069 | 0.0176919 | 3.029608 | 102 |
| [6] | {root vegetables, tropical fruit} | => | {other vegetables} | 0.0123030 | 0.5845411 | 0.0210473 | 3.020999 | 121 |
groceriesRules %>%
sort(by = "count", decreasing=TRUE) %>%
head() %>%
inspect() %>%
kable(caption = "Groceries Rules ordered by count") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
## lhs rhs support confidence coverage
## [1] {other vegetables} => {whole milk} 0.07483477 0.3867578 0.1934926
## [2] {whole milk} => {other vegetables} 0.07483477 0.2928770 0.2555160
## [3] {rolls/buns} => {whole milk} 0.05663447 0.3079049 0.1839349
## [4] {whole milk} => {rolls/buns} 0.05663447 0.2216474 0.2555160
## [5] {yogurt} => {whole milk} 0.05602440 0.4016035 0.1395018
## [6] {whole milk} => {yogurt} 0.05602440 0.2192598 0.2555160
## lift count
## [1] 1.513634 736
## [2] 1.513634 736
## [3] 1.205032 557
## [4] 1.205032 557
## [5] 1.571735 551
## [6] 1.571735 551
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {other vegetables} | => | {whole milk} | 0.0748348 | 0.3867578 | 0.1934926 | 1.513634 | 736 |
| [2] | {whole milk} | => | {other vegetables} | 0.0748348 | 0.2928770 | 0.2555160 | 1.513634 | 736 |
| [3] | {rolls/buns} | => | {whole milk} | 0.0566345 | 0.3079049 | 0.1839349 | 1.205032 | 557 |
| [4] | {whole milk} | => | {rolls/buns} | 0.0566345 | 0.2216474 | 0.2555160 | 1.205032 | 557 |
| [5] | {yogurt} | => | {whole milk} | 0.0560244 | 0.4016035 | 0.1395018 | 1.571735 | 551 |
| [6] | {whole milk} | => | {yogurt} | 0.0560244 | 0.2192598 | 0.2555160 | 1.571735 | 551 |
After applying Apriori analysis, our algorithm created 230 rules between food items.
summary(groceriesRules)
## set of 230 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3
## 150 80
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 2.000 2.348 3.000 3.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01007 Min. :0.2000 Min. :0.01729 Min. :0.8991
## 1st Qu.:0.01202 1st Qu.:0.2465 1st Qu.:0.03437 1st Qu.:1.4496
## Median :0.01484 Median :0.3172 Median :0.05231 Median :1.7303
## Mean :0.01905 Mean :0.3324 Mean :0.06311 Mean :1.7930
## 3rd Qu.:0.02224 3rd Qu.:0.4037 3rd Qu.:0.07565 3rd Qu.:2.0800
## Max. :0.07483 Max. :0.5862 Max. :0.25552 Max. :3.2950
## count
## Min. : 99.0
## 1st Qu.:118.2
## Median :146.0
## Mean :187.3
## 3rd Qu.:218.8
## Max. :736.0
##
## mining info:
## data ntransactions support confidence
## . 9835 0.01 0.2
## call
## apriori(data = ., parameter = list(support = 0.01, confidence = 0.2, minlen = 2))
Now let’s try to see what are the most common products that make customers more willing to buy milk.
transactions %>%
apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
head(10) %>%
inspect() %>%
kable(caption = "Relation between products by support") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
## lhs rhs support confidence
## [1] {other vegetables, yogurt} => {whole milk} 0.02226741 0.5128806
## [2] {butter} => {whole milk} 0.02755465 0.4972477
## [3] {curd} => {whole milk} 0.02613116 0.4904580
## [4] {other vegetables, root vegetables} => {whole milk} 0.02318251 0.4892704
## [5] {domestic eggs} => {whole milk} 0.02999492 0.4727564
## [6] {whipped/sour cream} => {whole milk} 0.03223183 0.4496454
## [7] {root vegetables} => {whole milk} 0.04890696 0.4486940
## [8] {frozen vegetables} => {whole milk} 0.02043721 0.4249471
## [9] {margarine} => {whole milk} 0.02409761 0.4121739
## [10] {beef} => {whole milk} 0.02125064 0.4050388
## coverage lift count
## [1] 0.04341637 2.007235 219
## [2] 0.05541434 1.946053 271
## [3] 0.05327911 1.919481 257
## [4] 0.04738180 1.914833 228
## [5] 0.06344687 1.850203 295
## [6] 0.07168277 1.759754 317
## [7] 0.10899847 1.756031 481
## [8] 0.04809354 1.663094 201
## [9] 0.05846467 1.613104 237
## [10] 0.05246568 1.585180 209
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {other vegetables, yogurt} | => | {whole milk} | 0.0222674 | 0.5128806 | 0.0434164 | 2.007235 | 219 |
| [2] | {butter} | => | {whole milk} | 0.0275547 | 0.4972477 | 0.0554143 | 1.946053 | 271 |
| [3] | {curd} | => | {whole milk} | 0.0261312 | 0.4904580 | 0.0532791 | 1.919480 | 257 |
| [4] | {other vegetables, root vegetables} | => | {whole milk} | 0.0231825 | 0.4892704 | 0.0473818 | 1.914833 | 228 |
| [5] | {domestic eggs} | => | {whole milk} | 0.0299949 | 0.4727564 | 0.0634469 | 1.850203 | 295 |
| [6] | {whipped/sour cream} | => | {whole milk} | 0.0322318 | 0.4496454 | 0.0716828 | 1.759754 | 317 |
| [7] | {root vegetables} | => | {whole milk} | 0.0489070 | 0.4486940 | 0.1089985 | 1.756031 | 481 |
| [8] | {frozen vegetables} | => | {whole milk} | 0.0204372 | 0.4249471 | 0.0480935 | 1.663094 | 201 |
| [9] | {margarine} | => | {whole milk} | 0.0240976 | 0.4121739 | 0.0584647 | 1.613104 | 237 |
| [10] | {beef} | => | {whole milk} | 0.0212506 | 0.4050388 | 0.0524657 | 1.585179 | 209 |
As you can see, customers are more likely to buy milk with items in the dairy category. When a customer walks up to a refrigerator with dairy products, he or she will immediately wonder if he or she needs other dairy-related food products such as butter, yogurt, eggs or cream.
transactions %>%
apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
head(10) %>%
inspect() %>%
kable(caption = "Relation between products by support") %>%
kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))
## lhs rhs support confidence coverage
## [1] {rolls/buns} => {whole milk} 0.05663447 0.3079049 0.1839349
## [2] {} => {whole milk} 0.25551601 0.2555160 1.0000000
## [3] {rolls/buns} => {other vegetables} 0.04260295 0.2316197 0.1839349
## [4] {rolls/buns} => {soda} 0.03833249 0.2084024 0.1839349
## [5] {} => {other vegetables} 0.19349263 0.1934926 1.0000000
## [6] {rolls/buns} => {yogurt} 0.03436706 0.1868436 0.1839349
## [7] {} => {soda} 0.17437722 0.1743772 1.0000000
## [8] {rolls/buns} => {sausage} 0.03060498 0.1663903 0.1839349
## [9] {} => {yogurt} 0.13950178 0.1395018 1.0000000
## [10] {rolls/buns} => {tropical fruit} 0.02460600 0.1337756 0.1839349
## lift count
## [1] 1.205032 557
## [2] 1.000000 2513
## [3] 1.197047 419
## [4] 1.195124 377
## [5] 1.000000 1903
## [6] 1.339363 338
## [7] 1.000000 1715
## [8] 1.771048 301
## [9] 1.000000 1372
## [10] 1.274886 242
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {rolls/buns} | => | {whole milk} | 0.0566345 | 0.3079049 | 0.1839349 | 1.205032 | 557 |
| [2] | {} | => | {whole milk} | 0.2555160 | 0.2555160 | 1.0000000 | 1.000000 | 2513 |
| [3] | {rolls/buns} | => | {other vegetables} | 0.0426029 | 0.2316197 | 0.1839349 | 1.197046 | 419 |
| [4] | {rolls/buns} | => | {soda} | 0.0383325 | 0.2084024 | 0.1839349 | 1.195124 | 377 |
| [5] | {} | => | {other vegetables} | 0.1934926 | 0.1934926 | 1.0000000 | 1.000000 | 1903 |
| [6] | {rolls/buns} | => | {yogurt} | 0.0343671 | 0.1868436 | 0.1839349 | 1.339363 | 338 |
| [7] | {} | => | {soda} | 0.1743772 | 0.1743772 | 1.0000000 | 1.000000 | 1715 |
| [8] | {rolls/buns} | => | {sausage} | 0.0306050 | 0.1663903 | 0.1839349 | 1.771048 | 301 |
| [9] | {} | => | {yogurt} | 0.1395018 | 0.1395018 | 1.0000000 | 1.000000 | 1372 |
| [10] | {rolls/buns} | => | {tropical fruit} | 0.0246060 | 0.1337756 | 0.1839349 | 1.274886 | 242 |
On the other hand, we would like to look at what other products are taken into the basket by customers when they buy fresh buns. From the presented rules they also most often reach for milk, other vegetables, sodas, yoghurts or sausages. These are items that are strongly associated with breakfast products.
Significance analysis of the rules using Fisher’s test is shown below.
transactions %>%
apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
is.significant()
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [25] FALSE FALSE FALSE FALSE
transactions %>%
apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
is.redundant()
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE TRUE TRUE TRUE
transactions %>%
apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
is.significant()
## [1] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE
## [13] TRUE FALSE FALSE TRUE FALSE FALSE TRUE
transactions %>%
apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
is.redundant()
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
groceriesRules %>%
plot(measure = c("support", "lift"), shading = "confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
groceriesRules %>%
plot(method = "grouped")
transactions %>%
apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
inspectDT()
transactions %>%
apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
control=list(verbose=F)) %>%
sort(by = "confidence", decreasing = TRUE) %>%
inspectDT()
groceriesRules %>%
plot(method = "graph", engine = "htmlwidget") %>%
visNodes(font=list(color = "black"))
## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).
The above tools allow for arbitrary searches of sets and relationships between food products. The interactive graph also allows tracing the paths of the 100 most significant relations between products in terms of Lift measure.
Analysis of associative rules for customers’ consumption basket is a very important tool to maximize sales and profits. Rational placement of product shelves and linking them thematically by specific categories will allow the customer to select the right products they need, encourage the purchase of new products, and minimize their time and distance to travel. It is of course necessary to go deeper into these relationships, to collect more transactions so that indeed the conclusions drawn from the Apriori analysis are translated into reality.