Introduction
- Literature
Key measures in Association Rules
- Support
- Confidence
- Lift
Target analysis
Summary
References

1 Introduction

Market Basket Analysis (Association Rules) has a huge impact on our daily lives. Primarily, it is used by corporations selling products through supermarkets. By analyzing the consumption baskets of customers, interesting conclusions can be drawn. First of all, we can analyze which group of products is most often found in the customer’s basket, as well as which specific product is related to another. Thanks to this, the probability of such a situation is determined, as well as how often we can notice such grouping of products.

This allows for appropriate and rational placement of the most frequently occurring products in specific boxes in the supermarket. This also minimizes the distance the customer has to travel to buy products, because they will be relatively close to each other.

An example would be dairy refrigerators that contain lots of dairy-like products, butter, cheese. From many analyses it came out that such products very often appear in the basket. The photo below shows an example of such a refrigerator with dairy.

Dairy Fridge

Source: https://www.justfridge.co.za/remote-cabinets/upright-dairy-m2/

The analysis of association rules begins by installing the appropriate packages and loading the libraries into the R environment.

# install.packages("arules")
# install.packages("arulesViz")
# install.packages("arulesCBA")
# install.packages("tidyverse")
# install.packages("kableExtra")
# install.packages("magrittr")
# install.packages("visNetwork")
library(arules)
library(arulesViz)
library(arulesCBA)
library(tidyverse)
library(knitr)
library(kableExtra)
library(magrittr)
library(visNetwork)

Then we proceed to load the data that have been made available thanks to the Kaggle website (data available at the following link: Kaggle Dataset Groceries). This dataset lists the items of customers who chose to shop at one of the nearby grocery stores. Each record represents one customer account. The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169 unique items.

Groceries_all <- read_csv('groceries.csv')

Groceries_all %>% 
  rename(noItems = "Item(s)") %>% 
  select(-noItems) -> Groceries

Groceries %>% 
  write.csv('groceries1.csv', row.names = FALSE)
transactions <- read.transactions('groceries1.csv', format="basket", sep=",", skip=1, rm.duplicates = T)

Let’s look at how many transactions we have for each number of groceries purchased together.

Groceries_all %>% 
  ggplot(mapping = aes(x = `Item(s)`)) +
  geom_bar(fill = "coral2") +
  xlab("Number of groceries in basket") +
  ylab("Number of transactions") +
  ggtitle("Barlpot of number of transactions by number of food items")

We see that the dataset contains the most transactions for 1 item and as the number of items increases, the number of transactions decreases dramatically. We are probably dealing here with a small neighborhood store, not a supermarket.

1.1 Literature

The essence of finding association rules in the sales of RDA Hijab apparel products is addressed in the paper Nabila et al. (2021) In the current very fast developing technological world, the search for better and better algorithms is happening all the time. Companies are competing fiercely in the market to grab their share. To maximize this share one should use Data Mining algorithms (Apriori) to search for dependencies and associative buying rules of products. This allows you to better customize your in-store stands to grab customers’ attention and maximize sales. In their analysis, they assumed a minimum confidence level of 50% and a support value of 20. They came to many conclusions about sales and the association between different color cuffs and white shirts.

An important role in the selection of Association Rules algorithms is their memory and time complexity. In his dissertation Khedkar and Kumari (2021) presented that Apriori algorithm requires higher time complexity but can be used to work on larger databases. While FP-growth algorithm requires higher memory complexity. Depending on the constraints posed by the environment the appropriate algorithm should be chosen. Also in their work they presented other Association Rules methods available in the literature including: single layer feedforward partially connected neural network, FP-Bonsai, AR-Markov Model or CLoPAR.

We can read a very different approach to association rules from the article Hruschka (2021) They analyzed the preferences of consumption baskets of products from the music and computer industries for different households. They used multivariate logit model (MVL), finite mixture extension (FM-MVL) and conditional restricted Boltzmann machine (CRBM). With these algorithms, they determined the differences between the average probabilities of buying and not buying specific products. They also discovered relationships in latent variables for subcategories in an interesting way. The essence of the relationship between two products is how they are connected, e.g., classical music with religious music or buying a monitor and a keyboard.

2 Key measures in Association Rules

Now let’s have a quick look at the 4 basic and most important measures used for association rules.

2.1 Support

The support measure tells us how large the common part is for the set A and B. We look for those transactions in which both a product from set A and a product from set B occurred, and then divide that by the number of total transactions. The measure of Support is determined by the following formula:

\[Support = \frac{\text{Number of transactions with both A and B}}{\text{Total number of transactions}} = P(A \cap B) \]

2.2 Confidence

To understand the Confidence measure in a simple way, it is simply a conditional probability in which we ask in how many situations of buying, for example, milk we could see situations of buying milk and cottage cheese. We answer the question what part of the set A is the common part of the set A and B. The equation for the Confidence measure can be written as follows:

\[Confidence = \frac{\text{Number of transactions with both A and B}}{\text{Total number of transactions with A}} = \frac{P(A \cap B)}{P(A)} \]

Additionally we can calculate ExpectedConfidence to get probability of occuring item from set B. The measure of ExpectedConfidence is determined by the following formula:

\[ExpectedConfidence = \frac{\text{Number of transactions with B}}{\text{Total number of transactions}} = P(B) \]

2.3 Lift

The last measure we should consider is the Lift measure. Lift is nothing more than showing how often the situation of milk with cottage cheese occurs compared to the independent events of buying milk or cottage cheese separately. A Lift of 3 means that you will notice 3 times as many situations where you buy milk and quark as when you buy only milk or only quark. The formula for the Lift measure can be written as follows:

\[Lift = \frac{\text{Confidence}}{\text{Expected Confidence}} = \frac{P(A \cap B)}{P(A)*P(B)} \]

3 Target analysis

Before we proceed to the target analysis of association rules, we should still take a bird’s eye view of the data in order to select appropriate and satisfactory parameters used for the Apriori algorithm.

3.1 Basic 1D information

3.1.1 First 6 transactions

transactions %>% 
  head() %>% 
  LIST()

## [[1]]
## [1] "citrus fruit"        "margarine"           "ready soups"        
## [4] "semi-finished bread"
## 
## [[2]]
## [1] "coffee"         "tropical fruit" "yogurt"        
## 
## [[3]]
## [1] "whole milk"
## 
## [[4]]
## [1] "cream cheese" "meat spreads" "pip fruit"    "yogurt"      
## 
## [[5]]
## [1] "condensed milk"           "long life bakery product"
## [3] "other vegetables"         "whole milk"              
## 
## [[6]]
## [1] "abrasive cleaner" "butter"           "rice"             "whole milk"      
## [5] "yogurt"

3.1.2 Frequency %

transactions %>% 
  itemFrequency() %>% 
  round(4) %>% 
  sort(decreasing = TRUE)

##                whole milk          other vegetables                rolls/buns 
##                    0.2555                    0.1935                    0.1839 
##                      soda                    yogurt             bottled water 
##                    0.1744                    0.1395                    0.1104 
##           root vegetables            tropical fruit             shopping bags 
##                    0.1090                    0.1049                    0.0984 
##                   sausage                    pastry              citrus fruit 
##                    0.0940                    0.0890                    0.0828 
##              bottled beer                newspapers               canned beer 
##                    0.0803                    0.0797                    0.0777 
##                 pip fruit     fruit/vegetable juice        whipped/sour cream 
##                    0.0756                    0.0721                    0.0717 
##               brown bread             domestic eggs               frankfurter 
##                    0.0649                    0.0634                    0.0590 
##                 margarine                    coffee                      pork 
##                    0.0585                    0.0580                    0.0577 
##                    butter                      curd                      beef 
##                    0.0554                    0.0533                    0.0525 
##                   napkins                 chocolate         frozen vegetables 
##                    0.0522                    0.0491                    0.0481 
##                   chicken               white bread              cream cheese 
##                    0.0429                    0.0421                    0.0397 
##                   waffles               salty snack                   dessert 
##                    0.0381                    0.0378                    0.0371 
##  long life bakery product                     sugar                  UHT-milk 
##                    0.0371                    0.0338                    0.0335 
##                   berries            hamburger meat          hygiene articles 
##                    0.0332                    0.0332                    0.0323 
##                    onions       specialty chocolate                     candy 
##                    0.0310                    0.0304                    0.0299 
##              frozen meals           misc. beverages                       oil 
##                    0.0284                    0.0283                    0.0281 
##               butter milk             specialty bar                 beverages 
##                    0.0280                    0.0274                    0.0260 
##                       ham                      meat                 ice cream 
##                    0.0260                    0.0258                    0.0250 
##               hard cheese             sliced cheese                  cat food 
##                    0.0245                    0.0245                    0.0233 
##                    grapes               chewing gum                 detergent 
##                    0.0224                    0.0210                    0.0192 
##            red/blush wine                white wine        pickled vegetables 
##                    0.0192                    0.0189                    0.0179 
##             baking powder       semi-finished bread                    dishes 
##                    0.0177                    0.0177                    0.0176 
##                     flour             potted plants               soft cheese 
##                    0.0173                    0.0173                    0.0171 
##          processed cheese                     herbs               canned fish 
##                    0.0166                    0.0163                    0.0150 
##                     pasta         seasonal products                  cake bar 
##                    0.0149                    0.0142                    0.0131 
## packaged fruit/vegetables                   mustard               frozen fish 
##                    0.0130                    0.0120                    0.0117 
##           cling film/bags             spread cheese                    liquor 
##                    0.0113                    0.0112                    0.0111 
##         canned vegetables            frozen dessert                      salt 
##                    0.0108                    0.0108                    0.0108 
##              dish cleaner            condensed milk            flower (seeds) 
##                    0.0105                    0.0103                    0.0103 
##             roll products                  pet care                photo/film 
##                    0.0102                    0.0095                    0.0093 
##                mayonnaise             sweet spreads     chocolate marshmallow 
##                    0.0092                    0.0090                    0.0089 
##                   candles          specialty cheese                  dog food 
##                    0.0088                    0.0085                    0.0084 
##    frozen potato products    house keeping products                    turkey 
##                    0.0084                    0.0081                    0.0081 
##     Instant food products        liquor (appetizer)                      rice 
##                    0.0080                    0.0078                    0.0076 
##            instant coffee                   popcorn                  zwieback 
##                    0.0073                    0.0072                    0.0069 
##                     soups         finished products                   vinegar 
##                    0.0067                    0.0065                    0.0065 
##  female sanitary products            kitchen towels                   cereals 
##                    0.0060                    0.0060                    0.0057 
##               dental care            sparkling wine                    sauces 
##                    0.0057                    0.0056                    0.0055 
##                  softener                       jam                    spices 
##                    0.0055                    0.0053                    0.0052 
##                   cleaner               curd cheese                liver loaf 
##                    0.0051                    0.0051                    0.0051 
##            male cosmetics                       rum                   ketchup 
##                    0.0046                    0.0044                    0.0043 
##              meat spreads                    brandy               light bulbs 
##                    0.0043                    0.0042                    0.0042 
##                       tea             specialty fat          abrasive cleaner 
##                    0.0039                    0.0036                    0.0035 
##                 skin care               nuts/prunes          artif. sweetener 
##                    0.0035                    0.0034                    0.0033 
##              canned fruit                     syrup                 nut snack 
##                    0.0033                    0.0033                    0.0031 
##            snack products                      fish           potato products 
##                    0.0031                    0.0029                    0.0028 
##          bathroom cleaner                  cookware                      soap 
##                    0.0027                    0.0027                    0.0026 
##         cooking chocolate            pudding powder                   tidbits 
##                    0.0024                    0.0023                    0.0023 
##              cocoa drinks           organic sausage                  prosecco 
##                    0.0022                    0.0022                    0.0020 
##    flower soil/fertilizer               ready soups      specialty vegetables 
##                    0.0019                    0.0018                    0.0017 
##          organic products               decalcifier                     honey 
##                    0.0016                    0.0015                    0.0015 
##                     cream             frozen fruits                hair spray 
##                    0.0013                    0.0012                    0.0011 
##           rubbing alcohol                   liqueur           make up remover 
##                    0.0010                    0.0009                    0.0008 
##            salad dressing                    whisky            toilet cleaner 
##                    0.0008                    0.0008                    0.0007 
##            baby cosmetics            frozen chicken                      bags 
##                    0.0006                    0.0006                    0.0004 
##           kitchen utensil     preservation products                 baby food 
##                    0.0004                    0.0002                    0.0001 
##      sound storage medium 
##                    0.0001

3.1.3 Frequency units

transactions %>% 
  itemFrequency(type = "absolute") %>% 
  sort(decreasing = TRUE)

##                whole milk          other vegetables                rolls/buns 
##                      2513                      1903                      1809 
##                      soda                    yogurt             bottled water 
##                      1715                      1372                      1086 
##           root vegetables            tropical fruit             shopping bags 
##                      1072                      1032                       968 
##                   sausage                    pastry              citrus fruit 
##                       924                       875                       814 
##              bottled beer                newspapers               canned beer 
##                       790                       784                       764 
##                 pip fruit     fruit/vegetable juice        whipped/sour cream 
##                       744                       709                       705 
##               brown bread             domestic eggs               frankfurter 
##                       638                       624                       580 
##                 margarine                    coffee                      pork 
##                       575                       570                       567 
##                    butter                      curd                      beef 
##                       545                       524                       516 
##                   napkins                 chocolate         frozen vegetables 
##                       513                       483                       473 
##                   chicken               white bread              cream cheese 
##                       422                       414                       390 
##                   waffles               salty snack                   dessert 
##                       375                       372                       365 
##  long life bakery product                     sugar                  UHT-milk 
##                       365                       332                       329 
##                   berries            hamburger meat          hygiene articles 
##                       327                       327                       318 
##                    onions       specialty chocolate                     candy 
##                       305                       299                       294 
##              frozen meals           misc. beverages                       oil 
##                       279                       278                       276 
##               butter milk             specialty bar                 beverages 
##                       275                       269                       256 
##                       ham                      meat                 ice cream 
##                       256                       254                       246 
##               hard cheese             sliced cheese                  cat food 
##                       241                       241                       229 
##                    grapes               chewing gum                 detergent 
##                       220                       207                       189 
##            red/blush wine                white wine        pickled vegetables 
##                       189                       186                       176 
##             baking powder       semi-finished bread                    dishes 
##                       174                       174                       173 
##                     flour             potted plants               soft cheese 
##                       170                       170                       168 
##          processed cheese                     herbs               canned fish 
##                       163                       160                       148 
##                     pasta         seasonal products                  cake bar 
##                       147                       140                       129 
## packaged fruit/vegetables                   mustard               frozen fish 
##                       128                       118                       115 
##           cling film/bags             spread cheese                    liquor 
##                       111                       110                       109 
##         canned vegetables            frozen dessert                      salt 
##                       106                       106                       106 
##              dish cleaner            condensed milk            flower (seeds) 
##                       103                       101                       101 
##             roll products                  pet care                photo/film 
##                       100                        93                        91 
##                mayonnaise             sweet spreads     chocolate marshmallow 
##                        90                        89                        88 
##                   candles          specialty cheese                  dog food 
##                        87                        84                        83 
##    frozen potato products    house keeping products                    turkey 
##                        83                        80                        80 
##     Instant food products        liquor (appetizer)                      rice 
##                        79                        77                        75 
##            instant coffee                   popcorn                  zwieback 
##                        72                        71                        68 
##                     soups         finished products                   vinegar 
##                        66                        64                        64 
##  female sanitary products            kitchen towels                   cereals 
##                        59                        59                        56 
##               dental care            sparkling wine                    sauces 
##                        56                        55                        54 
##                  softener                       jam                    spices 
##                        54                        52                        51 
##                   cleaner               curd cheese                liver loaf 
##                        50                        50                        50 
##            male cosmetics                       rum                   ketchup 
##                        45                        43                        42 
##              meat spreads                    brandy               light bulbs 
##                        42                        41                        41 
##                       tea             specialty fat          abrasive cleaner 
##                        38                        35                        34 
##                 skin care               nuts/prunes          artif. sweetener 
##                        34                        33                        32 
##              canned fruit                     syrup                 nut snack 
##                        32                        32                        30 
##            snack products                      fish           potato products 
##                        30                        29                        28 
##          bathroom cleaner                  cookware                      soap 
##                        27                        27                        26 
##         cooking chocolate            pudding powder                   tidbits 
##                        24                        23                        23 
##              cocoa drinks           organic sausage                  prosecco 
##                        22                        22                        20 
##    flower soil/fertilizer               ready soups      specialty vegetables 
##                        19                        18                        17 
##          organic products               decalcifier                     honey 
##                        16                        15                        15 
##                     cream             frozen fruits                hair spray 
##                        13                        12                        11 
##           rubbing alcohol                   liqueur           make up remover 
##                        10                         9                         8 
##            salad dressing                    whisky            toilet cleaner 
##                         8                         8                         7 
##            baby cosmetics            frozen chicken                      bags 
##                         6                         6                         4 
##           kitchen utensil     preservation products                 baby food 
##                         4                         2                         1 
##      sound storage medium 
##                         1

3.1.4 Basic Frequency Barplot

transactions %>% 
  itemFrequencyPlot(support = 0.1)

We see that we have the most transactions with the following food items: milk, vegetables, rolls, sodas, yogurt or bottled water. Of course, further down the list are more food items, but for these situations we simply have fewer transactions for the given dataset.

3.2 Basic 2D information

3.2.1 2D CrossTable count

transactions %>% 
  crossTable(sort = TRUE) -> cross1
cross1[1:8, 1:8] %>% 
  kable(caption = "Relation between products by count") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Relation between products by count
	whole milk	other vegetables	rolls/buns	soda	yogurt	bottled water	root vegetables	tropical fruit
whole milk	2513	736	557	394	551	337	481	416
other vegetables	736	1903	419	322	427	243	466	353
rolls/buns	557	419	1809	377	338	238	239	242
soda	394	322	377	1715	269	285	183	205
yogurt	551	427	338	269	1372	225	254	288
bottled water	337	243	238	285	225	1086	153	181
root vegetables	481	466	239	183	254	153	1072	207
tropical fruit	416	353	242	205	288	181	207	1032

3.2.2 2D CrossTable support

transactions %>% 
  crossTable(measure = "support", sort = TRUE) -> cross2
round(cross2[1:8, 1:8], 3) %>% 
  kable(caption = "Relation between products by support") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Relation between products by support
	whole milk	other vegetables	rolls/buns	soda	yogurt	bottled water	root vegetables	tropical fruit
whole milk	0.256	0.075	0.057	0.040	0.056	0.034	0.049	0.042
other vegetables	0.075	0.193	0.043	0.033	0.043	0.025	0.047	0.036
rolls/buns	0.057	0.043	0.184	0.038	0.034	0.024	0.024	0.025
soda	0.040	0.033	0.038	0.174	0.027	0.029	0.019	0.021
yogurt	0.056	0.043	0.034	0.027	0.140	0.023	0.026	0.029
bottled water	0.034	0.025	0.024	0.029	0.023	0.110	0.016	0.018
root vegetables	0.049	0.047	0.024	0.019	0.026	0.016	0.109	0.021
tropical fruit	0.042	0.036	0.025	0.021	0.029	0.018	0.021	0.105

3.2.3 2D CrossTable lift

transactions %>% 
  crossTable(measure = "lift", sort = TRUE) -> cross3
round(cross3[1:8, 1:8], 3) %>% 
  kable(caption = "Relation between products by lift") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Relation between products by lift
	whole milk	other vegetables	rolls/buns	soda	yogurt	bottled water	root vegetables	tropical fruit
whole milk	NA	1.514	1.205	0.899	1.572	1.214	1.756	1.578
other vegetables	1.514	NA	1.197	0.970	1.608	1.156	2.247	1.768
rolls/buns	1.205	1.197	NA	1.195	1.339	1.191	1.212	1.275
soda	0.899	0.970	1.195	NA	1.124	1.505	0.979	1.139
yogurt	1.572	1.608	1.339	1.124	NA	1.485	1.698	2.000
bottled water	1.214	1.156	1.191	1.505	1.485	NA	1.293	1.588
root vegetables	1.756	2.247	1.212	0.979	1.698	1.293	NA	1.840
tropical fruit	1.578	1.768	1.275	1.139	2.000	1.588	1.840	NA

In the food pairings, the most interesting ones are those that achieve the highest values for the Support, Confidence, and Lift measures. Yes for Lift we see that root vegetables are highly correlated with other vegetables. This is due to the fact that most often vegetables are located very close to each other in the store in one sector.

3.3 Apriori

We begin the analysis itself with the Apriori algorithm by selecting the parameters Support, Confidence, and the minimum number of food items in the datasets.

transactions %>% 
  apriori(parameter = list(support = 0.01, confidence = 0.2, minlen = 2)) -> groceriesRules

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [230 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

3.4 Groceries Rules

3.4.1 Rules by Support

groceriesRules %>% 
  sort(by = "support", decreasing=TRUE) %>% 
  head() %>% 
  inspect() %>% 
  kable(caption = "Groceries Rules ordered by support") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

##     lhs                   rhs                support    confidence coverage 
## [1] {other vegetables} => {whole milk}       0.07483477 0.3867578  0.1934926
## [2] {whole milk}       => {other vegetables} 0.07483477 0.2928770  0.2555160
## [3] {rolls/buns}       => {whole milk}       0.05663447 0.3079049  0.1839349
## [4] {whole milk}       => {rolls/buns}       0.05663447 0.2216474  0.2555160
## [5] {yogurt}           => {whole milk}       0.05602440 0.4016035  0.1395018
## [6] {whole milk}       => {yogurt}           0.05602440 0.2192598  0.2555160
##     lift     count
## [1] 1.513634 736  
## [2] 1.513634 736  
## [3] 1.205032 557  
## [4] 1.205032 557  
## [5] 1.571735 551  
## [6] 1.571735 551

Groceries Rules ordered by support
	lhs		rhs	support	confidence	coverage	lift	count
[1]	{other vegetables}	=>	{whole milk}	0.0748348	0.3867578	0.1934926	1.513634	736
[2]	{whole milk}	=>	{other vegetables}	0.0748348	0.2928770	0.2555160	1.513634	736
[3]	{rolls/buns}	=>	{whole milk}	0.0566345	0.3079049	0.1839349	1.205032	557
[4]	{whole milk}	=>	{rolls/buns}	0.0566345	0.2216474	0.2555160	1.205032	557
[5]	{yogurt}	=>	{whole milk}	0.0560244	0.4016035	0.1395018	1.571735	551
[6]	{whole milk}	=>	{yogurt}	0.0560244	0.2192598	0.2555160	1.571735	551

3.4.2 Rules by Confidence

groceriesRules %>% 
  sort(by = "confidence", decreasing=TRUE) %>% 
  head() %>% 
  inspect() %>% 
  kable(caption = "Groceries Rules ordered by confidence") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

##     lhs                                  rhs                support   
## [1] {citrus fruit, root vegetables}   => {other vegetables} 0.01037112
## [2] {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3] {curd, yogurt}                    => {whole milk}       0.01006609
## [4] {butter, other vegetables}        => {whole milk}       0.01148958
## [5] {root vegetables, tropical fruit} => {whole milk}       0.01199797
## [6] {root vegetables, yogurt}         => {whole milk}       0.01453991
##     confidence coverage   lift     count
## [1] 0.5862069  0.01769192 3.029608 102  
## [2] 0.5845411  0.02104728 3.020999 121  
## [3] 0.5823529  0.01728521 2.279125  99  
## [4] 0.5736041  0.02003050 2.244885 113  
## [5] 0.5700483  0.02104728 2.230969 118  
## [6] 0.5629921  0.02582613 2.203354 143

Groceries Rules ordered by confidence
	lhs		rhs	support	confidence	coverage	lift	count
[1]	{citrus fruit, root vegetables}	=>	{other vegetables}	0.0103711	0.5862069	0.0176919	3.029608	102
[2]	{root vegetables, tropical fruit}	=>	{other vegetables}	0.0123030	0.5845411	0.0210473	3.020999	121
[3]	{curd, yogurt}	=>	{whole milk}	0.0100661	0.5823529	0.0172852	2.279125	99
[4]	{butter, other vegetables}	=>	{whole milk}	0.0114896	0.5736041	0.0200305	2.244885	113
[5]	{root vegetables, tropical fruit}	=>	{whole milk}	0.0119980	0.5700483	0.0210473	2.230969	118
[6]	{root vegetables, yogurt}	=>	{whole milk}	0.0145399	0.5629921	0.0258261	2.203354	143

3.4.3 Rules by Lift

groceriesRules %>% 
  sort(by = "lift", decreasing=TRUE) %>% 
  head() %>% 
  inspect() %>% 
  kable(caption = "Groceries Rules ordered by lift") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

##     lhs                                   rhs                  support   
## [1] {citrus fruit, other vegetables}   => {root vegetables}    0.01037112
## [2] {other vegetables, yogurt}         => {whipped/sour cream} 0.01016777
## [3] {other vegetables, tropical fruit} => {root vegetables}    0.01230300
## [4] {beef}                             => {root vegetables}    0.01738688
## [5] {citrus fruit, root vegetables}    => {other vegetables}   0.01037112
## [6] {root vegetables, tropical fruit}  => {other vegetables}   0.01230300
##     confidence coverage   lift     count
## [1] 0.3591549  0.02887646 3.295045 102  
## [2] 0.2341920  0.04341637 3.267062 100  
## [3] 0.3427762  0.03589222 3.144780 121  
## [4] 0.3313953  0.05246568 3.040367 171  
## [5] 0.5862069  0.01769192 3.029608 102  
## [6] 0.5845411  0.02104728 3.020999 121

Groceries Rules ordered by lift
	lhs		rhs	support	confidence	coverage	lift	count
[1]	{citrus fruit, other vegetables}	=>	{root vegetables}	0.0103711	0.3591549	0.0288765	3.295046	102
[2]	{other vegetables, yogurt}	=>	{whipped/sour cream}	0.0101678	0.2341920	0.0434164	3.267062	100
[3]	{other vegetables, tropical fruit}	=>	{root vegetables}	0.0123030	0.3427762	0.0358922	3.144780	121
[4]	{beef}	=>	{root vegetables}	0.0173869	0.3313953	0.0524657	3.040367	171
[5]	{citrus fruit, root vegetables}	=>	{other vegetables}	0.0103711	0.5862069	0.0176919	3.029608	102
[6]	{root vegetables, tropical fruit}	=>	{other vegetables}	0.0123030	0.5845411	0.0210473	3.020999	121

3.4.4 Rules by Count

groceriesRules %>% 
  sort(by = "count", decreasing=TRUE) %>% 
  head() %>% 
  inspect() %>% 
  kable(caption = "Groceries Rules ordered by count") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

##     lhs                   rhs                support    confidence coverage 
## [1] {other vegetables} => {whole milk}       0.07483477 0.3867578  0.1934926
## [2] {whole milk}       => {other vegetables} 0.07483477 0.2928770  0.2555160
## [3] {rolls/buns}       => {whole milk}       0.05663447 0.3079049  0.1839349
## [4] {whole milk}       => {rolls/buns}       0.05663447 0.2216474  0.2555160
## [5] {yogurt}           => {whole milk}       0.05602440 0.4016035  0.1395018
## [6] {whole milk}       => {yogurt}           0.05602440 0.2192598  0.2555160
##     lift     count
## [1] 1.513634 736  
## [2] 1.513634 736  
## [3] 1.205032 557  
## [4] 1.205032 557  
## [5] 1.571735 551  
## [6] 1.571735 551

Groceries Rules ordered by count
	lhs		rhs	support	confidence	coverage	lift	count
[1]	{other vegetables}	=>	{whole milk}	0.0748348	0.3867578	0.1934926	1.513634	736
[2]	{whole milk}	=>	{other vegetables}	0.0748348	0.2928770	0.2555160	1.513634	736
[3]	{rolls/buns}	=>	{whole milk}	0.0566345	0.3079049	0.1839349	1.205032	557
[4]	{whole milk}	=>	{rolls/buns}	0.0566345	0.2216474	0.2555160	1.205032	557
[5]	{yogurt}	=>	{whole milk}	0.0560244	0.4016035	0.1395018	1.571735	551
[6]	{whole milk}	=>	{yogurt}	0.0560244	0.2192598	0.2555160	1.571735	551

3.5 Analyzing Apriori Rules

After applying Apriori analysis, our algorithm created 230 rules between food items.

summary(groceriesRules)

## set of 230 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3 
## 150  80 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   2.000   2.348   3.000   3.000 
## 
## summary of quality measures:
##     support          confidence        coverage            lift       
##  Min.   :0.01007   Min.   :0.2000   Min.   :0.01729   Min.   :0.8991  
##  1st Qu.:0.01202   1st Qu.:0.2465   1st Qu.:0.03437   1st Qu.:1.4496  
##  Median :0.01484   Median :0.3172   Median :0.05231   Median :1.7303  
##  Mean   :0.01905   Mean   :0.3324   Mean   :0.06311   Mean   :1.7930  
##  3rd Qu.:0.02224   3rd Qu.:0.4037   3rd Qu.:0.07565   3rd Qu.:2.0800  
##  Max.   :0.07483   Max.   :0.5862   Max.   :0.25552   Max.   :3.2950  
##      count      
##  Min.   : 99.0  
##  1st Qu.:118.2  
##  Median :146.0  
##  Mean   :187.3  
##  3rd Qu.:218.8  
##  Max.   :736.0  
## 
## mining info:
##  data ntransactions support confidence
##     .          9835    0.01        0.2
##                                                                               call
##  apriori(data = ., parameter = list(support = 0.01, confidence = 0.2, minlen = 2))

Now let’s try to see what are the most common products that make customers more willing to buy milk.

transactions %>% 
  apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  head(10) %>% 
  inspect() %>% 
  kable(caption = "Relation between products by support") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

##      lhs                                    rhs          support    confidence
## [1]  {other vegetables, yogurt}          => {whole milk} 0.02226741 0.5128806 
## [2]  {butter}                            => {whole milk} 0.02755465 0.4972477 
## [3]  {curd}                              => {whole milk} 0.02613116 0.4904580 
## [4]  {other vegetables, root vegetables} => {whole milk} 0.02318251 0.4892704 
## [5]  {domestic eggs}                     => {whole milk} 0.02999492 0.4727564 
## [6]  {whipped/sour cream}                => {whole milk} 0.03223183 0.4496454 
## [7]  {root vegetables}                   => {whole milk} 0.04890696 0.4486940 
## [8]  {frozen vegetables}                 => {whole milk} 0.02043721 0.4249471 
## [9]  {margarine}                         => {whole milk} 0.02409761 0.4121739 
## [10] {beef}                              => {whole milk} 0.02125064 0.4050388 
##      coverage   lift     count
## [1]  0.04341637 2.007235 219  
## [2]  0.05541434 1.946053 271  
## [3]  0.05327911 1.919481 257  
## [4]  0.04738180 1.914833 228  
## [5]  0.06344687 1.850203 295  
## [6]  0.07168277 1.759754 317  
## [7]  0.10899847 1.756031 481  
## [8]  0.04809354 1.663094 201  
## [9]  0.05846467 1.613104 237  
## [10] 0.05246568 1.585180 209

Relation between products by support
	lhs		rhs	support	confidence	coverage	lift	count
[1]	{other vegetables, yogurt}	=>	{whole milk}	0.0222674	0.5128806	0.0434164	2.007235	219
[2]	{butter}	=>	{whole milk}	0.0275547	0.4972477	0.0554143	1.946053	271
[3]	{curd}	=>	{whole milk}	0.0261312	0.4904580	0.0532791	1.919480	257
[4]	{other vegetables, root vegetables}	=>	{whole milk}	0.0231825	0.4892704	0.0473818	1.914833	228
[5]	{domestic eggs}	=>	{whole milk}	0.0299949	0.4727564	0.0634469	1.850203	295
[6]	{whipped/sour cream}	=>	{whole milk}	0.0322318	0.4496454	0.0716828	1.759754	317
[7]	{root vegetables}	=>	{whole milk}	0.0489070	0.4486940	0.1089985	1.756031	481
[8]	{frozen vegetables}	=>	{whole milk}	0.0204372	0.4249471	0.0480935	1.663094	201
[9]	{margarine}	=>	{whole milk}	0.0240976	0.4121739	0.0584647	1.613104	237
[10]	{beef}	=>	{whole milk}	0.0212506	0.4050388	0.0524657	1.585179	209

As you can see, customers are more likely to buy milk with items in the dairy category. When a customer walks up to a refrigerator with dairy products, he or she will immediately wonder if he or she needs other dairy-related food products such as butter, yogurt, eggs or cream.

transactions %>% 
  apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  head(10) %>% 
  inspect() %>% 
  kable(caption = "Relation between products by support") %>%
  kable_styling(font_size = 12, bootstrap_options = c("striped", "hover", "condensed", "responsive"))

##      lhs             rhs                support    confidence coverage 
## [1]  {rolls/buns} => {whole milk}       0.05663447 0.3079049  0.1839349
## [2]  {}           => {whole milk}       0.25551601 0.2555160  1.0000000
## [3]  {rolls/buns} => {other vegetables} 0.04260295 0.2316197  0.1839349
## [4]  {rolls/buns} => {soda}             0.03833249 0.2084024  0.1839349
## [5]  {}           => {other vegetables} 0.19349263 0.1934926  1.0000000
## [6]  {rolls/buns} => {yogurt}           0.03436706 0.1868436  0.1839349
## [7]  {}           => {soda}             0.17437722 0.1743772  1.0000000
## [8]  {rolls/buns} => {sausage}          0.03060498 0.1663903  0.1839349
## [9]  {}           => {yogurt}           0.13950178 0.1395018  1.0000000
## [10] {rolls/buns} => {tropical fruit}   0.02460600 0.1337756  0.1839349
##      lift     count
## [1]  1.205032  557 
## [2]  1.000000 2513 
## [3]  1.197047  419 
## [4]  1.195124  377 
## [5]  1.000000 1903 
## [6]  1.339363  338 
## [7]  1.000000 1715 
## [8]  1.771048  301 
## [9]  1.000000 1372 
## [10] 1.274886  242

Relation between products by support
	lhs		rhs	support	confidence	coverage	lift	count
[1]	{rolls/buns}	=>	{whole milk}	0.0566345	0.3079049	0.1839349	1.205032	557
[2]	{}	=>	{whole milk}	0.2555160	0.2555160	1.0000000	1.000000	2513
[3]	{rolls/buns}	=>	{other vegetables}	0.0426029	0.2316197	0.1839349	1.197046	419
[4]	{rolls/buns}	=>	{soda}	0.0383325	0.2084024	0.1839349	1.195124	377
[5]	{}	=>	{other vegetables}	0.1934926	0.1934926	1.0000000	1.000000	1903
[6]	{rolls/buns}	=>	{yogurt}	0.0343671	0.1868436	0.1839349	1.339363	338
[7]	{}	=>	{soda}	0.1743772	0.1743772	1.0000000	1.000000	1715
[8]	{rolls/buns}	=>	{sausage}	0.0306050	0.1663903	0.1839349	1.771048	301
[9]	{}	=>	{yogurt}	0.1395018	0.1395018	1.0000000	1.000000	1372
[10]	{rolls/buns}	=>	{tropical fruit}	0.0246060	0.1337756	0.1839349	1.274886	242

On the other hand, we would like to look at what other products are taken into the basket by customers when they buy fresh buns. From the presented rules they also most often reach for milk, other vegetables, sodas, yoghurts or sausages. These are items that are strongly associated with breakfast products.

Significance analysis of the rules using Fisher’s test is shown below.

transactions %>% 
  apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  is.significant()

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [25] FALSE FALSE FALSE FALSE

transactions %>% 
  apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  is.redundant()

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE  TRUE  TRUE  TRUE

transactions %>% 
  apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  is.significant()

##  [1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE
## [13]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE

transactions %>% 
  apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  is.redundant()

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

groceriesRules %>% 
  plot(measure = c("support", "lift"), shading = "confidence")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

groceriesRules %>% 
  plot(method = "grouped")

transactions %>% 
  apriori(parameter = list(support = 0.02, conf = 0.2), appearance=list(default="lhs", rhs="whole milk"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  inspectDT()

transactions %>% 
  apriori(parameter = list(support = 0.01, conf = 0.1), appearance=list(default="rhs", lhs="rolls/buns"),
          control=list(verbose=F)) %>% 
  sort(by = "confidence", decreasing = TRUE) %>% 
  inspectDT()

groceriesRules %>% 
  plot(method = "graph", engine = "htmlwidget") %>% 
  visNodes(font=list(color = "black"))

## Warning: Too many rules supplied. Only plotting the best 100 using
## 'lift' (change control parameter max if needed).

The above tools allow for arbitrary searches of sets and relationships between food products. The interactive graph also allows tracing the paths of the 100 most significant relations between products in terms of Lift measure.

4 Summary

Analysis of associative rules for customers’ consumption basket is a very important tool to maximize sales and profits. Rational placement of product shelves and linking them thematically by specific categories will allow the customer to select the right products they need, encourage the purchase of new products, and minimize their time and distance to travel. It is of course necessary to go deeper into these relationships, to collect more transactions so that indeed the conclusions drawn from the Apriori analysis are translated into reality.

References

Hruschka, Harald. 2021. “Interdependences of Products in Market Baskets: Comparing the Conditional Restricted Boltzmann Machine to the Multivariate Logit Model.” Review of Marketing Science 19 (1): 33–51.

Khedkar, Sanket Sandip, and Sangeeta Kumari. 2021. “Market Basket Analysis Using a-Priori Algorithm and FP-Tree Algorithm.” In 2021 International Conference on Artificial Intelligence and Machine Vision (AIMV), 1–6. IEEE.

Nabila, Aulia Ghassani, Intan Nurma Yulita, Ino Suryana, and Mira Suryani. 2021. “Market Basket Analysis on Sales Transactions for Micro, Small and Medium Enterprises Using Apriori Algorithm to Support Business Promotion Strategy in RDA Hijab.” In 2021 International Conference on Artificial Intelligence and Big Data Analytics, 1–6. IEEE.

Market Basket Analysis of purchasing groceries goods

Robert Kowalczyk

02/2022