Introduction

Association Rule Mining is an Unsupervised Non-linear algorithm to uncover if and how items are associated with one another. Essentially, we use frequency analysis to show which items appear together in a transaction or relation. To begin, one needs a large transaction database. For companies like grocery stores and large chain restaurants, this is not difficult to create due to the high volume of transactions. This database is used to build a list of associations based on the transactions. The transactions with the highest frequencies tend to imply a link between the items in the transactions. Recommender systems work in this sort of way. An online retailer sees that you have put an item into your cart and they find, in the database, a few other items that were frequently purchased in transactions involving the item in your cart. There are three common ways to measure association:

Support

The support of an item is the proportion of the total transactions involving that item. Thus, we can think of the support of an item (or items) \(X\) as \[\text{Support of } X = P(X).\]

Confidence

The confidence of product \(Y\) given \(X\) is a measure of how likely that product \(Y\) is to be purchased given that product \(X\) was already purchased. Therefore, \[\begin{aligned} \text{Confidence of } Y \text{ given } X &= P(Y | X)\\ \\ &= \dfrac{P(X \cap Y)}{P(X)}\\\\ &= \dfrac{\text{Support of } Y \cap X}{\text{Support of } X}. \end{aligned}\]

Lift Ratio

The lift ratio is the ratio of the confidence of \(Y\) given \(X\) as a proportion of the total transactions involving \(Y\). As a result, we have \[\begin{aligned} \text{Lift Ratio} &= \dfrac{P(Y | X)}{P(Y)}\\ \\ &= \dfrac{\text{Confidence of } Y \text{ given } X}{\text{Support of } Y}. \end{aligned}\]

Association Rules

In marketing, analyzing consumer behavior can lead to insights regarding the placement and promotion of products. Specifically, marketers are interested in examining transaction data on customer purchases to identify the products commonly purchased together.

The development of probabilistic if-then statements, called association rules convey the likelihood of certain items being purchased together. Although association rules are an important tool in market basket analysis, they are also applicable to disciplines other than marketing. For example, association rules can assist medical researchers in understanding which treatments have been commonly prescribed to certain patient symptoms (and the resulting effects).

Association rules come in the form “if \(X\), then \(Y\)”. The “if” portion of the rule is called the antecedent. The part of the rule corresponding to the ``then’’ portion is called the consequent.

As mentioned, the support of an item set is the percentage of transactions in the data that include that item set. By only considering rules involving item sets with a support above a minimum level, inexplicable rules capturing random noise in the data can generally be avoided. A rule of thumb is to consider only association rules with a support of at least 20% of the total number of transactions. If an item set is particularly valuable and represents a lucrative opportunity, then the minimum support used to filter the rules can be lowered.

A property of a reliable association rule is that, given a transaction contains the antecedent item set, there is a high probability that it contains the consequent item set. As we mentioned, the conditional probability of P(consequent item set | antecedent item set) is called the confidence of a rule, and is computed as \[\text{Confidence} = \dfrac{\text{Support of Antecedent } \cap \text{ Consequent}}{\text{Support of Antecedent}}.\]

Although high value of confidence suggests a rule in which the consequent is frequently true when the antecedent is true, a high value of confidence can be misleading. For example, if the support of the consequent is high (that is, the item set corresponding to the then part is very frequent) then the confidence of the association rule could be high even if there is little or no association between the items. Therefore, to evaluate the efficiency of a rule, we need to compare the P(consequent | antecedent) to the P(consequent) to determine how much more likely the consequent item set is given the antecedent item set versus just the overall (unconditional) likelihood that a transaction contains the consequent. The ratio of the P(consequent | antecedent) to P(consequent) is called the lift ratio of the rule and is computed as: \[\text{Lift Ratio} = \dfrac{\text{Confidence of Rule}}{\text{Support of Confidence}}.\]

Thus, the lift ratio represents how effective an association rule is at identifying transactions in which the consequent item set occurs versus a randomly selected transaction. A lift ratio greater than one suggests that there is some usefulness to the rule and that it is better at identifying cases when the consequent occurs than having no rule at all. From the definition of lift ratio, we see that the denominator contains the probability of a transaction containing the consequent set multiplied by the probability of a transaction containing the antecedent set. This product of probabilities is equivalent to the expected likelihood of a transaction containing both the consequent item set and antecedent item set if these item sets were independent. In other words, a lift ratio greater than one suggests that the level of association between the antecedent and consequent is higher than would be expected if these item sets were independent.

Evaluating Association Rules

Although explicit measures such as support, confidence, and lift ratio can help filter association rules, an association rule is ultimately judged on how actionable it is and how well it explains the relationship between item sets.

For example, suppose Walmart mined its transaction data to uncover strong evidence of the association rule, “If a customer purchases a Barbie doll, then a customer also purchases a candy bar.” Walmart could leverage this relationship in product placement decisions as well as in advertisements and promotions, perhaps by placing a high-margin candy-bar display near the Barbie dolls. However, we must be aware that association rule analysis often results in obvious relationships such as “If a customer purchases hamburger patties, then a customer also purchases hamburger buns,” which may be true but provide no new insight. Association rules with a weak support measure often are inexplicable. For an association rule to be useful, it must be well supported and explain an important previously unknown relationship.

So How Does This Help Me?

Association Rule Mining is used when you want to find an association between different objects in a set, find frequent patterns in a transaction database, relational databases or any other information repository. The applications of Association Rule Mining are found in Marketing, Basket Data Analysis (or Market Basket Analysis) in retailing, clustering and classification. Clients can use those rules for numerous marketing strategies:

  • Changing the store layout according to trends
  • Customer behavior analysis
  • Catalogue design
  • Cross marketing on online stores
  • What are the trending items customers buy
  • Customized emails with add-on sales

Examples

We will create our own data for this example. One could certainly source data for this type of exercise; however, the mathematics is sufficiently simple that we can understand what is going on better if we use a small example. In the example, we have 8 purchases consisting of a subset of items: fruit, veggies, beer, milk, potatoes and steak. We assign a label to each transaction.

dat <- list(  
  c("fruit", "beer", "potatoes", "steak"),
  c("fruit", "beer", "potatoes"),
  c("fruit", "beer"), 
  c("fruit", "veggies"),
  c("milk", "beer", "potatoes", "steak"), 
  c("milk", "beer", "potatoes"), 
  c("milk", "beer"),
  c("milk", "veggies")
  )

names(dat) <- paste("T", c(1:8), sep = "")

head(dat,8)
## $T1
## [1] "fruit"    "beer"     "potatoes" "steak"   
## 
## $T2
## [1] "fruit"    "beer"     "potatoes"
## 
## $T3
## [1] "fruit" "beer" 
## 
## $T4
## [1] "fruit"   "veggies"
## 
## $T5
## [1] "milk"     "beer"     "potatoes" "steak"   
## 
## $T6
## [1] "milk"     "beer"     "potatoes"
## 
## $T7
## [1] "milk" "beer"
## 
## $T8
## [1] "milk"    "veggies"

We install a few required packages, too.

library(arules)
library(arulesViz)
library(tidyverse)

We turn the data into a sparce matrix in R. The matrix will look something like this: \[\begin{pmatrix} 1& 1& 1& 1& 0&0 \\ 1& 1& 1& 0& 0&0 \\ 1& 1& 0& 0& 0&0 \\ 1& 0& 0& 0& 1&0 \\ 0& 1& 1& 1& 0&1 \\ 0& 1& 1& 0& 0&1 \\ 0& 1& 0& 0& 0&1 \\ 0& 0& 0& 0& 1&1 \\ \end{pmatrix} \]

Transactions <- as(dat, "transactions")
Transactions
## transactions in sparse format with
##  8 transactions (rows) and
##  6 items (columns)
itemLabels(Transactions)
## [1] "beer"     "fruit"    "milk"     "potatoes" "steak"    "veggies"

We can look at the summary data for these transactions.

summary(Transactions)
## transactions as itemMatrix in sparse format with
##  8 rows (elements/itemsets/transactions) and
##  6 columns (items) and a density of 0.4583333 
## 
## most frequent items:
##     beer    fruit     milk potatoes    steak  (Other) 
##        6        4        4        4        2        2 
## 
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 
## 4 2 2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    2.00    2.50    2.75    3.25    4.00 
## 
## includes extended item information - examples:
##   labels
## 1   beer
## 2  fruit
## 3   milk
## 
## includes extended transaction information - examples:
##   transactionID
## 1            T1
## 2            T2
## 3            T3

We can plot the frequency of each item.

itemFrequencyPlot(Transactions, topN=10, cex.names=1)

Now, in order to find some rules, we use the apriori command.

#Min Support 0.3, confidence as 0.5.
rules <- apriori(Transactions, 
                 parameter = list(supp=0.3, conf=0.5, 
                                  minlen=2,
                                  maxlen=10, 
                                  target= "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5     0.3      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 8 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Now that the analysis is done, we can observe the summary and the rules.

summary(rules)
## set of 6 rules
## 
## rule length distribution (lhs + rhs):sizes
## 2 
## 6 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       2       2       2       2       2       2 
## 
## summary of quality measures:
##     support         confidence        coverage          lift      
##  Min.   :0.3750   Min.   :0.5000   Min.   :0.500   Min.   :1.000  
##  1st Qu.:0.3750   1st Qu.:0.5417   1st Qu.:0.500   1st Qu.:1.000  
##  Median :0.3750   Median :0.7083   Median :0.625   Median :1.000  
##  Mean   :0.4167   Mean   :0.6944   Mean   :0.625   Mean   :1.111  
##  3rd Qu.:0.4688   3rd Qu.:0.7500   3rd Qu.:0.750   3rd Qu.:1.250  
##  Max.   :0.5000   Max.   :1.0000   Max.   :0.750   Max.   :1.333  
##      count      
##  Min.   :3.000  
##  1st Qu.:3.000  
##  Median :3.000  
##  Mean   :3.333  
##  3rd Qu.:3.750  
##  Max.   :4.000  
## 
## mining info:
##          data ntransactions support confidence
##  Transactions             8     0.3        0.5
inspect(rules)
##     lhs           rhs        support confidence coverage lift     count
## [1] {fruit}    => {beer}     0.375   0.7500000  0.50     1.000000 3    
## [2] {beer}     => {fruit}    0.375   0.5000000  0.75     1.000000 3    
## [3] {milk}     => {beer}     0.375   0.7500000  0.50     1.000000 3    
## [4] {beer}     => {milk}     0.375   0.5000000  0.75     1.000000 3    
## [5] {potatoes} => {beer}     0.500   1.0000000  0.50     1.333333 4    
## [6] {beer}     => {potatoes} 0.500   0.6666667  0.75     1.333333 4

We can see that we have 6 rules. For instance, rule 1 says that if someone buys fruit, then they buy beer. This happens with support 0.375 (this combination occurs in 37.5% of all transactions), and a confidence of 0.75 (meaning that P(beer|fruit) = 75%).

We can restrict our analysis to only include certain items. For example:

beer_rules_rhs <- apriori(Transactions, 
                          parameter = list(supp=0.3, conf=0.5, 
                                         maxlen=10, 
                                         minlen=1),
                          appearance = list(default="lhs", rhs="beer"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5     0.3      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[6 item(s), 8 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(beer_rules_rhs)
##     lhs           rhs    support confidence coverage lift     count
## [1] {}         => {beer} 0.750   0.75       1.0      1.000000 6    
## [2] {fruit}    => {beer} 0.375   0.75       0.5      1.000000 3    
## [3] {milk}     => {beer} 0.375   0.75       0.5      1.000000 3    
## [4] {potatoes} => {beer} 0.500   1.00       0.5      1.333333 4

Finally, in order to visualize the associations, we can plot the information.

subrules <- head(rules, n=10, by="confidence")
plot(subrules, method ="graph", engine = "htmlwidget")

Citations

Camm, Jeffrey D. Business Analytics. Third edition. Boston, MA, USA: Cengage, 2019.

GeeksforGeeks. “Association Rule Mining in R Programming,” June 19, 2020. Available here.

Kirenz, Jan. “Introduction to Association Rule Mining in R.” Jan Kirenz, May 14, 2020. Available here.