Association Rule Mining (ARM)

Introduction with R

Dr. Mustafa Hameed

2026-03-31

What is Association Rule Mining?

  • Finds interesting relationships between items in a dataset
  • Most famous use case: Market Basket Analysis
  • Used in retail, healthcare, web mining, and recommendation systems

“Customers who buy bread and butter also tend to buy milk.”

{bread, butter} {milk}

Why Does It Matter?

Retail

  • Product placement
  • Bundle promotions
  • Cross-selling

Other Domains

  • Medical co-diagnoses
  • Web click-stream patterns
  • Fraud detection

Three Key Measures

Measure Formula What it tells us
Support freq(A ∪ B) / N How common is the itemset?
Confidence Supp(A ∪ B) / Supp(A) How reliable is the rule?
Lift Conf(A→B) / Supp(B) Is it better than chance?

Rule of Thumb

Lift > 1 → useful rule · Lift = 1 → no association · Lift < 1 → negative association

Support — Illustrated

5 transactions:

TID Items
T1 bread, butter, milk
T2 bread, butter
T3 bread, milk
T4 butter, milk
T5 bread, butter, milk

\[\text{Support}(\{bread, butter\}) = \frac{3}{5} = 0.60\]

Confidence — Illustrated

Rule: {bread, butter} → {milk}

  • Transactions with {bread, butter}: T1, T2, T5 → 3
  • Transactions with {bread, butter, milk}: T1, T5 → 2

\[\text{Confidence} = \frac{2/5}{3/5} = \frac{2}{3} \approx 0.67\]

67% of the time when someone buys bread + butter, they also buy milk.

Lift — Illustrated

\[\text{Lift} = \frac{\text{Confidence}}{\text{Support(milk)}} = \frac{0.67}{0.80} = 0.83\]

  • Lift < 1 → buying bread & butter does not strongly predict milk here
  • We need more data and lower thresholds for small datasets

The Apriori Algorithm

  1. Set minimum support — prune rare itemsets early
  2. Find frequent 1-itemsets {bread}, {milk}, …
  3. Build frequent 2-itemsets by joining 1-itemsets
  4. Continue growing until no new frequent itemsets are found
  5. Generate rules from frequent itemsets that meet min-confidence

Note

The key insight: a subset of a frequent itemset must also be frequent
(the anti-monotone property)

Step 1 — Load Packages

if (!requireNamespace("arules",    quietly = TRUE)) install.packages("arules")
if (!requireNamespace("arulesViz", quietly = TRUE)) install.packages("arulesViz")

library(arules)
library(arulesViz)
library(ggplot2)

cat("Ready!\n")
Ready!

Step 2 — Create a Small Dataset

baskets <- list(
  c("bread", "butter", "milk"),
  c("bread", "butter"),
  c("bread", "milk"),
  c("butter", "milk"),
  c("bread", "butter", "milk"),
  c("bread", "eggs"),
  c("bread", "butter", "eggs"),
  c("milk", "yogurt"),
  c("milk", "yogurt", "butter"),
  c("bread", "yogurt"),
  c("cheese", "bread", "butter"),
  c("cheese", "milk"),
  c("bread", "butter", "milk", "eggs"),
  c("yogurt", "cheese"),
  c("bread", "milk", "yogurt"),
  c("butter", "eggs"),
  c("bread", "cheese"),
  c("milk", "butter", "cheese"),
  c("bread", "butter", "yogurt"),
  c("eggs", "milk")
)

txns <- as(baskets, "transactions")
cat(length(txns), "transactions,", length(itemLabels(txns)), "items\n")
20 transactions, 6 items

Step 3 — Inspect the Data

inspect(txns[1:5])
    items                
[1] {bread, butter, milk}
[2] {bread, butter}      
[3] {bread, milk}        
[4] {butter, milk}       
[5] {bread, butter, milk}

Step 3 — Item Frequency Plot

itemFrequencyPlot(
  txns,
  type = "relative",
  col  = "steelblue",
  main = "Relative Item Frequency"
)

Step 4 — Find Frequent Itemsets

frequent_sets <- apriori(
  txns,
  parameter = list(supp = 0.20, target = "frequent itemsets")
)
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
         NA    0.1    1 none FALSE            TRUE       5     0.2      1
 maxlen            target  ext
     10 frequent itemsets TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 4 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6 item(s), 20 transaction(s)] done [0.00s].
sorting and recoding items ... [6 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
sorting transactions ... done [0.00s].
writing ... [9 set(s)] done [0.00s].
creating S4 object  ... done [0.00s].
cat("Frequent itemsets:", length(frequent_sets), "\n\n")
Frequent itemsets: 9 
inspect(sort(frequent_sets, by = "support")[1:8])
    items           support count
[1] {bread}         0.60    12   
[2] {milk}          0.55    11   
[3] {butter}        0.55    11   
[4] {bread, butter} 0.35     7   
[5] {yogurt}        0.30     6   
[6] {butter, milk}  0.30     6   
[7] {cheese}        0.25     5   
[8] {eggs}          0.25     5   

Step 5 — Generate Rules

rules <- apriori(
  txns,
  parameter = list(supp = 0.20, conf = 0.50, minlen = 2)
)
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.2      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 4 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6 item(s), 20 transaction(s)] done [0.00s].
sorting and recoding items ... [6 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [4 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
cat("Rules generated:", length(rules), "\n")
Rules generated: 4 
inspect(sort(rules, by = "lift", decreasing = TRUE))
    lhs         rhs      support confidence coverage lift      count
[1] {bread}  => {butter} 0.35    0.5833333  0.60     1.0606061 7    
[2] {butter} => {bread}  0.35    0.6363636  0.55     1.0606061 7    
[3] {milk}   => {butter} 0.30    0.5454545  0.55     0.9917355 6    
[4] {butter} => {milk}   0.30    0.5454545  0.55     0.9917355 6    

Step 6 — Scatter Plot

plot(rules, method = "scatterplot",
     measure = c("support", "confidence"),
     shading = "lift")

Step 6 — Network Graph

plot(sort(rules, by = "lift"),
     method = "graph",
     main   = "Association Rules Network")
Available control parameters (with default values):
layout   =  stress
circular     =  FALSE
ggraphdots   =  NULL
edges    =  <environment>
nodes    =  <environment>
nodetext     =  <environment>
colors   =  c("#EE0000FF", "#EEEEEEFF")
engine   =  ggplot2
max  =  100
verbose  =  FALSE

Step 7 — Filter for a Specific Item

What items predict buying milk?

rules_milk <- apriori(
  txns,
  parameter  = list(supp = 0.15, conf = 0.50, minlen = 2),
  appearance = list(rhs = "milk", default = "lhs")
)
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5    0.15      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 3 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[6 item(s), 20 transaction(s)] done [0.00s].
sorting and recoding items ... [6 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [2 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
cat("Rules predicting milk:", length(rules_milk), "\n\n")
Rules predicting milk: 2 
inspect(sort(rules_milk, by = "lift", decreasing = TRUE))
    lhs         rhs    support confidence coverage lift      count
[1] {butter} => {milk} 0.30    0.5454545  0.55     0.9917355 6    
[2] {yogurt} => {milk} 0.15    0.5000000  0.30     0.9090909 3    

Step 8 — Export to Data Frame

rules_df <- as(rules, "data.frame")
rules_df$support    <- round(rules_df$support,    3)
rules_df$confidence <- round(rules_df$confidence, 3)
rules_df$lift       <- round(rules_df$lift,       3)

head(rules_df[order(-rules_df$lift), ], 6)
                rules support confidence coverage  lift count
3 {butter} => {bread}    0.35      0.636     0.55 1.061     7
4 {bread} => {butter}    0.35      0.583     0.60 1.061     7
1  {milk} => {butter}    0.30      0.545     0.55 0.992     6
2  {butter} => {milk}    0.30      0.545     0.55 0.992     6

Summary

Step What we did
1 Loaded arules and arulesViz
2 Built a small transaction dataset
3 Explored item frequencies
4 Found frequent itemsets with Apriori
5 Generated rules (support + confidence filters)
6 Visualised rules (scatter & graph)
7 Filtered rules for a specific product
8 Exported rules to a data frame

Key Takeaways

  • Support — how common is the pattern?
  • Confidence — how reliable is the rule?
  • Lift > 1 — the rule is better than guessing
  • Apriori is efficient because it prunes low-support candidates early
  • arules + arulesViz make ARM easy in R

Try it yourself!

Change supp and conf thresholds and see how the number of rules changes.

References

  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. VLDB.
  2. Hahsler, M., Grün, B., & Hornik, K. (2005). arules — A computational environment for mining association rules. Journal of Statistical Software, 14(15).
  3. Hahsler, M. (2017). arulesViz: Visualizing Association Rules and Frequent Itemsets. Journal of Statistical Software, 76(2).