Data set Description
Consider a small supermarket with a data set containing transactions made by customers. Each transaction lists the items purchased together. We’ll use a simple data set with the following transactions:
Manual
Calculations.
Let’s calculate some association rules manually using the concept of support, confidence, and lift.
Definitions:
Support: The frequency with which an item set appears in the data
set.
Confidence: A measure of the likelihood of an item Y being purchased
when item X is purchased.
Lift: The ratio of the observed support to that expected if X and Y were
independent.
Calculations for Rule: Bread → Milk.
Support of Bread:
Transactions containing Bread: {1, 2, 4, 5}.
Support(Bread) = 4/5 = 0.8.
Support of Milk:
Transactions containing Milk: {1, 3, 4, 5}.
Support(Milk) = 4/5 = 0.8.
Support of Bread and Milk:
Transactions containing Bread and Milk: {1, 4, 5}.
Support(Bread and Milk) = 3/5 = 0.6.
Confidence (Bread → Milk):
Confidence = Support(Bread and Milk) / Support(Bread).
Confidence = 0.6 / 0.8 = 0.75.
Lift (Bread → Milk):
Lift = Confidence(Bread → Milk) / Support(Milk).
Lift = 0.75 / 0.8 = 0.9375
R Code for Market Basket Analysis.
Below is the R code using the arules package to perform market basket analysis on the same data set.
# Load necessary library
if (!require(arules)) install.packages("arules", dependencies=TRUE)
## Loading required package: arules
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arules)
# Define the transactions based on the provided dataset
transactions <- list(
c("Bread", "Milk"),
c("Bread", "Diapers", "Beer", "Eggs"),
c("Milk", "Diapers", "Beer", "Cola"),
c("Bread", "Milk", "Diapers", "Beer"),
c("Bread", "Milk", "Cola")
)
# Convert the list to a transaction class
trans <- as(transactions, "transactions")
# Apply the apriori algorithm to find association rules
rules <- apriori(trans, parameter=list(supp=0.1, conf=0.6))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 0
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [41 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Sort rules by confidence
sorted_rules <- sort(rules, by="confidence", decreasing=TRUE)
# Print the most effective rules based on confidence
inspect(sorted_rules)
## lhs rhs support confidence coverage lift
## [1] {Eggs} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [2] {Eggs} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [3] {Eggs} => {Bread} 0.2 1.0000000 0.2 1.2500000
## [4] {Cola} => {Milk} 0.4 1.0000000 0.4 1.2500000
## [5] {Beer} => {Diapers} 0.6 1.0000000 0.6 1.6666667
## [6] {Diapers} => {Beer} 0.6 1.0000000 0.6 1.6666667
## [7] {Beer, Eggs} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [8] {Diapers, Eggs} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [9] {Beer, Eggs} => {Bread} 0.2 1.0000000 0.2 1.2500000
## [10] {Bread, Eggs} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [11] {Diapers, Eggs} => {Bread} 0.2 1.0000000 0.2 1.2500000
## [12] {Bread, Eggs} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [13] {Beer, Cola} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [14] {Cola, Diapers} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [15] {Beer, Cola} => {Milk} 0.2 1.0000000 0.2 1.2500000
## [16] {Cola, Diapers} => {Milk} 0.2 1.0000000 0.2 1.2500000
## [17] {Bread, Cola} => {Milk} 0.2 1.0000000 0.2 1.2500000
## [18] {Beer, Milk} => {Diapers} 0.4 1.0000000 0.4 1.6666667
## [19] {Diapers, Milk} => {Beer} 0.4 1.0000000 0.4 1.6666667
## [20] {Beer, Bread} => {Diapers} 0.4 1.0000000 0.4 1.6666667
## [21] {Bread, Diapers} => {Beer} 0.4 1.0000000 0.4 1.6666667
## [22] {Beer, Diapers, Eggs} => {Bread} 0.2 1.0000000 0.2 1.2500000
## [23] {Beer, Bread, Eggs} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [24] {Bread, Diapers, Eggs} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [25] {Beer, Cola, Diapers} => {Milk} 0.2 1.0000000 0.2 1.2500000
## [26] {Beer, Cola, Milk} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [27] {Cola, Diapers, Milk} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [28] {Beer, Bread, Milk} => {Diapers} 0.2 1.0000000 0.2 1.6666667
## [29] {Bread, Diapers, Milk} => {Beer} 0.2 1.0000000 0.2 1.6666667
## [30] {} => {Milk} 0.8 0.8000000 1.0 1.0000000
## [31] {} => {Bread} 0.8 0.8000000 1.0 1.0000000
## [32] {Milk} => {Bread} 0.6 0.7500000 0.8 0.9375000
## [33] {Bread} => {Milk} 0.6 0.7500000 0.8 0.9375000
## [34] {Beer} => {Milk} 0.4 0.6666667 0.6 0.8333333
## [35] {Beer} => {Bread} 0.4 0.6666667 0.6 0.8333333
## [36] {Diapers} => {Milk} 0.4 0.6666667 0.6 0.8333333
## [37] {Diapers} => {Bread} 0.4 0.6666667 0.6 0.8333333
## [38] {Beer, Diapers} => {Milk} 0.4 0.6666667 0.6 0.8333333
## [39] {Beer, Diapers} => {Bread} 0.4 0.6666667 0.6 0.8333333
## [40] {} => {Beer} 0.6 0.6000000 1.0 1.0000000
## [41] {} => {Diapers} 0.6 0.6000000 1.0 1.0000000
## count
## [1] 1
## [2] 1
## [3] 1
## [4] 2
## [5] 3
## [6] 3
## [7] 1
## [8] 1
## [9] 1
## [10] 1
## [11] 1
## [12] 1
## [13] 1
## [14] 1
## [15] 1
## [16] 1
## [17] 1
## [18] 2
## [19] 2
## [20] 2
## [21] 2
## [22] 1
## [23] 1
## [24] 1
## [25] 1
## [26] 1
## [27] 1
## [28] 1
## [29] 1
## [30] 4
## [31] 4
## [32] 3
## [33] 3
## [34] 2
## [35] 2
## [36] 2
## [37] 2
## [38] 2
## [39] 2
## [40] 3
## [41] 3
If-Then Rules.
Rule 1:
If: Bread Then: Milk.
Support: 0.6 (3 out of 5 transactions).
Confidence: 0.75 (3 out of 4 transactions with Bread also contain Milk).
Lift: 0.9375.
Rule 2:
If: Milk Then: Bread.
Support: 0.6 (3 out of 5 transactions).
Confidence: 0.75 (3 out of 4 transactions with Milk also contain Bread).
Lift: 0.9375.
Rule 3:
If: Diapers Then: Beer.
Support: 0.4 (2 out of 5 transactions).
Confidence: 0.6667 (2 out of 3 transactions with Diapers also contain Beer).
Lift: 1.1111.
Rule 4:
If: Beer Then: Diapers.
Support: 0.4 (2 out of 5 transactions).
Confidence: 0.6667 (2 out of 3 transactions with Beer also contain Diapers).
Lift: 1.1111.
Rule 5:
If: Bread, Diapers Then: Beer.
Support: 0.2 (1 out of 5 transactions).
Confidence: 1.0 (1 out of 1 transaction with Bread and Diapers also contains Beer).
Lift: 1.6667.
Rule 6:
If: Beer, Diapers Then: Bread.
Support: 0.2 (1 out of 5 transactions).
Confidence: 0.5 (1 out of 2 transactions with Beer and Diapers also contains Bread). Lift: 0.625.
These rules help identify patterns in the purchasing behavior of customers in the supermarket. For instance, customers who buy Diapers are often likely to buy Beer, and those who buy Bread frequently purchase Milk.