Market Basket Analysis

Data set Description

Consider a small supermarket with a data set containing transactions made by customers. Each transaction lists the items purchased together. We’ll use a simple data set with the following transactions:

Figure 1 - Transaction ID and Items Purchased Manual Calculations.

Let’s calculate some association rules manually using the concept of support, confidence, and lift.

Definitions:

Support: The frequency with which an item set appears in the data set.
Confidence: A measure of the likelihood of an item Y being purchased when item X is purchased.
Lift: The ratio of the observed support to that expected if X and Y were independent.

Calculations for Rule: Bread → Milk.

Support of Bread:

Transactions containing Bread: {1, 2, 4, 5}.

Support(Bread) = 4/5 = 0.8.

Support of Milk:

Transactions containing Milk: {1, 3, 4, 5}.

Support(Milk) = 4/5 = 0.8.

Support of Bread and Milk:

Transactions containing Bread and Milk: {1, 4, 5}.

Support(Bread and Milk) = 3/5 = 0.6.

Confidence (Bread → Milk):

Confidence = Support(Bread and Milk) / Support(Bread).

Confidence = 0.6 / 0.8 = 0.75.

Lift (Bread → Milk):

Lift = Confidence(Bread → Milk) / Support(Milk).

Lift = 0.75 / 0.8 = 0.9375

R Code for Market Basket Analysis.

Below is the R code using the arules package to perform market basket analysis on the same data set.

# Load necessary library
if (!require(arules)) install.packages("arules", dependencies=TRUE)
## Loading required package: arules
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arules)

# Define the transactions based on the provided dataset
transactions <- list(
  c("Bread", "Milk"),
  c("Bread", "Diapers", "Beer", "Eggs"),
  c("Milk", "Diapers", "Beer", "Cola"),
  c("Bread", "Milk", "Diapers", "Beer"),
  c("Bread", "Milk", "Cola")
)

# Convert the list to a transaction class
trans <- as(transactions, "transactions")

# Apply the apriori algorithm to find association rules
rules <- apriori(trans, parameter=list(supp=0.1, conf=0.6))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [41 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# Sort rules by confidence
sorted_rules <- sort(rules, by="confidence", decreasing=TRUE)

# Print the most effective rules based on confidence
inspect(sorted_rules)
##      lhs                       rhs       support confidence coverage lift     
## [1]  {Eggs}                 => {Beer}    0.2     1.0000000  0.2      1.6666667
## [2]  {Eggs}                 => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [3]  {Eggs}                 => {Bread}   0.2     1.0000000  0.2      1.2500000
## [4]  {Cola}                 => {Milk}    0.4     1.0000000  0.4      1.2500000
## [5]  {Beer}                 => {Diapers} 0.6     1.0000000  0.6      1.6666667
## [6]  {Diapers}              => {Beer}    0.6     1.0000000  0.6      1.6666667
## [7]  {Beer, Eggs}           => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [8]  {Diapers, Eggs}        => {Beer}    0.2     1.0000000  0.2      1.6666667
## [9]  {Beer, Eggs}           => {Bread}   0.2     1.0000000  0.2      1.2500000
## [10] {Bread, Eggs}          => {Beer}    0.2     1.0000000  0.2      1.6666667
## [11] {Diapers, Eggs}        => {Bread}   0.2     1.0000000  0.2      1.2500000
## [12] {Bread, Eggs}          => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [13] {Beer, Cola}           => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [14] {Cola, Diapers}        => {Beer}    0.2     1.0000000  0.2      1.6666667
## [15] {Beer, Cola}           => {Milk}    0.2     1.0000000  0.2      1.2500000
## [16] {Cola, Diapers}        => {Milk}    0.2     1.0000000  0.2      1.2500000
## [17] {Bread, Cola}          => {Milk}    0.2     1.0000000  0.2      1.2500000
## [18] {Beer, Milk}           => {Diapers} 0.4     1.0000000  0.4      1.6666667
## [19] {Diapers, Milk}        => {Beer}    0.4     1.0000000  0.4      1.6666667
## [20] {Beer, Bread}          => {Diapers} 0.4     1.0000000  0.4      1.6666667
## [21] {Bread, Diapers}       => {Beer}    0.4     1.0000000  0.4      1.6666667
## [22] {Beer, Diapers, Eggs}  => {Bread}   0.2     1.0000000  0.2      1.2500000
## [23] {Beer, Bread, Eggs}    => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [24] {Bread, Diapers, Eggs} => {Beer}    0.2     1.0000000  0.2      1.6666667
## [25] {Beer, Cola, Diapers}  => {Milk}    0.2     1.0000000  0.2      1.2500000
## [26] {Beer, Cola, Milk}     => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [27] {Cola, Diapers, Milk}  => {Beer}    0.2     1.0000000  0.2      1.6666667
## [28] {Beer, Bread, Milk}    => {Diapers} 0.2     1.0000000  0.2      1.6666667
## [29] {Bread, Diapers, Milk} => {Beer}    0.2     1.0000000  0.2      1.6666667
## [30] {}                     => {Milk}    0.8     0.8000000  1.0      1.0000000
## [31] {}                     => {Bread}   0.8     0.8000000  1.0      1.0000000
## [32] {Milk}                 => {Bread}   0.6     0.7500000  0.8      0.9375000
## [33] {Bread}                => {Milk}    0.6     0.7500000  0.8      0.9375000
## [34] {Beer}                 => {Milk}    0.4     0.6666667  0.6      0.8333333
## [35] {Beer}                 => {Bread}   0.4     0.6666667  0.6      0.8333333
## [36] {Diapers}              => {Milk}    0.4     0.6666667  0.6      0.8333333
## [37] {Diapers}              => {Bread}   0.4     0.6666667  0.6      0.8333333
## [38] {Beer, Diapers}        => {Milk}    0.4     0.6666667  0.6      0.8333333
## [39] {Beer, Diapers}        => {Bread}   0.4     0.6666667  0.6      0.8333333
## [40] {}                     => {Beer}    0.6     0.6000000  1.0      1.0000000
## [41] {}                     => {Diapers} 0.6     0.6000000  1.0      1.0000000
##      count
## [1]  1    
## [2]  1    
## [3]  1    
## [4]  2    
## [5]  3    
## [6]  3    
## [7]  1    
## [8]  1    
## [9]  1    
## [10] 1    
## [11] 1    
## [12] 1    
## [13] 1    
## [14] 1    
## [15] 1    
## [16] 1    
## [17] 1    
## [18] 2    
## [19] 2    
## [20] 2    
## [21] 2    
## [22] 1    
## [23] 1    
## [24] 1    
## [25] 1    
## [26] 1    
## [27] 1    
## [28] 1    
## [29] 1    
## [30] 4    
## [31] 4    
## [32] 3    
## [33] 3    
## [34] 2    
## [35] 2    
## [36] 2    
## [37] 2    
## [38] 2    
## [39] 2    
## [40] 3    
## [41] 3

Market Basket Analysis Output

If-Then Rules.

Rule 1:

If: Bread Then: Milk.

Support: 0.6 (3 out of 5 transactions).

Confidence: 0.75 (3 out of 4 transactions with Bread also contain Milk).

Lift: 0.9375.

Rule 2:

If: Milk Then: Bread.

Support: 0.6 (3 out of 5 transactions).

Confidence: 0.75 (3 out of 4 transactions with Milk also contain Bread).

Lift: 0.9375.

Rule 3:

If: Diapers Then: Beer.

Support: 0.4 (2 out of 5 transactions).

Confidence: 0.6667 (2 out of 3 transactions with Diapers also contain Beer).

Lift: 1.1111.

Rule 4:

If: Beer Then: Diapers.

Support: 0.4 (2 out of 5 transactions).

Confidence: 0.6667 (2 out of 3 transactions with Beer also contain Diapers).

Lift: 1.1111.

Rule 5:

If: Bread, Diapers Then: Beer.

Support: 0.2 (1 out of 5 transactions).

Confidence: 1.0 (1 out of 1 transaction with Bread and Diapers also contains Beer).

Lift: 1.6667.

Rule 6:

If: Beer, Diapers Then: Bread.

Support: 0.2 (1 out of 5 transactions).

Confidence: 0.5 (1 out of 2 transactions with Beer and Diapers also contains Bread). Lift: 0.625.

These rules help identify patterns in the purchasing behavior of customers in the supermarket. For instance, customers who buy Diapers are often likely to buy Beer, and those who buy Bread frequently purchase Milk.