In this analysis, we will perform a Market Basket Analysis on a retail dataset from https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis using the Apriori algorithm for association rule mining. We will analyze item co-occurrence, generate rules, filter them, and visualize the results.
We will follow the steps of: 1. Data loading and cleaning 2. Market Basket Analysis using Apriori 3. Visualization of the rules 4. Focus on specific item rules
Load the required libraries for data manipulation, mining, and visualization.
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
library(arulesCBA)
library(ggrepel)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
library(ggplot2)
library(readxl)
Load the dataset and clean it by removing return orders (where the quantity is less than or equal to 0).
#setwd("C:/Users/32204/Desktop/USLfinalpapers/Association")
data <- read_excel("Assignment-1_Data.xlsx", sheet = "retaildata")
## Warning: Expecting numeric in A288774 / R288774C1: got 'A563185'
## Warning: Expecting numeric in A288775 / R288775C1: got 'A563186'
## Warning: Expecting numeric in A288776 / R288776C1: got 'A563187'
data <- data[data$Quantity > 0, ] # Remove return orders where Quantity is less than or equal to 0
data <- data[, c("BillNo", "Itemname")] # Keep only BillNo and Itemname columns
head(data)
## # A tibble: 6 × 2
## BillNo Itemname
## <dbl> <chr>
## 1 536365 WHITE HANGING HEART T-LIGHT HOLDER
## 2 536365 WHITE METAL LANTERN
## 3 536365 CREAM CUPID HEARTS COAT HANGER
## 4 536365 KNITTED UNION FLAG HOT WATER BOTTLE
## 5 536365 RED WOOLLY HOTTIE WHITE HEART.
## 6 536365 SET 7 BABUSHKA NESTING BOXES
Convert the data into a transaction format that the arules package can work with.
transactions <- as(split(data$Itemname, data$BillNo), "transactions")
## Warning in asMethod(object): removing duplicated items in transactions
inspect(transactions[1:10]) # Inspect the first 10 transactions to verify
## items transactionID
## [1] {CREAM CUPID HEARTS COAT HANGER,
## GLASS STAR FROSTED T-LIGHT HOLDER,
## KNITTED UNION FLAG HOT WATER BOTTLE,
## RED WOOLLY HOTTIE WHITE HEART.,
## SET 7 BABUSHKA NESTING BOXES,
## WHITE HANGING HEART T-LIGHT HOLDER,
## WHITE METAL LANTERN} 536365
## [2] {HAND WARMER RED POLKA DOT,
## HAND WARMER UNION JACK} 536366
## [3] {ASSORTED COLOUR BIRD ORNAMENT,
## BOX OF 6 ASSORTED COLOUR TEASPOONS,
## BOX OF VINTAGE ALPHABET BLOCKS,
## BOX OF VINTAGE JIGSAW BLOCKS,
## DOORMAT NEW ENGLAND,
## FELTCRAFT PRINCESS CHARLOTTE DOLL,
## HOME BUILDING BLOCK WORD,
## IVORY KNITTED MUG COSY,
## LOVE BUILDING BLOCK WORD,
## POPPY'S PLAYHOUSE BEDROOM,
## POPPY'S PLAYHOUSE KITCHEN,
## RECIPE BOX WITH METAL HEART} 536367
## [4] {BLUE COAT RACK PARIS FASHION,
## JAM MAKING SET WITH JARS,
## RED COAT RACK PARIS FASHION,
## YELLOW COAT RACK PARIS FASHION} 536368
## [5] {BATH BUILDING BLOCK WORD} 536369
## [6] {ALARM CLOCK BAKELIKE GREEN,
## ALARM CLOCK BAKELIKE PINK,
## ALARM CLOCK BAKELIKE RED,
## CHARLOTTE BAG DOLLY GIRL DESIGN,
## CIRCUS PARADE LUNCH BOX,
## INFLATABLE POLITICAL GLOBE,
## LUNCH BOX I LOVE LONDON,
## MINI JIGSAW CIRCUS PARADE,
## MINI JIGSAW SPACEBOY,
## MINI PAINT SET VINTAGE,
## PANDA AND BUNNIES STICKER SHEET,
## POSTAGE,
## RED TOADSTOOL LED NIGHT LIGHT,
## ROUND SNACK BOXES SET OF4 WOODLAND,
## SET 2 TEA TOWELS I LOVE LONDON,
## SET/2 RED RETROSPOT TEA TOWELS,
## SPACEBOY LUNCH BOX,
## STARS GIFT TAPE,
## VINTAGE HEADS AND TAILS CARD GAME,
## VINTAGE SEASIDE JIGSAW PUZZLES} 536370
## [7] {PAPER CHAIN KIT 50'S CHRISTMAS} 536371
## [8] {HAND WARMER RED POLKA DOT,
## HAND WARMER UNION JACK} 536372
## [9] {CREAM CUPID HEARTS COAT HANGER,
## EDWARDIAN PARASOL RED,
## GLASS STAR FROSTED T-LIGHT HOLDER,
## KNITTED UNION FLAG HOT WATER BOTTLE,
## RED WOOLLY HOTTIE WHITE HEART.,
## RETRO COFFEE MUGS ASSORTED,
## SAVE THE PLANET MUG,
## SET 7 BABUSHKA NESTING BOXES,
## VINTAGE BILLBOARD DRINK ME MUG,
## VINTAGE BILLBOARD LOVE/HATE MUG,
## WHITE HANGING HEART T-LIGHT HOLDER,
## WHITE METAL LANTERN,
## WOOD 2 DRAWER CABINET WHITE FINISH,
## WOOD S/3 CABINET ANT WHITE FINISH,
## WOODEN FRAME ANTIQUE WHITE,
## WOODEN PICTURE FRAME WHITE FINISH} 536373
## [10] {VICTORIAN SEWING BOX LARGE} 536374
Use Cross Table to analyze item co-occurrence
# Create a cross table to show how often items co-occur
cross_tab <- crossTable(transactions, measure = "count", sort = TRUE)
# Calculate item frequency and display the relative frequency of items
item_freq <- itemFrequency(transactions, type = "relative")
head(item_freq) # Display the top item frequencies
## *Boombox Ipod Classic *USB Office Mirror Ball
## 4.920291e-05 9.840583e-05
## ? 10 COLOUR SPACEBOY PEN
## 2.952175e-04 1.535131e-02
## 12 COLOURED PARTY BALLOONS 12 DAISY PEGS IN WOOD BOX
## 7.872466e-03 3.591813e-03
Use the Apriori algorithm to mine association rules with a minimum support of 1% and a confidence of 30%.
# Apply Apriori algorithm to generate association rules
rules <- apriori(transactions, parameter = list(supp = 0.01, conf = 0.3, minlen = 2, maxlen = 4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 203
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[4055 item(s), 20324 transaction(s)] done [0.08s].
## sorting and recoding items ... [779 item(s)] done [0.01s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(transactions, parameter = list(supp = 0.01, conf = 0.3, :
## Mining stopped (maxlen reached). Only patterns up to a length of 4 returned!
## done [0.02s].
## writing ... [1370 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Sort the rules by lift to prioritize stronger rules
rules <- sort(rules, by = "lift", decreasing = TRUE)
# Remove redundant rules
rules_cleaned <- rules[!is.redundant(rules)]
# Inspect the top 10 cleaned rules
inspect(head(rules_cleaned, 10))
## lhs rhs support confidence
## [1] {HERB MARKER ROSEMARY} => {HERB MARKER THYME} 0.01067703 0.9273504
## [2] {HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.01067703 0.9313305
## [3] {HERB MARKER THYME} => {HERB MARKER PARSLEY} 0.01033261 0.9012876
## [4] {HERB MARKER PARSLEY} => {HERB MARKER THYME} 0.01033261 0.9051724
## [5] {HERB MARKER ROSEMARY} => {HERB MARKER PARSLEY} 0.01033261 0.8974359
## [6] {HERB MARKER PARSLEY} => {HERB MARKER ROSEMARY} 0.01033261 0.9051724
## [7] {HERB MARKER MINT} => {HERB MARKER PARSLEY} 0.01023421 0.8813559
## [8] {HERB MARKER PARSLEY} => {HERB MARKER MINT} 0.01023421 0.8965517
## [9] {HERB MARKER ROSEMARY} => {HERB MARKER BASIL} 0.01028341 0.8931624
## [10] {HERB MARKER BASIL} => {HERB MARKER ROSEMARY} 0.01028341 0.8855932
## coverage lift count
## [1] 0.01151348 80.89043 217
## [2] 0.01146428 80.89043 217
## [3] 0.01146428 78.95590 210
## [4] 0.01141508 78.95590 210
## [5] 0.01151348 78.61848 210
## [6] 0.01141508 78.61848 210
## [7] 0.01161189 77.20982 208
## [8] 0.01141508 77.20982 208
## [9] 0.01151348 76.91793 209
## [10] 0.01161189 76.91793 209
Generate rules where “WHITE METAL LANTERN” appears in the right-hand side of the rule.
# Generate rules where "WHITE METAL LANTERN" is in the rhs of the rule
rules_target <- apriori(transactions,
parameter = list(supp = 0.001, conf = 0.3, minlen = 2),
appearance = list(rhs = c("WHITE METAL LANTERN"), default = "lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 20
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[4055 item(s), 20324 transaction(s)] done [0.08s].
## sorting and recoding items ... [2812 item(s)] done [0.01s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(transactions, parameter = list(supp = 0.001, conf = 0.3, :
## Mining stopped (time limit reached). Only patterns up to a length of 4
## returned!
## done [24.88s].
## writing ... [13 rule(s)] done [2.43s].
## creating S4 object ... done [0.66s].
# Inspect the top 10 rules for "WHITE METAL LANTERN"
inspect(head(rules_target, 10))
## lhs rhs support confidence coverage lift count
## [1] {GLASS STAR FROSTED T-LIGHT HOLDER,
## WHITE HANGING HEART T-LIGHT HOLDER} => {WHITE METAL LANTERN} 0.001180870 0.3428571 0.003444204 23.30511 24
## [2] {KNITTED UNION FLAG HOT WATER BOTTLE,
## RETRO COFFEE MUGS ASSORTED} => {WHITE METAL LANTERN} 0.001033261 0.5526316 0.001869711 37.56416 21
## [3] {RETRO COFFEE MUGS ASSORTED,
## WHITE HANGING HEART T-LIGHT HOLDER} => {WHITE METAL LANTERN} 0.001180870 0.4615385 0.002558551 31.37227 24
## [4] {SAVE THE PLANET MUG,
## VINTAGE BILLBOARD DRINK ME MUG} => {WHITE METAL LANTERN} 0.001131667 0.3898305 0.002902972 26.49804 23
## [5] {KNITTED UNION FLAG HOT WATER BOTTLE,
## VINTAGE BILLBOARD DRINK ME MUG} => {WHITE METAL LANTERN} 0.001033261 0.6000000 0.001722102 40.78395 21
## [6] {VINTAGE BILLBOARD DRINK ME MUG,
## WHITE HANGING HEART T-LIGHT HOLDER} => {WHITE METAL LANTERN} 0.001230073 0.3472222 0.003542610 23.60182 25
## [7] {KNITTED UNION FLAG HOT WATER BOTTLE,
## VINTAGE BILLBOARD LOVE/HATE MUG} => {WHITE METAL LANTERN} 0.001131667 0.6969697 0.001623696 47.37529 23
## [8] {KNITTED UNION FLAG HOT WATER BOTTLE,
## SET 7 BABUSHKA NESTING BOXES} => {WHITE METAL LANTERN} 0.001082464 0.5000000 0.002164928 33.98662 22
## [9] {KNITTED UNION FLAG HOT WATER BOTTLE,
## VINTAGE BILLBOARD DRINK ME MUG,
## WHITE HANGING HEART T-LIGHT HOLDER} => {WHITE METAL LANTERN} 0.001033261 0.6562500 0.001574493 44.60744 21
## [10] {KNITTED UNION FLAG HOT WATER BOTTLE,
## RED WOOLLY HOTTIE WHITE HEART.,
## WHITE HANGING HEART T-LIGHT HOLDER} => {WHITE METAL LANTERN} 0.001180870 0.4000000 0.002952175 27.18930 24
Visualize the Association Rules using different methods, including graph-based and grouped visualizations.
# Plot the rules as a graph (only top 20 rules)
plot(rules_cleaned, method = "graph", max = 20, engine = "igraph")
## Warning: Too many rules supplied. Only plotting the best 20 using 'lift'
## (change control parameter max if needed).
# Grouped visualization, avoid label overlaps by setting max overlaps to 10
plot(rules_cleaned, method = "grouped", control = list(max.overlaps = 10))
## Warning: Unknown control parameters: max.overlaps
## Available control parameters (with default values):
## k = 20
## aggr.fun = function (x, ...) UseMethod("mean")
## rhs_max = 10
## lhs_label_items = 2
## col = c("#EE0000FF", "#EEEEEEFF")
## groups = NULL
## engine = ggplot2
## verbose = FALSE
# Plot support vs. confidence with shading based on lift
plot(rules_cleaned, measure = c("support", "confidence"), shading = "lift", jitter = 0)
Strongly associated products: Products such as HERB MARKER ROSEMARY are
highly related to HERB MARKER THYME, HERB MARKER PARSLEY. The
relationship between them is very strong, indicating that these items
are often purchased together.
Products with weak correlation: Relationship between REGENCY TEA PLATE ROSES, REGENCY TEA PLATE GREEN, and REGENCY TEA PLATE PINK is relatively weak. This may be because improvement between them is low.
Strength of the rule: It can be seen that certain products have a high degree of improvement such as the HERB MARKER series of products, which indicates that product combinations have a high purchase correlation,these combinations may have high commercial value.
The correlation between HERB MARKER ROSEMARY and HERB MARKER THYME can be very strong because of their high lift, which appears in dark red.
Commodities such as REGENCY TEA PLATE PINK and REGENCY TEA PLATE GREEN are also strongly related to multiple other commodities, and a large rule network is formed between these commodities.
Some products (such as REGENCY TEA PLATE PINK and REGENCY TEA PLATE GREEN) have multiple rules, but their support (relatively small support) and lift are relatively low, so their correlation is not as obvious as some high support products.
In the upper left part of the scatter plot, a few rules have low support (close to 0.01), but high confidence (close to 0.9) and high lift. These rules show strong product correlation. For example, certain items are often purchased together when consumers make purchases, even though they appear less frequently in the overall transaction.