This report presents a comprehensive market basket analysis using association rule mining techniques. The analysis identifies product bundles and purchasing patterns that can inform strategic business decisions such as product placement, cross-selling strategies, and promotional bundle creation. Using the Apriori algorithm, I extracted meaningful associations between products and visualized these relationships through multiple interactive and static visualizations.
Key Findings:
Market Basket Analysis (MBA) is a data mining technique used to discover associations between items that customers purchase together. The primary goal is to identify which products are frequently bought in combination, enabling businesses to:
Association rules are expressed in the form: {A, B} → {C}, which reads as “if a customer purchases products A and B, they are likely to purchase product C.”
Three fundamental metrics evaluate association rules:
The Apriori algorithm (Agrawal & Srikant, 1994) is the most widely used method for mining association rules. It operates on the principle that:
“If an itemset is frequent, then all of its subsets must also be frequent”
This property allows the algorithm to efficiently prune the search space by eliminating infrequent itemsets early in the process.
Algorithm Steps:
The dataset comprises 464 transactions and 22 product variables, structured in a binary market-basket format in which each row represents an individual customer purchase event and each column corresponds to a distinct product category. Item incidence is encoded dichotomously (1 = purchased, 0 = not purchased), thereby capturing the presence or absence of products within each shopping basket rather than purchase quantities.
##
## === STEP 1: Loading Data ===
# Loading the binary transaction matrix
data_raw <- read.csv("market.csv", sep = ";", header = TRUE, stringsAsFactors = FALSE)The product space spans a mixed assortment of grocery and household goods—including staple foods (e.g., bread, bacon, cheese), fresh produce (e.g., banana, carrot, apple), pantry items (e.g., flour, sugar, salt), and non-food household products (e.g., toothpaste, shampoo, shaving foam). This composition reflects typical supermarket consumption patterns and provides a heterogeneous item universe suitable for co-occurrence analysis.
##
## Dataset dimensions: 464 22
## Number of transactions: 464
## Number of products: 22
##
## First few rows:
## Bread Honey Bacon Toothpaste Banana Apple Hazelnut Cheese Meat Carrot
## 1 1 0 1 0 1 1 1 0 0 1
## 2 1 1 1 0 1 1 1 0 0 0
## 3 0 1 1 1 1 1 1 1 1 0
## Cucumber Onion Milk Butter ShavingFoam Salt Flour HeavyCream Egg Olive
## 1 0 0 0 0 0 0 0 1 1 0
## 2 1 0 1 1 0 0 1 0 0 1
## 3 1 1 1 0 1 1 1 1 1 0
## Shampoo Sugar
## 1 0 1
## 2 1 0
## 3 0 1
The transformation yields a dataset comprising 464 discrete transactions, each representing a unique shopping basket or customer purchase event. This preprocessed structure enables the Apriori algorithm to efficiently identify co-occurrence patterns by rapidly querying which items appear together across transactions, thereby forming the foundation for subsequent association rule discovery and business intelligence generation.
# Converting to transaction format
# Method 1: If data has transaction IDs in first column
if (!all(data_raw[,1] %in% c(0,1))) {
trans_data <- as.matrix(data_raw[,-1])
rownames(trans_data) <- data_raw[,1]
} else {
# Method 2: All columns are products
trans_data <- as.matrix(data_raw)
}
# Converting to logical matrix then to transactions
trans_logical <- trans_data == 1
trans <- as(trans_logical, "transactions")
cat("\nTransactions object created successfully!\n")##
## Transactions object created successfully!
## Total transactions: 464
## Total unique items: 22
The resulting transactional dataset comprises 22 unique items, each corresponding to a distinct product category captured in the original data. Each item is treated as a binary attribute, indicating whether a given product was purchased in a particular transaction or not.
The exploratory frequency analysis identified the ten most frequently purchased products by computing absolute item occurrence counts across all transactions.
# Calculating item frequencies
item_freq <- itemFrequency(trans, type = "absolute")
item_freq_pct <- itemFrequency(trans, type = "relative")
# Top 10 most popular items
cat("\nTop 10 Most Purchased Products:\n")##
## Top 10 Most Purchased Products:
## Banana Cheese Bacon Hazelnut Honey HeavyCream
## 208 206 200 195 193 193
## Carrot Bread Apple ShavingFoam
## 192 189 188 188
The results indicate that Banana was the most popular item, appearing in 208 transactions, closely followed by Cheese (206) and Bacon (200). A second tier of high-frequency products includes Hazelnut (195), Honey (193), and Heavy Cream (193), each demonstrating strong but slightly lower purchase prevalence. The remaining items within the top ten comprise Carrot (192), Bread (189), Apple (188), and Shaving Foam (188). Collectively, these quantities reflect the highest-demand products within the transactional dataset, as determined by absolute purchase frequency, and therefore represent the core items driving co-occurrence patterns in subsequent association rule analysis.
# Visualizing Top 10 items
par(mar = c(10, 4, 4, 2))
itemFrequencyPlot(trans, topN = 10, type = "absolute",
col = brewer.pal(8, "Set2"),
main = "Top 10 Most Frequent Items",
ylab = "Frequency (Absolute Count)",
cex.names = 0.8, las = 2)
The frequency range is relatively narrow—spanning approximately from 188
to 208 occurrences—indicating a moderately even demand concentration
among the leading products rather than the dominance of a single item.
From a statistical perspective, the limited dispersion between first and
tenth rank implies low variability among the most popular items,
reflecting stable, recurrent purchasing patterns. Substantively, the
graph indicates that everyday consumables—particularly fresh fruit,
dairy, and breakfast-related products—constitute the core drivers of
transaction frequency, thereby representing high-priority candidates for
promotional bundling and association rule generation.
# Additional plot - relative frequency
itemFrequencyPlot(trans, topN = 10, type = "relative",
col = brewer.pal(8, "Pastel1"),
main = "Top 10 Items - Relative Frequency",
ylab = "Frequency (% of Transactions)",
cex.names = 0.8, las = 2)
Relative frequency indicates what percentage of customers purchase each
item.
Banana, Cheese, and Bacon exhibit the highest relative frequencies (≈43–45%), identifying them as primary demand drivers. These products should receive priority in inventory planning, including higher stock levels, frequent replenishment cycles, and strategic shelf placement, as stockouts in these categories would affect nearly half of all transactions.
A second investment tier comprises Hazelnut, Honey, and Heavy Cream (≈41–42%). Their strong but slightly lower penetration indicates reliable, repeat demand, making them suitable candidates for promotional bundling—particularly with complementary staples such as bakery or breakfast items.
The remaining high-frequency goods—Carrot, Bread, Apple, and Shaving Foam (≈40–41%)—also warrant sustained inventory commitment. Notably, the inclusion of both fresh produce and personal care items suggests cross-category purchasing routines, supporting mixed-product bundling strategies.
##
## Transaction Statistics:
## Average items per transaction: 8.79
## Median items per transaction: 9
## Max items in single transaction: 17
# Plot transaction size distribution
hist(trans_sizes,
breaks = 20,
col = "steelblue",
main = "Distribution of Basket Sizes",
xlab = "Number of Items per Transaction",
ylab = "Frequency")
abline(v = mean(trans_sizes), col = "red", lwd = 2, lty = 2)
legend("topright", legend = paste("Mean =", round(mean(trans_sizes), 2)),
col = "red", lty = 2, lwd = 2)
The histogram depicts the distribution of basket sizes, measured as the
number of items purchased per transaction, thereby illustrating customer
purchasing intensity across the 464 observed shopping events. The
distribution appears moderately bell-shaped, with the highest
concentration of transactions clustered between approximately 8 and 12
items, indicating that mid-sized baskets dominate purchasing
behaviour.
The mean basket size is 8.79 items, as indicated by the red dashed reference line. This value represents the average number of products purchased per shopping trip and suggests that a typical customer buys roughly nine items per transaction. The proximity of the mean to the central mass of the histogram indicates a relatively balanced distribution without extreme skewness.
Parameter Selection Rationale:
Lift is the most critical metric for identifying meaningful associations, as it accounts for the popularity of individual items.
Why Lift > 1.2?
##
## === STEP 3: Running Apriori Algorithm ===
# Generating rules with specified parameters
rules_all <- apriori(trans,
parameter = list(supp = 0.005, # Low support for niche bundles
conf = 0.5, # 50% confidence threshold
minlen = 2, # At least 2 items
maxlen = 10),
control = list(verbose = TRUE))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[22 item(s), 464 transaction(s)] done [0.00s].
## sorting and recoding items ... [22 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## done [0.13s].
## writing ... [1301947 rule(s)] done [0.34s].
## creating S4 object ... done [1.31s].
##
## Total rules generated: 1301947
# Filtering for high-lift rules (Lift > 1.2)
rules_high_lift <- subset(rules_all, lift > 1.2)
cat("Rules with Lift > 1.2:", length(rules_high_lift), "\n")## Rules with Lift > 1.2: 1270920
Under these parameters, the algorithm generated a total of 1,301,947 association rules, reflecting the extensive combinatorial structure of product co-occurrence within the dataset. Each rule represents a probabilistic implication of the form {Item A, Item B} → {Item C}, indicating that customers purchasing the antecedent set are statistically likely to also purchase the consequent item.
To refine analytical relevance, the rule set was subsequently filtered using a lift threshold greater than 1.2, reducing the list to 1,270,920 high-lift rules. Lift measures the strength of association relative to random co-occurrence; values above 1 indicate positive dependence, while values exceeding 1.2 denote substantively meaningful purchase affinity. This filtering step therefore removed weak or coincidental associations, retaining only those rules with stronger cross-product linkage.
Products that repeatedly appear together in high-support, high-confidence, and high-lift rules can be interpreted as complementary goods, suitable for bundling, co-promotion, or adjacency placement in retail layouts.
##
## --- Rule Quality Metrics ---
## set of 1270920 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4 5 6 7 8 9 10
## 38 1826 14700 71082 244432 471191 340910 107946 18795
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 6.000 7.000 7.138 8.000 10.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.006466 Min. :0.5000 Min. :0.006466 Min. :1.200
## 1st Qu.:0.006466 1st Qu.:0.5833 1st Qu.:0.008621 1st Qu.:1.442
## Median :0.008621 Median :0.6667 Median :0.012931 Median :1.673
## Mean :0.010774 Mean :0.7085 Mean :0.016474 Mean :1.764
## 3rd Qu.:0.010776 3rd Qu.:0.8000 3rd Qu.:0.017241 3rd Qu.:1.985
## Max. :0.241379 Max. :1.0000 Max. :0.448276 Max. :2.729
## count
## Min. : 3.000
## 1st Qu.: 3.000
## Median : 4.000
## Mean : 4.999
## 3rd Qu.: 5.000
## Max. :112.000
##
## mining info:
## data ntransactions support confidence
## trans 464 0.005 0.5
## call
## apriori(data = trans, parameter = list(supp = 0.005, conf = 0.5, minlen = 2, maxlen = 10), control = list(verbose = TRUE))
A total of 1,270,920 association rules were retained after lift filtering, with rule lengths most commonly ranging between 6 and 8 items (median = 7; mean ≈ 7.14), indicating moderately complex co-purchase structures. Support values are generally low (mean ≈ 0.0108), reflecting that most rules describe niche rather than mass purchasing patterns, while confidence levels are relatively strong (mean ≈ 0.71), suggesting reliable predictive relationships. Lift statistics (mean ≈ 1.76; max = 2.73) confirm the presence of meaningful positive product affinities, where co-occurrence exceeds random expectation.
For business implementation, I focused on simple bundle structures that are easy to communicate and operationalize.
Bundle Format Rationale:
##
## === STEP 4: Identifying Optimal Product Bundles ===
# Filtering for simple bundles: LHS (1-2 items) => RHS (1 item)
# This format is best for "Buy A+B, Get C" bundles
# Getting LHS and RHS sizes
lhs_sizes <- size(lhs(rules_high_lift))
rhs_sizes <- size(rhs(rules_high_lift))
# Filter criteria
bundle_rules <- rules_high_lift[lhs_sizes >= 1 & lhs_sizes <= 2 & rhs_sizes == 1]
cat("\nFiltered bundle rules (LHS: 1-2 items, RHS: 1 item):", length(bundle_rules), "\n")##
## Filtered bundle rules (LHS: 1-2 items, RHS: 1 item): 1864
# Sort by lift to find strongest associations
bundle_rules_sorted <- sort(bundle_rules, by = "lift", decreasing = TRUE)
cat("\n--- TOP 20 PRODUCT BUNDLE OPPORTUNITIES ---\n")##
## --- TOP 20 PRODUCT BUNDLE OPPORTUNITIES ---
## lhs rhs support confidence coverage
## [1] {Bacon, Cheese} => {Butter} 0.13793103 0.6153846 0.2241379
## [2] {Bacon, Sugar} => {Meat} 0.11853448 0.6321839 0.1875000
## [3] {Cheese, Onion} => {Butter} 0.11637931 0.6067416 0.1918103
## [4] {Meat, Salt} => {Sugar} 0.10344828 0.5925926 0.1745690
## [5] {Hazelnut, Shampoo} => {Butter} 0.10560345 0.6049383 0.1745690
## [6] {Bacon, Toothpaste} => {Butter} 0.10560345 0.6049383 0.1745690
## [7] {Carrot, Butter} => {Toothpaste} 0.09913793 0.6133333 0.1616379
## [8] {Bread, Onion} => {Butter} 0.09913793 0.5974026 0.1659483
## [9] {Milk, ShavingFoam} => {HeavyCream} 0.09698276 0.6617647 0.1465517
## [10] {Honey, Bacon} => {Meat} 0.12500000 0.6170213 0.2025862
## [11] {Meat, Salt} => {Shampoo} 0.10129310 0.5802469 0.1745690
## [12] {Hazelnut, Cheese} => {Butter} 0.12284483 0.5937500 0.2068966
## [13] {HeavyCream, Sugar} => {Salt} 0.11422414 0.6309524 0.1810345
## [14] {Onion, Sugar} => {Toothpaste} 0.09913793 0.6052632 0.1637931
## [15] {Bacon, ShavingFoam} => {Butter} 0.12715517 0.5900000 0.2155172
## [16] {Milk, Salt} => {HeavyCream} 0.09698276 0.6521739 0.1487069
## [17] {Bacon, HeavyCream} => {Milk} 0.11637931 0.5806452 0.2004310
## [18] {Banana, Butter} => {Bacon} 0.12068966 0.6746988 0.1788793
## [19] {Bacon, Onion} => {Banana} 0.13146552 0.7011494 0.1875000
## [20] {Bacon, Onion} => {ShavingFoam} 0.11853448 0.6321839 0.1875000
## lift count
## [1] 1.641026 64
## [2] 1.629630 55
## [3] 1.617978 54
## [4] 1.617429 48
## [5] 1.613169 49
## [6] 1.613169 49
## [7] 1.598801 46
## [8] 1.593074 46
## [9] 1.590978 45
## [10] 1.590544 58
## [11] 1.583733 47
## [12] 1.583333 57
## [13] 1.582497 53
## [14] 1.577765 46
## [15] 1.573333 59
## [16] 1.567921 45
## [17] 1.566392 54
## [18] 1.565301 56
## [19] 1.564103 61
## [20] 1.560284 55
Raw association rules often contain redundant or statistically insignificant patterns. I applied three refinement techniques:
##
## === STEP 5: Cleaning Rules ===
# Removing redundant rules
bundle_rules_clean <- bundle_rules_sorted[!is.redundant(bundle_rules_sorted)]
cat("After removing redundant rules:", length(bundle_rules_clean), "\n")## After removing redundant rules: 1782
# Keeping only significant rules (Fisher's exact test)
bundle_rules_clean <- bundle_rules_clean[is.significant(bundle_rules_clean, trans)]
cat("After filtering for statistical significance:", length(bundle_rules_clean), "\n")## After filtering for statistical significance: 1187
# Keeping only maximal rules
bundle_rules_clean <- bundle_rules_clean[is.maximal(bundle_rules_clean)]
cat("After keeping only maximal rules:", length(bundle_rules_clean), "\n")## After keeping only maximal rules: 1149
##
## --- FINAL TOP 15 BUNDLE RECOMMENDATIONS ---
## lhs rhs support confidence coverage
## [1] {Bacon, Cheese} => {Butter} 0.13793103 0.6153846 0.2241379
## [2] {Bacon, Sugar} => {Meat} 0.11853448 0.6321839 0.1875000
## [3] {Cheese, Onion} => {Butter} 0.11637931 0.6067416 0.1918103
## [4] {Meat, Salt} => {Sugar} 0.10344828 0.5925926 0.1745690
## [5] {Hazelnut, Shampoo} => {Butter} 0.10560345 0.6049383 0.1745690
## [6] {Bacon, Toothpaste} => {Butter} 0.10560345 0.6049383 0.1745690
## [7] {Carrot, Butter} => {Toothpaste} 0.09913793 0.6133333 0.1616379
## [8] {Bread, Onion} => {Butter} 0.09913793 0.5974026 0.1659483
## [9] {Milk, ShavingFoam} => {HeavyCream} 0.09698276 0.6617647 0.1465517
## [10] {Honey, Bacon} => {Meat} 0.12500000 0.6170213 0.2025862
## [11] {Meat, Salt} => {Shampoo} 0.10129310 0.5802469 0.1745690
## [12] {Hazelnut, Cheese} => {Butter} 0.12284483 0.5937500 0.2068966
## [13] {HeavyCream, Sugar} => {Salt} 0.11422414 0.6309524 0.1810345
## [14] {Onion, Sugar} => {Toothpaste} 0.09913793 0.6052632 0.1637931
## [15] {Bacon, ShavingFoam} => {Butter} 0.12715517 0.5900000 0.2155172
## lift count
## [1] 1.641026 64
## [2] 1.629630 55
## [3] 1.617978 54
## [4] 1.617429 48
## [5] 1.613169 49
## [6] 1.613169 49
## [7] 1.598801 46
## [8] 1.593074 46
## [9] 1.590978 45
## [10] 1.590544 58
## [11] 1.583733 47
## [12] 1.583333 57
## [13] 1.582497 53
## [14] 1.577765 46
## [15] 1.573333 59
Refinement Techniques Explained:
Redundancy Removal: Eliminates rules that provide no additional information beyond their subsets. For example, if {A,B,C} → {D} exists with the same confidence as {A,B} → {D}, the former is redundant.
Statistical Significance: Applies Fisher’s exact test to verify that the association is unlikely to occur by random chance (typically p < 0.05).
Maximal Rules: Retains only the most specific rules that cannot be further extended while maintaining the same support. This prevents listing multiple rules that essentially describe the same pattern.
Business Value: These refinements ensure that marketing teams focus on unique, statistically validated insights rather than redundant information.
The network graph represents products as nodes and associations as directed edges, with edge thickness and color indicating rule strength.
##
## === STEP 6: Creating Visualizations ===
# Select top rules for visualization (to avoid clutter)
top_rules_viz <- head(bundle_rules_clean, 50)
# 1. Network Graph - Interactive
cat("\nGenerating interactive network graph...\n")##
## Generating interactive network graph...
## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
Interpretation of the Interactive Association Network Graph
This visualization reveals the relationship between rule frequency (support) and reliability (confidence), with lift as a third dimension.
plot(bundle_rules_clean,
measure = c("support", "confidence"),
shading = "lift",
main = "Product Bundle Rules: Support vs Confidence (colored by Lift)")The grouped matrix organizes rules by clustering similar antecedents and consequents, revealing structural patterns in customer behavior.
plot(head(bundle_rules_clean, 30),
method = "grouped",
control = list(k = 5),
main = "Grouped Rules - Top 30 Bundles")## Available control parameters (with default values):
## k = 20
## aggr.fun = function (x, ...) UseMethod("mean")
## rhs_max = 10
## lhs_label_items = 2
## col = c("#EE0000FF", "#EEEEEEFF")
## groups = NULL
## engine = ggplot2
## verbose = FALSE
From a purchasing standpoint, several staple-driven clusters are evident. Antecedent groups containing everyday consumables—such as dairy, breakfast, and pantry items—show strong lift relationships with complementary goods like Milk, Butter, Heavy Cream, and Bacon, indicating routine household replenishment missions. These diagonal concentrations suggest coherent within-cluster shopping patterns where customers repeatedly purchase functionally related staples in the same trip. Conversely, clusters featuring mixed grocery–household antecedents display off-diagonal hotspots linking items such as Toothpaste, Shampoo, and Meat, revealing cross-category baskets that combine food shopping with personal care restocking.
plot(head(bundle_rules_clean, 20),
method = "graph",
control = list(type = "items"),
main = "Top 20 Bundle Rules - Network View")## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
The network highlights Butter, Bacon, and Cheese as the most commercially valuable bundle anchors, evidenced by the darkest red nodes and edges (highest lift ≈ 1.62–1.64) and relatively large support, indicating strong and frequent co-purchase profitability. Mid-tier red associations connect Toothpaste–Shampoo and Carrot–Hazelnut, suggesting cross-category and specialty bundles with moderate but actionable lift. Paler, weakly connected items such as Salt, Sugar, and Honey show lower lift and sparse linkages, marking them as redundant or low-synergy add-ons with limited bundling profit potential.
plot(head(bundle_rules_clean, 20),
method = "paracoord",
control = list(reorder = TRUE),
main = "Bundle Rule Patterns - Parallel Coordinates")The most commercially advantageous transitions converge on Butter, Milk, and Meat, marking them as high-impact add-on products when paired with baskets containing staples such as Cheese, Bread, or Bacon. Conversely, lighter, fragmented paths linked to items like Sugar, Salt, and Shampoo reflect weaker, less monetizable associations, signalling limited bundling leverage and lower incremental revenue potential.
The Interactive Rule Explorer enables real-time filtering, sorting, and querying of rules by key quality metrics—such as support, confidence, and lift—allowing marketing teams to rapidly isolate the most commercially relevant product combinations.
##
## === STEP 7: Launching Interactive Rule Explorer ===
## This will open an interactive dashboard in your browser...
##
## === STEP 8: BUSINESS RECOMMENDATIONS ===
## lhs rhs support confidence coverage lift count
## [1] {Bacon, Cheese} => {Butter} 0.137931 0.6153846 0.2241379 1.641026 64
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## BUNDLE RECOMMENDATION # 1
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
##
## 📦 Bundle Name Suggestion:
## 'Bacon,Cheese + Butter Value Pack'
##
## 🛒 Bundle Components:
## Base Items: Bacon,Cheese
## Add-on Item: Butter
##
## 📊 Performance Metrics:
## • Lift: 1.64 (64.1% stronger than random)
## • Confidence: 61.5% (customers who buy base items also buy add-on)
## • Support: 13.79% (occurs in 64 transactions)
##
## 💡 Marketing Strategy:
## RECOMMENDED - Strong complementary relationship
## → Feature as 'Frequently Bought Together'
## → Offer 5-10% bundle discount
## → Add to product recommendation widgets
##
## 🎯 Expected Impact:
## If this bundle converts even 20% of applicable carts,
## you could influence ~ 13 transactions.
##
## lhs rhs support confidence coverage lift count
## [1] {Bacon, Sugar} => {Meat} 0.1185345 0.6321839 0.1875 1.62963 55
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## BUNDLE RECOMMENDATION # 2
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
##
## 📦 Bundle Name Suggestion:
## 'Bacon,Sugar + Meat Value Pack'
##
## 🛒 Bundle Components:
## Base Items: Bacon,Sugar
## Add-on Item: Meat
##
## 📊 Performance Metrics:
## • Lift: 1.63 (63% stronger than random)
## • Confidence: 63.2% (customers who buy base items also buy add-on)
## • Support: 11.85% (occurs in 55 transactions)
##
## 💡 Marketing Strategy:
## RECOMMENDED - Strong complementary relationship
## → Feature as 'Frequently Bought Together'
## → Offer 5-10% bundle discount
## → Add to product recommendation widgets
##
## 🎯 Expected Impact:
## If this bundle converts even 20% of applicable carts,
## you could influence ~ 11 transactions.
##
## lhs rhs support confidence coverage lift count
## [1] {Cheese, Onion} => {Butter} 0.1163793 0.6067416 0.1918103 1.617978 54
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## BUNDLE RECOMMENDATION # 3
## ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
##
## 📦 Bundle Name Suggestion:
## 'Cheese,Onion + Butter Value Pack'
##
## 🛒 Bundle Components:
## Base Items: Cheese,Onion
## Add-on Item: Butter
##
## 📊 Performance Metrics:
## • Lift: 1.62 (61.8% stronger than random)
## • Confidence: 60.7% (customers who buy base items also buy add-on)
## • Support: 11.64% (occurs in 54 transactions)
##
## 💡 Marketing Strategy:
## RECOMMENDED - Strong complementary relationship
## → Feature as 'Frequently Bought Together'
## → Offer 5-10% bundle discount
## → Add to product recommendation widgets
##
## 🎯 Expected Impact:
## If this bundle converts even 20% of applicable carts,
## you could influence ~ 11 transactions.
##
## === STEP 9: Exporting Results ===
# Convert rules to dataframe
rules_df <- as(bundle_rules_clean, "data.frame")
rules_df <- rules_df[order(-rules_df$lift), ]
# Add business-friendly names
rules_df$bundle_name <- paste0("Bundle_", 1:nrow(rules_df))
# Save to CSV
setwd("C:/Users/Asus/Downloads")
dir.create("outputs", showWarnings = FALSE)
write.csv(
rules_df,
"outputs/product_bundles_recommendations.csv",
row.names = FALSE
)
# Create summary report
summary_stats <- data.frame(
Metric = c("Total Transactions Analyzed",
"Total Products",
"Average Basket Size",
"Rules Generated (Initial)",
"High-Lift Rules (>1.2)",
"Final Bundle Recommendations",
"Top Bundle Lift",
"Average Confidence of Top 10"),
Value = c(length(trans),
length(itemLabels(trans)),
round(mean(size(trans)), 2),
length(rules_all),
length(rules_high_lift),
length(bundle_rules_clean),
round(max(quality(bundle_rules_clean)$lift), 2),
round(mean(quality(head(bundle_rules_clean, 10))$confidence) * 100, 1))
)
write.csv(summary_stats, "analysis_summary.csv", row.names = FALSE)
cat("✓ Saved: analysis_summary.csv\n")## ✓ Saved: analysis_summary.csv
##
## === BONUS: Product Affinity Matrix ===
# Cross-tabulation of products
cross_tab <- crossTable(trans, measure = "lift", sort = TRUE)
cat("\nTop 5x5 Product Affinities (Lift):\n")##
## Top 5x5 Product Affinities (Lift):
## Banana Cheese Bacon Hazelnut Honey
## Banana NA 1.13 1.25 1.18 1.14
## Cheese 1.13 NA 1.17 1.11 1.18
## Bacon 1.25 1.17 NA 1.23 1.13
## Hazelnut 1.18 1.11 1.23 NA 1.10
## Honey 1.14 1.18 1.13 1.10 NA
# Find items that are frequently bought together
freq_itemsets <- eclat(trans,
parameter = list(supp = 0.01, maxlen = 3),
control = list(verbose = FALSE))
cat("\nTop 10 Frequent Itemsets (2-3 products):\n")##
## Top 10 Frequent Itemsets (2-3 products):
## items support count
## [1] {Banana} 0.4482759 208
## [2] {Cheese} 0.4439655 206
## [3] {Bacon} 0.4310345 200
## [4] {Hazelnut} 0.4202586 195
## [5] {HeavyCream} 0.4159483 193
## [6] {Honey} 0.4159483 193
## [7] {Carrot} 0.4137931 192
## [8] {Bread} 0.4073276 189
## [9] {ShavingFoam} 0.4051724 188
## [10] {Apple} 0.4051724 188
##
## === Product Clustering Analysis ===
# Dissimilarity matrix for products
item_dissim <- dissimilarity(trans, which = "items", method = "jaccard")
# Hierarchical clustering
hc_items <- hclust(item_dissim, method = "ward.D2")
# Plot dendrogram
plot(hc_items,
main = "Product Clustering Dendrogram",
xlab = "Products",
sub = "Based on Co-occurrence Patterns")
rect.hclust(hc_items, k = 5, border = "red")##
## Products have been clustered into natural groups.
## Recomendation for businesses: use these clusters for category-based bundle strategies
The dendrogram reveals five hierarchical product clusters formed on the basis of Jaccard co-occurrence similarity, indicating how frequently items appear together within transactions. Products joined at lower linkage heights exhibit stronger co-purchase affinity—these tight sub-branches reflect natural basket companions such as dairy–bakery or meat–pantry combinations, where joint consumption drives repeated pairing.
Broader clusters formed at higher linkage distances represent looser, cross-category co-occurrence, capturing mixed shopping missions that combine grocery staples with household or personal care goods. From a commercial standpoint, the structure confirms the existence of both high-synergy core bundles (tight clusters) and peripheral add-on products (distant branches), supporting category-based merchandising and cluster-driven bundle design.
This market basket analysis has revealed several actionable, product-level commercial insights:
Strong Product Associations: Identified {1149} statistically significant bundle opportunities, with lift values ranging from 1.2 to {1.64}, confirming meaningful cross-selling potential beyond random co-occurrence.
Natural Product Clusters: Hierarchical clustering revealed 5 affinity groups, with the strongest co-purchase ecosystems concentrated around dairy, breakfast staples, meat products, and household essentials.
High-Value Cross-Selling Anchors:
The most commercially influential bundle drivers include:
These products function as basket anchors capable of increasing attachment rates and bundle profitability.
Focus: High-demand, high-profit staple bundles
Focus: Cross-category and specialty margin expansion
Focus: Testing and profit maximisation
Key Performance Indicators (KPIs) to track:
Current Limitations:
Recommended Future Analysis:
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
Hahsler, M., Grün, B., & Hornik, K. (2005). arules – A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15), 1-25.
Hahsler, M., & Chelluboina, S. (2011). Visualizing association rules: Introduction to the R-extension package arulesViz. R Project Module, 223-238.