This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
# Load dataset
data <- read.csv("C:/Users/USER/Downloads/W4 - retail_transactions.csv",
header = TRUE, row.names = 1)
# FIX DATA FORMATING
# Convert to logical (adjust if needed based on your dataset)
data_logical <- data == 1
# Convert to transactions
transactions <- as(data_logical, "transactions")
# Verify
summary(transactions)
## transactions as itemMatrix in sparse format with
## 500 rows (elements/itemsets/transactions) and
## 20 columns (items) and a density of 0.2027
##
## most frequent items:
## Coffee Frozen.Foods Milk Cereal Butter (Other)
## 124 115 114 108 106 1460
##
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 5 6
## 101 91 101 94 113
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 4.000 4.054 5.000 6.000
##
## includes extended item information - examples:
## labels
## 1 Milk
## 2 Bread
## 3 Butter
##
## includes extended transaction information - examples:
## transactionID
## 1 T0001
## 2 T0002
## 3 T0003
inspect(transactions[1:5])
## items transactionID
## [1] {Juice,
## Yogurt,
## Chips,
## Pasta} T0001
## [2] {Coffee,
## Tea,
## Chips,
## Rice,
## Snacks,
## Cleaning.Supplies} T0002
## [3] {Cereal,
## Chips} T0003
## [4] {Butter,
## Cereal,
## Coffee,
## Chips,
## Meat,
## Pasta} T0004
## [5] {Chips,
## Vegetables,
## Meat,
## Fish,
## Snacks,
## Cleaning.Supplies} T0005
# PART A: ITEM PROFILING
# Most frequent items
itemFrequencyPlot(transactions, topN = 10, type = "absolute")
# Top item pairs (lower support for pairs)
itemsets <- apriori(transactions,
parameter = list(supp = 0.02,
target = "frequent itemsets",
maxlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## NA 0.1 1 none FALSE TRUE 5 0.02 1
## maxlen target ext
## 2 frequent itemsets TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 10
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[20 item(s), 500 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(transactions, parameter = list(supp = 0.02, target =
## "frequent itemsets", : Mining stopped (maxlen reached). Only patterns up to a
## length of 2 returned!
## done [0.00s].
## sorting transactions ... done [0.00s].
## writing ... [209 set(s)] done [0.00s].
## creating S4 object ... done [0.00s].
top_pairs <- sort(itemsets, by = "support", decreasing = TRUE)
inspect(head(top_pairs, 5))
## items support count
## [1] {Coffee} 0.248 124
## [2] {Frozen.Foods} 0.230 115
## [3] {Milk} 0.228 114
## [4] {Cereal} 0.216 108
## [5] {Butter} 0.212 106
# -----------------------------
# PART B: ASSOCIATION RULES
# -----------------------------
# REQUIRED: Try with 30% support
rules <- apriori(transactions,
parameter = list(supp = 0.01, conf = 0.3))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[20 item(s), 500 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [242 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Check if rules exist
if (length(rules) == 0) {
print("No rules found at 30% support. Lowering support to 2%...")
rules <- apriori(transactions,
parameter = list(supp = 0.02, conf = 0.6))
}
# Summary of rules
summary(rules)
## set of 242 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 1 237 4
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.000 3.012 3.000 4.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.0100 Min. :0.3000 Min. :0.01200 Min. :1.210
## 1st Qu.:0.0120 1st Qu.:0.3125 1st Qu.:0.03200 1st Qu.:1.506
## Median :0.0120 Median :0.3333 Median :0.03800 Median :1.632
## Mean :0.0135 Mean :0.3588 Mean :0.03858 Mean :1.720
## 3rd Qu.:0.0160 3rd Qu.:0.3750 3rd Qu.:0.04400 3rd Qu.:1.834
## Max. :0.0620 Max. :0.8333 Max. :0.20400 Max. :4.529
## count
## Min. : 5.000
## 1st Qu.: 6.000
## Median : 6.000
## Mean : 6.748
## 3rd Qu.: 8.000
## Max. :31.000
##
## mining info:
## data ntransactions support confidence
## transactions 500 0.01 0.3
## call
## apriori(data = transactions, parameter = list(supp = 0.01, conf = 0.3))
# Sort by lift
rules_sorted <- sort(rules, by = "lift", decreasing = TRUE)
# View top 5 rules
inspect(head(rules_sorted, 5))
## lhs rhs support confidence coverage
## [1] {Cereal, Vegetables, Fish} => {Juice} 0.010 0.8333333 0.012
## [2] {Juice, Cereal, Fish} => {Vegetables} 0.010 0.8333333 0.012
## [3] {Juice, Vegetables, Fish} => {Cereal} 0.010 0.7142857 0.014
## [4] {Butter, Cereal} => {Bread} 0.014 0.5833333 0.024
## [5] {Cereal, Cleaning.Supplies} => {Bread} 0.018 0.5625000 0.032
## lift count
## [1] 4.528986 5
## [2] 3.968254 5
## [3] 3.306878 5
## [4] 3.102837 7
## [5] 2.992021 9
# VISUALIZATION
plot(head(rules_sorted, 10), method = "graph", engine = "htmlwidget")
#Business Applications
#Cross-selling strategies (recommend related items)#
#Product bundling (discounted combos)#
#PART C: BUSINESS IMPLICATIONS & STRATEGY 1. Decision-Making Applications
Even though no strong rules were identified at high thresholds, association rule mining (when successful) supports key business decisions such as:
A)Product Placement Frequently bought-together items should be placed close to each other in-store Example: beverages near snacks B)Bundling Strategies Products with strong association can be sold as bundles C)Increases average transaction value Promotions Targeted discounts can be offered based on purchase combinations Improves customer retention and upselling D)Store layout optimization (placing related items together)
2.Limitations of Association Rule Mining
A)Sparse data Problem B)Spurious Correlations c)Threshold Sensitivity D)Lack of causality
3.Role of metric in decision making
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.