R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

# Load dataset
data <- read.csv("C:/Users/USER/Downloads/W4 - retail_transactions.csv",
                 header = TRUE, row.names = 1)

# FIX DATA FORMATING
# Convert to logical (adjust if needed based on your dataset)
data_logical <- data == 1

# Convert to transactions
transactions <- as(data_logical, "transactions")

# Verify
summary(transactions)
## transactions as itemMatrix in sparse format with
##  500 rows (elements/itemsets/transactions) and
##  20 columns (items) and a density of 0.2027 
## 
## most frequent items:
##       Coffee Frozen.Foods         Milk       Cereal       Butter      (Other) 
##          124          115          114          108          106         1460 
## 
## element (itemset/transaction) length distribution:
## sizes
##   2   3   4   5   6 
## 101  91 101  94 113 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   4.054   5.000   6.000 
## 
## includes extended item information - examples:
##   labels
## 1   Milk
## 2  Bread
## 3 Butter
## 
## includes extended transaction information - examples:
##   transactionID
## 1         T0001
## 2         T0002
## 3         T0003
inspect(transactions[1:5])
##     items               transactionID
## [1] {Juice,                          
##      Yogurt,                         
##      Chips,                          
##      Pasta}                     T0001
## [2] {Coffee,                         
##      Tea,                            
##      Chips,                          
##      Rice,                           
##      Snacks,                         
##      Cleaning.Supplies}         T0002
## [3] {Cereal,                         
##      Chips}                     T0003
## [4] {Butter,                         
##      Cereal,                         
##      Coffee,                         
##      Chips,                          
##      Meat,                           
##      Pasta}                     T0004
## [5] {Chips,                          
##      Vegetables,                     
##      Meat,                           
##      Fish,                           
##      Snacks,                         
##      Cleaning.Supplies}         T0005
# PART A: ITEM PROFILING
# Most frequent items
itemFrequencyPlot(transactions, topN = 10, type = "absolute")

# Top item pairs (lower support for pairs)
itemsets <- apriori(transactions,
                    parameter = list(supp = 0.02,
                                     target = "frequent itemsets",
                                     maxlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##          NA    0.1    1 none FALSE            TRUE       5    0.02      1
##  maxlen            target  ext
##       2 frequent itemsets TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 10 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[20 item(s), 500 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2
## Warning in apriori(transactions, parameter = list(supp = 0.02, target =
## "frequent itemsets", : Mining stopped (maxlen reached). Only patterns up to a
## length of 2 returned!
##  done [0.00s].
## sorting transactions ... done [0.00s].
## writing ... [209 set(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
top_pairs <- sort(itemsets, by = "support", decreasing = TRUE)
inspect(head(top_pairs, 5))
##     items          support count
## [1] {Coffee}       0.248   124  
## [2] {Frozen.Foods} 0.230   115  
## [3] {Milk}         0.228   114  
## [4] {Cereal}       0.216   108  
## [5] {Butter}       0.212   106
# -----------------------------
# PART B: ASSOCIATION RULES
# -----------------------------
# REQUIRED: Try with 30% support
rules <- apriori(transactions,
                 parameter = list(supp = 0.01, conf = 0.3))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.3    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[20 item(s), 500 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [242 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# Check if rules exist
if (length(rules) == 0) {
  print("No rules found at 30% support. Lowering support to 2%...")
  
  rules <- apriori(transactions,
                   parameter = list(supp = 0.02, conf = 0.6))
}

# Summary of rules
summary(rules)
## set of 242 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4 
##   1 237   4 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   3.012   3.000   4.000 
## 
## summary of quality measures:
##     support         confidence        coverage            lift      
##  Min.   :0.0100   Min.   :0.3000   Min.   :0.01200   Min.   :1.210  
##  1st Qu.:0.0120   1st Qu.:0.3125   1st Qu.:0.03200   1st Qu.:1.506  
##  Median :0.0120   Median :0.3333   Median :0.03800   Median :1.632  
##  Mean   :0.0135   Mean   :0.3588   Mean   :0.03858   Mean   :1.720  
##  3rd Qu.:0.0160   3rd Qu.:0.3750   3rd Qu.:0.04400   3rd Qu.:1.834  
##  Max.   :0.0620   Max.   :0.8333   Max.   :0.20400   Max.   :4.529  
##      count       
##  Min.   : 5.000  
##  1st Qu.: 6.000  
##  Median : 6.000  
##  Mean   : 6.748  
##  3rd Qu.: 8.000  
##  Max.   :31.000  
## 
## mining info:
##          data ntransactions support confidence
##  transactions           500    0.01        0.3
##                                                                     call
##  apriori(data = transactions, parameter = list(supp = 0.01, conf = 0.3))
# Sort by lift
rules_sorted <- sort(rules, by = "lift", decreasing = TRUE)

# View top 5 rules
inspect(head(rules_sorted, 5))
##     lhs                            rhs          support confidence coverage
## [1] {Cereal, Vegetables, Fish}  => {Juice}      0.010   0.8333333  0.012   
## [2] {Juice, Cereal, Fish}       => {Vegetables} 0.010   0.8333333  0.012   
## [3] {Juice, Vegetables, Fish}   => {Cereal}     0.010   0.7142857  0.014   
## [4] {Butter, Cereal}            => {Bread}      0.014   0.5833333  0.024   
## [5] {Cereal, Cleaning.Supplies} => {Bread}      0.018   0.5625000  0.032   
##     lift     count
## [1] 4.528986 5    
## [2] 3.968254 5    
## [3] 3.306878 5    
## [4] 3.102837 7    
## [5] 2.992021 9
# VISUALIZATION

plot(head(rules_sorted, 10), method = "graph", engine = "htmlwidget")
#Business Applications
#Cross-selling strategies (recommend related items)#
#Product bundling (discounted combos)#

#PART C: BUSINESS IMPLICATIONS & STRATEGY 1. Decision-Making Applications

Even though no strong rules were identified at high thresholds, association rule mining (when successful) supports key business decisions such as:

A)Product Placement Frequently bought-together items should be placed close to each other in-store Example: beverages near snacks B)Bundling Strategies Products with strong association can be sold as bundles C)Increases average transaction value Promotions Targeted discounts can be offered based on purchase combinations Improves customer retention and upselling D)Store layout optimization (placing related items together)

2.Limitations of Association Rule Mining

A)Sparse data Problem B)Spurious Correlations c)Threshold Sensitivity D)Lack of causality

3.Role of metric in decision making

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.