Market Basket and Clusters

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached. Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift. Extra credit: do a simple cluster analysis on the data as well.

8.1. Recreate the simulated data from Exercise 7.2:

#Load Libraries
library(tidyverse)
library(arules)
library(arulesViz)
library(RColorBrewer)
library(kableExtra)

#Read transactions
grocery<-read.transactions('GroceryDataSet.csv', format = 'basket', header=FALSE, sep=',')

#inspect(grocery)
summary(grocery)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

The top five most frequent items are whole milk, other vegetables, roll/buns, soda and yogurt.

#Use RColorBrewer To Plot the Top 10 Items
library(RColorBrewer)
itemFrequencyPlot(grocery,topN=10,type="absolute",col=brewer.pal(8,'Pastel2'), main="Absolute Item Frequency Plot")

#Create rules
asoc_groc <- apriori(grocery, parameter = list(support=0.001,conf=0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(asoc_groc)
## set of 410 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5   6 
##  29 229 140  12 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.329   5.000   6.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001017   Min.   :0.8000   Min.   :0.001017   Min.   : 3.131  
##  1st Qu.:0.001017   1st Qu.:0.8333   1st Qu.:0.001220   1st Qu.: 3.312  
##  Median :0.001220   Median :0.8462   Median :0.001322   Median : 3.588  
##  Mean   :0.001247   Mean   :0.8663   Mean   :0.001449   Mean   : 3.951  
##  3rd Qu.:0.001322   3rd Qu.:0.9091   3rd Qu.:0.001627   3rd Qu.: 4.341  
##  Max.   :0.003152   Max.   :1.0000   Max.   :0.003559   Max.   :11.235  
##      count      
##  Min.   :10.00  
##  1st Qu.:10.00  
##  Median :12.00  
##  Mean   :12.27  
##  3rd Qu.:13.00  
##  Max.   :31.00  
## 
## mining info:
##     data ntransactions support confidence
##  grocery          9835   0.001        0.8
##                                                                    call
##  apriori(data = grocery, parameter = list(support = 0.001, conf = 0.8))
  • Parameter Specification: min_sup=0.001 and min_confidence=0.8 values in a rule.

  • Total number of rules: The set of 410 rules

  • Distribution of rule length: A length of 4 items has the most rules: 229 and length of 6 items have the lowest number of rules:12

#Dig into the top 10 rules to report support, confidence and lift
inspect(head(asoc_groc, n=10, by='support'))
##      lhs                         rhs                    support confidence    coverage     lift count
## [1]  {citrus fruit,                                                                                  
##       root vegetables,                                                                               
##       tropical fruit,                                                                                
##       whole milk}             => {other vegetables} 0.003152008  0.8857143 0.003558719 4.577509    31
## [2]  {curd,                                                                                          
##       domestic eggs,                                                                                 
##       other vegetables}       => {whole milk}       0.002846975  0.8235294 0.003457041 3.223005    28
## [3]  {curd,                                                                                          
##       hamburger meat}         => {whole milk}       0.002541942  0.8064516 0.003152008 3.156169    25
## [4]  {herbs,                                                                                         
##       rolls/buns}             => {whole milk}       0.002440264  0.8000000 0.003050330 3.130919    24
## [5]  {herbs,                                                                                         
##       tropical fruit}         => {whole milk}       0.002338587  0.8214286 0.002846975 3.214783    23
## [6]  {citrus fruit,                                                                                  
##       other vegetables,                                                                              
##       root vegetables,                                                                               
##       yogurt}                 => {whole milk}       0.002338587  0.8214286 0.002846975 3.214783    23
## [7]  {butter,                                                                                        
##       other vegetables,                                                                              
##       pork}                   => {whole milk}       0.002236909  0.8461538 0.002643620 3.311549    22
## [8]  {rolls/buns,                                                                                    
##       root vegetables,                                                                               
##       tropical fruit,                                                                                
##       yogurt}                 => {whole milk}       0.002236909  0.8148148 0.002745297 3.188899    22
## [9]  {grapes,                                                                                        
##       tropical fruit,                                                                                
##       whole milk}             => {other vegetables} 0.002033554  0.8000000 0.002541942 4.134524    20
## [10] {fruit/vegetable juice,                                                                         
##       other vegetables,                                                                              
##       root vegetables,                                                                               
##       yogurt}                 => {whole milk}       0.002033554  0.8333333 0.002440264 3.261374    20
inspect(head(asoc_groc, n=10, by='confidence'))
##      lhs                     rhs                    support confidence    coverage     lift count
## [1]  {rice,                                                                                      
##       sugar}              => {whole milk}       0.001220132          1 0.001220132 3.913649    12
## [2]  {canned fish,                                                                               
##       hygiene articles}   => {whole milk}       0.001118454          1 0.001118454 3.913649    11
## [3]  {butter,                                                                                    
##       rice,                                                                                      
##       root vegetables}    => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [4]  {flour,                                                                                     
##       root vegetables,                                                                           
##       whipped/sour cream} => {whole milk}       0.001728521          1 0.001728521 3.913649    17
## [5]  {butter,                                                                                    
##       domestic eggs,                                                                             
##       soft cheese}        => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [6]  {citrus fruit,                                                                              
##       root vegetables,                                                                           
##       soft cheese}        => {other vegetables} 0.001016777          1 0.001016777 5.168156    10
## [7]  {butter,                                                                                    
##       hygiene articles,                                                                          
##       pip fruit}          => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [8]  {hygiene articles,                                                                          
##       root vegetables,                                                                           
##       whipped/sour cream} => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [9]  {hygiene articles,                                                                          
##       pip fruit,                                                                                 
##       root vegetables}    => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [10] {cream cheese,                                                                              
##       domestic eggs,                                                                             
##       sugar}              => {whole milk}       0.001118454          1 0.001118454 3.913649    11
inspect(head(asoc_groc, n=10, by='lift'))|> kable()|> kable_styling(bootstrap_options='striped',position='left')
##      lhs                         rhs                   support confidence    coverage      lift count
## [1]  {liquor,                                                                                        
##       red/blush wine}         => {bottled beer}    0.001931876  0.9047619 0.002135231 11.235269    19
## [2]  {citrus fruit,                                                                                  
##       fruit/vegetable juice,                                                                         
##       other vegetables,                                                                              
##       soda}                   => {root vegetables} 0.001016777  0.9090909 0.001118454  8.340400    10
## [3]  {oil,                                                                                           
##       other vegetables,                                                                              
##       tropical fruit,                                                                                
##       whole milk,                                                                                    
##       yogurt}                 => {root vegetables} 0.001016777  0.9090909 0.001118454  8.340400    10
## [4]  {citrus fruit,                                                                                  
##       fruit/vegetable juice,                                                                         
##       grapes}                 => {tropical fruit}  0.001118454  0.8461538 0.001321810  8.063879    11
## [5]  {other vegetables,                                                                              
##       rice,                                                                                          
##       whole milk,                                                                                    
##       yogurt}                 => {root vegetables} 0.001321810  0.8666667 0.001525165  7.951182    13
## [6]  {oil,                                                                                           
##       other vegetables,                                                                              
##       tropical fruit,                                                                                
##       whole milk}             => {root vegetables} 0.001321810  0.8666667 0.001525165  7.951182    13
## [7]  {ham,                                                                                           
##       other vegetables,                                                                              
##       pip fruit,                                                                                     
##       yogurt}                 => {tropical fruit}  0.001016777  0.8333333 0.001220132  7.941699    10
## [8]  {beef,                                                                                          
##       citrus fruit,                                                                                  
##       other vegetables,                                                                              
##       tropical fruit}         => {root vegetables} 0.001016777  0.8333333 0.001220132  7.645367    10
## [9]  {butter,                                                                                        
##       cream cheese,                                                                                  
##       root vegetables}        => {yogurt}          0.001016777  0.9090909 0.001118454  6.516698    10
## [10] {butter,                                                                                        
##       sliced cheese,                                                                                 
##       tropical fruit,                                                                                
##       whole milk}             => {yogurt}          0.001016777  0.9090909 0.001118454  6.516698    10
#Also remove subset rules
subset.rules <- which(colSums(is.subset(asoc_groc, asoc_groc)) > 1)
length(subset.rules)
## [1] 91
subset_asoc_groc <- asoc_groc[-subset.rules]
#Plot All Rules
plot(asoc_groc, jitter=0, method='two-key plot')

head(asoc_groc, n = 10, by = "confidence") %>% 
  plot(method = "graph", engine = "htmlwidget")
#Perform Cluster analysis
library(cluster)
library(factoextra)
library(dendextend)

#Hierarchical cluster analysis to create cluster dendrograms.

t1 <- grocery[ , itemFrequency(grocery) > 0.08]
glist <- dissimilarity(t1, which = "items")

# plot dendrogram with complete
hc <- hclust(glist, method = "complete" )

plot(hc, cex = 0.8, hang = 1)
rect.hclust(hc, k = 4, border = 2:5)