Introduction

In this project, I will analyse association rules among subcategories of supermarket products. The main aim is to give insights to marketing department for shelf design and promotions.

The data has been acquired from the Kaggle website (V. Chowdhury 2022). The data consists of several columns about customers, branches, and product. However for Market Basket Analysis we need only two columns from the data:

data <- read.csv2("Sample - Superstore.csv", sep = ",")
str(data)
## 'data.frame':    9994 obs. of  21 variables:
##  $ Row.ID       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Order.ID     : chr  "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
##  $ Order.Date   : chr  "11/8/2016" "11/8/2016" "6/12/2016" "10/11/2015" ...
##  $ Ship.Date    : chr  "11/11/2016" "11/11/2016" "6/16/2016" "10/18/2015" ...
##  $ Ship.Mode    : chr  "Second Class" "Second Class" "Second Class" "Standard Class" ...
##  $ Customer.ID  : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Customer.Name: chr  "Claire Gute" "Claire Gute" "Darrin Van Huff" "Sean O'Donnell" ...
##  $ Segment      : chr  "Consumer" "Consumer" "Corporate" "Consumer" ...
##  $ Country      : chr  "United States" "United States" "United States" "United States" ...
##  $ City         : chr  "Henderson" "Henderson" "Los Angeles" "Fort Lauderdale" ...
##  $ State        : chr  "Kentucky" "Kentucky" "California" "Florida" ...
##  $ Postal.Code  : int  42420 42420 90036 33311 33311 90032 90032 90032 90032 90032 ...
##  $ Region       : chr  "South" "South" "West" "South" ...
##  $ Product.ID   : chr  "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
##  $ Category     : chr  "Furniture" "Furniture" "Office Supplies" "Furniture" ...
##  $ Sub.Category : chr  "Bookcases" "Chairs" "Labels" "Tables" ...
##  $ Product.Name : chr  "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
##  $ Sales        : chr  "261.96" "731.94" "14.62" "957.5775" ...
##  $ Quantity     : int  2 3 2 5 2 7 4 6 3 5 ...
##  $ Discount     : chr  "0" "0" "0" "0.45" ...
##  $ Profit       : chr  "41.9136" "219.582" "6.8714" "-383.031" ...
data <- subset(data, select = c(Customer.ID, Sub.Category))
str(data)
## 'data.frame':    9994 obs. of  2 variables:
##  $ Customer.ID : chr  "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
##  $ Sub.Category: chr  "Bookcases" "Chairs" "Labels" "Tables" ...
summary(data)
##  Customer.ID        Sub.Category      
##  Length:9994        Length:9994       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character

Data Preparation

As Apriori algorithm use special type of data (transaction), we need to convert our data.

trans <- as(split(data[,"Sub.Category"], data[,"Customer.ID"]), "transactions")
## Warning in asMethod(object): removing duplicated items in transactions
summary(trans)
## transactions as itemMatrix in sparse format with
##  793 rows (elements/itemsets/transactions) and
##  17 columns (items) and a density of 0.4452192 
## 
## most frequent items:
##     Binders       Paper Furnishings     Storage      Phones     (Other) 
##         650         611         528         514         511        3188 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15 
##   7  15  27  46  71 103 119 114  99  94  53  22  20   1   2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   6.000   8.000   7.569   9.000  15.000 
## 
## includes extended item information - examples:
##        labels
## 1 Accessories
## 2  Appliances
## 3         Art
## 
## includes extended transaction information - examples:
##   transactionID
## 1      AA-10315
## 2      AA-10375
## 3      AA-10480
inspect(trans[1:10])
##      items          transactionID
## [1]  {Accessories,               
##       Appliances,                
##       Binders,                   
##       Fasteners,                 
##       Furnishings,               
##       Paper,                     
##       Phones,                    
##       Storage,                   
##       Supplies}          AA-10315
## [2]  {Accessories,               
##       Art,                       
##       Binders,                   
##       Furnishings,               
##       Paper,                     
##       Phones,                    
##       Storage}           AA-10375
## [3]  {Accessories,               
##       Art,                       
##       Furnishings,               
##       Paper,                     
##       Phones,                    
##       Storage,                   
##       Tables}            AA-10480
## [4]  {Art,                       
##       Binders,                   
##       Bookcases,                 
##       Chairs,                    
##       Envelopes,                 
##       Furnishings,               
##       Paper,                     
##       Phones,                    
##       Storage}           AA-10645
## [5]  {Art,                       
##       Bookcases,                 
##       Chairs,                    
##       Phones,                    
##       Storage}           AB-10015
## [6]  {Accessories,               
##       Appliances,                
##       Binders,                   
##       Chairs,                    
##       Furnishings,               
##       Paper,                     
##       Supplies,                  
##       Tables}            AB-10060
## [7]  {Accessories,               
##       Art,                       
##       Binders,                   
##       Furnishings,               
##       Labels,                    
##       Machines,                  
##       Phones,                    
##       Storage,                   
##       Tables}            AB-10105
## [8]  {Accessories,               
##       Art,                       
##       Binders,                   
##       Furnishings,               
##       Paper,                     
##       Supplies}          AB-10150
## [9]  {Accessories,               
##       Art,                       
##       Binders,                   
##       Chairs,                    
##       Furnishings,               
##       Paper,                     
##       Storage}           AB-10165
## [10] {Accessories,               
##       Art,                       
##       Binders,                   
##       Copiers,                   
##       Fasteners,                 
##       Furnishings,               
##       Paper,                     
##       Phones,                    
##       Storage,                   
##       Supplies}          AB-10255

Before making analyzes, let’s first look for our sub-category frequency distribution.

itemFrequencyPlot(trans, topN = 17, type="absolute", main="Item Frequency") 

itemFrequencyPlot(trans, topN = 17, type="relative", main="Item Frequency") 

As you can see, the most purchased sub-categories are binders and papers.

Apriori Algorithm

rules <- apriori(trans, parameter = list(support = 0.006, confidence = 0.25, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.25    0.1    1 none FALSE            TRUE       5   0.006      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 4 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 793 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(trans, parameter = list(support = 0.006, confidence = 0.25, :
## Mining stopped (maxlen reached). Only patterns up to a length of 10 returned!
##  done [0.02s].
## writing ... [176470 rule(s)] done [0.04s].
## creating S4 object  ... done [0.11s].
inspect(rules[1:10])
##      lhs          rhs           support    confidence coverage   lift     
## [1]  {Copiers} => {Bookcases}   0.02143758 0.265625   0.08070618 1.0802083
## [2]  {Copiers} => {Envelopes}   0.02774275 0.343750   0.08070618 1.3232706
## [3]  {Copiers} => {Tables}      0.02269861 0.281250   0.08070618 0.8545259
## [4]  {Copiers} => {Labels}      0.02648172 0.328125   0.08070618 0.9259898
## [5]  {Copiers} => {Appliances}  0.03404792 0.421875   0.08070618 0.9397384
## [6]  {Copiers} => {Chairs}      0.04665826 0.578125   0.08070618 1.1264205
## [7]  {Copiers} => {Accessories} 0.04539723 0.562500   0.08070618 0.9410601
## [8]  {Copiers} => {Art}         0.05800757 0.718750   0.08070618 1.1537829
## [9]  {Copiers} => {Phones}      0.06052963 0.750000   0.08070618 1.1638943
## [10] {Copiers} => {Storage}     0.05548550 0.687500   0.08070618 1.0606761
##      count
## [1]  17   
## [2]  22   
## [3]  18   
## [4]  21   
## [5]  27   
## [6]  37   
## [7]  36   
## [8]  46   
## [9]  48   
## [10] 44

Explanation of result:

In the below tables you can see the top 5 rules sorted for each element of result.

inspect(sort(rules, by = "lift")[1:5])
##     lhs              rhs            support confidence    coverage     lift count
## [1] {Chairs,                                                                     
##      Envelopes,                                                                  
##      Fasteners,                                                                  
##      Paper,                                                                      
##      Phones,                                                                     
##      Supplies}    => {Machines} 0.007566204  0.8571429 0.008827238 6.865801     6
## [2] {Appliances,                                                                 
##      Chairs,                                                                     
##      Envelopes,                                                                  
##      Fasteners,                                                                  
##      Paper,                                                                      
##      Phones,                                                                     
##      Supplies}    => {Machines} 0.007566204  0.8571429 0.008827238 6.865801     6
## [3] {Binders,                                                                    
##      Chairs,                                                                     
##      Envelopes,                                                                  
##      Fasteners,                                                                  
##      Paper,                                                                      
##      Phones,                                                                     
##      Supplies}    => {Machines} 0.007566204  0.8571429 0.008827238 6.865801     6
## [4] {Appliances,                                                                 
##      Binders,                                                                    
##      Chairs,                                                                     
##      Envelopes,                                                                  
##      Fasteners,                                                                  
##      Paper,                                                                      
##      Phones,                                                                     
##      Supplies}    => {Machines} 0.007566204  0.8571429 0.008827238 6.865801     6
## [5] {Appliances,                                                                 
##      Chairs,                                                                     
##      Envelopes,                                                                  
##      Fasteners,                                                                  
##      Phones,                                                                     
##      Storage,                                                                    
##      Supplies}    => {Machines} 0.006305170  0.8333333 0.007566204 6.675084     5
inspect(sort(rules, by = "confidence")[1:5])
##     lhs                            rhs       support     confidence coverage   
## [1] {Copiers, Machines}         => {Binders} 0.015132409 1          0.015132409
## [2] {Copiers, Supplies}         => {Binders} 0.017654477 1          0.017654477
## [3] {Copiers, Labels, Machines} => {Phones}  0.007566204 1          0.007566204
## [4] {Copiers, Labels, Machines} => {Paper}   0.007566204 1          0.007566204
## [5] {Copiers, Labels, Machines} => {Binders} 0.007566204 1          0.007566204
##     lift     count
## [1] 1.220000 12   
## [2] 1.220000 14   
## [3] 1.551859  6   
## [4] 1.297872  6   
## [5] 1.220000  6
inspect(sort(rules, by = "support")[1:5])
##     lhs              rhs           support   confidence coverage  lift    
## [1] {Paper}       => {Binders}     0.6506936 0.8445172  0.7704918 1.030311
## [2] {Binders}     => {Paper}       0.6506936 0.7938462  0.8196721 1.030311
## [3] {Furnishings} => {Binders}     0.5535939 0.8314394  0.6658260 1.014356
## [4] {Binders}     => {Furnishings} 0.5535939 0.6753846  0.8196721 1.014356
## [5] {Storage}     => {Binders}     0.5460277 0.8424125  0.6481715 1.027743
##     count
## [1] 516  
## [2] 516  
## [3] 439  
## [4] 439  
## [5] 433
inspect(sort(rules, by = "count")[1:5])
##     lhs              rhs           support   confidence coverage  lift    
## [1] {Paper}       => {Binders}     0.6506936 0.8445172  0.7704918 1.030311
## [2] {Binders}     => {Paper}       0.6506936 0.7938462  0.8196721 1.030311
## [3] {Furnishings} => {Binders}     0.5535939 0.8314394  0.6658260 1.014356
## [4] {Binders}     => {Furnishings} 0.5535939 0.6753846  0.8196721 1.014356
## [5] {Storage}     => {Binders}     0.5460277 0.8424125  0.6481715 1.027743
##     count
## [1] 516  
## [2] 516  
## [3] 439  
## [4] 439  
## [5] 433

The better way to see results is to use graphs.

plot(sort(rules, by = "confidence")[1:100], method="graph")

plot(rules, shading="order", control=list(main="Two-key plot"))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.