Discover Associations Between Products For Electronic Company

Perform a market basket analysis to help the company’s board of directors to better understand the clientele of an electronic online retailer to see if it would be an optimal acquisition.

Overview

Background

The board of directors is considering acquiring a start-up electronics online retailer. The board of directors has asked us to help them better understand the clientele that the retailer is serving and if it would be an optimal partnership.

They need our help to identify purchasing patterns that will provide insight into the retailer’s clientele.

Objective

To conduct a market basket analysis and discover any interesting relationships/associations between customer’s transactions and the items they’ve purchased. These associations can then be used to drive sales-oriented initiatives such as recommender systems.
To help the board of directors form a clearer picture of the online retailer’s customer buying patterns.

Dataset Information

We were provided a CSV file that contains one month’s (30 days) worht of the retailer’s online transactions and a file containing all the electronics that they currently sell. The data was pulled on the items that purchased per customers’ transactions.

Initial Exploration of the Transactions

The data contains rows that represent single transactions with the purchased items being separated by commas, which is also called a ’basket" format. We will upload the CSV file through the read.transactions() function. It changes the dataset to a sparse matrix which makes each row represent a transaction and creates columns for each item that a customer might purchase. It also changes the data to binary. (1=item purchased in that transaction OR 0=no purchase.)

Call the libraries

# call pacakges arules and arulesViz
library(arules)
library(arulesViz)

Import the dataset

# import the data
transactions <- read.transactions(file = "ElectronidexTransactions2017.csv",
                  format = "basket",
                  sep = ",",
                  rm.duplicates = T)

## distribution of transactions with duplicates:
## items
##   1   2 
## 191  10

Here, we set remove duplicates to TRUE. Dupliate transaction will effect the analysis and model performance later on.

Inspect the transactions

# inspect the first 5 transactions
inspect(transactions[1:5])

##     items                    
## [1] {Acer Aspire,            
##      Belkin Mouse Pad,       
##      Brother Printer Toner,  
##      VGA Monitor Cable}      
## [2] {Apple Wireless Keyboard,
##      Dell Desktop,           
##      Lenovo Desktop Computer}
## [3] {iMac}                   
## [4] {Acer Desktop,           
##      Intel Desktop,          
##      Lenovo Desktop Computer,
##      XIBERIA Gaming Headset} 
## [5] {ASUS Desktop,           
##      Epson Black Ink,        
##      HP Laptop,              
##      iMac}

We can also use LIST() function to inspect each transaction.

LIST(transactions[1:5])

## [[1]]
## [1] "Acer Aspire"           "Belkin Mouse Pad"      "Brother Printer Toner"
## [4] "VGA Monitor Cable"    
## 
## [[2]]
## [1] "Apple Wireless Keyboard" "Dell Desktop"           
## [3] "Lenovo Desktop Computer"
## 
## [[3]]
## [1] "iMac"
## 
## [[4]]
## [1] "Acer Desktop"            "Intel Desktop"          
## [3] "Lenovo Desktop Computer" "XIBERIA Gaming Headset" 
## 
## [[5]]
## [1] "ASUS Desktop"    "Epson Black Ink" "HP Laptop"       "iMac"

Investigate how many transactions we have

length(transactions)

## [1] 9835

Investigate how many items are in each transaction

size(transactions)[1:10]

##  [1] 4 3 1 4 4 5 1 5 1 2

Investigate how many products we have

length(itemLabels(transactions))

## [1] 125

We have 125 items from the dataset.

Visualization

High frequency items(top 10).

# item frequency 
itemFrequencyPlot(transactions, type = "absolute", topN =10, col = rainbow(10), main = "High Frequency")

Low frequency items(bottom 10).

# create a table of low frequency items
lowFreqency <- sort(table(unlist(LIST(transactions))), decreasing = FALSE)[1:10]
lowFreqency

## 
##         Logitech Wireless Keyboard                  VGA Monitor Cable 
##                                 22                                 22 
## Panasonic On-Ear Stereo Headphones   1TB Portable External Hard Drive 
##                                 23                                 27 
##                          Canon Ink            Logitech Stereo Headset 
##                                 27                                 30 
##                     Ethernet Cable               Canon Office Printer 
##                                 32                                 35 
##          Gaming Mouse Professional                        Audio Cable 
##                                 35                                 36

# plot low frequency items 
par(mar=c(10, 17, 2, 1))
barplot(lowFreqency,
        horiz=TRUE,
        las = 1,
        col=rainbow(10),
        main = "Low Frequency")

Visualize the item distribution for randomed sampled transactions.

image(sample(transactions, 200))

From the 200 random sampled transactions, we do see some patterns for certain items. Some items show a almost solid vetical line, those items were purchased more frequently.

Applying the rule

We will use Apriori algorithm to perform the Market Basket Analysis. The Apriori algorithm is helpful when working with large datasets and is used to uncover insights pertaining to transactional datasets. It is based on item frequency. For example, this item set {Item 1, Item 2, Item 3, Item 4} can only occur if items {Item 1}, {Item 2}, {Item 3} and {Item 4} occur just as frequently.

The Apriori algorithm assesses association rules using two types of measurements. The first statistical measure is the Support measurement, which measures itemsets or rules frequency within your transactional data.The second statistical measure is the Confidence measurement, which measures the accuracy of the rules.

Apply and inspect the first 10 rules

# applying apriori rule
aprioriRule <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.55, minlen = 2, maxlen = 4))

# inspect the first 10 rules
inspect(aprioriRule[1:10])

##      lhs                                     rhs                     support confidence     lift count
## [1]  {Logitech Wireless Keyboard}         => {iMac}              0.001321810  0.5909091 2.307102    13
## [2]  {Panasonic On-Ear Stereo Headphones} => {iMac}              0.001321810  0.5652174 2.206794    13
## [3]  {Generic Black 3-Button}             => {iMac}              0.003660397  0.6428571 2.509925    36
## [4]  {Mackie CR Speakers}                 => {iMac}              0.004677173  0.6133333 2.394654    46
## [5]  {Backlit LED Gaming Keyboard,                                                                    
##       Large Mouse Pad}                    => {Apple MacBook Air} 0.001321810  0.8125000 5.222835    13
## [6]  {ASUS 2 Monitor,                                                                                 
##       Generic Black 3-Button}             => {iMac}              0.001016777  0.9090909 3.549388    10
## [7]  {Generic Black 3-Button,                                                                         
##       ViewSonic Monitor}                  => {iMac}              0.001016777  0.7692308 3.003329    10
## [8]  {Dell Desktop,                                                                                   
##       Generic Black 3-Button}             => {iMac}              0.001220132  0.8571429 3.346566    12
## [9]  {Generic Black 3-Button,                                                                         
##       Lenovo Desktop Computer}            => {iMac}              0.001728521  0.8095238 3.160646    17
## [10] {Generic Black 3-Button,                                                                         
##       HP Laptop}                          => {iMac}              0.001321810  0.6500000 2.537813    13

Check the statistical numbers of the rules

summary(aprioriRule)

## set of 4031 rules
## 
## rule length distribution (lhs + rhs):sizes
##    2    3    4 
##    4  837 3190 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    4.00    4.00    3.79    4.00    4.00 
## 
## summary of quality measures:
##     support           confidence          lift            count       
##  Min.   :0.001017   Min.   :0.5500   Min.   : 2.147   Min.   : 10.00  
##  1st Qu.:0.001118   1st Qu.:0.5882   1st Qu.: 2.510   1st Qu.: 11.00  
##  Median :0.001322   Median :0.6271   Median : 2.991   Median : 13.00  
##  Mean   :0.001664   Mean   :0.6542   Mean   : 3.243   Mean   : 16.36  
##  3rd Qu.:0.001830   3rd Qu.:0.7059   3rd Qu.: 3.680   3rd Qu.: 18.00  
##  Max.   :0.015760   Max.   :1.0000   Max.   :17.697   Max.   :155.00  
## 
## mining info:
##          data ntransactions support confidence
##  transactions          9835   0.001       0.55

Improve the model

Use lift measurements

Lift measures the importance of a rule. A high value for lift strongly indicates that the rule is important. We will inspect the top 10 rules by their lift measurements.

We will then manually change Support and Confident parameters in the apriori() function above while we are monitoring the lift measurements until the lift measurements stopping changing. We then get our optimal 10 rules.

Inspect the rules by life measurements

inspect(sort(aprioriRule, by = "lift")[1:10])

##      lhs                                                        rhs                                 support confidence      lift count
## [1]  {Apple Earpods,                                                                                                                  
##       Logitech MK360 Wireless Keyboard and Mouse Combo}      => {Eluktronics Pro Gaming Laptop} 0.001220132  0.6315789 17.696806    12
## [2]  {Apple Earpods,                                                                                                                  
##       Microsoft Wireless Comfort Keyboard and Mouse}         => {Slim Wireless Mouse}           0.001220132  0.6315789 16.697793    12
## [3]  {Dell Wired Keyboard,                                                                                                            
##       HDMI Cable 6ft}                                        => {AOC Monitor}                   0.001931876  0.6333333 15.045491    19
## [4]  {Dell Desktop,                                                                                                                   
##       HDMI Cable 6ft,                                                                                                                 
##       HP Monitor}                                            => {AOC Monitor}                   0.001118454  0.6111111 14.517579    11
## [5]  {Computer Game,                                                                                                                  
##       iMac,                                                                                                                           
##       Intel Desktop}                                         => {Apple Magic Keyboard}          0.001118454  0.7333333 10.230260    11
## [6]  {ASUS 2 Monitor,                                                                                                                 
##       Dell Desktop,                                                                                                                   
##       Intel Desktop}                                         => {Apple Magic Keyboard}          0.001118454  0.7333333 10.230260    11
## [7]  {Dell Desktop,                                                                                                                   
##       Microsoft Office Home and Student 2016,                                                                                         
##       Rii LED Gaming Keyboard & Mouse Combo}                 => {Belkin Mouse Pad}              0.001016777  0.5882353 10.043913    10
## [8]  {Apple Magic Keyboard,                                                                                                           
##       Eluktronics Pro Gaming Laptop,                                                                                                  
##       Lenovo Desktop Computer}                               => {ASUS Monitor}                  0.001016777  0.5555556 10.025484    10
## [9]  {ASUS Chromebook,                                                                                                                
##       Computer Game,                                                                                                                  
##       Lenovo Desktop Computer}                               => {ASUS Monitor}                  0.001016777  0.5555556 10.025484    10
## [10] {ASUS Monitor,                                                                                                                   
##       iMac,                                                                                                                           
##       Logitech MK550 Wireless Wave Keyboard and Mouse Combo} => {Apple Magic Keyboard}          0.001423488  0.6666667  9.300236    14

Different ways to set or subset the rules

Subset the rules to see a specific item’s rules

# see Kindle's rule
xRules <- subset(aprioriRule, items %in% "Kindle") 
inspect(xRules)
summary(xRules)

## set of 0 rules

There are no rules related to Kindle.

Apply apriori rule while specifying the left hand side item

Here is and example how we specify which items should appear on either left hand side of right hand side of the rules. The purpose of this approch is to find which items should be displayed or bundled with the item that we want to promote.

# left hand and right hand 
# lhs = iMac
iMacRules_lh <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.1),
                        appearance = list(default = "rhs", lhs = "iMac"))

inspect(sample(iMacRules_lh, 5))

##     lhs       rhs                                      support    confidence
## [1] {iMac} => {HP Laptop}                              0.07554652 0.2949583 
## [2] {iMac} => {CYBERPOWER Gamer Desktop}               0.05673615 0.2215165 
## [3] {iMac} => {Microsoft Office Home and Student 2016} 0.03101169 0.1210798 
## [4] {iMac} => {3-Button Mouse}                         0.03335028 0.1302104 
## [5] {iMac} => {Acer Aspire}                            0.03070666 0.1198888 
##     lift     count
## [1] 1.519599 743  
## [2] 1.204320 558  
## [3] 1.820825 305  
## [4] 1.463565 328  
## [5] 1.448534 302

Visualize the result of the rules

The plot() function will display your rules as a scatter box.

plot(aprioriRule, method = "scatterplot", measure = c("support","confidence"),
     shading = "lift")

We can see that we have some important rules where the darker red spots are.

Graph method can be used to visualize a subset of your related rules.

# create a subset of rules ordered by lift
aRule <- sort(aprioriRule, by = "lift")
# plot the subset of rules
plot(aRule, method = "graph", control = list(type = "items"), max = 10)

From the graph, we can see Dell Desktop, ViewSonic Monitor, Acer Aspire are associated with some important rules.

Recommendations if acquiring the online retailer

Cross selling opportunities

Dell Desktop is a good pair with small gadgets, like mouse and keyboard.
Apple Magic Keyboard is not only more often been purchased with iMac, it has also been sold more with Computer Game, Intel Desktop, ASUS 2 Monitor and Dell Desktop.
Slim Wireless Mouse is more often been purchased with Apple Earpods.

Remove low frequency items

The online retailer has very popular and profitable items, like Apple products, Dell Desktop, etc. But the lest frequent purchased items(bottom 10) counts less than 2 percent of the total transactions for each item. Meanwhile these products take the room from the shelf and inventory. We recommand to remove the low frequency items.

Here ends our project