Discover Association Rule Between Products for an Electronic Company

Overview

Background Info

The board of directors is considering acquiring Electronidex, a start-up electronics online retailer. We were asked to help the board memebers to better understand the clientele that Electronidex is serving and see if it would be an optimal partnership.
We need to identify purchasing patterns that will provide insight into retailer’s clientele.

Objective

To conduct a market basket analysis and to discover any interesting relationships or associations between customer’s transactions and the item(s) they’ve purchased. These associations can be used to drive sales-oriented initiatives such as recommender systems like Amazon’s frequent bought together option.
To help the board of directors form a clearer picture of Electronidex’s customer buying patterns.

Dataset Info

We were provided a CSV file that contains with one month’s (30 days worth) of 9835 online transactions and a file containing all the 125 products that Electronidex currently sell. Due to lack of funding, retailer is only able to pull data on the items that customers purchased per their transactions.

Initial Exploration

Upload the Dataset:
- Transcational data contains rows that represent single transactions with the purchased item(s) being separated by commas, which is also called a ‘basket’ format. Due to RStudio does not inherently understand transctional data, we will have to upload the CSV file through the read.transcations() function.
- The read.transcations() function changes the dataset into a sparse matrix. It makes each row represent a transaction and creates columns for each item that a customer might purchase. Electronidex sells 125 items, so the sparse matrix creates 125 columns. It also changes the data to binary. (1=item purchased in that transaction OR 0=no purchase).

Call libraries

library(arules)
library(arulesViz)

Upload and Inspect the dataset

transdata <-read.transactions("ElectronidexTransactions2017.csv", 
                              format = "basket", 
                              rm.duplicates=TRUE, sep=",")

## distribution of transactions with duplicates:
## items
##   1   2 
## 191  10

# Check the first 5 transactions
inspect (transdata[1:5])

##     items                    
## [1] {Acer Aspire,            
##      Belkin Mouse Pad,       
##      Brother Printer Toner,  
##      VGA Monitor Cable}      
## [2] {Apple Wireless Keyboard,
##      Dell Desktop,           
##      Lenovo Desktop Computer}
## [3] {iMac}                   
## [4] {Acer Desktop,           
##      Intel Desktop,          
##      Lenovo Desktop Computer,
##      XIBERIA Gaming Headset} 
## [5] {ASUS Desktop,           
##      Epson Black Ink,        
##      HP Laptop,              
##      iMac}

# Check the length 
length (transdata)

## [1] 9835

# Check number of items bought for the first 10 transcations
size (transdata[1:10])

##  [1] 4 3 1 4 4 5 1 5 1 2

# Use LIST() to inspect each transaction
LIST(transdata[1:5])

## [[1]]
## [1] "Acer Aspire"           "Belkin Mouse Pad"      "Brother Printer Toner"
## [4] "VGA Monitor Cable"    
## 
## [[2]]
## [1] "Apple Wireless Keyboard" "Dell Desktop"           
## [3] "Lenovo Desktop Computer"
## 
## [[3]]
## [1] "iMac"
## 
## [[4]]
## [1] "Acer Desktop"            "Intel Desktop"          
## [3] "Lenovo Desktop Computer" "XIBERIA Gaming Headset" 
## 
## [[5]]
## [1] "ASUS Desktop"    "Epson Black Ink" "HP Laptop"       "iMac"

# To see the item labels
length(itemLabels(transdata))

## [1] 125

We have total 125 items in the dataset

Data Visualization

Top 10 high frequently bought items

# Create item Frequency Plot & Bar Plot 
itemFrequencyPlot(transdata, type =c("absolute"), topN =10, col = "lightblue1", 
                  main="Top 10 Products Frequency Plot", ylab = "")

The least purchased items (bottom 10)

# Create a table for low frequency items 
low_frequency <- sort(table(unlist(LIST(transdata))), decreasing = FALSE)[1:10]
low_frequency

## 
##         Logitech Wireless Keyboard                  VGA Monitor Cable 
##                                 22                                 22 
## Panasonic On-Ear Stereo Headphones   1TB Portable External Hard Drive 
##                                 23                                 27 
##                          Canon Ink            Logitech Stereo Headset 
##                                 27                                 30 
##                     Ethernet Cable               Canon Office Printer 
##                                 32                                 35 
##          Gaming Mouse Professional                        Audio Cable 
##                                 35                                 36

# Plot low frequency items
par(mar=c(10, 17, 2, 1))
barplot(low_frequency, horiz=TRUE, 
        las = 1, col=rainbow(4), main = "Bottom 10 Products Frequency Plot")

Visualize the randomly sampled transactions

image(sample(transdata, 100))

Above scatter plot shows some patterns. The solid vertical lines are for those frequently purchased items.

Apply the Apriori Algorithm

We will use Apriori algorithm to perform the Market Basket Analysis. The Apriori algorithm is helpful when working with large datasets and is used to uncover insights pertaining to transactional datasets. It is based on item frequency. For example, this item set {Item 1, Item 2, Item 3, Item 4} can only occur if items {Item 1}, {Item 2}, {Item 3} and {Item 4} occur just as frequently.
The Apriori algorithm assesses association rules using two types of measurements. The first statistical measure is the Support measurement, which measures itemsets or rules frequency within your transactional data.The second statistical measure is the Confidence measurement, which measures the accuracy of the rules. A rule that measures high in both support and confidence is known as a strong rule.

Apply and Inspect the first 10 rules

# Apply the Apriori Rule
rule <- apriori(transdata, parameter = list(supp = 0.001, conf = 0.6))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [125 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [3969 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# Check the statistical numbers of the rules
summary(rule)

## set of 3969 rules
## 
## rule length distribution (lhs + rhs):sizes
##    2    3    4    5    6 
##    2  454 2275 1141   97 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   4.000   4.000   4.221   5.000   6.000 
## 
## summary of quality measures:
##     support           confidence          lift            count      
##  Min.   :0.001017   Min.   :0.6000   Min.   : 2.343   Min.   : 10.0  
##  1st Qu.:0.001118   1st Qu.:0.6316   1st Qu.: 2.716   1st Qu.: 11.0  
##  Median :0.001220   Median :0.6875   Median : 3.225   Median : 12.0  
##  Mean   :0.001484   Mean   :0.7061   Mean   : 3.523   Mean   : 14.6  
##  3rd Qu.:0.001627   3rd Qu.:0.7647   3rd Qu.: 4.007   3rd Qu.: 16.0  
##  Max.   :0.010778   Max.   :1.0000   Max.   :17.697   Max.   :106.0  
## 
## mining info:
##       data ntransactions support confidence
##  transdata          9835   0.001        0.6

# Inspect the first 10 rules
inspect(rule[1:10])

##      lhs                              rhs                     support confidence     lift count
## [1]  {Generic Black 3-Button}      => {iMac}              0.003660397  0.6428571 2.509925    36
## [2]  {Mackie CR Speakers}          => {iMac}              0.004677173  0.6133333 2.394654    46
## [3]  {Backlit LED Gaming Keyboard,                                                             
##       Large Mouse Pad}             => {Apple MacBook Air} 0.001321810  0.8125000 5.222835    13
## [4]  {ASUS 2 Monitor,                                                                          
##       Generic Black 3-Button}      => {iMac}              0.001016777  0.9090909 3.549388    10
## [5]  {Generic Black 3-Button,                                                                  
##       ViewSonic Monitor}           => {iMac}              0.001016777  0.7692308 3.003329    10
## [6]  {Dell Desktop,                                                                            
##       Generic Black 3-Button}      => {iMac}              0.001220132  0.8571429 3.346566    12
## [7]  {Generic Black 3-Button,                                                                  
##       Lenovo Desktop Computer}     => {iMac}              0.001728521  0.8095238 3.160646    17
## [8]  {Generic Black 3-Button,                                                                  
##       HP Laptop}                   => {iMac}              0.001321810  0.6500000 2.537813    13
## [9]  {ASUS Monitor,                                                                            
##       HDMI Adapter}                => {iMac}              0.001016777  0.8333333 3.253606    10
## [10] {HDMI Adapter,                                                                            
##       ViewSonic Monitor}           => {iMac}              0.001321810  0.6842105 2.671382    13

Model Improvement

Adding Lift Mesasurement

Lift measures the importance of a rule. A high value for lift strongly indicates that the rule is important. Unlike the Confidence measurement, {Item 1} -> {Item 2} is the same as {Item 2} -> {Item 1} in the context of the lift measurement. We will inspect the top 10 rules by the lift measurement.
We will manually adjust Support and Confident parameters in the apriori() function above while monitoring the lift measurements until the lift measurements stopping changing. We then get our optimal 10 rules.

# Inspect the top 10 rules sorted by lift:
top.lift <- sort(rule, decreasing = TRUE, na.last = NA, by = "lift")
inspect(head(top.lift, 10))

##      lhs                                                   rhs                                 support confidence     lift count
## [1]  {Apple Earpods,                                                                                                            
##       Logitech MK360 Wireless Keyboard and Mouse Combo} => {Eluktronics Pro Gaming Laptop} 0.001220132  0.6315789 17.69681    12
## [2]  {Apple Earpods,                                                                                                            
##       Microsoft Wireless Comfort Keyboard and Mouse}    => {Slim Wireless Mouse}           0.001220132  0.6315789 16.69779    12
## [3]  {Dell Wired Keyboard,                                                                                                      
##       HDMI Cable 6ft}                                   => {AOC Monitor}                   0.001931876  0.6333333 15.04549    19
## [4]  {Dell Desktop,                                                                                                             
##       HDMI Cable 6ft,                                                                                                           
##       HP Monitor}                                       => {AOC Monitor}                   0.001118454  0.6111111 14.51758    11
## [5]  {Computer Game,                                                                                                            
##       iMac,                                                                                                                     
##       Microsoft Office Home and Student 2016,                                                                                   
##       ViewSonic Monitor}                                => {ASUS Monitor}                  0.001016777  0.6666667 12.03058    10
## [6]  {Computer Game,                                                                                                            
##       Dell Desktop,                                                                                                             
##       iMac,                                                                                                                     
##       Lenovo Desktop Computer}                          => {ASUS Monitor}                  0.001220132  0.6666667 12.03058    12
## [7]  {AOC Monitor,                                                                                                              
##       Dell Desktop,                                                                                                             
##       HP Laptop,                                                                                                                
##       Lenovo Desktop Computer}                          => {ASUS Monitor}                  0.001016777  0.6250000 11.27867    10
## [8]  {Computer Game,                                                                                                            
##       iMac,                                                                                                                     
##       Intel Desktop}                                    => {Apple Magic Keyboard}          0.001118454  0.7333333 10.23026    11
## [9]  {ASUS 2 Monitor,                                                                                                           
##       Dell Desktop,                                                                                                             
##       Intel Desktop}                                    => {Apple Magic Keyboard}          0.001118454  0.7333333 10.23026    11
## [10] {Apple MacBook Pro,                                                                                                        
##       HP Black & Tri-color Ink,                                                                                                 
##       HP Laptop,                                                                                                                
##       iMac}                                             => {Acer Aspire}                   0.001016777  0.8333333 10.06859    10

Subset the rules and specify the left/right hand item

Using subset() funcion to see a specific item’s rules

# Check Apple MacBook Air Rule
ItemRules <- subset(rule, items %in% "Apple MacBook Air")
# Inspect ItemRule 2 along with Apple MacBook Air purchase
inspect(ItemRules[2])

##     lhs                                       rhs                     support confidence     lift count
## [1] {Dell KM117 Wireless Keyboard & Mouse,                                                             
##      iPhone Charger Cable}                 => {Apple MacBook Air} 0.002033554   0.952381 6.122004    20

Apply Apriori rule to specify the left-hand item

Besides of above mentioned function, we can also define which items should appear on either left/right hand side the rules. The purpose of this approch is to find which items should be displayed or bundled with the item that we want to promote.

# Define lhs = HP Laptop
HP_Laptop_Rules_lh <- apriori(transdata, parameter = list(supp = 0.001, conf = 0.1),
appearance = list(default = "rhs", lhs = "HP Laptop"))

inspect(sample(HP_Laptop_Rules_lh,5))

##     lhs            rhs                                                support confidence     lift count
## [1] {}          => {Apple Earpods}                                 0.17437722  0.1743772 1.000000  1715
## [2] {}          => {Dell Desktop}                                  0.13401118  0.1340112 1.000000  1318
## [3] {HP Laptop} => {Acer Aspire}                                   0.02907982  0.1498167 1.810131   286
## [4] {HP Laptop} => {Microsoft Wireless Desktop Keyboard and Mouse} 0.02318251  0.1194343 1.212215   228
## [5] {HP Laptop} => {Samsung Monitor}                               0.02755465  0.1419591 1.483707   271