Electronidex’s Market Basket Analysis

Summary

Blackwell Electronics’ board of directors is considering acquiring Electronidex, we are to help them better understand this company’s clientele and if we should acquire them. After the analysis of Electronidex past month’s transaction data, it was deduced that this company’s customers are mainly other business and retailers that buy their electronics at this store; whereas Blackwell sells to consumers directly. And from the transactional data, we can see the huge volume of items this company sales in a magnitude far bigger than Blackwell’s. The acquisition of Electronidex will prove successful in terms of boosting the sales of Laptops and Desktops, to a high percentage where we will see high rise in revenues and profits. And due to our dominance in Accessory sales, it will boost Electronidex’s sales of that product type; but to achieve these results, a more thorough analysis with different month’s transactions is required to better assess the trends and the statistics with least error margin.

Business Question

The board of directors at Blackwell Electronics is considering acquiring Electronidex, a start-up electronics online retailer. We are tasked with helping them better understand Electronidex’s clientele and if we should acquire them or not. Main objective is to identify the purchasing patterns of Electronidex’s clientele and discovering any interesting relationships (or associations) between customer’s transactions and the item(s) they’ve purchased.

What is Market Basket Analysis?

Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don’t buy a bar meal, you are more likely to buy crisps (US. chips) at the same time than somebody who didn’t buy beer. The set of items a customer buys is referred to as an itemset, and market basket analysis seeks to find relationships between purchases. Typically the relationship will be in the form of a rule:

IF {beer, no bar meal} THEN {crisps}.

The probability that a customer will buy beer without a bar meal (i.e. that the antecedent is true) is referred to as the support for the rule. The conditional probability that a customer will purchase crisps is referred to as the confidence.

Processing the Data

Data Summary

In order to analyse the data in R, we use the summary function to see useful information that will help us better understand this data.

#Loading transactional file (no attributes)
df <- read.transactions("ElectronidexTransactions2017.csv", 
                  format = "basket", 
                  sep=",", 
                  rm.duplicates=F,
                  cols = NULL)

#load product categories list
productCatList <- read.csv("ProductCategoryList.csv", sep=",")
#remove "" as they are not necessary and may give us wrong results
df@itemInfo$labels <- gsub("\"","",df@itemInfo$labels)
#add level1 to categories our products
df@itemInfo$level1 <- productCatList$ProductCategory

#find item that was consumed alone
oneCat <- df[which(size(df) == 1), ] #2163 items consumed alone

#summary giving us lots of good information
summary(df)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  125 columns (items) and a density of 0.03506172 
## 
## most frequent items:
##                     iMac                HP Laptop CYBERPOWER Gamer Desktop 
##                     2519                     1909                     1809 
##            Apple Earpods        Apple MacBook Air                  (Other) 
##                     1715                     1530                    33622 
## 
## element (itemset/transaction) length distribution:
## sizes
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##    2 2163 1647 1294 1021  856  646  540  439  353  247  171  119   77   72 
##   15   16   17   18   19   20   21   22   23   25   26   27   29   30 
##   56   41   26   20   10   10   10    5    3    1    1    3    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   4.383   6.000  30.000 
## 
## includes extended item information - examples:
##                             labels  level1
## 1 1TB Portable External Hard Drive Laptops
## 2 2TB Portable External Hard Drive Laptops
## 3                   3-Button Mouse Laptops

#plot items frequency
itemFrequencyPlot(df,
   topN=10,
   #col=brewer.pal(8,'Pastel2'),
   main='Absolute Item Frequency Plot',
   type="absolute",
   ylab="Item Frequency (Absolute)")

Exploratory Visualisation to better understand the data

After analysing the data in R, we created the plot in figure one. It shows the number of times a product was in a unique transaction. The most popular product was the iMac (Desktop), follow by HP Laptop. That is good because Blackwell’s laptop and PC sales are struggling.

#Most products in one cat transactions
barplot(sort(itemFrequency(oneCat, type="absolute"), decreasing=T))

The most popular products that were purchased alone are:

Apple MacBook Air was purchased 383
iMac was purchased 121
CYBERPOWER GAMER Desktop was purchased 109

We will use an algorithm called Apriori that will analyse the data and output a set of rules with calculated support, confidence and lift. The higher the confidence and lift are, the better the rule is.

#run apriori with sup at 0.01 and conf at 0.5
basket_rules <- apriori(df, parameter = list(sup = 0.01,
                                             conf = 0.4,
                                             minlen = 2,
                                             target="rules"))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [82 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [70 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

basket_rules

## set of 70 rules

inspect(head(basket_rules))

##     lhs                                                        rhs            support confidence     lift count
## [1] {Logitech MK550 Wireless Wave Keyboard and Mouse Combo} => {iMac}      0.01006609  0.4107884 1.603852    99
## [2] {Alienware Laptop}                                      => {iMac}      0.01159126  0.4145455 1.618521   114
## [3] {HDMI Cable 6ft}                                        => {iMac}      0.01148958  0.4414062 1.723394   113
## [4] {Panasonic In-Ear Headphone}                            => {iMac}      0.01077783  0.4326531 1.689219   106
## [5] {Eluktronics Pro Gaming Laptop}                         => {HP Laptop} 0.01443823  0.4045584 2.084249   142
## [6] {Eluktronics Pro Gaming Laptop}                         => {iMac}      0.01576004  0.4415954 1.724133   155

The scatterplot below shows the distribution of 70 rules generated from all products; we can see the presence of few rules with high confidence and lift. But there is a way to product better performing rules by generalising our data.

#scatter plot of association rules we found use apriori
plot(basket_rules)

Below is some output showing the top 6 rules found sorted by highest confidence. We can also perform redundancy check to remove any low tier ruleset.

rules_conf <- sort (basket_rules, by="confidence", decreasing=TRUE) # 'high-confidence' rules.
inspect(head(rules_conf)) # show the support, lift and confidence for all rules

##     lhs                                         rhs         support   
## [1] {Acer Aspire,ViewSonic Monitor}          => {HP Laptop} 0.01077783
## [2] {ASUS 2 Monitor,Lenovo Desktop Computer} => {iMac}      0.01087951
## [3] {Apple Magic Keyboard,Dell Desktop}      => {iMac}      0.01016777
## [4] {ASUS Monitor,HP Laptop}                 => {iMac}      0.01179461
## [5] {ASUS 2 Monitor,HP Laptop}               => {iMac}      0.01108287
## [6] {Dell Desktop,ViewSonic Monitor}         => {HP Laptop} 0.01525165
##     confidence lift     count
## [1] 0.6022727  3.102856 106  
## [2] 0.5911602  2.308083 107  
## [3] 0.5847953  2.283232 100  
## [4] 0.5829146  2.275889 116  
## [5] 0.5828877  2.275784 109  
## [6] 0.5747126  2.960869 150

#check for redundant rules (obsolute)
is.redundant(basket_rules)

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [67] FALSE FALSE FALSE FALSE

In order to get a better idea on how confidence, support and lift affect one another, the following 2 charts were created

#load external csv files 
changingConf <- read.csv("changingconf.csv", sep=",")
changingSupp <- read.csv("changingsupp.csv", sep=",")

#conf vs rules found line graph
ggplot(data = changingConf, aes(x=minConf, y=numRules))+
  geom_point()+
  geom_smooth(se=F, color="#2471A3")+
  ggtitle("Mean Confidence against Rules found (Min supp. = 0.1)")+
  xlab("Mean Conf") + 
  ylab("No. of Rules")+
  theme_bw()

From the line graph, we can see a clear negative correlation between number of rulesets and confidence. This means that we have less rules the highest the confidence. Since confidence is very important and it measures the probability of how often the ruleset is fulfilled, it means we need to keep this value high.

#supp vs rules found line graph
ggplot(data = changingSupp, aes(x=minSupp, y=numRules))+
  geom_point()+
  geom_smooth(se=F, color="#2471A3")+
  ggtitle("Mean Support against Rules found (Min conf. = 0.4)")+
  xlab("Mean Supp") + 
  ylab("No. of Rules")+
  theme_bw()

When support is plotted against rules found, we see an even sharper decline of rules found with higher support. But because support is more general and only measure the number of transactions containing items to fulfill the right hand side of the ruleset, we wont put alot of emphasis on it being a high value.

Ruleset Discovery for individual products

After finding the product categories in common between Blackwell Electronics and Electronidex, we zoom in on each of the 3 common product types and analyze the rules generated based o highest support, confidence and lift.

Desktop

iMac is performing very well in this product type, this category is decent and can help Blackwell in the future with its struggling PC sales.

#remove redundant rules
ruleDesktop <- ruleDesktop[!is.redundant(ruleDesktop)]
summary(ruleDesktop)

## set of 214 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3   4 
##  29 153  32 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   3.014   3.000   4.000 
## 
## summary of quality measures:
##     support           confidence          lift           count       
##  Min.   :0.005084   Min.   :0.4023   Min.   :1.571   Min.   : 50.00  
##  1st Qu.:0.005796   1st Qu.:0.4417   1st Qu.:1.809   1st Qu.: 57.00  
##  Median :0.006914   Median :0.4957   Median :2.007   Median : 68.00  
##  Mean   :0.009073   Mean   :0.5028   Mean   :2.128   Mean   : 89.23  
##  3rd Qu.:0.009761   3rd Qu.:0.5495   3rd Qu.:2.276   3rd Qu.: 96.00  
##  Max.   :0.054601   Max.   :0.7391   Max.   :3.871   Max.   :537.00  
## 
## mining info:
##  data ntransactions support confidence
##    df          9835   0.005        0.4

Expand ruleset output

#Sort inspect(sort(ruleDesktop, ##      lhs ## [1]  {Dell Desktop} ## [2]  {ViewSonic Monitor} ## [3]  {Apple Magic Keyboard} ## [4]  {Microsoft ## [5]  {ASUS 2 Monitor} ## [6]  {ASUS Monitor} ## [7]  {Belkin Mouse Pad} ## [8]  {HP Laptop,ViewSonic ## [9]  {HP Laptop,Lenovo ## [10] {Dell Desktop,HP Laptop} ##      confidence lift     count ## [1]  0.4074355  1.590762 537 ## [2]  0.4479263  1.748851 486 ## [3]  0.4510638  1.761101 318 ## [4]  0.4663609  1.820825 305 ## [5]  0.4867725  1.900519 276 ## [6]  0.4990826  1.948582 272 ## [7]  0.4131944  1.613246 238 ## [8]  0.4936441  1.927348 233 ## [9]  0.5000000  1.952164 227 ## [10] 0.4954751 ins ##      lhs ## [1]  {ASUS 2 Monitor, ##       Dell Desktop, ##       Lenovo Desktop Computer} ## [2]  {ASUS 2 Monitor, ##       ASUS Monitor} ## [3]  {ASUS 2 Monitor, ##       Microsoft ## [4]  {Dell Desktop, ##       Lenovo Desktop Computer, ##       ViewSonic Monitor} ## [5]  {Apple Magic Keyboard, ##       Dell Desktop, ##       Lenovo Desktop Computer} ## [6]  {Apple Magic Keyboard, ##       ASUS Monitor} ## [7]  {Acer Desktop, ##       HP Laptop, ##       ViewSonic Monitor} ## [8]  {Acer Desktop, ##       ASUS 2 Monitor} ## [9]  {ASUS Monitor, ##       ViewSonic Monitor} ## [10] {ASUS Monitor, ##       Dell Desktop} ins ##      lhs ## [1]  {ASUS 2 Monitor, ##       Dell Desktop, ##       iMac} ## [2]  {Acer Aspire, ##       HP Laptop, ##       ViewSonic Monitor} ## [3]  {Acer Aspire, ##       HP Laptop, ##       iMac} ## [4]  {ASUS 2 Monitor, ##       iMac, ##       Lenovo ## [5]  {Apple Magic Keyboard, ##       Dell Desktop, ##       iMac} ## [6]  {Apple Magic Keyboard, ##       iMac, ##       Lenovo ## [7]  {HP Laptop, ##       HP Monitor, ##       iMac} ## [8]  {ASUS 2 Monitor, ##       Dell Desktop} ## [9]  {HP Laptop, ##       Lenovo Desktop Computer, ##       ViewSonic Monitor} ## [10] {iMac, ##       Lenovo Desktop Computer, ##       ViewSonic Monitor}

by top 15 support/conf/lift decreasing = TRUE, by = "support")[1:10])
 rhs    support => {iMac} 0.05460092 => {iMac} 0.04941535 => {iMac} 0.03233350 Office Home and Student 2016} => {iMac} 0.03101169 => {iMac} 0.02806304 => {iMac} 0.02765633 => {iMac} 0.02419929 Monitor}            => {iMac} 0.02369090 Desktop Computer}      => {iMac} 0.02308083 => {iMac} 0.02226741 1.934497 219
 pect(sort(ruleDesktop, decreasing = TRUE, by = "confidence")[1:10]) rhs        support confidence     lift count

=> {iMac} 0.005185562  0.7391304 2.885807    51

=> {iMac} 0.005083884  0.7142857 2.788805    50

Office Home and Student 2016} => {iMac} 0.005185562  0.6986301 2.727681    51

=> {iMac} 0.006914082  0.6938776 2.709125    68

=> {iMac} 0.005287239  0.6842105 2.671382    52

=> {iMac} 0.006812405  0.6700000 2.615899    67

=> {iMac} 0.006405694  0.6562500 2.562215    63

=> {iMac} 0.006405694  0.6428571 2.509925    63

=> {iMac} 0.008235892  0.6377953 2.490161    81

=> {iMac} 0.007930859  0.6341463 2.475915    78
 pect(sort(ruleDesktop, decreasing = TRUE, by = "lift")[1:10]) rhs                           support confidence     lift count

=> {Lenovo Desktop Computer} 0.005185562  0.5730337 3.870732    51

=> {Dell Desktop}            0.005287239  0.4905660 3.660635    52

=> {Dell Desktop}            0.006405694  0.4772727 3.561440    63

Desktop Computer} => {Dell Desktop}            0.005185562  0.4766355 3.556685    51

=> {Lenovo Desktop Computer} 0.005287239  0.5200000 3.512500    52

Desktop Computer} => {Dell Desktop}            0.005287239  0.4642857 3.464530    52

=> {Lenovo Desktop Computer} 0.005388917  0.5096154 3.442354    53

=> {Lenovo Desktop Computer} 0.007015760  0.4893617 3.305544    69

=> {Dell Desktop}            0.006202339  0.4420290 3.298448    61

=> {Dell Desktop}            0.006914082  0.4387097 3.273680    68

Laptop

HP Laptop is performing very well in this product type, this category is decent and can help Blackwell in the future with its struggling laptop sales.

#Subsetting rules with a Laptop in the right hand side 
#(means that people are more likely to purchase it based on thier current purchases)
ruleLaptop <- subset(ruleGeneral, subset = rhs %in% tempLaptop)

#remove redundant rules
ruleLaptop <- ruleLaptop[!is.redundant(ruleLaptop)]
summary(ruleLaptop)

## set of 103 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4 
##  9 81 13 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   3.039   3.000   4.000 
## 
## summary of quality measures:
##     support           confidence          lift           count       
##  Min.   :0.005084   Min.   :0.4028   Min.   :2.075   Min.   : 50.00  
##  1st Qu.:0.005796   1st Qu.:0.4261   1st Qu.:2.195   1st Qu.: 57.00  
##  Median :0.006711   Median :0.4651   Median :2.396   Median : 66.00  
##  Mean   :0.008772   Mean   :0.4860   Mean   :2.504   Mean   : 86.27  
##  3rd Qu.:0.009304   3rd Qu.:0.5248   3rd Qu.:2.704   3rd Qu.: 91.50  
##  Max.   :0.047992   Max.   :0.8125   Max.   :4.186   Max.   :472.00  
## 
## mining info:
##  data ntransactions support confidence
##    df          9835   0.005        0.4

Expand ruleset output

inspect(sort(ruleLaptop, decreasing = TRUE, by = "support")[1:10])

##      lhs                          rhs            support confidence     lift count
## [1]  {ViewSonic Monitor}       => {HP Laptop} 0.04799187  0.4350230 2.241200   472
## [2]  {Apple Magic Keyboard}    => {HP Laptop} 0.02887646  0.4028369 2.075380   284
## [3]  {iMac,                                                                       
##       ViewSonic Monitor}       => {HP Laptop} 0.02369090  0.4794239 2.469950   233
## [4]  {Dell Desktop,                                                               
##       iMac}                    => {HP Laptop} 0.02226741  0.4078212 2.101059   219
## [5]  {Computer Game}           => {HP Laptop} 0.01799695  0.4285714 2.207962   177
## [6]  {HP Wireless Mouse}       => {HP Laptop} 0.01789527  0.4112150 2.118543   176
## [7]  {Acer Desktop,                                                               
##       iMac}                    => {HP Laptop} 0.01596340  0.4385475 2.259358   157
## [8]  {Dell Desktop,                                                               
##       ViewSonic Monitor}       => {HP Laptop} 0.01525165  0.5747126 2.960869   150
## [9]  {Dell Desktop,                                                               
##       Lenovo Desktop Computer} => {HP Laptop} 0.01504830  0.4099723 2.112141   148
## [10] {Apple Magic Keyboard,                                                       
##       iMac}                    => {HP Laptop} 0.01474326  0.4559748 2.349142   145

inspect(sort(ruleLaptop, decreasing = TRUE, by = "confidence")[1:10])

##      lhs                          rhs             support confidence     lift count
## [1]  {Acer Aspire,                                                                 
##       Dell Desktop,                                                                
##       ViewSonic Monitor}       => {HP Laptop} 0.005287239  0.8125000 4.185928    52
## [2]  {Acer Aspire,                                                                 
##       iMac,                                                                        
##       ViewSonic Monitor}       => {HP Laptop} 0.006202339  0.6630435 3.415942    61
## [3]  {Acer Desktop,                                                                
##       iMac,                                                                        
##       ViewSonic Monitor}       => {HP Laptop} 0.006405694  0.6363636 3.278489    63
## [4]  {Dell Desktop,                                                                
##       Lenovo Desktop Computer,                                                     
##       ViewSonic Monitor}       => {HP Laptop} 0.006202339  0.6224490 3.206802    61
## [5]  {Computer Game,                                                               
##       ViewSonic Monitor}       => {HP Laptop} 0.007422471  0.6186441 3.187200    73
## [6]  {Computer Game,                                                               
##       Dell Desktop}            => {HP Laptop} 0.005693950  0.6086957 3.135946    56
## [7]  {Acer Aspire,                                                                 
##       ViewSonic Monitor}       => {HP Laptop} 0.010777834  0.6022727 3.102856   106
## [8]  {Acer Desktop,                                                                
##       Apple Magic Keyboard}    => {HP Laptop} 0.006405694  0.5943396 3.061985    63
## [9]  {Dell Desktop,                                                                
##       iMac,                                                                        
##       ViewSonic Monitor}       => {HP Laptop} 0.008744281  0.5931034 3.055617    86
## [10] {ASUS Chromebook,                                                             
##       Dell Desktop}            => {HP Laptop} 0.005795628  0.5816327 2.996520    57

inspect(sort(ruleLaptop, decreasing = TRUE, by = "lift")[1:10])

##      lhs                          rhs             support confidence     lift count
## [1]  {Acer Aspire,                                                                 
##       Dell Desktop,                                                                
##       ViewSonic Monitor}       => {HP Laptop} 0.005287239  0.8125000 4.185928    52
## [2]  {Acer Aspire,                                                                 
##       iMac,                                                                        
##       ViewSonic Monitor}       => {HP Laptop} 0.006202339  0.6630435 3.415942    61
## [3]  {Acer Desktop,                                                                
##       iMac,                                                                        
##       ViewSonic Monitor}       => {HP Laptop} 0.006405694  0.6363636 3.278489    63
## [4]  {Dell Desktop,                                                                
##       Lenovo Desktop Computer,                                                     
##       ViewSonic Monitor}       => {HP Laptop} 0.006202339  0.6224490 3.206802    61
## [5]  {Computer Game,                                                               
##       ViewSonic Monitor}       => {HP Laptop} 0.007422471  0.6186441 3.187200    73
## [6]  {Computer Game,                                                               
##       Dell Desktop}            => {HP Laptop} 0.005693950  0.6086957 3.135946    56
## [7]  {Acer Aspire,                                                                 
##       ViewSonic Monitor}       => {HP Laptop} 0.010777834  0.6022727 3.102856   106
## [8]  {Acer Desktop,                                                                
##       Apple Magic Keyboard}    => {HP Laptop} 0.006405694  0.5943396 3.061985    63
## [9]  {Dell Desktop,                                                                
##       iMac,                                                                        
##       ViewSonic Monitor}       => {HP Laptop} 0.008744281  0.5931034 3.055617    86
## [10] {ASUS Chromebook,                                                             
##       Dell Desktop}            => {HP Laptop} 0.005795628  0.5816327 2.996520    57

Monitors

Looking at the Monitors, we only have good lift, but both support and confidence are at the lower range with only 7 rulesets. Those rules will not be fullfilled often and thus we can opt them out.

#Subsetting rules with a monitors in the right hand side 
#(means that people are more likely to purchase it based on thier current purchases)
ruleMonitors <- subset(ruleGeneral, subset = rhs %in% tempMonitors)

#remove redundant rules
ruleMonitors <- ruleMonitors[!is.redundant(ruleMonitors)]

summary(ruleMonitors)

## set of 7 rules
## 
## rule length distribution (lhs + rhs):sizes
## 2 3 4 
## 1 2 4 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   3.429   4.000   4.000 
## 
## summary of quality measures:
##     support           confidence          lift           count      
##  Min.   :0.005287   Min.   :0.4013   Min.   :3.637   Min.   :52.00  
##  1st Qu.:0.006202   1st Qu.:0.4112   1st Qu.:3.727   1st Qu.:61.00  
##  Median :0.006406   Median :0.4124   Median :3.738   Median :63.00  
##  Mean   :0.006667   Mean   :0.4295   Mean   :3.893   Mean   :65.57  
##  3rd Qu.:0.007219   3rd Qu.:0.4467   3rd Qu.:4.049   3rd Qu.:71.00  
##  Max.   :0.008134   Max.   :0.4771   Max.   :4.324   Max.   :80.00  
## 
## mining info:
##  data ntransactions support confidence
##    df          9835   0.005        0.4

Expand ruleset output

#Sort by top 15 support/conf/lift
inspect(sort(ruleMonitors, decreasing = TRUE, by = "support"))

##     lhs                           rhs                     support confidence     lift count
## [1] {ASUS Chromebook,                                                                      
##      HP Laptop}                => {ViewSonic Monitor} 0.008134215  0.4102564 3.718776    80
## [2] {Computer Game,                                                                        
##      HP Laptop}                => {ViewSonic Monitor} 0.007422471  0.4124294 3.738473    73
## [3] {HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.007015760  0.4312500 3.909073    69
## [4] {Acer Desktop,                                                                         
##      HP Laptop,                                                                            
##      iMac}                     => {ViewSonic Monitor} 0.006405694  0.4012739 3.637354    63
## [5] {Acer Aspire,                                                                          
##      HP Laptop,                                                                            
##      iMac}                     => {ViewSonic Monitor} 0.006202339  0.4621212 4.188905    61
## [6] {Dell Desktop,                                                                         
##      HP Laptop,                                                                            
##      Lenovo Desktop Computer}  => {ViewSonic Monitor} 0.006202339  0.4121622 3.736051    61
## [7] {Acer Aspire,                                                                          
##      Dell Desktop,                                                                         
##      HP Laptop}                => {ViewSonic Monitor} 0.005287239  0.4770642 4.324356    52

inspect(sort(ruleMonitors, decreasing = TRUE, by = "confidence"))

##     lhs                           rhs                     support confidence     lift count
## [1] {Acer Aspire,                                                                          
##      Dell Desktop,                                                                         
##      HP Laptop}                => {ViewSonic Monitor} 0.005287239  0.4770642 4.324356    52
## [2] {Acer Aspire,                                                                          
##      HP Laptop,                                                                            
##      iMac}                     => {ViewSonic Monitor} 0.006202339  0.4621212 4.188905    61
## [3] {HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.007015760  0.4312500 3.909073    69
## [4] {Computer Game,                                                                        
##      HP Laptop}                => {ViewSonic Monitor} 0.007422471  0.4124294 3.738473    73
## [5] {Dell Desktop,                                                                         
##      HP Laptop,                                                                            
##      Lenovo Desktop Computer}  => {ViewSonic Monitor} 0.006202339  0.4121622 3.736051    61
## [6] {ASUS Chromebook,                                                                      
##      HP Laptop}                => {ViewSonic Monitor} 0.008134215  0.4102564 3.718776    80
## [7] {Acer Desktop,                                                                         
##      HP Laptop,                                                                            
##      iMac}                     => {ViewSonic Monitor} 0.006405694  0.4012739 3.637354    63

inspect(sort(ruleMonitors, decreasing = TRUE, by = "lift"))

##     lhs                           rhs                     support confidence     lift count
## [1] {Acer Aspire,                                                                          
##      Dell Desktop,                                                                         
##      HP Laptop}                => {ViewSonic Monitor} 0.005287239  0.4770642 4.324356    52
## [2] {Acer Aspire,                                                                          
##      HP Laptop,                                                                            
##      iMac}                     => {ViewSonic Monitor} 0.006202339  0.4621212 4.188905    61
## [3] {HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.007015760  0.4312500 3.909073    69
## [4] {Computer Game,                                                                        
##      HP Laptop}                => {ViewSonic Monitor} 0.007422471  0.4124294 3.738473    73
## [5] {Dell Desktop,                                                                         
##      HP Laptop,                                                                            
##      Lenovo Desktop Computer}  => {ViewSonic Monitor} 0.006202339  0.4121622 3.736051    61
## [6] {ASUS Chromebook,                                                                      
##      HP Laptop}                => {ViewSonic Monitor} 0.008134215  0.4102564 3.718776    80
## [7] {Acer Desktop,                                                                         
##      HP Laptop,                                                                            
##      iMac}                     => {ViewSonic Monitor} 0.006405694  0.4012739 3.637354    63

Ruleset Discovery for Product Categories

If we group all products into their original product type, we can focus on the items we are interested in analysing, our common product types. The frequency plot below shows the frequency of Product categories in the transaction records. Desktop is still top on the chart, Computer Mic and active headphones follows but Blackwell doesn’t sell these items, so we ignore them.

#aggregate by cats
dfByType <- aggregate(df, by= df@itemInfo$level1)

#plot items frequency for categories
itemFrequencyPlot(dfByType,
                  topN=10,
                  #col=brewer.pal(8,'Pastel2'),
                  main='Absolute Item Frequency Plot',
                  type="absolute",
                  ylab="Item Frequency (Absolute)")

Running the Apriori algorithm again but this time on the categories, we get a sizable amount of rules, 5000+. In order to clean those out, first we remove the redundant rules. Then we zoom in on the rules that have high confidence and lift. In order to visualize those rules, a scatterplot was created with rules measuring the probability of a desktop being purchased next.

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5   0.005      3
##  maxlen target   ext
##      20  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 49 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[15 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [15 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 done [0.01s].
## writing ... [11554 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Summary

## set of 5284 rules
## 
## rule length distribution (lhs + rhs):sizes
##    3    4    5    6    7    8 
##  423 1403 1887 1229  321   21 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00    4.00    5.00    4.94    6.00    8.00 
## 
## summary of quality measures:
##     support           confidence          lift           count       
##  Min.   :0.005084   Min.   :0.4013   Min.   :1.133   Min.   :  50.0  
##  1st Qu.:0.006812   1st Qu.:0.5755   1st Qu.:1.668   1st Qu.:  67.0  
##  Median :0.009659   Median :0.6384   Median :1.944   Median :  95.0  
##  Mean   :0.013682   Mean   :0.6308   Mean   :1.901   Mean   : 134.6  
##  3rd Qu.:0.015480   3rd Qu.:0.6939   3rd Qu.:2.132   3rd Qu.: 152.2  
##  Max.   :0.106151   Max.   :0.8806   Max.   :2.937   Max.   :1044.0  
## 
## mining info:
##      data ntransactions support confidence
##  dfByType          9835   0.005        0.4

Looking at the rulesets for categories, we see higher support, confidence and lift for all sets, this is because we combined product types into different categories.

Expand ruleset output

#Sort by top 15 support/conf/lift to explore
inspect(sort(ruleByBWCats, decreasing = TRUE, by = "support")[1:15])

##      lhs                                        rhs           support   
## [1]  {Active Headphones,Computer Mice }      => {Desktop}     0.10615150
## [2]  {Active Headphones,Computer Mice }      => {Accessories} 0.09659380
## [3]  {Active Headphones,Computer Mice }      => {Monitors}    0.09649212
## [4]  {Accessories,Computer Mice }            => {Desktop}     0.09639044
## [5]  {Computer Mice ,Desktop}                => {Accessories} 0.09639044
## [6]  {Computer Mice ,Monitors}               => {Desktop}     0.09578038
## [7]  {Computer Mice ,Desktop}                => {Monitors}    0.09578038
## [8]  {Active Headphones,Monitors}            => {Desktop}     0.09567870
## [9]  {Active Headphones,Desktop}             => {Monitors}    0.09567870
## [10] {Computer Headphones,Computer Mice }    => {Desktop}     0.09425521
## [11] {Computer Mice ,Laptops}                => {Desktop}     0.09303508
## [12] {Computer Mice ,Desktop}                => {Laptops}     0.09303508
## [13] {Active Headphones,Computer Mice }      => {Laptops}     0.09191662
## [14] {Active Headphones,Computer Headphones} => {Desktop}     0.09140824
## [15] {Accessories,Active Headphones}         => {Desktop}     0.09130656
##      confidence lift     count
## [1]  0.5680087  1.153969 1044 
## [2]  0.5168662  1.567976  950 
## [3]  0.5163221  1.601901  949 
## [4]  0.5766423  1.171509  948 
## [5]  0.4514286  1.369463  948 
## [6]  0.5931990  1.205146  942 
## [7]  0.4485714  1.391703  942 
## [8]  0.5773006  1.172847  941 
## [9]  0.5097508  1.581514  941 
## [10] 0.5826524  1.183720  927 
## [11] 0.5683230  1.154608  915 
## [12] 0.4357143  1.390863  915 
## [13] 0.4918390  1.570021  904 
## [14] 0.5773924  1.173033  899 
## [15] 0.5756410  1.169475  898

inspect(sort(ruleByBWCats, decreasing = TRUE, by = "confidence")[1:15])

##      lhs                           rhs               support confidence     lift count
## [1]  {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Laptops,                                                                        
##       Printers}                 => {Monitors}    0.005998983  0.8805970 2.732073    59
## [2]  {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Desktop,                                                                        
##       Printers}                 => {Monitors}    0.006304016  0.8611111 2.671618    62
## [3]  {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Tablets,                                                               
##       Desktop,                                                                        
##       Laptops,                                                                        
##       Printers}                 => {Monitors}    0.005490595  0.8571429 2.659306    54
## [4]  {Accessories,                                                                    
##       Computer Headphones,                                                            
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Desktop,                                                                        
##       Printers}                 => {Monitors}    0.005897306  0.8529412 2.646270    58
## [5]  {Active Headphones,                                                              
##       Computer Mice ,                                                                 
##       Printers,                                                                       
##       Smart Home Devices,                                                             
##       Speakers}                 => {Monitors}    0.005490595  0.8437500 2.617754    54
## [6]  {Active Headphones,                                                              
##       Computer Tablets,                                                               
##       Desktop,                                                                        
##       Laptops,                                                                        
##       Monitors,                                                                       
##       Printers}                 => {Accessories} 0.005490595  0.8437500 2.559618    54
## [7]  {Active Headphones,                                                              
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Laptops,                                                                        
##       Monitors,                                                                       
##       Printers}                 => {Accessories} 0.005998983  0.8428571 2.556909    59
## [8]  {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Tablets,                                                               
##       Keyboard,                                                                       
##       Monitors}                 => {Desktop}     0.005083884  0.8333333 1.693004    50
## [9]  {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Headphones,                                                            
##       Computer Mice ,                                                                 
##       Desktop,                                                                        
##       Printers,                                                                       
##       Speakers}                 => {Monitors}    0.005083884  0.8333333 2.585436    50
## [10] {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Headphones,                                                            
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Printers}                 => {Monitors}    0.005998983  0.8309859 2.578153    59
## [11] {Active Headphones,                                                              
##       Computer Headphones,                                                            
##       Computer Mice ,                                                                 
##       Laptops,                                                                        
##       Mouse and Keyboard Combo,                                                       
##       Printers}                 => {Monitors}    0.005388917  0.8281250 2.569277    53
## [12] {Accessories,                                                                    
##       Computer Tablets,                                                               
##       Keyboard,                                                                       
##       Monitors}                 => {Desktop}     0.006304016  0.8266667 1.679460    62
## [13] {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Headphones,                                                            
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Laptops,                                                                        
##       Monitors}                 => {Desktop}     0.005693950  0.8235294 1.673087    56
## [14] {Accessories,                                                                    
##       Active Headphones,                                                              
##       Computer Headphones,                                                            
##       Computer Tablets,                                                               
##       Laptops,                                                                        
##       Monitors}                 => {Desktop}     0.006609049  0.8227848 1.671574    65
## [15] {Accessories,                                                                    
##       Computer Headphones,                                                            
##       Computer Mice ,                                                                 
##       Computer Tablets,                                                               
##       Laptops,                                                                        
##       Printers}                 => {Monitors}    0.005185562  0.8225806 2.552076    51

inspect(sort(ruleByBWCats, decreasing = TRUE, by = "lift")[1:15])

##      lhs                           rhs            support confidence     lift count
## [1]  {Accessories,                                                                 
##       Computer Headphones,                                                         
##       Computer Mice ,                                                              
##       Desktop,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.005795628  0.6951220 2.936651    57
## [2]  {Accessories,                                                                 
##       Computer Headphones,                                                         
##       Computer Mice ,                                                              
##       Laptops,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.005490595  0.6923077 2.924762    54
## [3]  {Accessories,                                                                 
##       Computer Headphones,                                                         
##       Desktop,                                                                     
##       Laptops,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.005083884  0.6849315 2.893600    50
## [4]  {Accessories,                                                                 
##       Active Headphones,                                                           
##       Computer Headphones,                                                         
##       Laptops,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.005490595  0.6750000 2.851643    54
## [5]  {Accessories,                                                                 
##       Computer Mice ,                                                              
##       Desktop,                                                                     
##       Laptops,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.005388917  0.6708861 2.834263    53
## [6]  {Accessories,                                                                 
##       Computer Headphones,                                                         
##       Laptops,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.006914082  0.6601942 2.789094    68
## [7]  {Accessories,                                                                 
##       Active Headphones,                                                           
##       Computer Headphones,                                                         
##       Computer Mice ,                                                              
##       Desktop,                                                                     
##       Mouse and Keyboard Combo} => {Printers} 0.005897306  0.6516854 2.753147    58
## [8]  {Accessories,                                                                 
##       Active Headphones,                                                           
##       Computer Mice ,                                                              
##       Computer Tablets,                                                            
##       Laptops,                                                                     
##       Printers}                 => {Monitors} 0.005998983  0.8805970 2.732073    59
## [9]  {Computer Headphones,                                                         
##       Computer Mice ,                                                              
##       Desktop,                                                                     
##       Laptops,                                                                     
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.005693950  0.6436782 2.719319    56
## [10] {Accessories,                                                                 
##       Active Headphones,                                                           
##       Computer Mice ,                                                              
##       Computer Tablets,                                                            
##       Laptops,                                                                     
##       Monitors}                 => {Printers} 0.005998983  0.6413043 2.709290    59
## [11] {Accessories,                                                                 
##       Computer Mice ,                                                              
##       External Hardrives,                                                          
##       Laptops,                                                                     
##       Monitors}                 => {Printers} 0.005083884  0.6410256 2.708113    50
## [12] {Accessories,                                                                 
##       Computer Tablets,                                                            
##       Speakers}                 => {Printers} 0.005592272  0.6395349 2.701815    55
## [13] {External Hardrives,                                                          
##       Laptops,                                                                     
##       Speakers}                 => {Printers} 0.006100661  0.6382979 2.696589    60
## [14] {External Hardrives,                                                          
##       Monitors,                                                                    
##       Speakers}                 => {Printers} 0.006100661  0.6382979 2.696589    60
## [15] {Accessories,                                                                 
##       Computer Headphones,                                                         
##       External Hardrives,                                                          
##       Laptops,                                                                     
##       Monitors}                 => {Printers} 0.005287239  0.6341463 2.679050    52

ruleByCatsDesktop <- subset(ruleByType, subset = rhs %in% c("Desktop") & lift > 1.5)
#remove duplicates
ruleByCatsDesktop <- ruleByCatsDesktop[!is.redundant(ruleByCatsDesktop)]
#plot
plot(ruleByCatsDesktop, measure=c("support", "confidence"), shading="lift", main="Scatterplot of Desktop Rulesets")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

From the above plot, we can see that most rules have perfect confidence levels but weak support. In this case we will go for the rules with better lift over support, and focus on the rules with that will generate most transactions.

ruleByCatsLatop <- subset(ruleByType, subset = rhs %in% c("Desktop") & lift > 1.5)
#remove duplicates
ruleByCatsLatop <- ruleByCatsLatop[!is.redundant(ruleByCatsLatop)]
#plot
plot(ruleByCatsLatop, measure=c("support", "confidence"), shading="lift", main="Scatterplot of Desktop Rulesets")

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

From this plot, we can tell that laptops have higher number of rules and better support, but lower confidence. We will pick a set of rules with best confidence and lift from here as well in order to boost blackwell’s laptop and PC sales in the future.

Questions and Answers

Are there any interesting patterns or item relationships within Electronidex’s transactions?

After lots of exploration and going through the generated data sets, we can conclude that the transactions are mainly made by big companies because most item sets are multiple computers and computer necessities like monitors, keyboards, mice, printers. Here is an example of a rule set that has decent confidence:

{Accessories, Headphones, Computer Mice, Tablets, Laptops, Printers} => {Monitors}

With this knowledge, we conclude that Electronidex is a Business to Business Company. Since our company is Business to consumer Type Company, this will be challenging to merge because we are not compatible in terms of customers Since Electronidex is a B2B (Business to business) company, acquiring them will prove to be challenging as we are a different company type that serves consumers mainly. But since we will get the business customers they sell to and the experienced employees working there, this acquisition can prove very profitable and good for the future of our company and its expansion.

Would Blackwell benefit from selling any of Electronidex’s items?

Yes they would, mainly the 6 product types in common (Accessories, monitors, printers, laptops, desktops and tablets). The below table sums up the volume sales of Blackwell’s 6 product types in common with Electronidex.

ProductTypes <- c('Accessories','Display','Printer', 'Tablet','Laptop','PC')
VolumeSales <- c('25,216','2,428','2,036','948','516','116')

metrics <- data.frame(ProductTypes, VolumeSales) 

kable(metrics) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),fixed_thead = T)

ProductTypes	VolumeSales
Accessories	25,216
Display	2,428
Printer	2,036
Tablet	948
Laptop	516
PC	116

After analysing our current data with that of Electronidex, we deduce that our sales of laptops, PCs and desktops are very weak compared to most of our products; an acquisition of such company will boost the sales of our less popular products and increase our revenues and profits by a big margin. Due to not having transactional data for Blackwell’s, an analysis to know if products will benefit Electronidex’s items cannot be done at this time.

In your opinion, should Blackwell acquire Electronidex?

With our current conclusions, acquiring Electronidex would be risky because their main customers are other businesses. However, if we hire experts in this field, and with the aid of the experienced employees already working at Electronidex, this transition can be made easier and be a great step towards the future of Blackwell’s in branching out to other markets.

If Blackwell does acquire Electronidex, do you have any recommendations for Blackwell?

A deeper analysis needs to be done on both company’s products. We will be required to remove the items that are not selling well (because we will have 125 more products after the acquisition). We can also change the locations of the items in our stores based on the market basket analysis, keeping printers, laptops, computer and computer accessories all in close proximity to one another (or as recommended items to buy on the website). Furthermore, transactional data from Blackwells will be needed for analysis; we can find additional rulesets that will give us an idea of how Blackwell’s items can benefit Electronidex. Also, transactional data with exact volume of each product purchase is essential to know how much profitability and revenue this acquisition will generate.

Bayes’ Theorem

Bayes’ theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates.

Given a hypothesis H and evidence E, Bayes’ theorem states that the relationship between the probability of the hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence P(H|E) is \[P(H|E)=\frac{P(E|H)}{P(E)} \cdot P(H)\]

Many modern machine learning techniques rely on Bayes’ theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating p-values or interpreting medical results, are best described in terms of how they contribute to updating hypotheses using Bayes’ theorem.

Conclusions

Electronidex is a business to business company (B2B) meaning they sell to other retailers and business that requires a lot of computers. This was discovered from the transactional records because most transactions had multiple combinations of laptops and PC’s purchased at the same time.
Laptop rule set has highest support and confidence which means acquiring Electronidex will boost our laptop sales as the rule set is more likely to be fulfilled.
Eletronidex has 9835 transactions in one month and 125 products in 15 categories; if the acquisition goes through, that will increase the number of products we offer, fulfilling some of the rules we found which will boost the sales of our laptops and desktops at Blackwell.
In common product types between Blackwell and Electronidex are: Laptops, Printers, PC, Monitors, Tablets and Accessories.
We need more information from Electronidex for a more thorough analysis and additional insights if we intend on acquiring the company. More information is needed because one month’s data can be biased based on the season and time of the year, for instance more companies will be restocking on electronics during October and November ahead of Black Friday and Cyber Monday.

Electronidex’s Market Basket Analysis

Kais Kawar

29 April 2019

Summary

Business Question

What is Market Basket Analysis?

Processing the Data

Data Summary

Exploratory Visualisation to better understand the data

Ruleset Discovery for individual products

Desktop

Laptop

Monitors

Ruleset Discovery for Product Categories

Questions and Answers

Are there any interesting patterns or item relationships within Electronidex’s transactions?

Would Blackwell benefit from selling any of Electronidex’s items?

In your opinion, should Blackwell acquire Electronidex?

If Blackwell does acquire Electronidex, do you have any recommendations for Blackwell?

Bayes’ Theorem

Conclusions