Blackwell Electronics’ board of directors is considering acquiring Electronidex, we are to help them better understand this company’s clientele and if we should acquire them. After the analysis of Electronidex past month’s transaction data, it was deduced that this company’s customers are mainly other business and retailers that buy their electronics at this store; whereas Blackwell sells to consumers directly. And from the transactional data, we can see the huge volume of items this company sales in a magnitude far bigger than Blackwell’s. The acquisition of Electronidex will prove successful in terms of boosting the sales of Laptops and Desktops, to a high percentage where we will see high rise in revenues and profits. And due to our dominance in Accessory sales, it will boost Electronidex’s sales of that product type; but to achieve these results, a more thorough analysis with different month’s transactions is required to better assess the trends and the statistics with least error margin.
The board of directors at Blackwell Electronics is considering acquiring Electronidex, a start-up electronics online retailer. We are tasked with helping them better understand Electronidex’s clientele and if we should acquire them or not. Main objective is to identify the purchasing patterns of Electronidex’s clientele and discovering any interesting relationships (or associations) between customer’s transactions and the item(s) they’ve purchased.
Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don’t buy a bar meal, you are more likely to buy crisps (US. chips) at the same time than somebody who didn’t buy beer. The set of items a customer buys is referred to as an itemset, and market basket analysis seeks to find relationships between purchases. Typically the relationship will be in the form of a rule:
IF {beer, no bar meal} THEN {crisps}.
The probability that a customer will buy beer without a bar meal (i.e. that the antecedent is true) is referred to as the support for the rule. The conditional probability that a customer will purchase crisps is referred to as the confidence.
In order to analyse the data in R, we use the summary function to see useful information that will help us better understand this data.
#Loading transactional file (no attributes)
df <- read.transactions("ElectronidexTransactions2017.csv",
format = "basket",
sep=",",
rm.duplicates=F,
cols = NULL)
#load product categories list
productCatList <- read.csv("ProductCategoryList.csv", sep=",")
#remove "" as they are not necessary and may give us wrong results
df@itemInfo$labels <- gsub("\"","",df@itemInfo$labels)
#add level1 to categories our products
df@itemInfo$level1 <- productCatList$ProductCategory
#find item that was consumed alone
oneCat <- df[which(size(df) == 1), ] #2163 items consumed alone
#summary giving us lots of good information
summary(df)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 125 columns (items) and a density of 0.03506172
##
## most frequent items:
## iMac HP Laptop CYBERPOWER Gamer Desktop
## 2519 1909 1809
## Apple Earpods Apple MacBook Air (Other)
## 1715 1530 33622
##
## element (itemset/transaction) length distribution:
## sizes
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 2 2163 1647 1294 1021 856 646 540 439 353 247 171 119 77 72
## 15 16 17 18 19 20 21 22 23 25 26 27 29 30
## 56 41 26 20 10 10 10 5 3 1 1 3 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 3.000 4.383 6.000 30.000
##
## includes extended item information - examples:
## labels level1
## 1 1TB Portable External Hard Drive Laptops
## 2 2TB Portable External Hard Drive Laptops
## 3 3-Button Mouse Laptops
#plot items frequency
itemFrequencyPlot(df,
topN=10,
#col=brewer.pal(8,'Pastel2'),
main='Absolute Item Frequency Plot',
type="absolute",
ylab="Item Frequency (Absolute)")
After analysing the data in R, we created the plot in figure one. It shows the number of times a product was in a unique transaction. The most popular product was the iMac (Desktop), follow by HP Laptop. That is good because Blackwell’s laptop and PC sales are struggling.
#Most products in one cat transactions
barplot(sort(itemFrequency(oneCat, type="absolute"), decreasing=T))
The most popular products that were purchased alone are:
We will use an algorithm called Apriori that will analyse the data and output a set of rules with calculated support, confidence and lift. The higher the confidence and lift are, the better the rule is.
#run apriori with sup at 0.01 and conf at 0.5
basket_rules <- apriori(df, parameter = list(sup = 0.01,
conf = 0.4,
minlen = 2,
target="rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [82 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [70 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
basket_rules
## set of 70 rules
inspect(head(basket_rules))
## lhs rhs support confidence lift count
## [1] {Logitech MK550 Wireless Wave Keyboard and Mouse Combo} => {iMac} 0.01006609 0.4107884 1.603852 99
## [2] {Alienware Laptop} => {iMac} 0.01159126 0.4145455 1.618521 114
## [3] {HDMI Cable 6ft} => {iMac} 0.01148958 0.4414062 1.723394 113
## [4] {Panasonic In-Ear Headphone} => {iMac} 0.01077783 0.4326531 1.689219 106
## [5] {Eluktronics Pro Gaming Laptop} => {HP Laptop} 0.01443823 0.4045584 2.084249 142
## [6] {Eluktronics Pro Gaming Laptop} => {iMac} 0.01576004 0.4415954 1.724133 155
The scatterplot below shows the distribution of 70 rules generated from all products; we can see the presence of few rules with high confidence and lift. But there is a way to product better performing rules by generalising our data.
#scatter plot of association rules we found use apriori
plot(basket_rules)
Below is some output showing the top 6 rules found sorted by highest confidence. We can also perform redundancy check to remove any low tier ruleset.
rules_conf <- sort (basket_rules, by="confidence", decreasing=TRUE) # 'high-confidence' rules.
inspect(head(rules_conf)) # show the support, lift and confidence for all rules
## lhs rhs support
## [1] {Acer Aspire,ViewSonic Monitor} => {HP Laptop} 0.01077783
## [2] {ASUS 2 Monitor,Lenovo Desktop Computer} => {iMac} 0.01087951
## [3] {Apple Magic Keyboard,Dell Desktop} => {iMac} 0.01016777
## [4] {ASUS Monitor,HP Laptop} => {iMac} 0.01179461
## [5] {ASUS 2 Monitor,HP Laptop} => {iMac} 0.01108287
## [6] {Dell Desktop,ViewSonic Monitor} => {HP Laptop} 0.01525165
## confidence lift count
## [1] 0.6022727 3.102856 106
## [2] 0.5911602 2.308083 107
## [3] 0.5847953 2.283232 100
## [4] 0.5829146 2.275889 116
## [5] 0.5828877 2.275784 109
## [6] 0.5747126 2.960869 150
#check for redundant rules (obsolute)
is.redundant(basket_rules)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [67] FALSE FALSE FALSE FALSE
In order to get a better idea on how confidence, support and lift affect one another, the following 2 charts were created
#load external csv files
changingConf <- read.csv("changingconf.csv", sep=",")
changingSupp <- read.csv("changingsupp.csv", sep=",")
#conf vs rules found line graph
ggplot(data = changingConf, aes(x=minConf, y=numRules))+
geom_point()+
geom_smooth(se=F, color="#2471A3")+
ggtitle("Mean Confidence against Rules found (Min supp. = 0.1)")+
xlab("Mean Conf") +
ylab("No. of Rules")+
theme_bw()
From the line graph, we can see a clear negative correlation between number of rulesets and confidence. This means that we have less rules the highest the confidence. Since confidence is very important and it measures the probability of how often the ruleset is fulfilled, it means we need to keep this value high.
#supp vs rules found line graph
ggplot(data = changingSupp, aes(x=minSupp, y=numRules))+
geom_point()+
geom_smooth(se=F, color="#2471A3")+
ggtitle("Mean Support against Rules found (Min conf. = 0.4)")+
xlab("Mean Supp") +
ylab("No. of Rules")+
theme_bw()
When support is plotted against rules found, we see an even sharper decline of rules found with higher support. But because support is more general and only measure the number of transactions containing items to fulfill the right hand side of the ruleset, we wont put alot of emphasis on it being a high value.
After finding the product categories in common between Blackwell Electronics and Electronidex, we zoom in on each of the 3 common product types and analyze the rules generated based o highest support, confidence and lift.
iMac is performing very well in this product type, this category is decent and can help Blackwell in the future with its struggling PC sales.
#remove redundant rules
ruleDesktop <- ruleDesktop[!is.redundant(ruleDesktop)]
summary(ruleDesktop)
## set of 214 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 29 153 32
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.000 3.014 3.000 4.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.005084 Min. :0.4023 Min. :1.571 Min. : 50.00
## 1st Qu.:0.005796 1st Qu.:0.4417 1st Qu.:1.809 1st Qu.: 57.00
## Median :0.006914 Median :0.4957 Median :2.007 Median : 68.00
## Mean :0.009073 Mean :0.5028 Mean :2.128 Mean : 89.23
## 3rd Qu.:0.009761 3rd Qu.:0.5495 3rd Qu.:2.276 3rd Qu.: 96.00
## Max. :0.054601 Max. :0.7391 Max. :3.871 Max. :537.00
##
## mining info:
## data ntransactions support confidence
## df 9835 0.005 0.4
#Sort by top 15 support/conf/lift
inspect(sort(ruleDesktop, decreasing = TRUE, by = "support")[1:10])
## lhs rhs support
## [1] {Dell Desktop} => {iMac} 0.05460092
## [2] {ViewSonic Monitor} => {iMac} 0.04941535
## [3] {Apple Magic Keyboard} => {iMac} 0.03233350
## [4] {Microsoft Office Home and Student 2016} => {iMac} 0.03101169
## [5] {ASUS 2 Monitor} => {iMac} 0.02806304
## [6] {ASUS Monitor} => {iMac} 0.02765633
## [7] {Belkin Mouse Pad} => {iMac} 0.02419929
## [8] {HP Laptop,ViewSonic Monitor} => {iMac} 0.02369090
## [9] {HP Laptop,Lenovo Desktop Computer} => {iMac} 0.02308083
## [10] {Dell Desktop,HP Laptop} => {iMac} 0.02226741
## confidence lift count
## [1] 0.4074355 1.590762 537
## [2] 0.4479263 1.748851 486
## [3] 0.4510638 1.761101 318
## [4] 0.4663609 1.820825 305
## [5] 0.4867725 1.900519 276
## [6] 0.4990826 1.948582 272
## [7] 0.4131944 1.613246 238
## [8] 0.4936441 1.927348 233
## [9] 0.5000000 1.952164 227
## [10] 0.4954751 1.934497 219
inspect(sort(ruleDesktop, decreasing = TRUE, by = "confidence")[1:10])
## lhs rhs support confidence lift count
## [1] {ASUS 2 Monitor,
## Dell Desktop,
## Lenovo Desktop Computer} => {iMac} 0.005185562 0.7391304 2.885807 51
## [2] {ASUS 2 Monitor,
## ASUS Monitor} => {iMac} 0.005083884 0.7142857 2.788805 50
## [3] {ASUS 2 Monitor,
## Microsoft Office Home and Student 2016} => {iMac} 0.005185562 0.6986301 2.727681 51
## [4] {Dell Desktop,
## Lenovo Desktop Computer,
## ViewSonic Monitor} => {iMac} 0.006914082 0.6938776 2.709125 68
## [5] {Apple Magic Keyboard,
## Dell Desktop,
## Lenovo Desktop Computer} => {iMac} 0.005287239 0.6842105 2.671382 52
## [6] {Apple Magic Keyboard,
## ASUS Monitor} => {iMac} 0.006812405 0.6700000 2.615899 67
## [7] {Acer Desktop,
## HP Laptop,
## ViewSonic Monitor} => {iMac} 0.006405694 0.6562500 2.562215 63
## [8] {Acer Desktop,
## ASUS 2 Monitor} => {iMac} 0.006405694 0.6428571 2.509925 63
## [9] {ASUS Monitor,
## ViewSonic Monitor} => {iMac} 0.008235892 0.6377953 2.490161 81
## [10] {ASUS Monitor,
## Dell Desktop} => {iMac} 0.007930859 0.6341463 2.475915 78
inspect(sort(ruleDesktop, decreasing = TRUE, by = "lift")[1:10])
## lhs rhs support confidence lift count
## [1] {ASUS 2 Monitor,
## Dell Desktop,
## iMac} => {Lenovo Desktop Computer} 0.005185562 0.5730337 3.870732 51
## [2] {Acer Aspire,
## HP Laptop,
## ViewSonic Monitor} => {Dell Desktop} 0.005287239 0.4905660 3.660635 52
## [3] {Acer Aspire,
## HP Laptop,
## iMac} => {Dell Desktop} 0.006405694 0.4772727 3.561440 63
## [4] {ASUS 2 Monitor,
## iMac,
## Lenovo Desktop Computer} => {Dell Desktop} 0.005185562 0.4766355 3.556685 51
## [5] {Apple Magic Keyboard,
## Dell Desktop,
## iMac} => {Lenovo Desktop Computer} 0.005287239 0.5200000 3.512500 52
## [6] {Apple Magic Keyboard,
## iMac,
## Lenovo Desktop Computer} => {Dell Desktop} 0.005287239 0.4642857 3.464530 52
## [7] {HP Laptop,
## HP Monitor,
## iMac} => {Lenovo Desktop Computer} 0.005388917 0.5096154 3.442354 53
## [8] {ASUS 2 Monitor,
## Dell Desktop} => {Lenovo Desktop Computer} 0.007015760 0.4893617 3.305544 69
## [9] {HP Laptop,
## Lenovo Desktop Computer,
## ViewSonic Monitor} => {Dell Desktop} 0.006202339 0.4420290 3.298448 61
## [10] {iMac,
## Lenovo Desktop Computer,
## ViewSonic Monitor} => {Dell Desktop} 0.006914082 0.4387097 3.273680 68
HP Laptop is performing very well in this product type, this category is decent and can help Blackwell in the future with its struggling laptop sales.
#Subsetting rules with a Laptop in the right hand side
#(means that people are more likely to purchase it based on thier current purchases)
ruleLaptop <- subset(ruleGeneral, subset = rhs %in% tempLaptop)
#remove redundant rules
ruleLaptop <- ruleLaptop[!is.redundant(ruleLaptop)]
summary(ruleLaptop)
## set of 103 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 9 81 13
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.000 3.039 3.000 4.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.005084 Min. :0.4028 Min. :2.075 Min. : 50.00
## 1st Qu.:0.005796 1st Qu.:0.4261 1st Qu.:2.195 1st Qu.: 57.00
## Median :0.006711 Median :0.4651 Median :2.396 Median : 66.00
## Mean :0.008772 Mean :0.4860 Mean :2.504 Mean : 86.27
## 3rd Qu.:0.009304 3rd Qu.:0.5248 3rd Qu.:2.704 3rd Qu.: 91.50
## Max. :0.047992 Max. :0.8125 Max. :4.186 Max. :472.00
##
## mining info:
## data ntransactions support confidence
## df 9835 0.005 0.4
inspect(sort(ruleLaptop, decreasing = TRUE, by = "support")[1:10])
## lhs rhs support confidence lift count
## [1] {ViewSonic Monitor} => {HP Laptop} 0.04799187 0.4350230 2.241200 472
## [2] {Apple Magic Keyboard} => {HP Laptop} 0.02887646 0.4028369 2.075380 284
## [3] {iMac,
## ViewSonic Monitor} => {HP Laptop} 0.02369090 0.4794239 2.469950 233
## [4] {Dell Desktop,
## iMac} => {HP Laptop} 0.02226741 0.4078212 2.101059 219
## [5] {Computer Game} => {HP Laptop} 0.01799695 0.4285714 2.207962 177
## [6] {HP Wireless Mouse} => {HP Laptop} 0.01789527 0.4112150 2.118543 176
## [7] {Acer Desktop,
## iMac} => {HP Laptop} 0.01596340 0.4385475 2.259358 157
## [8] {Dell Desktop,
## ViewSonic Monitor} => {HP Laptop} 0.01525165 0.5747126 2.960869 150
## [9] {Dell Desktop,
## Lenovo Desktop Computer} => {HP Laptop} 0.01504830 0.4099723 2.112141 148
## [10] {Apple Magic Keyboard,
## iMac} => {HP Laptop} 0.01474326 0.4559748 2.349142 145
inspect(sort(ruleLaptop, decreasing = TRUE, by = "confidence")[1:10])
## lhs rhs support confidence lift count
## [1] {Acer Aspire,
## Dell Desktop,
## ViewSonic Monitor} => {HP Laptop} 0.005287239 0.8125000 4.185928 52
## [2] {Acer Aspire,
## iMac,
## ViewSonic Monitor} => {HP Laptop} 0.006202339 0.6630435 3.415942 61
## [3] {Acer Desktop,
## iMac,
## ViewSonic Monitor} => {HP Laptop} 0.006405694 0.6363636 3.278489 63
## [4] {Dell Desktop,
## Lenovo Desktop Computer,
## ViewSonic Monitor} => {HP Laptop} 0.006202339 0.6224490 3.206802 61
## [5] {Computer Game,
## ViewSonic Monitor} => {HP Laptop} 0.007422471 0.6186441 3.187200 73
## [6] {Computer Game,
## Dell Desktop} => {HP Laptop} 0.005693950 0.6086957 3.135946 56
## [7] {Acer Aspire,
## ViewSonic Monitor} => {HP Laptop} 0.010777834 0.6022727 3.102856 106
## [8] {Acer Desktop,
## Apple Magic Keyboard} => {HP Laptop} 0.006405694 0.5943396 3.061985 63
## [9] {Dell Desktop,
## iMac,
## ViewSonic Monitor} => {HP Laptop} 0.008744281 0.5931034 3.055617 86
## [10] {ASUS Chromebook,
## Dell Desktop} => {HP Laptop} 0.005795628 0.5816327 2.996520 57
inspect(sort(ruleLaptop, decreasing = TRUE, by = "lift")[1:10])
## lhs rhs support confidence lift count
## [1] {Acer Aspire,
## Dell Desktop,
## ViewSonic Monitor} => {HP Laptop} 0.005287239 0.8125000 4.185928 52
## [2] {Acer Aspire,
## iMac,
## ViewSonic Monitor} => {HP Laptop} 0.006202339 0.6630435 3.415942 61
## [3] {Acer Desktop,
## iMac,
## ViewSonic Monitor} => {HP Laptop} 0.006405694 0.6363636 3.278489 63
## [4] {Dell Desktop,
## Lenovo Desktop Computer,
## ViewSonic Monitor} => {HP Laptop} 0.006202339 0.6224490 3.206802 61
## [5] {Computer Game,
## ViewSonic Monitor} => {HP Laptop} 0.007422471 0.6186441 3.187200 73
## [6] {Computer Game,
## Dell Desktop} => {HP Laptop} 0.005693950 0.6086957 3.135946 56
## [7] {Acer Aspire,
## ViewSonic Monitor} => {HP Laptop} 0.010777834 0.6022727 3.102856 106
## [8] {Acer Desktop,
## Apple Magic Keyboard} => {HP Laptop} 0.006405694 0.5943396 3.061985 63
## [9] {Dell Desktop,
## iMac,
## ViewSonic Monitor} => {HP Laptop} 0.008744281 0.5931034 3.055617 86
## [10] {ASUS Chromebook,
## Dell Desktop} => {HP Laptop} 0.005795628 0.5816327 2.996520 57
Looking at the Monitors, we only have good lift, but both support and confidence are at the lower range with only 7 rulesets. Those rules will not be fullfilled often and thus we can opt them out.
#Subsetting rules with a monitors in the right hand side
#(means that people are more likely to purchase it based on thier current purchases)
ruleMonitors <- subset(ruleGeneral, subset = rhs %in% tempMonitors)
#remove redundant rules
ruleMonitors <- ruleMonitors[!is.redundant(ruleMonitors)]
summary(ruleMonitors)
## set of 7 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 1 2 4
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 4.000 3.429 4.000 4.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.005287 Min. :0.4013 Min. :3.637 Min. :52.00
## 1st Qu.:0.006202 1st Qu.:0.4112 1st Qu.:3.727 1st Qu.:61.00
## Median :0.006406 Median :0.4124 Median :3.738 Median :63.00
## Mean :0.006667 Mean :0.4295 Mean :3.893 Mean :65.57
## 3rd Qu.:0.007219 3rd Qu.:0.4467 3rd Qu.:4.049 3rd Qu.:71.00
## Max. :0.008134 Max. :0.4771 Max. :4.324 Max. :80.00
##
## mining info:
## data ntransactions support confidence
## df 9835 0.005 0.4
#Sort by top 15 support/conf/lift
inspect(sort(ruleMonitors, decreasing = TRUE, by = "support"))
## lhs rhs support confidence lift count
## [1] {ASUS Chromebook,
## HP Laptop} => {ViewSonic Monitor} 0.008134215 0.4102564 3.718776 80
## [2] {Computer Game,
## HP Laptop} => {ViewSonic Monitor} 0.007422471 0.4124294 3.738473 73
## [3] {HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.007015760 0.4312500 3.909073 69
## [4] {Acer Desktop,
## HP Laptop,
## iMac} => {ViewSonic Monitor} 0.006405694 0.4012739 3.637354 63
## [5] {Acer Aspire,
## HP Laptop,
## iMac} => {ViewSonic Monitor} 0.006202339 0.4621212 4.188905 61
## [6] {Dell Desktop,
## HP Laptop,
## Lenovo Desktop Computer} => {ViewSonic Monitor} 0.006202339 0.4121622 3.736051 61
## [7] {Acer Aspire,
## Dell Desktop,
## HP Laptop} => {ViewSonic Monitor} 0.005287239 0.4770642 4.324356 52
inspect(sort(ruleMonitors, decreasing = TRUE, by = "confidence"))
## lhs rhs support confidence lift count
## [1] {Acer Aspire,
## Dell Desktop,
## HP Laptop} => {ViewSonic Monitor} 0.005287239 0.4770642 4.324356 52
## [2] {Acer Aspire,
## HP Laptop,
## iMac} => {ViewSonic Monitor} 0.006202339 0.4621212 4.188905 61
## [3] {HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.007015760 0.4312500 3.909073 69
## [4] {Computer Game,
## HP Laptop} => {ViewSonic Monitor} 0.007422471 0.4124294 3.738473 73
## [5] {Dell Desktop,
## HP Laptop,
## Lenovo Desktop Computer} => {ViewSonic Monitor} 0.006202339 0.4121622 3.736051 61
## [6] {ASUS Chromebook,
## HP Laptop} => {ViewSonic Monitor} 0.008134215 0.4102564 3.718776 80
## [7] {Acer Desktop,
## HP Laptop,
## iMac} => {ViewSonic Monitor} 0.006405694 0.4012739 3.637354 63
inspect(sort(ruleMonitors, decreasing = TRUE, by = "lift"))
## lhs rhs support confidence lift count
## [1] {Acer Aspire,
## Dell Desktop,
## HP Laptop} => {ViewSonic Monitor} 0.005287239 0.4770642 4.324356 52
## [2] {Acer Aspire,
## HP Laptop,
## iMac} => {ViewSonic Monitor} 0.006202339 0.4621212 4.188905 61
## [3] {HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.007015760 0.4312500 3.909073 69
## [4] {Computer Game,
## HP Laptop} => {ViewSonic Monitor} 0.007422471 0.4124294 3.738473 73
## [5] {Dell Desktop,
## HP Laptop,
## Lenovo Desktop Computer} => {ViewSonic Monitor} 0.006202339 0.4121622 3.736051 61
## [6] {ASUS Chromebook,
## HP Laptop} => {ViewSonic Monitor} 0.008134215 0.4102564 3.718776 80
## [7] {Acer Desktop,
## HP Laptop,
## iMac} => {ViewSonic Monitor} 0.006405694 0.4012739 3.637354 63
If we group all products into their original product type, we can focus on the items we are interested in analysing, our common product types. The frequency plot below shows the frequency of Product categories in the transaction records. Desktop is still top on the chart, Computer Mic and active headphones follows but Blackwell doesn’t sell these items, so we ignore them.
#aggregate by cats
dfByType <- aggregate(df, by= df@itemInfo$level1)
#plot items frequency for categories
itemFrequencyPlot(dfByType,
topN=10,
#col=brewer.pal(8,'Pastel2'),
main='Absolute Item Frequency Plot',
type="absolute",
ylab="Item Frequency (Absolute)")
Running the Apriori algorithm again but this time on the categories, we get a sizable amount of rules, 5000+. In order to clean those out, first we remove the redundant rules. Then we zoom in on the rules that have high confidence and lift. In order to visualize those rules, a scatterplot was created with rules measuring the probability of a desktop being purchased next.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.005 3
## maxlen target ext
## 20 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 49
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[15 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [15 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 done [0.01s].
## writing ... [11554 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Summary
## set of 5284 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5 6 7 8
## 423 1403 1887 1229 321 21
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 4.00 5.00 4.94 6.00 8.00
##
## summary of quality measures:
## support confidence lift count
## Min. :0.005084 Min. :0.4013 Min. :1.133 Min. : 50.0
## 1st Qu.:0.006812 1st Qu.:0.5755 1st Qu.:1.668 1st Qu.: 67.0
## Median :0.009659 Median :0.6384 Median :1.944 Median : 95.0
## Mean :0.013682 Mean :0.6308 Mean :1.901 Mean : 134.6
## 3rd Qu.:0.015480 3rd Qu.:0.6939 3rd Qu.:2.132 3rd Qu.: 152.2
## Max. :0.106151 Max. :0.8806 Max. :2.937 Max. :1044.0
##
## mining info:
## data ntransactions support confidence
## dfByType 9835 0.005 0.4
Looking at the rulesets for categories, we see higher support, confidence and lift for all sets, this is because we combined product types into different categories.
#Sort by top 15 support/conf/lift to explore
inspect(sort(ruleByBWCats, decreasing = TRUE, by = "support")[1:15])
## lhs rhs support
## [1] {Active Headphones,Computer Mice } => {Desktop} 0.10615150
## [2] {Active Headphones,Computer Mice } => {Accessories} 0.09659380
## [3] {Active Headphones,Computer Mice } => {Monitors} 0.09649212
## [4] {Accessories,Computer Mice } => {Desktop} 0.09639044
## [5] {Computer Mice ,Desktop} => {Accessories} 0.09639044
## [6] {Computer Mice ,Monitors} => {Desktop} 0.09578038
## [7] {Computer Mice ,Desktop} => {Monitors} 0.09578038
## [8] {Active Headphones,Monitors} => {Desktop} 0.09567870
## [9] {Active Headphones,Desktop} => {Monitors} 0.09567870
## [10] {Computer Headphones,Computer Mice } => {Desktop} 0.09425521
## [11] {Computer Mice ,Laptops} => {Desktop} 0.09303508
## [12] {Computer Mice ,Desktop} => {Laptops} 0.09303508
## [13] {Active Headphones,Computer Mice } => {Laptops} 0.09191662
## [14] {Active Headphones,Computer Headphones} => {Desktop} 0.09140824
## [15] {Accessories,Active Headphones} => {Desktop} 0.09130656
## confidence lift count
## [1] 0.5680087 1.153969 1044
## [2] 0.5168662 1.567976 950
## [3] 0.5163221 1.601901 949
## [4] 0.5766423 1.171509 948
## [5] 0.4514286 1.369463 948
## [6] 0.5931990 1.205146 942
## [7] 0.4485714 1.391703 942
## [8] 0.5773006 1.172847 941
## [9] 0.5097508 1.581514 941
## [10] 0.5826524 1.183720 927
## [11] 0.5683230 1.154608 915
## [12] 0.4357143 1.390863 915
## [13] 0.4918390 1.570021 904
## [14] 0.5773924 1.173033 899
## [15] 0.5756410 1.169475 898
inspect(sort(ruleByBWCats, decreasing = TRUE, by = "confidence")[1:15])
## lhs rhs support confidence lift count
## [1] {Accessories,
## Active Headphones,
## Computer Mice ,
## Computer Tablets,
## Laptops,
## Printers} => {Monitors} 0.005998983 0.8805970 2.732073 59
## [2] {Accessories,
## Active Headphones,
## Computer Mice ,
## Computer Tablets,
## Desktop,
## Printers} => {Monitors} 0.006304016 0.8611111 2.671618 62
## [3] {Accessories,
## Active Headphones,
## Computer Tablets,
## Desktop,
## Laptops,
## Printers} => {Monitors} 0.005490595 0.8571429 2.659306 54
## [4] {Accessories,
## Computer Headphones,
## Computer Mice ,
## Computer Tablets,
## Desktop,
## Printers} => {Monitors} 0.005897306 0.8529412 2.646270 58
## [5] {Active Headphones,
## Computer Mice ,
## Printers,
## Smart Home Devices,
## Speakers} => {Monitors} 0.005490595 0.8437500 2.617754 54
## [6] {Active Headphones,
## Computer Tablets,
## Desktop,
## Laptops,
## Monitors,
## Printers} => {Accessories} 0.005490595 0.8437500 2.559618 54
## [7] {Active Headphones,
## Computer Mice ,
## Computer Tablets,
## Laptops,
## Monitors,
## Printers} => {Accessories} 0.005998983 0.8428571 2.556909 59
## [8] {Accessories,
## Active Headphones,
## Computer Tablets,
## Keyboard,
## Monitors} => {Desktop} 0.005083884 0.8333333 1.693004 50
## [9] {Accessories,
## Active Headphones,
## Computer Headphones,
## Computer Mice ,
## Desktop,
## Printers,
## Speakers} => {Monitors} 0.005083884 0.8333333 2.585436 50
## [10] {Accessories,
## Active Headphones,
## Computer Headphones,
## Computer Mice ,
## Computer Tablets,
## Printers} => {Monitors} 0.005998983 0.8309859 2.578153 59
## [11] {Active Headphones,
## Computer Headphones,
## Computer Mice ,
## Laptops,
## Mouse and Keyboard Combo,
## Printers} => {Monitors} 0.005388917 0.8281250 2.569277 53
## [12] {Accessories,
## Computer Tablets,
## Keyboard,
## Monitors} => {Desktop} 0.006304016 0.8266667 1.679460 62
## [13] {Accessories,
## Active Headphones,
## Computer Headphones,
## Computer Mice ,
## Computer Tablets,
## Laptops,
## Monitors} => {Desktop} 0.005693950 0.8235294 1.673087 56
## [14] {Accessories,
## Active Headphones,
## Computer Headphones,
## Computer Tablets,
## Laptops,
## Monitors} => {Desktop} 0.006609049 0.8227848 1.671574 65
## [15] {Accessories,
## Computer Headphones,
## Computer Mice ,
## Computer Tablets,
## Laptops,
## Printers} => {Monitors} 0.005185562 0.8225806 2.552076 51
inspect(sort(ruleByBWCats, decreasing = TRUE, by = "lift")[1:15])
## lhs rhs support confidence lift count
## [1] {Accessories,
## Computer Headphones,
## Computer Mice ,
## Desktop,
## Monitors,
## Speakers} => {Printers} 0.005795628 0.6951220 2.936651 57
## [2] {Accessories,
## Computer Headphones,
## Computer Mice ,
## Laptops,
## Monitors,
## Speakers} => {Printers} 0.005490595 0.6923077 2.924762 54
## [3] {Accessories,
## Computer Headphones,
## Desktop,
## Laptops,
## Monitors,
## Speakers} => {Printers} 0.005083884 0.6849315 2.893600 50
## [4] {Accessories,
## Active Headphones,
## Computer Headphones,
## Laptops,
## Monitors,
## Speakers} => {Printers} 0.005490595 0.6750000 2.851643 54
## [5] {Accessories,
## Computer Mice ,
## Desktop,
## Laptops,
## Monitors,
## Speakers} => {Printers} 0.005388917 0.6708861 2.834263 53
## [6] {Accessories,
## Computer Headphones,
## Laptops,
## Monitors,
## Speakers} => {Printers} 0.006914082 0.6601942 2.789094 68
## [7] {Accessories,
## Active Headphones,
## Computer Headphones,
## Computer Mice ,
## Desktop,
## Mouse and Keyboard Combo} => {Printers} 0.005897306 0.6516854 2.753147 58
## [8] {Accessories,
## Active Headphones,
## Computer Mice ,
## Computer Tablets,
## Laptops,
## Printers} => {Monitors} 0.005998983 0.8805970 2.732073 59
## [9] {Computer Headphones,
## Computer Mice ,
## Desktop,
## Laptops,
## Monitors,
## Speakers} => {Printers} 0.005693950 0.6436782 2.719319 56
## [10] {Accessories,
## Active Headphones,
## Computer Mice ,
## Computer Tablets,
## Laptops,
## Monitors} => {Printers} 0.005998983 0.6413043 2.709290 59
## [11] {Accessories,
## Computer Mice ,
## External Hardrives,
## Laptops,
## Monitors} => {Printers} 0.005083884 0.6410256 2.708113 50
## [12] {Accessories,
## Computer Tablets,
## Speakers} => {Printers} 0.005592272 0.6395349 2.701815 55
## [13] {External Hardrives,
## Laptops,
## Speakers} => {Printers} 0.006100661 0.6382979 2.696589 60
## [14] {External Hardrives,
## Monitors,
## Speakers} => {Printers} 0.006100661 0.6382979 2.696589 60
## [15] {Accessories,
## Computer Headphones,
## External Hardrives,
## Laptops,
## Monitors} => {Printers} 0.005287239 0.6341463 2.679050 52
ruleByCatsDesktop <- subset(ruleByType, subset = rhs %in% c("Desktop") & lift > 1.5)
#remove duplicates
ruleByCatsDesktop <- ruleByCatsDesktop[!is.redundant(ruleByCatsDesktop)]
#plot
plot(ruleByCatsDesktop, measure=c("support", "confidence"), shading="lift", main="Scatterplot of Desktop Rulesets")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
From the above plot, we can see that most rules have perfect confidence levels but weak support. In this case we will go for the rules with better lift over support, and focus on the rules with that will generate most transactions.
ruleByCatsLatop <- subset(ruleByType, subset = rhs %in% c("Desktop") & lift > 1.5)
#remove duplicates
ruleByCatsLatop <- ruleByCatsLatop[!is.redundant(ruleByCatsLatop)]
#plot
plot(ruleByCatsLatop, measure=c("support", "confidence"), shading="lift", main="Scatterplot of Desktop Rulesets")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
From this plot, we can tell that laptops have higher number of rules and better support, but lower confidence. We will pick a set of rules with best confidence and lift from here as well in order to boost blackwell’s laptop and PC sales in the future.
After lots of exploration and going through the generated data sets, we can conclude that the transactions are mainly made by big companies because most item sets are multiple computers and computer necessities like monitors, keyboards, mice, printers. Here is an example of a rule set that has decent confidence:
{Accessories, Headphones, Computer Mice, Tablets, Laptops, Printers} => {Monitors}
With this knowledge, we conclude that Electronidex is a Business to Business Company. Since our company is Business to consumer Type Company, this will be challenging to merge because we are not compatible in terms of customers Since Electronidex is a B2B (Business to business) company, acquiring them will prove to be challenging as we are a different company type that serves consumers mainly. But since we will get the business customers they sell to and the experienced employees working there, this acquisition can prove very profitable and good for the future of our company and its expansion.
Yes they would, mainly the 6 product types in common (Accessories, monitors, printers, laptops, desktops and tablets). The below table sums up the volume sales of Blackwell’s 6 product types in common with Electronidex.
ProductTypes <- c('Accessories','Display','Printer', 'Tablet','Laptop','PC')
VolumeSales <- c('25,216','2,428','2,036','948','516','116')
metrics <- data.frame(ProductTypes, VolumeSales)
kable(metrics) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),fixed_thead = T)
| ProductTypes | VolumeSales |
|---|---|
| Accessories | 25,216 |
| Display | 2,428 |
| Printer | 2,036 |
| Tablet | 948 |
| Laptop | 516 |
| PC | 116 |
After analysing our current data with that of Electronidex, we deduce that our sales of laptops, PCs and desktops are very weak compared to most of our products; an acquisition of such company will boost the sales of our less popular products and increase our revenues and profits by a big margin. Due to not having transactional data for Blackwell’s, an analysis to know if products will benefit Electronidex’s items cannot be done at this time.
With our current conclusions, acquiring Electronidex would be risky because their main customers are other businesses. However, if we hire experts in this field, and with the aid of the experienced employees already working at Electronidex, this transition can be made easier and be a great step towards the future of Blackwell’s in branching out to other markets.
A deeper analysis needs to be done on both company’s products. We will be required to remove the items that are not selling well (because we will have 125 more products after the acquisition). We can also change the locations of the items in our stores based on the market basket analysis, keeping printers, laptops, computer and computer accessories all in close proximity to one another (or as recommended items to buy on the website). Furthermore, transactional data from Blackwells will be needed for analysis; we can find additional rulesets that will give us an idea of how Blackwell’s items can benefit Electronidex. Also, transactional data with exact volume of each product purchase is essential to know how much profitability and revenue this acquisition will generate.
Bayes’ theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates.
Given a hypothesis H and evidence E, Bayes’ theorem states that the relationship between the probability of the hypothesis before getting the evidence P(H) and the probability of the hypothesis after getting the evidence P(H|E) is \[P(H|E)=\frac{P(E|H)}{P(E)} \cdot P(H)\]
Many modern machine learning techniques rely on Bayes’ theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating p-values or interpreting medical results, are best described in terms of how they contribute to updating hypotheses using Bayes’ theorem.