To import the data, we are going to set remove duplicates = TRUE, this means removing the duplicate items in a single transaction - not duplicate transactions which would be valid.
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:arules':
##
## intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(ggplot2)
transactionsMatrix<- read.transactions(file("C:\\Users\\domsi\\OneDrive\\Documents\\M3T4\\ElectronidexTransactions2017.csv"), format='basket', sep=',',rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line found on
## 'C:\Users\domsi\OneDrive\Documents\M3T4\ElectronidexTransactions2017.csv'
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
We can see our database has 9,835 different transactions:
length(transactionsMatrix)
## [1] 9835
The below graph shows us the top 20 most frequent items in our data:
iMac’s are the most frequently bought items, lets see in what percentage of transactions iMacs are in:
itemFrequency(transactionsMatrix)['iMac']
## iMac
## 0.2561261
We can use the Apriori Algorithm to filter out irrelevant association rules and home-in rules with more relevance.
In the below rule, we are setting a support of 0.1%, meaning that all components of individual rules must be in at least 0.1% of all transactions (in this instance 0.1% = 9 counts). A confidence of 0.8 (80%) equates to the minimum probability of finding the rhs item in a transaction given the lhs items are present. Finally, the minimum length of 2 and max length of 4 sets the rules to not be too long to be over complicated but not too small to be useless.
RULE2<-apriori(transactionsMatrix, parameter=list(supp=0.001, conf = 0.8, minlen=2, maxlen=4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [125 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(transactionsMatrix, parameter = list(supp = 0.001, conf =
## 0.8, : Mining stopped (maxlen reached). Only patterns up to a length of 4
## returned!
## done [0.01s].
## writing ... [333 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
length(RULE2)
## [1] 333
We see there are 333 rules that fit our criteria. We can now do some filtering to find the most useful rules.
Below, we are removing the redundant rules (of which there was one), and confirming that all the rules are statistically significant:
RULE2<- RULE2[!is.redundant(RULE2)]
length(RULE2)
## [1] 332
RULE2<- RULE2[is.significant(RULE2,transactions = transactionsMatrix, method = 'fisher', alpha=0.05,adjust='bonferroni')]
length(RULE2)
## [1] 332
Now we can get a summary of our rules, showing 30 rules are length 3 and 302 are length 4. We have some rules with a lift of 8, and confidence of 1. The support are all quite low in these rules, with the maximum support .5% (or in 52 transactions).
summary(RULE2)
## set of 332 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4
## 30 302
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 4.00 4.00 3.91 4.00 4.00
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001017 Min. :0.8000 Min. :0.001017 Min. :3.123
## 1st Qu.:0.001017 1st Qu.:0.8261 1st Qu.:0.001220 1st Qu.:3.254
## Median :0.001220 Median :0.8462 Median :0.001423 Median :3.549
## Mean :0.001306 Mean :0.8569 Mean :0.001535 Mean :3.912
## 3rd Qu.:0.001423 3rd Qu.:0.8750 3rd Qu.:0.001627 3rd Qu.:4.330
## Max. :0.005287 Max. :1.0000 Max. :0.006507 Max. :8.367
## count
## Min. :10.00
## 1st Qu.:10.00
## Median :12.00
## Mean :12.84
## 3rd Qu.:14.00
## Max. :52.00
##
## mining info:
## data ntransactions support confidence
## transactionsMatrix 9835 0.001 0.8
## call
## apriori(data = transactionsMatrix, parameter = list(supp = 0.001, conf = 0.8, minlen = 2, maxlen = 4))
Now we can sort by the highest confidence and look at the top 20 results to see the rules with the highest confidence.
RULE2 <- sort(RULE2,by='confidence', descending=TRUE)
inspect(head(RULE2,20))
## lhs rhs support confidence coverage lift count
## [1] {Brother Printer,
## Halter Acrylic Monitor Stand} => {iMac} 0.001118454 1.0000000 0.001118454 3.904327 11
## [2] {ASUS Monitor,
## Mackie CR Speakers,
## ViewSonic Monitor} => {iMac} 0.001016777 1.0000000 0.001016777 3.904327 10
## [3] {Apple Magic Keyboard,
## Rii LED Gaming Keyboard & Mouse Combo,
## ViewSonic Monitor} => {iMac} 0.001728521 1.0000000 0.001728521 3.904327 17
## [4] {ASUS Monitor,
## Koss Home Headphones,
## Microsoft Office Home and Student 2016} => {iMac} 0.001016777 1.0000000 0.001016777 3.904327 10
## [5] {Acer Aspire,
## Koss Home Headphones,
## ViewSonic Monitor} => {HP Laptop} 0.001220132 1.0000000 0.001220132 5.151912 12
## [6] {Dell Desktop,
## Koss Home Headphones,
## ViewSonic Monitor} => {HP Laptop} 0.001118454 1.0000000 0.001118454 5.151912 11
## [7] {ASUS 2 Monitor,
## Dell Desktop,
## Logitech Keyboard} => {iMac} 0.001016777 1.0000000 0.001016777 3.904327 10
## [8] {Alienware Laptop,
## ASUS Desktop,
## Lenovo Desktop Computer} => {iMac} 0.001016777 1.0000000 0.001016777 3.904327 10
## [9] {Brother Printer,
## Dell Desktop,
## Epson Printer} => {iMac} 0.001118454 1.0000000 0.001118454 3.904327 11
## [10] {Apple Magic Keyboard,
## Brother Printer,
## ViewSonic Monitor} => {iMac} 0.001016777 1.0000000 0.001016777 3.904327 10
## [11] {ASUS Desktop,
## Dell Desktop,
## Microsoft Office Home and Student 2016} => {iMac} 0.001016777 1.0000000 0.001016777 3.904327 10
## [12] {Intel Desktop,
## iPad Pro,
## Microsoft Office Home and Student 2016} => {iMac} 0.001118454 1.0000000 0.001118454 3.904327 11
## [13] {Acer Aspire,
## ASUS 2 Monitor,
## Intel Desktop} => {HP Laptop} 0.001016777 1.0000000 0.001016777 5.151912 10
## [14] {ASUS Monitor,
## Intel Desktop,
## ViewSonic Monitor} => {Lenovo Desktop Computer} 0.001118454 1.0000000 0.001118454 6.754808 11
## [15] {ASUS 2 Monitor,
## Computer Game,
## Dell Desktop} => {HP Laptop} 0.001016777 1.0000000 0.001016777 5.151912 10
## [16] {Dell KM117 Wireless Keyboard & Mouse,
## iPhone Charger Cable} => {Apple MacBook Air} 0.002033554 0.9523810 0.002135231 6.122004 20
## [17] {ASUS Desktop,
## Dell Desktop,
## iPad Pro} => {iMac} 0.001525165 0.9375000 0.001626843 3.660307 15
## [18] {3-Button Mouse,
## Acer Desktop,
## ASUS Monitor} => {HP Laptop} 0.001525165 0.9375000 0.001626843 4.829917 15
## [19] {Dell Desktop,
## Mackie CR Speakers,
## ViewSonic Monitor} => {HP Laptop} 0.001423488 0.9333333 0.001525165 4.808451 14
## [20] {Dell Desktop,
## Mackie CR Speakers,
## ViewSonic Monitor} => {iMac} 0.001423488 0.9333333 0.001525165 3.644039 14
Firstly, we see that the majority of the rules have a rhs of iMac. iMac’s are the most frequently bought item (in 25% of all transactions) so there will be more rules with iMac in, however, to achieve such a high confidence rate shows a clear trend of purchasing habits.
The top 15 rules all have a confidence of 1, and every rhs is either a desktop or laptop computer. Intuitively this makes sense, with a desktop computer you need a monitor, mouse, keyboard and accessories which we can see in the top 4 rules. After that, we start to see some interesting transactions like desktops and laptops being bought in the same transaction, or multiple desktops being bought in one transaction (Rule 8 for example).
We can look into this more by categorizing the items.
We can also explore the data when we categorize each item into their relevant label, and see if this provides more insight.
Laptops<- c('LG Touchscreen Laptop',
'Acer Aspire',
'HP Laptop',
'ASUS Chromebook',
'Apple Macbook Pro',
'Apple MacBook Air',
'Dell Laptop',
'Eluktronics Pro Gaming Laptop',
'Alienware AW17R4-7345SLV-PUS 17" Laptop',
'HP Notebook Touchscreen Laptop PC')
Desktop<-c('Lenovo Desktop Computer',
'iMac',
'HP Desktop',
'ASUS Desktop',
'Dell Desktop',
'Intel Desktop',
'Acer Desktop',
'CYBERPOWER Gamer Desktop',
'Dell 2 Desktop')
Monitors<-c('Acer Monitor',
'LG Monitor',
'ASUS Monitor',
'ASUS 2 Monitor',
'Dell Monitor',
'Samsung Monitor',
'Sceptre Monitor',
'ViewSonic Monitor',
'AOC Monitor',
'HP Monitor')
ComputerMice<-c('3-Button Mouse',
'Logitech Wireless Mouse',
'Microsoft Basic Optical Mouse',
'Logitech 3-button Mouse',
'Redragon Gaming Mouse',
'HP Wireless Mouse',
'Generic Black 3-Button',
'Wireless Portable Mouse',
'Gaming Mouse Professional',
'Slim Wireless Mouse')
Keyboards<- c('HP USB Keyboard',
'Logitech Wireless Keyboard',
'Rii LED Keyboard',
'Logitech Keyboard',
'Backlit LED Gaming Keyboard',
'Dell Wired Keyboard',
'Apple Wired Keyboard',
'Apple Wireless Keyboard',
'Apple Magic Keyboard')
MouseAndKeyboard<- c('Logitech MK550 Wireless Wave Keyboard and Mouse Combo',
'Logitech Desktop MK120 Mouse and keyboard Combo',
'Logitech MK270 Wireless Keyboard and Mouse Combo',
'Dell KM117 Wireless Keyboard & Mouse',
'EagleTec Wireless Combo Keyboard and Mouse',
'Microsoft Wireless Comfort Keyboard and Mouse',
'Microsoft Wireless Desktop Keyboard and Mouse',
'Rii LED Gaming Keyboard & Mouse Combo',
'Logitech MK360 Wireless Keyboard and Mouse Combo')
ComputerHeadphones<-c('Zombie Gaming Headset',
'Logitech ClearChat Headset',
'Panasonic On-Ear Stereo Headphones RP-HT21',
'PC Gaming Headset',
'Kensington Headphones',
'Logitech Stereo Headset',
'Koss Home Headphones',
'Microsoft Headset',
'Ailihen Stereo Headphones',
'XIBERIA Gaming Headset')
ActiveHeadphones<-c('Apple Earpods',
'Monster Beats By Dr Dre',
'Otium Wireless Sports Bluetooth Headphones',
'Panasonic In-Ear Headphone',
'APIE Bluetooth Headphones',
'Philips Flexible Earhook Headphones')
ComputerCords<-c('HDMI Cable 6ft',
'Ethernet Cable',
'Etekcity Power Extension Cord Cable',
'Audio Cable',
'VGA Monitor Cable',
'iPhone Charger Cable',
'HDMI Adapter',
'USB Cable',
'Samsung Charging Cable')
Accessories<- c('Microsoft Office Home and Student 2016',
'Computer Game',
'Belkin Mouse Pad',
'Large Mouse Pad')
Speakers<- c('Cambridge Bluetooth Speaker',
'JBL Splashproof Portable Bluetooth Speaker',
'DOSS Touch Wireless Bluetooth',
'Logitech Multimedia Speakers',
'Rokono Mini Speaker',
'Cyber Acoustics',
'Bose Companion Speakers',
'Mackie CR Speakers',
'Sonos')
Printers<-c('Epson Printer',
'HP Wireless Printer',
'Canon Office Printer',
'Brother Printer',
'DYMO Label Manker' )
PrinterInk<-c('Epson Black Ink',
'HP Black & Tri-color Ink',
'Canon Ink',
'Brother Printer Toner',
'DYMO Labeling Tape')
ComputerStand<- c('Halter Acrylic Monitor Stand',
'Height-Adjustable Standing Desk',
'Multi Media Stand',
'Halter Mesh Metal Monitor Stand',
'Full Motion Monitor Mount')
Tablets<- c('iPad',
'iPad Pro',
'Fire HD Tablet',
'Samsung Galaxy Tab',
'Kindle')
ExHarddrive<- c('1TB Portable External Hard Drive',
'2TB Portable External Hard Drive',
'5TB Desktop Hard Drive',
'Slim 2TB Portable External Hard Drive',
'3TB Portable External Hard Drive')
SmartDevices<-c('Apple TV',
'Google Home',
'Smart Light Bulb',
'Fire TV Stick',
'Roku Express')
Now we convert the items into categories and create a new dataframe, trans, which we can apply the same algorithm to explore.
path = "C:\\Users\\domsi\\OneDrive\\Documents\\M3T4\\ElectronidexTransactions2017.csv"
path2 = "C:\\Users\\domsi\\OneDrive\\Documents\\M3T4\\"
df <- read.csv(path, header=FALSE, stringsAsFactors=FALSE)
df <- df %>% mutate(across(where(is.character), str_trim))
dnew <- df %>%
mutate_all(
funs(case_when(
. %in% Accessories ~ "Accessories",
. %in% ActiveHeadphones ~ "Active Headphones",
. %in% ComputerCords ~ "Computer Chords",
. %in% ComputerHeadphones ~ "Computer Headphones",
. %in% ComputerMice ~ "Computer Mice",
. %in% ComputerStand ~ "Computer Stand",
. %in% Desktop ~ "Desktop",
. %in% ExHarddrive ~ "External Harddrive",
. %in% Keyboards ~ "Keyboards",
. %in% Laptops ~ "Laptops",
. %in% Monitors ~ "Monitors",
. %in% MouseAndKeyboard ~ "Mouse and Keyboard",
. %in% PrinterInk ~ "Printer Ink",
. %in% Printers ~ "Printers",
. %in% SmartDevices ~ "Smart Devices",
. %in% Speakers ~ "Speakers",
. %in% Tablets ~ "Tablets"
)))
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
write.csv(dnew, paste0(path2,"new.csv"), row.names=FALSE)
trans <- read.transactions( paste0(path2,"new.csv"), sep = ',',header=TRUE)
## Warning in asMethod(object): removing duplicated items in transactions
We can see Desktops and Laptops are the standout most frequent items, with Monitors and Computer Mice behind.
We use the Apriori algorithm again to create statistically significant rules:
RULEtrans<-apriori(trans, parameter=list(supp=100/length(trans), conf = 0.8, maxlen=4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01016777 1
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 99
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(trans, parameter = list(supp = 100/length(trans), conf =
## 0.8, : Mining stopped (maxlen reached). Only patterns up to a length of 4
## returned!
## done [0.00s].
## writing ... [194 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
length(RULEtrans)
## [1] 194
RULEtrans<- RULEtrans[!is.redundant(RULEtrans)]
length(RULEtrans)
## [1] 189
RULEtrans<- RULEtrans[is.significant(RULEtrans,transactions = trans, method = 'fisher', alpha=0.05,adjust='bonferroni')]
RULEtrans<- sort(RULEtrans,by='confidence')
summary(RULEtrans)
## set of 189 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4
## 36 153
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 4.00 4.00 3.81 4.00 4.00
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01027 Min. :0.8000 Min. :0.01139 Min. :1.379
## 1st Qu.:0.01230 1st Qu.:0.8213 1st Qu.:0.01484 1st Qu.:1.419
## Median :0.01668 Median :0.8358 Median :0.02003 Median :1.445
## Mean :0.02048 Mean :0.8394 Mean :0.02453 Mean :1.466
## 3rd Qu.:0.02410 3rd Qu.:0.8551 3rd Qu.:0.02857 3rd Qu.:1.479
## Max. :0.10473 Max. :0.9286 Max. :0.13005 Max. :2.072
## count
## Min. : 101.0
## 1st Qu.: 121.0
## Median : 164.0
## Mean : 201.4
## 3rd Qu.: 237.0
## Max. :1030.0
##
## mining info:
## data ntransactions support confidence
## trans 9835 0.01016777 0.8
## call
## apriori(data = trans, parameter = list(supp = 100/length(trans), conf = 0.8, maxlen = 4))
inspect(head(RULEtrans,20))
## lhs rhs support confidence coverage lift count
## [1] {Computer Mice,
## Keyboards,
## Printer Ink} => {Desktop} 0.01057448 0.9285714 0.01138790 1.601069 104
## [2] {External Harddrive,
## Laptops,
## Monitors} => {Desktop} 0.01098119 0.9075630 0.01209964 1.564846 108
## [3] {Laptops,
## Printers,
## Tablets} => {Desktop} 0.01291307 0.9071429 0.01423488 1.564122 127
## [4] {Computer Headphones,
## Monitors,
## Printers} => {Desktop} 0.01331978 0.8972603 0.01484494 1.547082 131
## [5] {Accessories,
## Computer Headphones,
## Computer Mice} => {Desktop} 0.01026945 0.8938053 0.01148958 1.541125 101
## [6] {Monitors,
## Printers,
## Tablets} => {Desktop} 0.01382816 0.8888889 0.01555669 1.532648 136
## [7] {Keyboards,
## Monitors,
## Printer Ink} => {Desktop} 0.01535333 0.8882353 0.01728521 1.531521 151
## [8] {Computer Mice,
## Monitors,
## Smart Devices} => {Desktop} 0.01209964 0.8880597 0.01362481 1.531218 119
## [9] {Computer Chords,
## Computer Headphones,
## Monitors} => {Desktop} 0.01037112 0.8869565 0.01169293 1.529316 102
## [10] {Monitors,
## Speakers,
## Tablets} => {Desktop} 0.01189629 0.8863636 0.01342145 1.528294 117
## [11] {Computer Mice,
## Keyboards,
## Tablets} => {Desktop} 0.01555669 0.8843931 0.01759024 1.524896 153
## [12] {Accessories,
## Computer Headphones,
## Monitors} => {Desktop} 0.01504830 0.8809524 0.01708185 1.518963 148
## [13] {Computer Mice,
## Monitors,
## Tablets} => {Desktop} 0.02226741 0.8795181 0.02531774 1.516490 219
## [14] {Accessories,
## Laptops,
## Smart Devices} => {Desktop} 0.01037112 0.8793103 0.01179461 1.516132 102
## [15] {Accessories,
## Monitors,
## Printer Ink} => {Desktop} 0.01026945 0.8782609 0.01169293 1.514323 101
## [16] {Computer Chords,
## Computer Mice,
## Monitors} => {Desktop} 0.02247077 0.8769841 0.02562278 1.512121 221
## [17] {Computer Mice,
## Mouse and Keyboard,
## Tablets} => {Desktop} 0.01220132 0.8759124 0.01392984 1.510273 120
## [18] {Keyboards,
## Laptops,
## Printer Ink} => {Desktop} 0.01504830 0.8757396 0.01718353 1.509975 148
## [19] {Accessories,
## Computer Mice,
## Tablets} => {Desktop} 0.01067616 0.8750000 0.01220132 1.508700 105
## [20] {Monitors,
## Mouse and Keyboard,
## Tablets} => {Desktop} 0.01789527 0.8712871 0.02053889 1.502298 176
We can see that Desktop computers have a high confidence of being purchased across a variety of different rules.
Rule 1 for example, implies that given a customer has Computer Mice, Keyboard and Printer Ink in their basket, there is a 93% probability they will also buy a Desktop computer. It is important to remember that this rule is only in one direction, so (using Rule 1 again) if a customer has a Desktop in their basket the probability of there being Computer Mice, Keyboards and Printer Ink is not 93%.
Notice there are some interesting transactions where Laptops, Monitors, Desktops and Printers are being bought multiple times in the same transactions. This would indicate Businesses are also making (large) transactions with Electronidex. This would be a boost to Blackwell, adding B2B sales to their existing B2C sales.
In the below we will look into rules we can find after splitting transactions into B2B or B2C.
dnew2<- dnew
#Create columns with the summed quantity of products
dnew2$nLaptops <- rowSums(dnew2 == "Laptops", na.rm=TRUE)
dnew2$nDesktop <- rowSums(dnew2 == "Desktop", na.rm=TRUE)
dnew2$nMonitors <- rowSums(dnew2 == "Monitors", na.rm=TRUE)
dnew2$nComputer_Mice <- rowSums(dnew2 == "Computer Mice", na.rm=TRUE)
dnew2$nKeyboard <- rowSums(dnew2 == "Keyboards", na.rm=TRUE)
dnew2$nMouse_and_Keyboard_Combo <- rowSums(dnew2 == "Mouse and Keyboard", na.rm=TRUE)
dnew2$nComputer_Headphones <- rowSums(dnew2 == "Computer Headphones", na.rm=TRUE)
dnew2$nActive_Headphones <- rowSums(dnew2 == "Active Headphones", na.rm=TRUE)
dnew2$nComputer_Cords <- rowSums(dnew2 == "Computer Cords", na.rm=TRUE)
dnew2$nAccessories <- rowSums(dnew2 == "Accessories", na.rm=TRUE)
dnew2$nSpeakers <- rowSums(dnew2 == "Speakers", na.rm=TRUE)
dnew2$nPrinters <- rowSums(dnew2 == "Printers", na.rm=TRUE)
dnew2$nPrinter_Ink <- rowSums(dnew2 == "Printer Ink", na.rm=TRUE)
dnew2$nComputer_Stands <- rowSums(dnew2 == "Computer Stand", na.rm=TRUE)
dnew2$nComputer_Tablets <- rowSums(dnew2 == "Tablets", na.rm=TRUE)
dnew2$nExternal_Hardrives <- rowSums(dnew2 == "External Hardrives", na.rm=TRUE)
dnew2$nSmart_Home_Devices <- rowSums(dnew2 == "Smart Devices", na.rm=TRUE)
#Create a B2B column to populate with 1 if categorised as B2B, or 0 as B2C
dnew2<-cbind(dnew2,B2B=0)
t<-2 #Threshold
#B2B conditions
dnew2 %>%
mutate(B2B = case_when(nLaptops>t | nDesktop>t | nMonitors>t | nComputer_Mice>t | nMouse_and_Keyboard_Combo>t
|nComputer_Stands>t ~ 1,
TRUE ~ 0)) -> dnew2
#B2B dataframe into transaction matrix
B2Bdf<- subset(dnew2, B2B == 1)
B2Bdf<- B2Bdf[,1:32]
write.csv(B2Bdf, paste0(path2,"B2Bdf.csv"), row.names=FALSE)
B2Btrans <- read.transactions( paste0(path2,"B2Bdf.csv"), sep = ',',header=TRUE, rm.duplicates = TRUE)
## distribution of transactions with duplicates:
## items
## 18 19 20 21 22 23 24 25 26 27 28 29 30
## 5 11 32 53 75 111 168 210 258 224 152 69 26
#B2C transactions into matrix
B2Cdf<- subset(dnew2,B2B==0)
B2Cdf<- B2Cdf[,1:32]
write.csv(B2Cdf, paste0(path2,"B2Cdf.csv"), row.names=FALSE)
B2Ctrans <- read.transactions( paste0(path2,"B2Cdf.csv"), sep = ',',header=TRUE, rm.duplicates = TRUE)
## distribution of transactions with duplicates:
## items
## 20 21 22 23 24 25 26 27 28 29 30 31
## 16 20 54 115 267 457 675 1042 1424 1870 2394 99
With the above assumption for a B2B transaction, 1522 transactions are B2B representing 15% (9835 transactions in total) of Electronidex sales over this time period. This would prove a huge boost for Blackwell given they currently have no B2B sales.
length(B2Btrans)
## [1] 1398
We can look at the item frequency for B2B transactions, with Desktops and Laptops the two most frequent shortly followed by Monitors.
For B2C transactions, Desktops and Laptops again are the most frequent but by some margin.
We can see in the below B2B rules that we have a higher number of rules with higher confidence in finding Monitors, and Laptops as well as the expected Desktops. This would be a useful basis to boost initial sales for Blackwell should they purchase Electronidex, as we can offer package deals based on the below rules to business customers.
For example, we can offer discounts on packages over a certain amount. If we give 20% the set of {Accessories, Computer Cords, Speakers} after two sets have already been purchased we may increase sales of Monitors (Rule 1). We can then look at Rule 6 and offer discounts on Keyboards which, when paired with the Monitors, could increase sales in Desktop computers.
#B2B Rules
RULEB2B<-apriori(B2Btrans, parameter=list(supp=0.01, conf = 0.8, minlen=2, maxlen=4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 13
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 1398 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(B2Btrans, parameter = list(supp = 0.01, conf = 0.8, minlen =
## 2, : Mining stopped (maxlen reached). Only patterns up to a length of 4
## returned!
## done [0.00s].
## writing ... [1597 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
RULEB2B<- RULEB2B[!is.redundant(RULEB2B)]
RULEB2B<- sort(RULEB2B,by='confidence', descending=TRUE)
RULEB2B<- RULEB2B[is.significant(RULEB2B,transactions = B2Btrans, method = 'fisher', alpha=0.05,adjust='bonferroni')]
inspect(head(RULEB2B,10))
## lhs rhs support confidence coverage lift count
## [1] {Computer Stand,
## Keyboards,
## Printers} => {Laptops} 0.02861230 1.0000000 0.02861230 1.414980 40
## [2] {Computer Headphones,
## Computer Mice,
## Printer Ink} => {Laptops} 0.02002861 1.0000000 0.02002861 1.414980 28
## [3] {Keyboards,
## Printer Ink,
## Speakers} => {Laptops} 0.02002861 1.0000000 0.02002861 1.414980 28
## [4] {Computer Mice,
## Printer Ink,
## Printers} => {Laptops} 0.02432046 1.0000000 0.02432046 1.414980 34
## [5] {Monitors,
## Printer Ink,
## Speakers} => {Laptops} 0.03004292 0.9767442 0.03075823 1.382073 42
## [6] {Computer Chords,
## Keyboards,
## Speakers} => {Monitors} 0.02932761 0.9761905 0.03004292 1.322398 41
## [7] {Accessories,
## Computer Headphones,
## Computer Mice} => {Laptops} 0.04291845 0.9677419 0.04434907 1.369335 60
## [8] {Computer Mice,
## Monitors,
## Printer Ink} => {Laptops} 0.05937053 0.9651163 0.06151645 1.365620 83
## [9] {Printer Ink,
## Speakers} => {Laptops} 0.03290415 0.9583333 0.03433476 1.356022 46
## [10] {Active Headphones,
## Computer Chords,
## Computer Mice} => {Laptops} 0.04148784 0.9508197 0.04363376 1.345391 58
The B2C rules are not dissimilar to that of the entire transaction dataset. From a Blackwell perspective, these rules provide a very good basis we can use for the B2C sales regardless of whether the acquisition takes place.
For example, setting the bundle of {External Hardrives, Monitors and Sales Devices} at a discount of x% to increase the sales of (more expensive) Desktop items (using Rule 1).
####B2C Rules####
RULEB2C<-apriori(B2Ctrans, parameter=list(supp=0.001, conf = 0.8, minlen=2, maxlen=4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 8
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[17 item(s), 8437 transaction(s)] done [0.00s].
## sorting and recoding items ... [17 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(B2Ctrans, parameter = list(supp = 0.001, conf = 0.8, :
## Mining stopped (maxlen reached). Only patterns up to a length of 4 returned!
## done [0.00s].
## writing ... [166 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
RULEB2C<- RULEB2C[!is.redundant(RULEB2C)]
RULEB2C<- RULEB2C[is.significant(RULEB2C,transactions = B2Ctrans, method = 'fisher', alpha=0.05, adjust='bonferroni')]
RULEB2C<- sort(RULEB2C,by='confidence', descending=TRUE)
inspect(head(RULEB2C,10))
## lhs rhs support confidence coverage lift count
## [1] {External Harddrive,
## Monitors,
## Smart Devices} => {Desktop} 0.001659358 1.0000000 0.001659358 1.910985 14
## [2] {Accessories,
## Computer Stand,
## Smart Devices} => {Desktop} 0.001777883 1.0000000 0.001777883 1.910985 15
## [3] {Active Headphones,
## Computer Headphones,
## Smart Devices} => {Desktop} 0.001777883 1.0000000 0.001777883 1.910985 15
## [4] {Accessories,
## Active Headphones,
## Computer Stand} => {Desktop} 0.003081664 0.9629630 0.003200190 1.840208 26
## [5] {Active Headphones,
## Mouse and Keyboard,
## Printer Ink} => {Desktop} 0.003081664 0.9629630 0.003200190 1.840208 26
## [6] {Computer Mice,
## Computer Stand,
## Printer Ink} => {Monitors} 0.001777883 0.9375000 0.001896409 2.790998 15
## [7] {Accessories,
## Computer Mice,
## Computer Stand} => {Desktop} 0.004029868 0.9189189 0.004385445 1.756041 34
## [8] {Accessories,
## External Harddrive,
## Mouse and Keyboard} => {Monitors} 0.001303781 0.9166667 0.001422307 2.728976 11
## [9] {Active Headphones,
## Computer Stand,
## Printer Ink} => {Monitors} 0.001185255 0.9090909 0.001303781 2.706422 10
## [10] {Computer Headphones,
## Smart Devices,
## Tablets} => {Monitors} 0.001185255 0.9090909 0.001303781 2.706422 10
If we look at the below item frequency plot, two of the top three most frequent products are made by Apple. This is something to investigate as Blackwell currently do not sell any Apple products.
The below groups all Apple products into one category ‘Apple’, and creates separate datasets for all transactions with at least one Apple purchase.
Apple<- c('Apple TV',
'iPad',
'iPad Pro',
'iPhone Charger Cable',
'Apple Earpods',
'Apple Wired Keyboard',
'Apple Wireless Keyboard',
'Apple Magic Keyboard',
'iMac',
'Apple MacBook Pro',
'Apple MacBook Air')
dnewApple <- df %>%
mutate_all(
funs(case_when(
. %in% Apple ~ "Apple",
TRUE~.
)))
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
dnewApple$nApple<- rowSums(dnewApple == 'Apple', na.rm=TRUE)
dnewApple %>%
mutate(nApple = case_when(nApple>=1 ~ 1,
TRUE ~ 0)) -> dnewApple
Non_AppleDF<- subset(dnewApple, nApple == 0)
Non_AppleDF<-Non_AppleDF[,1:32]
write.csv(Non_AppleDF, paste0(path2,"Non_AppleDF"), row.names=FALSE)
NonAppletrans <- read.transactions( paste0(path2,"Non_AppleDF"), sep = ',',header=TRUE, rm.duplicates = TRUE)
## distribution of transactions with duplicates:
## 1
## 26
AppleDF<-subset(dnewApple, nApple==1)
AppleDF<-AppleDF[,1:32]
write.csv(AppleDF, paste0(path2,"AppleDF"), row.names=FALSE)
Appletrans <- read.transactions( paste0(path2,"AppleDF"), sep = ',',header=TRUE, rm.duplicates = TRUE)
## distribution of transactions with duplicates:
## items
## 1 2 3 4 5 6
## 1560 534 175 49 10 2
There are 6,267 transactions with at least one Apple purchase, which is 64% of the total transactions.
length(Appletrans)
## [1] 6267
We can take a look for any potential insights in the two separate datasets.
The below rules again give high confidence rates which we can use to create discount bundle deals in order to sell more of the rhs, which in most of these examples are the more expensive items.
RULEApple<-apriori(Appletrans, parameter=list(supp=0.001, conf = 0.8, minlen=2, maxlen=4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 6
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[115 item(s), 6267 transaction(s)] done [0.00s].
## sorting and recoding items ... [115 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(Appletrans, parameter = list(supp = 0.001, conf = 0.8, :
## Mining stopped (maxlen reached). Only patterns up to a length of 4 returned!
## done [0.01s].
## writing ... [8264 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
RULEApple<- RULEApple[!is.redundant(RULEApple)]
RULEApple<- sort(RULEApple,by='confidence', descending=TRUE)
RULEApple<- RULEApple[is.significant(RULEApple,transactions = Appletrans, method = 'fisher', alpha=0.05,adjust='bonferroni')]
inspect(head(RULEApple,10))
## lhs rhs support confidence coverage lift count
## [1] {Logitech Desktop MK120 Mouse and keyboard Combo,
## Logitech MK270 Wireless Keyboard and Mouse Combo} => {HP Laptop} 0.001116962 1 0.001116962 4.747727 7
## [2] {3-Button Mouse,
## Mackie CR Speakers} => {HP Laptop} 0.001116962 1 0.001116962 4.747727 7
## [3] {Brother Printer Toner,
## Koss Home Headphones} => {Dell Desktop} 0.001116962 1 0.001116962 6.541754 7
## [4] {Brother Printer Toner,
## Panasonic In-Ear Headphone} => {HP Laptop} 0.001276528 1 0.001276528 4.747727 8
## [5] {Epson Black Ink,
## Panasonic In-Ear Headphone} => {Lenovo Desktop Computer} 0.001116962 1 0.001116962 6.025962 7
## [6] {Dell Desktop,
## Google Home,
## ViewSonic Monitor} => {HP Laptop} 0.001276528 1 0.001276528 4.747727 8
## [7] {HP Monitor,
## Lenovo Desktop Computer,
## Mackie CR Speakers} => {HP Laptop} 0.001116962 1 0.001116962 4.747727 7
## [8] {3-Button Mouse,
## Acer Desktop,
## Sonos} => {Dell Desktop} 0.001116962 1 0.001116962 6.541754 7
## [9] {ASUS Chromebook,
## ASUS Monitor,
## HP Black & Tri-color Ink} => {ViewSonic Monitor} 0.001116962 1 0.001116962 7.756188 7
## [10] {HP Black & Tri-color Ink,
## HP Monitor,
## Lenovo Desktop Computer} => {HP Laptop} 0.001116962 1 0.001116962 4.747727 7
The non-Apple transactions is a lot smaller, meaning the the support level of .1% is giving a small count rate. However, the lift numbers are very high. As such, if Blackwell decided not to acquire Electronidex, we could use the below rules to help sell existing products.
RULENonApple<-apriori(NonAppletrans, parameter=list(supp=0.001, conf = 0.8, minlen=2, maxlen=4))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 3
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[114 item(s), 3568 transaction(s)] done [0.00s].
## sorting and recoding items ... [113 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## Warning in apriori(NonAppletrans, parameter = list(supp = 0.001, conf = 0.8, :
## Mining stopped (maxlen reached). Only patterns up to a length of 4 returned!
## done [0.00s].
## writing ... [94 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
RULENonApple<- RULENonApple[!is.redundant(RULENonApple)]
RULENonApple<- sort(RULENonApple,by='confidence', descending=TRUE)
RULENonApple<- RULENonApple[is.significant(RULENonApple,transactions = NonAppletrans, method = 'fisher', alpha=0.05,adjust='bonferroni')]
inspect(head(RULENonApple,10))
## lhs rhs support confidence coverage lift count
## [1] {Dell Desktop,
## HP Black & Tri-color Ink} => {Microsoft Wireless Desktop Keyboard and Mouse} 0.001121076 1 0.001121076 12.432056 4
## [2] {HP Monitor,
## Smart Light Bulb} => {Dell Desktop} 0.001401345 1 0.001401345 9.911111 5
## [3] {Alienware Laptop,
## Epson Printer} => {Acer Desktop} 0.001121076 1 0.001121076 11.698361 4
## [4] {Eluktronics Pro Gaming Laptop,
## HP Wireless Mouse} => {HP Laptop} 0.001961883 1 0.001961883 6.057725 7
## [5] {Epson Printer,
## Intel Desktop} => {Bose Companion Speakers} 0.001121076 1 0.001121076 32.733945 4
## [6] {Alienware Laptop,
## ASUS 2 Monitor,
## HP Laptop} => {Acer Desktop} 0.001121076 1 0.001121076 11.698361 4
## [7] {Acer Desktop,
## Alienware Laptop,
## Belkin Mouse Pad} => {ViewSonic Monitor} 0.001121076 1 0.001121076 12.880866 4
## [8] {Alienware Laptop,
## Belkin Mouse Pad,
## ViewSonic Monitor} => {Acer Desktop} 0.001121076 1 0.001121076 11.698361 4
## [9] {Slim Wireless Mouse,
## ViewSonic Monitor,
## Wireless Portable Mouse} => {HP Laptop} 0.001401345 1 0.001401345 6.057725 5
## [10] {3-Button Mouse,
## Acer Aspire,
## Wireless Portable Mouse} => {Acer Desktop} 0.001121076 1 0.001121076 11.698361 4
After assessing Electronidex’s transactions over a month, we have gained some useful insights. Lets remind ourselves of Blackwells current sales:
## Warning: The dot-dot notation (`..prop..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(prop)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Most sales are in Accessories and Printers, with a similar level of sales between Tablets, PC’s and Laptops. While this data is not complete it gives us a picture as to what the current sales are coming from.
Below we see the sales of one month at Electronidex, with 58% of transactions containing Desktop’s, and 53% containing Laptops. This is a much higher frequency than the existing sales at Blackwell.
With the higher sales of Desktops acquiring Electronidex could provide, we can utilize the high confidence rates the below 10 rules their transaction data provides to increase sales in Desktops, by providing discounted bundle deals on the lhs item sets. This could be best encouraged with Blackwell existing high sales in Accessories to push more sales in Desktop computers:
inspect(head(RULEtrans,10))
## lhs rhs support confidence coverage lift count
## [1] {Computer Mice,
## Keyboards,
## Printer Ink} => {Desktop} 0.01057448 0.9285714 0.01138790 1.601069 104
## [2] {External Harddrive,
## Laptops,
## Monitors} => {Desktop} 0.01098119 0.9075630 0.01209964 1.564846 108
## [3] {Laptops,
## Printers,
## Tablets} => {Desktop} 0.01291307 0.9071429 0.01423488 1.564122 127
## [4] {Computer Headphones,
## Monitors,
## Printers} => {Desktop} 0.01331978 0.8972603 0.01484494 1.547082 131
## [5] {Accessories,
## Computer Headphones,
## Computer Mice} => {Desktop} 0.01026945 0.8938053 0.01148958 1.541125 101
## [6] {Monitors,
## Printers,
## Tablets} => {Desktop} 0.01382816 0.8888889 0.01555669 1.532648 136
## [7] {Keyboards,
## Monitors,
## Printer Ink} => {Desktop} 0.01535333 0.8882353 0.01728521 1.531521 151
## [8] {Computer Mice,
## Monitors,
## Smart Devices} => {Desktop} 0.01209964 0.8880597 0.01362481 1.531218 119
## [9] {Computer Chords,
## Computer Headphones,
## Monitors} => {Desktop} 0.01037112 0.8869565 0.01169293 1.529316 102
## [10] {Monitors,
## Speakers,
## Tablets} => {Desktop} 0.01189629 0.8863636 0.01342145 1.528294 117
We also notice Electronidex offer a new consumer market to increase revenues, with some transactions being Business to Business as well as, like Blackwell’s existing market, Business to Consumer. This opens up a brand new revenue stream for Blackwell should they acquire Electronidex. As previously mentioned, we calculate the B2B to be 15% of all transactions which, when combined with the fact B2B transactions tend to be higher volume, could create large profits.
Another new revenue stream is the fact that Blackwell currently do not stock any Apple products. A shown below, 4 of the top 10 most frequently purchased items at Electronidex are made by Apple. We also found out that 64% of all transactions have at least one Apple product in them.
itemFrequencyPlot(transactionsMatrix, topN=10, type ='absolute', horiz=TRUE, cex.names=0.6)
100 * (length(Appletrans)/length(trans))
## [1] 63.7214
It is for the above reasons that I believe there are huge potential upsides for Blackwell acquiring Electronidex. Not only does it provide multiple new revenue streams with different products, but also a new potential customer base. Should the acquisition take place, we have lots of rules we can implement with a focus of increasing sales in Desktop computers by creating discounts in some of the lhs sets in the above rules.