The board of directors is considering acquiring Electronidex, a start-up electronics online retailer. We were asked to help the board memebers to better understand the clientele that Electronidex is serving and see if it would be an optimal partnership.
We need to identify purchasing patterns that will provide insight into retailer’s clientele.
To conduct a market basket analysis and to discover any interesting relationships or associations between customer’s transactions and the item(s) they’ve purchased. These associations can be used to drive sales-oriented initiatives such as recommender systems like Amazon’s frequent bought together option.
To help the board of directors form a clearer picture of Electronidex’s customer buying patterns.
Transcational data contains rows that represent single transactions with the purchased item(s) being separated by commas, which is also called a ‘basket’ format. Due to RStudio does not inherently understand transctional data, we will have to upload the CSV file through the read.transcations() function.
The read.transcations() function changes the dataset into a sparse matrix. It makes each row represent a transaction and creates columns for each item that a customer might purchase. Electronidex sells 125 items, so the sparse matrix creates 125 columns. It also changes the data to binary. (1=item purchased in that transaction OR 0=no purchase).
library(arules)
library(arulesViz)
transdata <-read.transactions("ElectronidexTransactions2017.csv",
format = "basket",
rm.duplicates=TRUE, sep=",")
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
# Check the first 5 transactions
inspect (transdata[1:5])
## items
## [1] {Acer Aspire,
## Belkin Mouse Pad,
## Brother Printer Toner,
## VGA Monitor Cable}
## [2] {Apple Wireless Keyboard,
## Dell Desktop,
## Lenovo Desktop Computer}
## [3] {iMac}
## [4] {Acer Desktop,
## Intel Desktop,
## Lenovo Desktop Computer,
## XIBERIA Gaming Headset}
## [5] {ASUS Desktop,
## Epson Black Ink,
## HP Laptop,
## iMac}
# Check the length
length (transdata)
## [1] 9835
# Check number of items bought for the first 10 transcations
size (transdata[1:10])
## [1] 4 3 1 4 4 5 1 5 1 2
# Use LIST() to inspect each transaction
LIST(transdata[1:5])
## [[1]]
## [1] "Acer Aspire" "Belkin Mouse Pad" "Brother Printer Toner"
## [4] "VGA Monitor Cable"
##
## [[2]]
## [1] "Apple Wireless Keyboard" "Dell Desktop"
## [3] "Lenovo Desktop Computer"
##
## [[3]]
## [1] "iMac"
##
## [[4]]
## [1] "Acer Desktop" "Intel Desktop"
## [3] "Lenovo Desktop Computer" "XIBERIA Gaming Headset"
##
## [[5]]
## [1] "ASUS Desktop" "Epson Black Ink" "HP Laptop" "iMac"
# To see the item labels
length(itemLabels(transdata))
## [1] 125
# Create item Frequency Plot & Bar Plot
itemFrequencyPlot(transdata, type =c("absolute"), topN =10, col = "lightblue1",
main="Top 10 Products Frequency Plot", ylab = "")
# Create a table for low frequency items
low_frequency <- sort(table(unlist(LIST(transdata))), decreasing = FALSE)[1:10]
low_frequency
##
## Logitech Wireless Keyboard VGA Monitor Cable
## 22 22
## Panasonic On-Ear Stereo Headphones 1TB Portable External Hard Drive
## 23 27
## Canon Ink Logitech Stereo Headset
## 27 30
## Ethernet Cable Canon Office Printer
## 32 35
## Gaming Mouse Professional Audio Cable
## 35 36
# Plot low frequency items
par(mar=c(10, 17, 2, 1))
barplot(low_frequency, horiz=TRUE,
las = 1, col=rainbow(4), main = "Bottom 10 Products Frequency Plot")
image(sample(transdata, 100))
We will use Apriori algorithm to perform the Market Basket Analysis. The Apriori algorithm is helpful when working with large datasets and is used to uncover insights pertaining to transactional datasets. It is based on item frequency. For example, this item set {Item 1, Item 2, Item 3, Item 4} can only occur if items {Item 1}, {Item 2}, {Item 3} and {Item 4} occur just as frequently.
The Apriori algorithm assesses association rules using two types of measurements. The first statistical measure is the Support measurement, which measures itemsets or rules frequency within your transactional data.The second statistical measure is the Confidence measurement, which measures the accuracy of the rules. A rule that measures high in both support and confidence is known as a strong rule.
Apply and Inspect the first 10 rules
# Apply the Apriori Rule
rule <- apriori(transdata, parameter = list(supp = 0.001, conf = 0.6))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [125 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [3969 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Check the statistical numbers of the rules
summary(rule)
## set of 3969 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4 5 6
## 2 454 2275 1141 97
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 4.000 4.000 4.221 5.000 6.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.001017 Min. :0.6000 Min. : 2.343 Min. : 10.0
## 1st Qu.:0.001118 1st Qu.:0.6316 1st Qu.: 2.716 1st Qu.: 11.0
## Median :0.001220 Median :0.6875 Median : 3.225 Median : 12.0
## Mean :0.001484 Mean :0.7061 Mean : 3.523 Mean : 14.6
## 3rd Qu.:0.001627 3rd Qu.:0.7647 3rd Qu.: 4.007 3rd Qu.: 16.0
## Max. :0.010778 Max. :1.0000 Max. :17.697 Max. :106.0
##
## mining info:
## data ntransactions support confidence
## transdata 9835 0.001 0.6
# Inspect the first 10 rules
inspect(rule[1:10])
## lhs rhs support confidence lift count
## [1] {Generic Black 3-Button} => {iMac} 0.003660397 0.6428571 2.509925 36
## [2] {Mackie CR Speakers} => {iMac} 0.004677173 0.6133333 2.394654 46
## [3] {Backlit LED Gaming Keyboard,
## Large Mouse Pad} => {Apple MacBook Air} 0.001321810 0.8125000 5.222835 13
## [4] {ASUS 2 Monitor,
## Generic Black 3-Button} => {iMac} 0.001016777 0.9090909 3.549388 10
## [5] {Generic Black 3-Button,
## ViewSonic Monitor} => {iMac} 0.001016777 0.7692308 3.003329 10
## [6] {Dell Desktop,
## Generic Black 3-Button} => {iMac} 0.001220132 0.8571429 3.346566 12
## [7] {Generic Black 3-Button,
## Lenovo Desktop Computer} => {iMac} 0.001728521 0.8095238 3.160646 17
## [8] {Generic Black 3-Button,
## HP Laptop} => {iMac} 0.001321810 0.6500000 2.537813 13
## [9] {ASUS Monitor,
## HDMI Adapter} => {iMac} 0.001016777 0.8333333 3.253606 10
## [10] {HDMI Adapter,
## ViewSonic Monitor} => {iMac} 0.001321810 0.6842105 2.671382 13
# Inspect the top 10 rules sorted by lift:
top.lift <- sort(rule, decreasing = TRUE, na.last = NA, by = "lift")
inspect(head(top.lift, 10))
## lhs rhs support confidence lift count
## [1] {Apple Earpods,
## Logitech MK360 Wireless Keyboard and Mouse Combo} => {Eluktronics Pro Gaming Laptop} 0.001220132 0.6315789 17.69681 12
## [2] {Apple Earpods,
## Microsoft Wireless Comfort Keyboard and Mouse} => {Slim Wireless Mouse} 0.001220132 0.6315789 16.69779 12
## [3] {Dell Wired Keyboard,
## HDMI Cable 6ft} => {AOC Monitor} 0.001931876 0.6333333 15.04549 19
## [4] {Dell Desktop,
## HDMI Cable 6ft,
## HP Monitor} => {AOC Monitor} 0.001118454 0.6111111 14.51758 11
## [5] {Computer Game,
## iMac,
## Microsoft Office Home and Student 2016,
## ViewSonic Monitor} => {ASUS Monitor} 0.001016777 0.6666667 12.03058 10
## [6] {Computer Game,
## Dell Desktop,
## iMac,
## Lenovo Desktop Computer} => {ASUS Monitor} 0.001220132 0.6666667 12.03058 12
## [7] {AOC Monitor,
## Dell Desktop,
## HP Laptop,
## Lenovo Desktop Computer} => {ASUS Monitor} 0.001016777 0.6250000 11.27867 10
## [8] {Computer Game,
## iMac,
## Intel Desktop} => {Apple Magic Keyboard} 0.001118454 0.7333333 10.23026 11
## [9] {ASUS 2 Monitor,
## Dell Desktop,
## Intel Desktop} => {Apple Magic Keyboard} 0.001118454 0.7333333 10.23026 11
## [10] {Apple MacBook Pro,
## HP Black & Tri-color Ink,
## HP Laptop,
## iMac} => {Acer Aspire} 0.001016777 0.8333333 10.06859 10
# Check Apple MacBook Air Rule
ItemRules <- subset(rule, items %in% "Apple MacBook Air")
# Inspect ItemRule 2 along with Apple MacBook Air purchase
inspect(ItemRules[2])
## lhs rhs support confidence lift count
## [1] {Dell KM117 Wireless Keyboard & Mouse,
## iPhone Charger Cable} => {Apple MacBook Air} 0.002033554 0.952381 6.122004 20
Besides of above mentioned function, we can also define which items should appear on either left/right hand side the rules. The purpose of this approch is to find which items should be displayed or bundled with the item that we want to promote.
# Define lhs = HP Laptop
HP_Laptop_Rules_lh <- apriori(transdata, parameter = list(supp = 0.001, conf = 0.1),
appearance = list(default = "rhs", lhs = "HP Laptop"))
inspect(sample(HP_Laptop_Rules_lh,5))
## lhs rhs support confidence lift count
## [1] {} => {Apple Earpods} 0.17437722 0.1743772 1.000000 1715
## [2] {} => {Dell Desktop} 0.13401118 0.1340112 1.000000 1318
## [3] {HP Laptop} => {Acer Aspire} 0.02907982 0.1498167 1.810131 286
## [4] {HP Laptop} => {Microsoft Wireless Desktop Keyboard and Mouse} 0.02318251 0.1194343 1.212215 228
## [5] {HP Laptop} => {Samsung Monitor} 0.02755465 0.1419591 1.483707 271
plot(rule, jitter=0)
plot(rule, method="graph", control=list(type="items", interactive = TRUE, max = 10))
Cross-Selling
Remove Low Frequency Items
Deeper Analysis from Finanical Point of View