ContextBase Logo



Table of Contents

Section 1 - The Problem
Section 2 - Our Solution

Section 3 - Data Import
Section 4 - Exploratory Data Analysis

Section 5 - Market Basket Association Rules
Section 6 - Plot of ARM Rules Distribution
Section 7 - Frequency Plot

Section 8 - Ruleset Discovery for Product Categories

Section 9 - Conclusion

Section 10 - Appendix
Section 10.1 - Required Packages
Section 10.2 - Session Information



Section 1 - The Problem

Blackwell Electronics’ board of directors is considering acquiring Electronidex, a start-up electronics online retailer. The board of directors has asked us to help them:
1) Better understand the clientele that Electronidex currently is serving and if it would be an optimal partnership.
2) Identify purchasing patterns that will provide insight into Electronidex’s clientele.



Section 2 - Our Solution

In order to solve the above problems, this document examines a CSV file that contains one month’s (30 days’ worth) of Electronidex’s online transactions and a file containing all the electronics that they currently sell. Due to their lack of funding, Electronidex is only able to pull data on the items that customers purchased per their transactions.

Via discovering any interesting relationships, (or associations), between customer’s transactions and the item(s) they’ve purchased, sales-oriented initiatives such as recommender systems are then possible to develop.

This document performs Association Rules Analysis in order to assist Blackwell’s board of directors to form a clearer picture of Electronidex’s customer buying patterns. Interesting patterns or item relationships within Electronidex’s transactions are examined. The benefit to Blackwell from selling Electronidex’s items is examined. Thereby, a conclusion of whether Blackwell should acquire Electronidex is determined, with recommedations for Blackwell concerning acquiring Electronidex. For example, cross-selling items, sale promotions, removing items, etc.



Section 3 - Data Import

# Load transactional data
df <- read.transactions("Assignment 2/Data/ElectronidexTransactions2017.csv", 
                  format = "basket", 
                  sep=",", 
                  rm.duplicates=F,
                  cols = NULL)

# Remove "" for improved processing
df@itemInfo$labels <- gsub("\"","",df@itemInfo$labels)

# Format item data into a "myLabels" variable
myLabels <- c("HardDrive_External_1TB_Portable","HardDrive_External_2TBPortable", "ComputerMouse_3Button", "HardDrive_External_3TBPortable","HardDrive_Desktop_5TB","Laptop_Acer_Aspire", "Desktop_Acer", "Monitor_Acer", "ComputerHeadPhones_Ailihen_Stereo", "Laptop_Alienware_AW17R4-7345SLV-PUS17", "Monitor_AOC", "ActiveHeadPhones_APIE_Bluetooh","ActiveHeadPhones_Earpods_Apple", "Laptop_AppleMacBookAir", "Laptop_AppleMacBookPro", "Keyboard_AppleMagic", "SmartHomeDevice_TV_Apple","Keyboard_AppleWired", "Keyboard_AppleWireless", "Monitor_ASUS2", "Laptop_ChromebookASUS", "Desktop_ASUS", "Monitor_ASUS","ComputerCord_AudioCable", "Keyboard_BacklitLED_Gaming", "Accessories_MousePad_Belkin", "Speaker_BoseCompanion", "Printer_Brother", "PrinterInk_Toner_Brother", "Speaker_Bluetooth_Cambridge","PrinterInk_Canon", "Printer_Canon_Office", "Software_ComputerGame", "Speaker_CyberAcoustics", "Desktop_CYBERPOWERGamer", "Desktop_Dell2", "Desktop_Dell", "MouseAndKeyboardCombo_Dell_KM117Wireless", "Laptop_Dell", "Monitor_Dell", "Keyboard_Dell_Wired", "Speaker_DOSS_Touch_Wireless_Bluetooth", "Printer_DYMO_LabelMaker","PrinterInk_DYMOLabelingTape", "MouseAndKeyboardCombo_EagleTec_Wireless", "Laptop_Eluktronics_ProGaming", "PrinterInk_Epson_Black", "Printer_Epson", "ComputerCord_Etekcity_PowerExtension","ComputerCord_EthernetCable", "ComputerTablet_FireHD", "SmartHomeDevice_FireTVStick", "ComputerStand_MonitorFullMotion", "ComputerMouse_GamingProfessional", "ComputerMouse_GenericBlack_3Button", "SmartHomeDevice_GoogleHome", "ComputerStand_HalterAcrylicMonitorStand", "ComputerStand_Halter_MeshMetal_MonitorStand", "ComputerCord_HDMIAdapter", "ComputerCord_HDMICable6ft", "ComputerStand_HeightAdjustable_StandingDesk", "PrinterInk_HP_BlackAndTricolor", "Desktop_HP", "Laptop_HP", "Monitor_HP", "Laptop_HP_Notebook_TouchScreen_PC", "Keyboard_HP_USB", "ComputerMouse_HP_Wireless", "Printer_HP_Wireless", "Desktop_iMac", "Desktop_Intel", "ComputerTablet_iPad", "ComputerTablet_iPadPro", "Accessories_iPhone_Charger_Cable", "Speaker_JBL_Splashproof_Portable_Bluetooth", "ComputerHeadPhones_Kensington", "ComputerTablet_Kindle","ComputerHeadPhones_Koss_Home", "Accessories_LargeMousePad", "Desktop_Lenovo_Computer", "Monitor_LG", "Laptop_LG_Touchscreen", "ComputerMouse_Logitech_3button", "ComputerHeadPhones_HeadSet_Logitech_ClearChat", "MouseAndKeyboardCombo_Logitech_Desktop_MK120", "Keyboard_Logitech", "MouseAndKeyboardCombo_Logitech_MK270_Wireless", "MouseAndKeyboardCombo_Logitech_MK360_Wireless", "MouseAndKeyboardCombo_Logitech_MK550_Wireless_Wave", "Speaker_Logitech_Multimedia", "ComputerHeadPhones_Headset_Logitech_Stereo", "Keyboard_Logitech_Wireless", "ComputerMouse_Logitech_Wireless", "Speaker_Mackie_CR", "ComputerMouse_Microsoft_BasicOptical", "ComputerHeadPhones_HeadSet_Microsoft", "Software_MicrosoftOffice_HomeandStudent2016","MouseAndKeyboardCombo_Microsoft_Wireless_Comfort","MouseAndKeyboardCombo_Microsoft_Wireless_Desktop", "ActiveHeadPhones_MonsterBeats_ByDrDre","ComputerStand_MultiMediaStand","ActiveHeadPhones_Otium_Wireless_Sports_Bluetooth", "ActiveHeadPhones_Panasonic_In-Ear", "ComputerHeadPhones_Panasonic_On-Ear_Stereo_RP-HT21", "ComputerHeadPhones_HeadSet_PC_Gaming", "ActiveHeadPhones_Philips_Flexible_Earhook","ComputerMouse_Redragon_Gaming", "MouseAndKeyboardCombo_Rii_LED_Gaming", "Keyboard_Rii_LED", "Speaker_Rokono_Mini", "SmartHomeDevice_RokuExpress", "ComputerCord_Samsung_ChargingCable", "ComputerTablet_SamsungGalaxy", "Monitor_Samsung", "Monitor_Sceptre","HardDrive_External_Slim_2TB_Portable", "ComputerMouse_Slim_Wireless", "SmartHomeDevice_SmartLightBulb", "Speaker_Sonos", "ComputerCord_USB_Cable", "ComputerCord_VGA_MonitorCable","Monitor_ViewSonic", "ComputerMouse_Wireless_Portable", "ComputerHeadPhones_HeadSet_XIBERIA_Gaming", "ComputerHeadPhones_HeadSet_Zombie_Gaming")

# Enhance the dataset with previosly formatted item labels
itemLabels(df) <- myLabels

# Create Levels
myLevel1 <- as.factor(sub("\\_.*", "", myLabels))

# Add Levels
df@itemInfo$level1 <- myLevel1

# Remove empty rows
df <- df[which(size(df)!= 0)]

# We have 9833 transactions 
dim(df)[1]
## [1] 9833
# Find items that were consumed alone
oneCat <- df[which(size(df) == 1), ]
dim(oneCat)[1]
## [1] 2163



Section 4 - Exploratory Data Analysis

The dataset is summarized to find useful information. A plot of item frequencies is displayed that shows the number of times a product was in a unique transaction. The most popular product was the iMac (Desktop), follow by HP Laptop. These products would benefit Blackwell’s struggling laptop and PC sales.

The most popular Electronidex products that were purchased alone are:
1) Apple MacBook Air (383 purchases)
2) iMac 121 purchases
3) CYBERPOWER GAMER Desktop 109 purchases

# Summary
print(paste("Table 1: Electronidex Transactions 2017 Summary Statistics"))
## [1] "Table 1: Electronidex Transactions 2017 Summary Statistics"
summary(df)
## transactions as itemMatrix in sparse format with
##  9833 rows (elements/itemsets/transactions) and
##  125 columns (items) and a density of 0.03506885 
## 
## most frequent items:
##                   Desktop_iMac                      Laptop_HP 
##                           2519                           1909 
##        Desktop_CYBERPOWERGamer ActiveHeadPhones_Earpods_Apple 
##                           1809                           1715 
##         Laptop_AppleMacBookAir                        (Other) 
##                           1530                          33622 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2163 1647 1294 1021  856  646  540  439  353  247  171  119   77   72   56   41 
##   17   18   19   20   21   22   23   25   26   27   29   30 
##   26   20   10   10   10    5    3    1    1    3    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.384   6.000  30.000 
## 
## includes extended item information - examples:
##                            labels        level1
## 1 HardDrive_External_1TB_Portable     HardDrive
## 2  HardDrive_External_2TBPortable     HardDrive
## 3           ComputerMouse_3Button ComputerMouse
# Plot item frequencies
itemFrequencyPlot(df,
   topN=10,
   main='Graph 1: Item Frequency Plot',
   type="absolute",
   ylab="Item Frequency")

# Most products in one cat transactions
barplot(sort(itemFrequency(oneCat, type="absolute"), decreasing=T), main = "Graph 2: Bar Plot of One Category Item Frequencies")



Section 5 - Market Basket Association Rules

The Apriori algorithm is used to analyse the data and output a set of rules with calculated support, confidence and lift. The higher the confidence and lift are, the better the rule is.

# Apriori Rules processing, with sup at 0.01 and conf at 0.5
basket_rules <- apriori(df, parameter = list(sup = 0.01,
                                             conf = 0.4,
                                             minlen = 2,
                                             target="rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9833 transaction(s)] done [0.00s].
## sorting and recoding items ... [82 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [70 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# The number of Association Rules Mining (ARM) rules
basket_rules
## set of 70 rules
# The first six ARM rules
print(paste("Table 2: The First Six Association Rules"))
## [1] "Table 2: The First Six Association Rules"
inspect(head(basket_rules))
##     lhs                                                     rhs               support confidence     lift count
## [1] {MouseAndKeyboardCombo_Logitech_MK550_Wireless_Wave} => {Desktop_iMac} 0.01006814  0.4107884 1.603526    99
## [2] {Laptop_Alienware_AW17R4-7345SLV-PUS17}              => {Desktop_iMac} 0.01159361  0.4145455 1.618192   114
## [3] {ComputerCord_HDMICable6ft}                          => {Desktop_iMac} 0.01149191  0.4414062 1.723044   113
## [4] {ActiveHeadPhones_Panasonic_In-Ear}                  => {Desktop_iMac} 0.01078003  0.4326531 1.688876   106
## [5] {Laptop_Eluktronics_ProGaming}                       => {Laptop_HP}    0.01444117  0.4045584 2.083825   142
## [6] {Laptop_Eluktronics_ProGaming}                       => {Desktop_iMac} 0.01576325  0.4415954 1.723782   155



Section 6 - Plot of ARM Rules Distribution

The scatterplot below demonstrates the distribution of 70 rules generated from all products. There are a few rules with high confidence and lift. Better performing rules are generated by generalising the data.

# Scatter plot of ARM rules found using the apriori algorithm
plot(basket_rules, main = "Graph 3: Distribution of 70 ARM rules")



Section 7 - Sort Rules By Highest Confidence

The top 6 rules are sorted by highest confidence. A redundancy check is performed to remove any low tier rulesets.

# 'high-confidence' rules.
rules_conf <- sort(basket_rules, by="confidence", decreasing=TRUE)

# Show the support, lift and confidence for all rules
print(paste("Table 3: Support, Lift and Confidence for all Rules"))
## [1] "Table 3: Support, Lift and Confidence for all Rules"
inspect(head(rules_conf))
##     lhs                          rhs               support confidence     lift count
## [1] {Laptop_Acer_Aspire,                                                            
##      Monitor_ViewSonic}       => {Laptop_HP}    0.01078003  0.6022727 3.102225   106
## [2] {Monitor_ASUS2,                                                                 
##      Desktop_Lenovo_Computer} => {Desktop_iMac} 0.01088172  0.5911602 2.307614   107
## [3] {Keyboard_AppleMagic,                                                           
##      Desktop_Dell}            => {Desktop_iMac} 0.01016984  0.5847953 2.282768   100
## [4] {Monitor_ASUS,                                                                  
##      Laptop_HP}               => {Desktop_iMac} 0.01179701  0.5829146 2.275426   116
## [5] {Monitor_ASUS2,                                                                 
##      Laptop_HP}               => {Desktop_iMac} 0.01108512  0.5828877 2.275321   109
## [6] {Desktop_Dell,                                                                  
##      Monitor_ViewSonic}       => {Laptop_HP}    0.01525475  0.5747126 2.960267   150
# Check for redundant rules
print(paste("Table 4: Redundant Rules Check"))
## [1] "Table 4: Redundant Rules Check"
is.redundant(basket_rules)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE



Section 8 - Ruleset Discovery for Product Categories

Running the Apriori algorithm again, (but this time on the categories), we get a sizable amount of rules. 5000+ total. A plot of category frequencies is displayed that shows the number of times a product category wasin a unique transaction. This demonstrates Electronidex’s high level of desktop, laptop, and monitor sales.

# Aggregate by Category
dfByType <- aggregate(df, by= df@itemInfo$level1)

# Plot frequency for categories
itemFrequencyPlot(dfByType,
                  topN=10,
                  main='Graph 4: Category Frequency Plot',
                  type="absolute",
                  ylab="Category Frequency")



Section 9 - Conclusion

Sales of Blackwell’s laptops, PCs and desktops would benefit from acquisition of Electronidex via increased sales and an improved margin of revenues and profits. Sales of accessories would also increase greatly with Electronidex’s customer base, and Blackwell Electronics would benefit from expanding into business to business sales, in addition to business to customer sales.

Electronidex is a business to business company that sells multiple laptops and PC’s simultaneously. Electronidex’s Laptop association rules mining rule set has the highest support and confidence. Therefore, Blackwell’s acquisition of Electroidex will greatly increase laptop sales.

The variety of products offered by Blackwell Electronics will increase exponentially via inclusion of Eletronidex’s 125 products within 15 categories. The product categories shared by Blackwell Electronics and Electronidex include Laptops, Printers, PC, Monitors, Tablets and Accessories. One possibly problem is the data from Electronidex being deficit in date range, possibly leading to inaccurate analysis of annual sales.



Section 10 - Appendix

Section 10.1 - Required Packages

Table 5: List of Required Packages
List of Required Packages
Required Packages ‘arules’ ‘arulesViz’ ‘dplyr’ ‘caret’ ‘treemap’ ‘reshape2’ ‘base’ ‘stats’ ‘e1071’ ‘magrittr’ ‘lattice’ ‘datasets’ ‘RColorBrewer’ ‘ggplot2’ ‘knitr’



Section 10.2 - Session Information

Table 6: Session Information
Session Information
R Version R version 3.6.2 (2019-12-12)
Platform x86_64-w64-mingw32/x64 (64-bit)
Running Windows 10 x64 (build 17763)
RStudio Citation RStudio: Integrated Development Environment for R
RStudio Version 1.0.153