Blackwell Electronics’ board of directors is considering acquiring Electronidex, a start-up electronics online retailer. We have been asked to understand the clientele of Electronidex and provide some insights to make the best decision.
We have been provided with a data set that contains one month’s (30 days’ worth) of Electronidexes online transactions and a file containing all the electronics that they currently sell.
In this project, we will use R to conduct a Market Basket Analysis. We will be discovering any interesting relationships (or associations) between customer’s transactions and the item(s) they’ve purchased. These associations can then be used to drive sales-oriented initiatives such as recommender systems like the ones used by Amazon and other eCommerce sites.
First, we need to activate packages that will be used during the project.
The next step is to upload the data set that contain one month’s (30 days’ worth) of Electronidexes online transactions.
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
This data set contains 9835 transactions and 125 different products.
The average of items per transaction is 4.
par(mfrow=c(1, 2))
itemset_transaction <- summary@lengths
plot(itemset_transaction, type = "h", main = "Items per transaction", xlab = "Itemsets", ylab = "Transactions")
box_plot <- summary@lengthSummary
boxplot(box_plot)The most sold top 5 products of Electroindex are iMac, HP Laptop, CYBERPOWER Gamer Desktop, Apple Earpods and Apple Macbook Air.
It´s time to look for relationships among products. In Market Basket Analysis it is usually used the Apriori algorithm. This algorithm is helpful when working with large datasets and is used to uncover insights pertaining to transactional datasets. It is based on item frequency. For example, this item set {Item 1, Item 2, Item 3, Item 4} can only occur if items {Item 1}, {Item 2}, {Item 3} and {Item 4} occur just as frequently.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.0025 2
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 24
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [122 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 done [0.01s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
We have generated 6 different association rules. This associations show some interesting insights.
rule_supp_0.0025_conf_0.8_df <- as(rule_supp_0.0025_conf_0.8, "data.frame")
datatable(rule_supp_0.0025_conf_0.8_df, options = list(pageLength = 5))We can observe that there is a pattern in Electronidex clientele. It is very likely to buy laptops and desktop PCs at the same transaction, something that can sound weird if the clientele are particulars.
Association rules are going to be plotted in order to check better for more insights.
## Available control parameters (with default values):
## main = Graph for 6 rules
## nodeColors = c("#66CC6680", "#9999CC80")
## nodeCol = c("#EE0000FF", "#EE0303FF", "#EE0606FF", "#EE0909FF", "#EE0C0CFF", "#EE0F0FFF", "#EE1212FF", "#EE1515FF", "#EE1818FF", "#EE1B1BFF", "#EE1E1EFF", "#EE2222FF", "#EE2525FF", "#EE2828FF", "#EE2B2BFF", "#EE2E2EFF", "#EE3131FF", "#EE3434FF", "#EE3737FF", "#EE3A3AFF", "#EE3D3DFF", "#EE4040FF", "#EE4444FF", "#EE4747FF", "#EE4A4AFF", "#EE4D4DFF", "#EE5050FF", "#EE5353FF", "#EE5656FF", "#EE5959FF", "#EE5C5CFF", "#EE5F5FFF", "#EE6262FF", "#EE6666FF", "#EE6969FF", "#EE6C6CFF", "#EE6F6FFF", "#EE7272FF", "#EE7575FF", "#EE7878FF", "#EE7B7BFF", "#EE7E7EFF", "#EE8181FF", "#EE8484FF", "#EE8888FF", "#EE8B8BFF", "#EE8E8EFF", "#EE9191FF", "#EE9494FF", "#EE9797FF", "#EE9999FF", "#EE9B9BFF", "#EE9D9DFF", "#EE9F9FFF", "#EEA0A0FF", "#EEA2A2FF", "#EEA4A4FF", "#EEA5A5FF", "#EEA7A7FF", "#EEA9A9FF", "#EEABABFF", "#EEACACFF", "#EEAEAEFF", "#EEB0B0FF", "#EEB1B1FF", "#EEB3B3FF", "#EEB5B5FF", "#EEB7B7FF", "#EEB8B8FF", "#EEBABAFF", "#EEBCBCFF", "#EEBDBDFF", "#EEBFBFFF", "#EEC1C1FF", "#EEC3C3FF", "#EEC4C4FF", "#EEC6C6FF", "#EEC8C8FF", "#EEC9C9FF", "#EECBCBFF", "#EECDCDFF", "#EECFCFFF", "#EED0D0FF", "#EED2D2FF", "#EED4D4FF", "#EED5D5FF", "#EED7D7FF", "#EED9D9FF", "#EEDBDBFF", "#EEDCDCFF", "#EEDEDEFF", "#EEE0E0FF", "#EEE1E1FF", "#EEE3E3FF", "#EEE5E5FF", "#EEE7E7FF", "#EEE8E8FF", "#EEEAEAFF", "#EEECECFF", "#EEEEEEFF")
## edgeCol = c("#474747FF", "#494949FF", "#4B4B4BFF", "#4D4D4DFF", "#4F4F4FFF", "#515151FF", "#535353FF", "#555555FF", "#575757FF", "#595959FF", "#5B5B5BFF", "#5E5E5EFF", "#606060FF", "#626262FF", "#646464FF", "#666666FF", "#686868FF", "#6A6A6AFF", "#6C6C6CFF", "#6E6E6EFF", "#707070FF", "#727272FF", "#747474FF", "#767676FF", "#787878FF", "#7A7A7AFF", "#7C7C7CFF", "#7E7E7EFF", "#808080FF", "#828282FF", "#848484FF", "#868686FF", "#888888FF", "#8A8A8AFF", "#8C8C8CFF", "#8D8D8DFF", "#8F8F8FFF", "#919191FF", "#939393FF", "#959595FF", "#979797FF", "#999999FF", "#9A9A9AFF", "#9C9C9CFF", "#9E9E9EFF", "#A0A0A0FF", "#A2A2A2FF", "#A3A3A3FF", "#A5A5A5FF", "#A7A7A7FF", "#A9A9A9FF", "#AAAAAAFF", "#ACACACFF", "#AEAEAEFF", "#AFAFAFFF", "#B1B1B1FF", "#B3B3B3FF", "#B4B4B4FF", "#B6B6B6FF", "#B7B7B7FF", "#B9B9B9FF", "#BBBBBBFF", "#BCBCBCFF", "#BEBEBEFF", "#BFBFBFFF", "#C1C1C1FF", "#C2C2C2FF", "#C3C3C4FF", "#C5C5C5FF", "#C6C6C6FF", "#C8C8C8FF", "#C9C9C9FF", "#CACACAFF", "#CCCCCCFF", "#CDCDCDFF", "#CECECEFF", "#CFCFCFFF", "#D1D1D1FF", "#D2D2D2FF", "#D3D3D3FF", "#D4D4D4FF", "#D5D5D5FF", "#D6D6D6FF", "#D7D7D7FF", "#D8D8D8FF", "#D9D9D9FF", "#DADADAFF", "#DBDBDBFF", "#DCDCDCFF", "#DDDDDDFF", "#DEDEDEFF", "#DEDEDEFF", "#DFDFDFFF", "#E0E0E0FF", "#E0E0E0FF", "#E1E1E1FF", "#E1E1E1FF", "#E2E2E2FF", "#E2E2E2FF", "#E2E2E2FF")
## alpha = 0.5
## cex = 1
## itemLabels = TRUE
## labelCol = #000000B3
## measureLabels = FALSE
## precision = 3
## layout = NULL
## layoutParams = list()
## arrowSize = 0.5
## engine = igraph
## plot = TRUE
## plot_options = list()
## max = 100
## verbose = FALSE
library(networkD3)
data(rule_supp_0.0025_conf_0.8, MisNodes)
forceNetwork(Links = MisLinks, Nodes = MisNodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
Group = "group", opacity = 0.4)qrage <- function(links, width = 1400, height = 1000,distance=6000,nodeValue=NULL,nodeColor=NULL,linkColor='#00f',arrowColor='#f00',cut=0.25,textSize=12,linkWidth=c(1,10),nodeSize=c(5,20),linkOpacity=c(0.1,1)) {
if (class(links) != "data.frame") {
stop("links must be a data frame class object.")
}
df <- links
colnames(df) <- c("source","target","value");
nv <- NULL;
tnv <- NULL
isSetNodeValue = F;
if(is.null(nodeValue)){
}else{
isSetNodeValue = T;
nv <- nodeValue;
colnames(nv) <- c("name","nodevalue")
tnv <- data.frame(t(nv$nodevalue));
colnames(tnv) <- nv$name;
}
td <- NULL
if(is.null(nodeColor)){
}else{
colnames(nodeColor) <- c("name","color")
td <- data.frame(t(as.character(t(nodeColor[,-1]))))
colnames(td) <- nodeColor$name
}
df <- subset(df,df$value>=(max(df$value)*cut));
data <- list(
df = df,
distance = distance,
nodeColor = td,
r = nv,
tr = tnv,
width = width,
height = height,
isSetNodeValue = isSetNodeValue,
textSize = textSize,
linkWidth = linkWidth,
nodeSize = nodeSize,
linkColor=linkColor,
linkOpacity=linkOpacity,
arrowColor=arrowColor
)
# create widget
htmlwidgets::createWidget(
name = 'qrage',
data,
width = width,
height = height,
package = 'qrage'
)
}
qrageOutput <- function(outputId, width = "100%", height = "400px") {
shinyWidgetOutput(outputId, "qrage", width, height, package = "qrage");
}
renderQrage <- function(expr, env = parent.frame(), quoted = FALSE) {
if (!quoted) { expr <- substitute(expr) } # force quoted
shinyRenderWidget(expr, qrageOutput, env, quoted = TRUE);
}Now it´s time to analyze differences between Blackwell and Electronidex. Blackwell´s existing product list information is limited to the type of product without specifying each product. While Electronidex collects a wide info about their products.
It can be seen that most sold products are Accesories and Game consoles, followed by Softwares.
drop_variables_products <- c("Price","x5StarReviews","x4StarReviews","x3StarReviews","x2StarReviews","x1StarReviews","PositiveServiceReview", "NegativeServiceReview" , "ProductDepth", "ProductHeight", "ProductWidth", "ProfitMargin", "ProductNum", "ShippingWeight","Recommendproduct", "BestSellersRank")
existing_products_review <- existing_products[ , !(names(existing_products) %in% drop_variables_products)]
datatable(existing_products_review, options = list(pageLength = 5))On the contrary, Elecronidex clientele is more likely to buy iMacs, HP Laptops and CyberPower Game Console as we have seen before.
But if we analyze the new product list that Blacwell will launch soon it can be seen some similarities.
new_products <- read_csv("C:/Users/user/Desktop/Data Analytics II/M2T3/R markdown/New_Products.csv")## Warning: Missing column names filled in: 'X1' [1]
drop_variables_new_products <- c("Price","x5StarReviews","x4StarReviews","x3StarReviews","x2StarReviews","x1StarReviews","PositiveServiceReview", "NegativeServiceReview" , "ProductDepth", "ProductHeight", "ProductWidth", "ProfitMargin", "ProductNum", "ShippingWeight","Recommendproduct", "BestSellersRank")
new_products_review <- new_products[ , !(names(new_products) %in% drop_variables_new_products)]
datatable(new_products_review, options = list(pageLength = 5))