Blackwell Electronics’ board of directors is considering acquiring Electronidex, a start-up electronics online retailer. The board of directors has asked us to help them better understand the clientele that Electronidex currently is serving and if it would be an optimal partnership.
They need our help to identify purchasing patterns that will provide insight into Electronidex’s clientele. The CSV file contains one month’s (30 days’ worth) of Electronidexes online transactions and a file containing all the electronics that they currently sell. Due to their lack of funding, Electronidex is only able to pull data on the items that customers purchased per their transactions.
tr <- read.transactions('C:/Users/andre/OneDrive/Andres Marquez/UBIQUM/Project 3/Part 3/ElectronidexTransactions2017.csv', format = 'basket', sep=',')
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'C:/Users/andre/OneDrive/Andres Marquez/UBIQUM/Project 3/Part 3/
## ElectronidexTransactions2017.csv'
## Warning in asMethod(object): removing duplicated items in transactions
tr
## transactions in sparse format with
## 9835 transactions (rows) and
## 125 items (columns)
summary(tr)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 125 columns (items) and a density of 0.03506172
##
## most frequent items:
## iMac HP Laptop CYBERPOWER Gamer Desktop
## 2519 1909 1809
## Apple Earpods Apple MacBook Air (Other)
## 1715 1530 33622
##
## element (itemset/transaction) length distribution:
## sizes
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 2 2163 1647 1294 1021 856 646 540 439 353 247 171 119 77 72 56
## 16 17 18 19 20 21 22 23 25 26 27 29 30
## 41 26 20 10 10 10 5 3 1 1 3 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 3.000 4.383 6.000 30.000
##
## includes extended item information - examples:
## labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3 3-Button Mouse
We can see 9.835 transactions made. We have 125 items.
Density: The percentage of non-empty cells in the sparse matrix. In another words, the total number of items that are purchased divided by the total number of possible items in that matrix. We can calculate how many items were purchased using density like so: 43.104 items.
The most frequent items are the iMac, HP Laptop, CYBERPOWER Gamer Desktop, Apple Earpods and the Apple McBook Air.
Looking at the size of the transactions: 2163 transactions were for just 1 item, 1647 transactions for 2 items, all the way up to the biggest transaction: 1 transaction for 30 items. This indicates that most customers buy a small number of items in each transaction. The distribution of the data is right skewed.
We can also see that we have two rows with no transactions at all.
itemFrequencyPlot(tr, topN=20, type='absolute', col="#003366")
The iMac is the top selling product in Electrodinex, followed by the HP laptop, as told before.
For future purposes, we’ll be creating a dataframe with all the products with the frequency they were bought.
df <- as.data.frame(itemFrequency(tr, type='absolute'))
df <- cbind(Product = rownames(df), df)
rownames(df) <- 1:nrow(df)
colnames(df) <- c('Products','Count')
Let’s create some rules, we’ll be first doing it by confidence, which is the probability that a rule is correct with items on the left for a new transaction. Confidence tells us what percentage of transactions with item A also have item B. (e.g., how many transactions that have bread also have butter).
rules <- apriori(tr, parameter = list(supp=0.001, conf=0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [125 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [635 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules <- sort(rules, by='confidence', decreasing = TRUE)
summary(rules)
## set of 635 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5 6
## 30 303 269 33
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 4.00 4.00 4.48 5.00 6.00
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001017 Min. :0.8000 Min. :0.001017 Min. : 3.123
## 1st Qu.:0.001017 1st Qu.:0.8333 1st Qu.:0.001220 1st Qu.: 3.304
## Median :0.001118 Median :0.8462 Median :0.001322 Median : 3.579
## Mean :0.001268 Mean :0.8661 Mean :0.001475 Mean : 4.042
## 3rd Qu.:0.001322 3rd Qu.:0.9091 3rd Qu.:0.001627 3rd Qu.: 4.416
## Max. :0.005287 Max. :1.0000 Max. :0.006507 Max. :10.069
## count
## Min. :10.00
## 1st Qu.:10.00
## Median :11.00
## Mean :12.47
## 3rd Qu.:13.00
## Max. :52.00
##
## mining info:
## data ntransactions support confidence
## tr 9835 0.001 0.8
## call
## apriori(data = tr, parameter = list(supp = 0.001, conf = 0.8))
rules <- rules[!is.redundant(rules)]
inspect(rules[1:10])
## lhs rhs support confidence coverage lift count
## [1] {Brother Printer,
## Halter Acrylic Monitor Stand} => {iMac} 0.001118454 1 0.001118454 3.904327 11
## [2] {ASUS Monitor,
## Mackie CR Speakers,
## ViewSonic Monitor} => {iMac} 0.001016777 1 0.001016777 3.904327 10
## [3] {Apple Magic Keyboard,
## Rii LED Gaming Keyboard & Mouse Combo,
## ViewSonic Monitor} => {iMac} 0.001728521 1 0.001728521 3.904327 17
## [4] {ASUS Monitor,
## Koss Home Headphones,
## Microsoft Office Home and Student 2016} => {iMac} 0.001016777 1 0.001016777 3.904327 10
## [5] {Acer Aspire,
## Koss Home Headphones,
## ViewSonic Monitor} => {HP Laptop} 0.001220132 1 0.001220132 5.151912 12
## [6] {Dell Desktop,
## Koss Home Headphones,
## ViewSonic Monitor} => {HP Laptop} 0.001118454 1 0.001118454 5.151912 11
## [7] {ASUS 2 Monitor,
## Dell Desktop,
## Logitech Keyboard} => {iMac} 0.001016777 1 0.001016777 3.904327 10
## [8] {Alienware Laptop,
## ASUS Desktop,
## Lenovo Desktop Computer} => {iMac} 0.001016777 1 0.001016777 3.904327 10
## [9] {Brother Printer,
## Dell Desktop,
## Epson Printer} => {iMac} 0.001118454 1 0.001118454 3.904327 11
## [10] {Apple Magic Keyboard,
## Brother Printer,
## ViewSonic Monitor} => {iMac} 0.001016777 1 0.001016777 3.904327 10
topRules <- rules[1:10]
plot(topRules, col = rainbow(4, start = 0.35, end = 0.8))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(topRules, method="graph", control=list("items"),col = rainbow(4, start = 0.35, end = 0.8))
## Warning: Unknown control parameters:
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
plot(topRules, method = "grouped", col = rainbow(4, start = 0.35, end = 0.8))
Our results clearly show how the iMac is the top selling product and why. There are a lot of different packages of products bought that include the iMac, which make us believe that Electrodinex has a B2B business and also a B2C.
Now, we’ll be doing the same, but by lift, which is the ratio that exceeds the expected confidence by the confidence of the rule. The ratio of the number of respondents obtained with the model to the number obtained without the model is known as lift:
Lift (A => B) = 1 means that within a set of elements there is no correlation. Lift (A = > B) > 1 means a positive correlation is more commonly purchased between the products in the product set, i.e. in items A and B. Lift(A =>B) <1 means that it is unlikely to be purchased together for the negative correlation of the itemset, i.e. the products in the item set, A, and B.
rulesL <- apriori(tr, parameter = list(supp=0.001, conf=0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[125 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [125 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [635 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rulesL <- sort(rules, by='lift', decreasing = TRUE)
summary(rulesL)
## set of 591 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5 6
## 30 302 239 20
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 4.000 4.000 4.421 5.000 6.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001017 Min. :0.8000 Min. :0.001017 Min. : 3.123
## 1st Qu.:0.001017 1st Qu.:0.8333 1st Qu.:0.001220 1st Qu.: 3.304
## Median :0.001220 Median :0.8462 Median :0.001322 Median : 3.579
## Mean :0.001276 Mean :0.8654 Mean :0.001486 Mean : 4.024
## 3rd Qu.:0.001322 3rd Qu.:0.9091 3rd Qu.:0.001627 3rd Qu.: 4.416
## Max. :0.005287 Max. :1.0000 Max. :0.006507 Max. :10.069
## count
## Min. :10.00
## 1st Qu.:10.00
## Median :12.00
## Mean :12.55
## 3rd Qu.:13.00
## Max. :52.00
##
## mining info:
## data ntransactions support confidence
## tr 9835 0.001 0.8
## call
## apriori(data = tr, parameter = list(supp = 0.001, conf = 0.8))
rulesL <- rulesL[!is.redundant(rulesL)]
inspect(rulesL[1:10])
## lhs rhs support confidence coverage lift count
## [1] {Apple MacBook Pro,
## HP Black & Tri-color Ink,
## HP Laptop,
## iMac} => {Acer Aspire} 0.001016777 0.8333333 0.001220132 10.068591 10
## [2] {Dell Desktop,
## iMac,
## Lenovo Desktop Computer,
## Mackie CR Speakers} => {ViewSonic Monitor} 0.001118454 1.0000000 0.001118454 9.064516 11
## [3] {Dell Desktop,
## Lenovo Desktop Computer,
## Mackie CR Speakers} => {ViewSonic Monitor} 0.001220132 0.9230769 0.001321810 8.367246 12
## [4] {Dell Desktop,
## Etekcity Power Extension Cord Cable,
## HP Laptop,
## iMac,
## Lenovo Desktop Computer} => {ViewSonic Monitor} 0.001118454 0.9166667 0.001220132 8.309140 11
## [5] {Acer Aspire,
## Apple Earpods,
## HP Laptop,
## HP Monitor} => {ViewSonic Monitor} 0.001016777 0.9090909 0.001118454 8.240469 10
## [6] {HP Laptop,
## iMac,
## Lenovo Desktop Computer,
## Mackie CR Speakers} => {ViewSonic Monitor} 0.001321810 0.8666667 0.001525165 7.855914 13
## [7] {Acer Aspire,
## Dell Desktop,
## Epson Printer,
## HP Laptop} => {ViewSonic Monitor} 0.001220132 0.8571429 0.001423488 7.769585 12
## [8] {Acer Aspire,
## ASUS Chromebook,
## Dell Desktop,
## HP Laptop} => {ViewSonic Monitor} 0.001220132 0.8571429 0.001423488 7.769585 12
## [9] {Dell Desktop,
## HP Laptop,
## iMac,
## Mackie CR Speakers} => {ViewSonic Monitor} 0.001321810 0.8125000 0.001626843 7.364919 13
## [10] {Acer Aspire,
## HP Laptop,
## Koss Home Headphones} => {ViewSonic Monitor} 0.001220132 0.8000000 0.001525165 7.251613 12
Interesting results, when we see it using “lift” it looks like the iMac doesn’t show as much as before for the rhs.
topRulesL <- rulesL[1:10]
plot(topRulesL, col = rainbow(4, start = 0.35, end = 0.8))
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(topRulesL, method="graph", control=list("items"),col = rainbow(4, start = 0.35, end = 0.8))
## Warning: Unknown control parameters:
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
In this case, we have a similar story but with a different product, which is the ViewSonic Monitor. Still we can see that we have many products bought together with this monitor, this validates our previous thought of this being a B2B company with also B2C transactions. We’ll keep doing some research in order to get some more insight.
Let’s see our top 5 “Packages” bought with the iMac.
rules1 <- apriori(tr, parameter = list(supp=0.001, conf=0.05, minlen=2),
appearance = list(default="lhs",rhs="iMac"),
control = list(verbose=F))
rules1<-sort(rules1, decreasing=TRUE, by="confidence")
inspect(rules1[1:5])
## lhs rhs support confidence coverage lift count
## [1] {Brother Printer,
## Halter Acrylic Monitor Stand} => {iMac} 0.001118454 1 0.001118454 3.904327 11
## [2] {ASUS Monitor,
## Mackie CR Speakers,
## ViewSonic Monitor} => {iMac} 0.001016777 1 0.001016777 3.904327 10
## [3] {Apple Magic Keyboard,
## Rii LED Gaming Keyboard & Mouse Combo,
## ViewSonic Monitor} => {iMac} 0.001728521 1 0.001728521 3.904327 17
## [4] {ASUS Monitor,
## Koss Home Headphones,
## Microsoft Office Home and Student 2016} => {iMac} 0.001016777 1 0.001016777 3.904327 10
## [5] {ASUS 2 Monitor,
## Dell Desktop,
## Logitech Keyboard} => {iMac} 0.001016777 1 0.001016777 3.904327 10
It looks like most of them are related to business needs.
rules2 <- apriori(tr, parameter = list(supp=0.001, conf=0.15, minlen=2),
appearance = list(default="rhs",lhs="iMac"),
control = list(verbose=F))
rules2<-sort(rules2, decreasing=TRUE, by="confidence")
inspect(rules2)
## lhs rhs support confidence coverage
## [1] {iMac} => {HP Laptop} 0.07554652 0.2949583 0.2561261
## [2] {iMac} => {Lenovo Desktop Computer} 0.05876970 0.2294561 0.2561261
## [3] {iMac} => {CYBERPOWER Gamer Desktop} 0.05673615 0.2215165 0.2561261
## [4] {iMac} => {Dell Desktop} 0.05460092 0.2131798 0.2561261
## [5] {iMac} => {ViewSonic Monitor} 0.04941535 0.1929337 0.2561261
## [6] {iMac} => {Apple Earpods} 0.04026436 0.1572052 0.2561261
## lift count
## [1] 1.5195993 743
## [2] 1.5499321 578
## [3] 1.2043198 558
## [4] 1.5907615 537
## [5] 1.7488507 486
## [6] 0.9015239 396
We can also see that the HP laptop is the most frequent product bought together with the iMac and some other products that looks to be more of a business purchase than a single customer one.
It can be easily illustrated in the following plot:
plot(rules2, method="graph", control=list("items"),col = rainbow(4, start = 0.35, end = 0.8))
## Warning: Unknown control parameters:
## Available control parameters (with default values):
## layout = stress
## circular = FALSE
## ggraphdots = NULL
## edges = <environment>
## nodes = <environment>
## nodetext = <environment>
## colors = c("#EE0000FF", "#EEEEEEFF")
## engine = ggplot2
## max = 100
## verbose = FALSE
Now that we have this information, let’s try and group the products into categories in order to see if Blackwell will benefit by acquiring Electronidex.
We previously did an analysis of the products categorized that Blackwell is currently selling and made a sales volume prediction using a Random Forest Regression Model of the new four different product types (PC, Laptops, Netbooks and Smartphones). We’ll be importing the data we have on the products types that Blackwell is currently selling + the sales volume prediction of the new ones and make a new analysis in order to see if the acquisition of Electrodinex will benefit us. For us to do this, we’ll first categorized the products that Electrodinex are currently selling and then keep up with our analysis.
table1 <- pdf_text("C:/Users/andre/OneDrive/Andres Marquez/UBIQUM/Project 3/Part 3/ElectronidexItems2017.pdf") %>%
read_lines()
full_table['Category'] <- NA
full_table[1:10, ]['Category'] <- "Laptops"
full_table[11:19, ]['Category'] <- "Desktop"
full_table[20:29, ]['Category'] <- "Monitor"
full_table[30:39, ]['Category'] <- "Computer Mice"
full_table[40:48, ]['Category'] <- "Keyboard"
full_table[49:57, ]['Category'] <- "Mouse and Keyboard Combo"
full_table[58:67, ]['Category'] <- "Computer Headphones"
full_table[68:73, ]['Category'] <- "Active Headphones"
full_table[74:82, ]['Category'] <- "Computer Cords"
full_table[83:86, ]['Category'] <- "Accessories"
full_table[87:95, ]['Category'] <- "Speakers"
full_table[96:100, ]['Category'] <- "Printers"
full_table[101:105, ]['Category'] <- "Printer Ink"
full_table[106:110, ]['Category'] <- "Computer Stands"
full_table[111:115, ]['Category'] <- "Computer Tablets"
full_table[116:120, ]['Category'] <- "External Hardrives"
full_table[121:125, ]['Category'] <- "Smart Home Devices"
full_table <- full_table[order(full_table$Products), c(1,2)]
df_final <- data.frame(df, full_table)
Great, now we have our full table with the number of times the product was purchased + its category.
Now, we want to see which categories has the most number of products
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
n_categories <- tabyl(df_final, Category)
n_categories
## Category n percent
## Accessories 4 0.032
## Active Headphones 6 0.048
## Computer Cords 9 0.072
## Computer Headphones 10 0.080
## Computer Mice 10 0.080
## Computer Stands 5 0.040
## Computer Tablets 5 0.040
## Desktop 9 0.072
## External Hardrives 5 0.040
## Keyboard 9 0.072
## Laptops 10 0.080
## Monitor 10 0.080
## Mouse and Keyboard Combo 9 0.072
## Printer Ink 5 0.040
## Printers 5 0.040
## Smart Home Devices 5 0.040
## Speakers 9 0.072
n_categories <- as.data.frame(n_categories)
n_categories <- n_categories[order(n_categories$n, decreasing = TRUE), c(1,2,3)]
p <-ggplot(data=n_categories, aes(x = reorder(Category,-n), y = n, fill=n)) +
geom_bar(stat="identity") +
scale_fill_gradient(low="darkorange1",high="#003366", name="#")+
labs(title="Number of Products per Category",
x ="Category", y = "Count") +
geom_text(aes(label=n), vjust=1.5, hjust=0.5 , color="white", size=3.5)+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
p
The plot clearly shows the number of different products per category.
Now, let’s see the number of purchases per category made and see which groups stand out.
df_count <- aggregate(df_final$Count, list(df_final$Category), FUN=sum)
df_count <- df_count[order(df_count$x, decreasing = TRUE), c(1,2)]
sum(df_count$x)
## [1] 43104
p <-ggplot(data=df_count, aes(x = reorder(Group.1,x), y = x, fill=x)) +
geom_bar(stat="identity") +
scale_fill_gradient(low="darkorange1",high="#003366", name="Count")+
labs(title="Number of purchases per Category",
x ="Category", y = "Count") +
geom_text(aes(label=x), vjust=0.5, hjust=1.2 , color="white", size=3.5)+
theme_minimal() +
coord_flip()
p
We can see that Desktop, Laptops and Monitors are the products that were most bought.
Now, we’ll be importing the transactions of the products types that Blackwell is currently selling and join them with the new products with their respective sales volume prediction. This will give us a clear view on what segmentation Blackwell is failing and see if the acquisition of Electrodinex might help gain some ground that area.
ExistingProducts <- read.csv(("C:/Users/andre/OneDrive/Andres Marquez/UBIQUM/Project 3/Part 2/existingproducts.csv"))
NewProducts <- read.csv(("C:/Users/andre/OneDrive/Andres Marquez/UBIQUM/Project 3/Part 2/newproducts.csv"))
mean_volume <- aggregate(ExistingProducts$Volume, list(ExistingProducts$ProductType), FUN=mean)
mean_volume <- mean_volume[order(mean_volume$x, decreasing = TRUE), c(1,2)]
mean_volume
## Group.1 x
## 4 GameConsole 4360.00000
## 3 ExtendedWarranty 987.60000
## 1 Accessories 982.61538
## 11 Software 711.33333
## 2 Display 485.60000
## 10 Smartphone 452.00000
## 12 Tablet 316.00000
## 5 Laptop 172.00000
## 8 Printer 169.66667
## 6 Netbook 46.00000
## 7 PC 29.00000
## 9 PrinterSupplies 14.66667
ExistingProducts <- ExistingProducts %>% select(ProductType,Price, Volume, ProfitMargin)
NewProducts <- NewProducts %>% select(ProductType,Price, Volume, ProfitMargin)
prueba <- rbind(ExistingProducts, NewProducts)
“Prueba” is now the complete dataset, “Existing Products” and “New Products” with their sales volume prediction that Blackwell is currently selling.
Let’s calculate the mean sales volume per category and see which ones stand out.
mean_volume <- aggregate(prueba$Volume, list(prueba$ProductType), FUN=mean)
mean_volume <- mean_volume[order(mean_volume$x, decreasing = TRUE), c(1,2)]
mean_volume$x <- round(mean_volume$x, 2)
mean_volume
## Group.1 x
## 4 GameConsole 3628.34
## 12 Tablet 1596.81
## 1 Accessories 916.32
## 3 ExtendedWarranty 904.16
## 11 Software 626.08
## 2 Display 408.37
## 10 Smartphone 394.97
## 6 Netbook 261.23
## 8 Printer 167.35
## 5 Laptop 134.10
## 7 PC 129.95
## 9 PrinterSupplies 14.67
p <-ggplot(data=mean_volume, aes(x = reorder(Group.1,x), y = x, fill = x)) +
coord_flip() +
scale_fill_gradient(low="darkorange1",high="#003366", name="Amount")+
geom_bar(stat="identity") +
labs(title="Mean Sales Volume per Category",
x ="Category", y = "Amount") +
geom_text(aes(label=x), vjust=0.5, hjust=1.2 , color="white", size=3)+
theme_minimal()
p
This tells us that the product type with the highest sales volume mean is Game Console, but lets keep diving into it, it doesn’t seem to be telling the full story.
Let’s create a pie chart with the sum of the sales volume
sales_volume <- aggregate(prueba$Volume, list(prueba$ProductType), FUN=sum)
sales_volume <- sales_volume[order(sales_volume$x, decreasing = TRUE), c(1,2)]
sales_volume$x <- round(sales_volume$x, 2)
sales_volume
## Group.1 x
## 1 Accessories 25656.86
## 4 GameConsole 14513.37
## 3 ExtendedWarranty 9945.77
## 12 Tablet 7984.07
## 11 Software 4382.56
## 10 Smartphone 3159.76
## 2 Display 2450.21
## 8 Printer 2175.54
## 6 Netbook 1567.39
## 5 Laptop 804.58
## 7 PC 779.71
## 9 PrinterSupplies 58.67
sum(sales_volume$x)
## [1] 73478.49
p <-ggplot(data=sales_volume, aes(x = reorder(Group.1,x), y = x, fill = x)) +
coord_flip() +
scale_fill_gradient(low="darkorange1",high="#003366", name="Amount")+
geom_bar(stat="identity") +
labs(title="Sales Volume per Category",
x ="Category", y = "Amount") +
geom_text(aes(label=x), vjust=0.5, hjust=1.2 , color="white", size=3)+
theme_minimal()
p
fig <- plot_ly(sales_volume, labels = ~Group.1, values = ~x, type = 'pie')
fig <- fig %>% layout(title = 'Sales Volume per Category',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig
Crazy turnaround, it looks like Game Console is not the one with the highest sales despite having the highest mean. Accessories is the product type that has the highest sales volume of Blackwell at the moment and it seems that PC (Desktop) and Laptops represent just a little portion based on our prediction of our sales, situation that could be heavily improven by the acquisition of Electrodinex, given that the two most purchased product types are Laptops and Desktops, followed by Monitors and other categories that can help the company take a bigger proportion of the market.
For last, let’s calculate the mean profit margin per category in order to see which are the ones more profitable and find out if those belong to the top selling product types of Electronidex.
margin_profit <- aggregate(prueba$ProfitMargin, list(prueba$ProductType), FUN=mean)
margin_profit <- margin_profit[order(margin_profit$x, decreasing = TRUE), c(1,2)]
margin_profit$x <- round(margin_profit$x, 2)
margin_profit
## Group.1 x
## 3 ExtendedWarranty 0.40
## 9 PrinterSupplies 0.31
## 8 Printer 0.20
## 11 Software 0.19
## 7 PC 0.17
## 12 Tablet 0.16
## 2 Display 0.15
## 4 GameConsole 0.14
## 5 Laptop 0.13
## 10 Smartphone 0.12
## 6 Netbook 0.10
## 1 Accessories 0.05
p <-ggplot(data=margin_profit, aes(x = reorder(Group.1,x), y = x, fill=x)) +
geom_bar(stat="identity") +
scale_fill_gradient(low="darkorange1",high="#003366", name="Profit Margin")+
labs(title="Mean Profit Margin per Category",
x ="Category", y = "Profit Margin") +
geom_text(aes(label=x), vjust=0.5, hjust=1.2 , color="white", size=3.5)+
theme_minimal() +
coord_flip()
p
Clearly not the results we wanted, still we have one last thing we can try.
Given that we have the number of transactions made for each product type, we are going to calculate the sales volume of the categories that the acquisition of Electronidex would add to our company (for now, only the ones we share). For this calculation we’ll be using the mean price of our products types and multiply it by the count of purchases made to see an estimate of how much we are talking about.
mean_price <- aggregate(prueba$Price, list(prueba$ProductType), FUN=mean)
mean_price <- mean_price[order(mean_price$x, decreasing = TRUE), c(1,2)]
mean_price$x <- round(mean_price$x, 2)
mean_price
## Group.1 x
## 5 Laptop 1109.60
## 7 PC 961.16
## 2 Display 481.75
## 12 Tablet 449.00
## 6 Netbook 371.33
## 4 GameConsole 276.00
## 8 Printer 216.02
## 10 Smartphone 211.75
## 11 Software 162.36
## 3 ExtendedWarranty 154.48
## 1 Accessories 56.83
## 9 PrinterSupplies 43.52
p <-ggplot(data=mean_price, aes(x = reorder(Group.1,-x), y = x, fill=x)) +
geom_bar(stat="identity") +
scale_fill_gradient(low="darkorange1",high="#003366", name="Price")+
labs(title="Mean Price per Category",
x ="Category", y = "Price") +
geom_text(aes(label=x), vjust=-0.5, hjust=0.5 , color="black", size=3.5)+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
p
We can see that Laptops and PC are the ones with the highest price.
sv <- c("Category", "Count")
mp <- c("Category", "Price")
pm <- c("Category", "Profit_Margin")
colnames(mean_price) <- mp
colnames(df_count)<- sv
colnames(margin_profit) <- pm
Let’s change the names of the categories that mean the same in order to join them.
df_count[df_count$Category=="Laptops", "Category"] <- "Laptop"
df_count[df_count$Category=="Desktop", "Category"] <- "PC"
df_count[df_count$Category=="Monitor", "Category"] <- "Display"
df_count[df_count$Category=="Printers", "Category"] <- "Printer"
df_count[df_count$Category=="Printer Ink", "Category"] <- "PrinterSupplies"
intersect(df_count$Category, mean_price$Category)
## [1] "PC" "Laptop" "Display" "Accessories"
## [5] "Printer" "PrinterSupplies"
We’ll be using SQL to make it simpler.
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
df2 = sqldf("
SELECT mp.Category, mp.Price, sv.Count FROM mean_price mp JOIN df_count sv
ON mp.Category=sv.Category
")
df2 = sqldf("
SELECT Category, Price, Count, Price*Count as Sales
FROM df2
")
p <-ggplot(data=df2, aes(x = reorder(Category, -Sales), y = Sales, fill=Sales)) +
geom_bar(stat="identity") +
scale_fill_gradient(low="darkorange1",high="#003366", name="Sales")+
labs(title="Sales per Category",
x ="Category", y = "Sales") +
geom_text(aes(label=Sales), vjust=-0.5, hjust=0.5 , color="black", size=3.5)+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))
p
The previous plot shows us the sales that could represent the categories in the chart.PC and Laptops are the ones that stand out the most.
sum(df2$Sales)
## [1] 20494015
The total of sales could be around 20.494.015 by just taking into account the categories we share, we are still missing some.
prueba = sqldf("
SELECT ProductType, Price, Volume, ProfitMargin, Price*Volume as Sales
FROM prueba
")
view(prueba)
sum(prueba$Sales)
## [1] 13757056
13.757.056 is the current number of sales that we currently have (taking into account the predictions we made about the new products)
sum(df2$Sales) - sum(prueba$Sales)
## [1] 6736959
Now we can definitely confirm that the acquisition of Electrodinex would be a great opportunity for Blackwell. Given a simple calculation of the mean price of certain products that we both share times the sales volume we can estimate sales of around 20.494.015, way bigger of what we currently manage at the moment which is 13.942.795 by our calculations taking into account the prediction we made about our new products. That’s 6.736.959 more in sales, just taking into account the products we already share together.
We could also calculate the profit we could be generating by the sale of this related items from Electrodinex.
df3 = sqldf("
SELECT mp.Category, mp.Profit_Margin, sv.Sales FROM margin_profit mp JOIN df2 sv
ON mp.Category=sv.Category
")
df3 = sqldf("
SELECT Category, Profit_Margin, Sales, Profit_Margin*Sales as Profit FROM df3
")
sum(df3$Profit)
## [1] 3097882
Our profit would be of 3.097.882, what’s left is to look at the financial situation and evaluation of the company to see if it would be a smart decision to acquire it. But just looking at the information we have in our hands, it looks like it would be a smart idea for Blackwell to make the acquisition of Electrodinex.
fig <- plot_ly(df3, labels = ~Category, values = ~Sales, type = 'pie')
fig <- fig %>% layout(title = 'Sales per Category',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig