This short report concerns the customer segmentation analysis problem. The approach presented in the following study is focused on distinction of the customers into certain groups based on purchasing habits. The method used to view customer groups is Pricipal Component Analysis (PCA). PCA is an algorithm which is particularly useful for massive data processing due to the fact, that PCA a dimensionality reduction algorithm. It decomposes the data into principal components (PC).
The data used for the following research is a collection of fictional data for bike shops (customers), bikes (products), and sales orders for the bike manufacturer (https://github.com/mdancho84/orderSimulatoR/tree/master/data).
The hypothesis is the following: bike shops are interested and buy bikes on the base of the features like: unit price (high end vs affordable), primary category (Mountain vs Road), frame (aluminum vs carbon), etc.
Three data sources (customers, products, orders) have been read and merged. The following table presents first 6 rows of the combined dataset.
#customerTrends %>% head() %>% knitr::kable() # First 6 rows
head(customerTrends )
## # A tibble: 6 x 35
## # Groups: model, category1, category2, frame [6]
## model category1 category2 frame price
## <chr> <chr> <chr> <chr> <fctr>
## 1 Bad Habit 1 Mountain Trail Aluminum [ 415, 3500)
## 2 Bad Habit 2 Mountain Trail Aluminum [ 415, 3500)
## 3 Beast of the East 1 Mountain Trail Aluminum [ 415, 3500)
## 4 Beast of the East 2 Mountain Trail Aluminum [ 415, 3500)
## 5 Beast of the East 3 Mountain Trail Aluminum [ 415, 3500)
## 6 CAAD Disc Ultegra Road Elite Road Aluminum [ 415, 3500)
## # ... with 30 more variables: `Albuquerque Cycles` <dbl>, `Ann Arbor
## # Speed` <dbl>, `Austin Cruisers` <dbl>, `Cincinnati Speed` <dbl>,
## # `Columbus Race Equipment` <dbl>, `Dallas Cycles` <dbl>, `Denver Bike
## # Shop` <dbl>, `Detroit Cycles` <dbl>, `Indianapolis Velocipedes` <dbl>,
## # `Ithaca Mountain Climbers` <dbl>, `Kansas City 29ers` <dbl>, `Las
## # Vegas Cycles` <dbl>, `Los Angeles Cycles` <dbl>, `Louisville Race
## # Equipment` <dbl>, `Miami Race Equipment` <dbl>, `Minneapolis Bike
## # Shop` <dbl>, `Nashville Cruisers` <dbl>, `New Orleans
## # Velocipedes` <dbl>, `New York Cycles` <dbl>, `Oklahoma City Race
## # Equipment` <dbl>, `Philadelphia Bike Shop` <dbl>, `Phoenix
## # Bi-peds` <dbl>, `Pittsburgh Mountain Machines` <dbl>, `Portland
## # Bi-peds` <dbl>, `Providence Bi-peds` <dbl>, `San Antonio Bike
## # Shop` <dbl>, `San Francisco Cruisers` <dbl>, `Seattle Race
## # Equipment` <dbl>, `Tampa 29ers` <dbl>, `Wichita Speed` <dbl>
dendPlot(ceb, mode="hclust")
plot(x=ceb, y=simIgraph)
# Create force directed network plot
forceNetwork(Links = simIgraph_d3$links, Nodes = simIgraph_d3$nodes,
Source = 'source', Target = 'target',
NodeID = 'name', Group = 'group',
fontSize = 16, fontFamily = 'Arial', linkDistance = 100,
zoom = TRUE)
## Warning: It looks like Source/Target is not zero-indexed. This is required
## in JavaScript and so your plot may not render.