I downloaded this wholesale customer dataset from UCI Machine Learning Repository. The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units on diverse product categories.
My goal today is to use various clustering techniques to segment customers. Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. Thus, there is no outcome to be predicted, and the algorithm just tries to find patterns in the data.
This is the head and structure of the original data
## Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicassen
## 1 2 3 12669 9656 7561 214 2674 1338
## 2 2 3 7057 9810 9568 1762 3293 1776
## 3 2 3 6353 8808 7684 2405 3516 7844
## 4 1 3 13265 1196 4221 6404 507 1788
## 5 2 3 22615 5410 7198 3915 1777 5185
## 6 2 3 9413 8259 5126 666 1795 1451
Prepare the data for analysis. Remove the missing value and remove “Channel” and “Region” columns because they are not useful for clustering.
Standardize the variables.
Determine number of clusters.
The correct choice of k is often ambiguous, but from the above plot, I am going to try my cluster analysis with 6 clusters .
## Group.1 Fresh Milk Grocery Frozen Detergents_Paper
## 1 1 -0.4087717 0.4052515 0.4997472 -0.31386827 0.4558404
## 2 2 1.9645810 5.1696185 1.2857533 6.89275382 -0.5542311
## 3 3 1.0755395 5.1033075 5.6319063 -0.08979632 5.6823687
## 4 4 -0.1935165 -0.4404705 -0.4920530 -0.14609724 -0.4478457
## 5 5 1.7313374 -0.1172609 -0.2469785 1.25849026 -0.4298107
## 6 6 -0.4453660 1.5449634 1.9533468 -0.25357812 2.1391487
## Delicassen
## 1 0.02344931
## 2 16.45971129
## 3 0.41981740
## 4 -0.21418761
## 5 0.38598244
## 6 0.38983626
Plotting the results.
Interpretation of the results: With my analysis, more than 66% of information about the multivariate data is captured by this plot of component 1 and 2.
First, the data are partitioned into k groups by assigning them to the closest cluster centers.
## Fresh Milk Grocery Frozen Detergents_Paper Delicassen
## 1 16177.138 3123.224 4480.181 3620.595 1093.0690 1402.250
## 2 61903.375 13358.375 10448.375 21728.750 1301.2500 9270.125
## 3 33290.133 4951.200 5621.067 4209.511 955.4889 1931.000
## 4 5125.548 12509.986 19326.548 1610.630 8443.1233 1893.945
## 5 20031.286 38084.000 56126.143 2564.571 27644.5714 2548.143
## 6 4690.660 3552.749 4390.602 2266.361 1454.4188 1000.686
Then calculate the distance between each object and its cluster center, and pick those with largest distances as outliers.
## [1] 1 6 6 1 1 6 1 6 6 4 6 1 3 1 3 6 4 6 1 6 1 6 3 4 1 1 6 1 4 3 1 6 1 3 6
## [36] 6 3 1 4 2 1 1 4 4 6 4 4 5 1 4 6 6 3 4 3 6 4 4 1 6 6 5 6 4 6 5 6 1 6 6
## [71] 1 1 6 1 6 1 6 4 1 6 6 4 6 1 1 5 5 3 6 3 1 1 4 1 4 6 6 6 6 6 4 4 6 2 1
## [106] 1 6 4 6 4 1 4 1 1 1 1 1 6 1 6 1 6 1 1 3 2 1 1 6 3 6 6 1 6 6 6 6 6 1 6
## [141] 1 3 3 1 1 4 6 6 6 3 1 6 1 6 6 4 4 1 6 4 6 1 1 4 6 4 6 6 6 6 4 4 6 4 6
## [176] 6 3 1 1 6 1 2 6 2 6 6 6 6 4 4 1 1 6 4 6 1 3 6 1 6 4 4 3 6 6 4 6 6 6 4
## [211] 1 5 6 6 6 4 4 1 4 6 1 6 6 6 6 1 1 6 6 6 1 6 3 6 1 6 6 1 6 3 1 1 1 1 6
## [246] 4 6 1 1 6 6 4 6 3 6 3 1 6 2 3 6 6 1 6 4 4 4 1 4 1 6 6 6 3 6 6 3 1 1 1
## [281] 6 1 3 3 2 3 6 1 1 3 6 6 6 4 1 6 1 6 6 6 1 4 6 4 4 6 4 1 6 4 6 3 4 6 1
## [316] 4 6 6 1 4 6 6 1 1 3 2 6 6 1 6 6 4 1 5 1 3 1 6 6 6 6 6 6 4 6 6 4 3 6 4
## [351] 6 4 6 4 1 6 1 4 6 6 1 6 6 6 6 6 6 6 1 6 3 1 6 1 6 6 4 3 6 6 3 1 3 6 4
## [386] 1 6 1 6 6 6 6 6 1 1 6 6 1 1 6 6 3 3 3 1 6 3 4 6 6 6 6 6 6 6 6 4 6 4 6
## [421] 4 1 3 1 1 1 4 3 6 6 4 6 1 6 1 3 3 4 6 6
## [1] 184 182 326 87 86
These are the outliers. Let me make it more meaningful.
## Fresh Milk Grocery Frozen Detergents_Paper Delicassen
## 184 36847 43950 20170 36534 239 47943
## 182 112151 29627 18148 16745 4948 8550
## 326 32717 16784 13626 60869 1272 5609
## 87 22925 73498 32114 987 20070 903
## 86 16117 46197 92780 1026 40827 2944
Much better!
First draw a sample of 40 records from the customer data, so that the clustering plot will not be over crowded. Same as before, variables Region and Channel are removed from the data. After that, I apply hierarchical clustering to the data.
There are a wide range of hierarchical clustering methods, I heard Ward’s method is a good appraoch, so try it out.
Let me try to interpret: At the bottom, I start with 40 data points, each assigned to separate clusters. Two closest clusters are then merged till I have just one cluster at the top. The height in the dendrogram at which two clusters are merged represents the distance between two clusters in the data space. The decision of the number of clusters that can best depict different groups can be chosen by observing the dendrogram.
I reviewed K Means clustering and Hierarchical Clustering. As we have seen, from using clusters we can understand the portfolio in a better way. We can then build targeted strategy using the profiles of each cluster.