I downloaded this wholesale customer dataset from UCI Machine Learning Repository. The data set refers to clients of a wholesale distributor. It includes the annual spending in monetary units on diverse product categories.

My goal today is to use various clustering techniques to segment customers. Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. Thus, there is no outcome to be predicted, and the algorithm just tries to find patterns in the data.

This is the head and structure of the original data

##   Channel Region Fresh Milk Grocery Frozen Detergents_Paper Delicassen
## 1       2      3 12669 9656    7561    214             2674       1338
## 2       2      3  7057 9810    9568   1762             3293       1776
## 3       2      3  6353 8808    7684   2405             3516       7844
## 4       1      3 13265 1196    4221   6404              507       1788
## 5       2      3 22615 5410    7198   3915             1777       5185
## 6       2      3  9413 8259    5126    666             1795       1451

K-Means Clustering

Prepare the data for analysis. Remove the missing value and remove “Channel” and “Region” columns because they are not useful for clustering.

Standardize the variables.

Determine number of clusters.

The correct choice of k is often ambiguous, but from the above plot, I am going to try my cluster analysis with 6 clusters .

##   Group.1      Fresh       Milk    Grocery      Frozen Detergents_Paper
## 1       1 -0.4087717  0.4052515  0.4997472 -0.31386827        0.4558404
## 2       2  1.9645810  5.1696185  1.2857533  6.89275382       -0.5542311
## 3       3  1.0755395  5.1033075  5.6319063 -0.08979632        5.6823687
## 4       4 -0.1935165 -0.4404705 -0.4920530 -0.14609724       -0.4478457
## 5       5  1.7313374 -0.1172609 -0.2469785  1.25849026       -0.4298107
## 6       6 -0.4453660  1.5449634  1.9533468 -0.25357812        2.1391487
##    Delicassen
## 1  0.02344931
## 2 16.45971129
## 3  0.41981740
## 4 -0.21418761
## 5  0.38598244
## 6  0.38983626

Plotting the results.

Interpretation of the results: With my analysis, more than 66% of information about the multivariate data is captured by this plot of component 1 and 2.

Outlier detection with K-Means

First, the data are partitioned into k groups by assigning them to the closest cluster centers.

##       Fresh      Milk   Grocery    Frozen Detergents_Paper Delicassen
## 1 16177.138  3123.224  4480.181  3620.595        1093.0690   1402.250
## 2 61903.375 13358.375 10448.375 21728.750        1301.2500   9270.125
## 3 33290.133  4951.200  5621.067  4209.511         955.4889   1931.000
## 4  5125.548 12509.986 19326.548  1610.630        8443.1233   1893.945
## 5 20031.286 38084.000 56126.143  2564.571       27644.5714   2548.143
## 6  4690.660  3552.749  4390.602  2266.361        1454.4188   1000.686

Then calculate the distance between each object and its cluster center, and pick those with largest distances as outliers.

##   [1] 1 6 6 1 1 6 1 6 6 4 6 1 3 1 3 6 4 6 1 6 1 6 3 4 1 1 6 1 4 3 1 6 1 3 6
##  [36] 6 3 1 4 2 1 1 4 4 6 4 4 5 1 4 6 6 3 4 3 6 4 4 1 6 6 5 6 4 6 5 6 1 6 6
##  [71] 1 1 6 1 6 1 6 4 1 6 6 4 6 1 1 5 5 3 6 3 1 1 4 1 4 6 6 6 6 6 4 4 6 2 1
## [106] 1 6 4 6 4 1 4 1 1 1 1 1 6 1 6 1 6 1 1 3 2 1 1 6 3 6 6 1 6 6 6 6 6 1 6
## [141] 1 3 3 1 1 4 6 6 6 3 1 6 1 6 6 4 4 1 6 4 6 1 1 4 6 4 6 6 6 6 4 4 6 4 6
## [176] 6 3 1 1 6 1 2 6 2 6 6 6 6 4 4 1 1 6 4 6 1 3 6 1 6 4 4 3 6 6 4 6 6 6 4
## [211] 1 5 6 6 6 4 4 1 4 6 1 6 6 6 6 1 1 6 6 6 1 6 3 6 1 6 6 1 6 3 1 1 1 1 6
## [246] 4 6 1 1 6 6 4 6 3 6 3 1 6 2 3 6 6 1 6 4 4 4 1 4 1 6 6 6 3 6 6 3 1 1 1
## [281] 6 1 3 3 2 3 6 1 1 3 6 6 6 4 1 6 1 6 6 6 1 4 6 4 4 6 4 1 6 4 6 3 4 6 1
## [316] 4 6 6 1 4 6 6 1 1 3 2 6 6 1 6 6 4 1 5 1 3 1 6 6 6 6 6 6 4 6 6 4 3 6 4
## [351] 6 4 6 4 1 6 1 4 6 6 1 6 6 6 6 6 6 6 1 6 3 1 6 1 6 6 4 3 6 6 3 1 3 6 4
## [386] 1 6 1 6 6 6 6 6 1 1 6 6 1 1 6 6 3 3 3 1 6 3 4 6 6 6 6 6 6 6 6 4 6 4 6
## [421] 4 1 3 1 1 1 4 3 6 6 4 6 1 6 1 3 3 4 6 6
## [1] 184 182 326  87  86

These are the outliers. Let me make it more meaningful.

##      Fresh  Milk Grocery Frozen Detergents_Paper Delicassen
## 184  36847 43950   20170  36534              239      47943
## 182 112151 29627   18148  16745             4948       8550
## 326  32717 16784   13626  60869             1272       5609
## 87   22925 73498   32114    987            20070        903
## 86   16117 46197   92780   1026            40827       2944

Much better!

Hierarchical Clustering

First draw a sample of 40 records from the customer data, so that the clustering plot will not be over crowded. Same as before, variables Region and Channel are removed from the data. After that, I apply hierarchical clustering to the data.

There are a wide range of hierarchical clustering methods, I heard Ward’s method is a good appraoch, so try it out.

Let me try to interpret: At the bottom, I start with 40 data points, each assigned to separate clusters. Two closest clusters are then merged till I have just one cluster at the top. The height in the dendrogram at which two clusters are merged represents the distance between two clusters in the data space. The decision of the number of clusters that can best depict different groups can be chosen by observing the dendrogram.

The End

I reviewed K Means clustering and Hierarchical Clustering. As we have seen, from using clusters we can understand the portfolio in a better way. We can then build targeted strategy using the profiles of each cluster.