Customer Segmentation with K-Means

I would like to share what I’ve learned in in DQLab about Customer Segmentation. Customer Segmentation is the process where we cluster our customers based on their characteristics. This process is important because it can determine the targeted advertisement and make the marketing budget more efficient.

In this case, I am using R. I use the dataset from Kaggle, which you can access here. There are 3 steps of doing customer segmentation, preparing the data, determining the number of clusters, and clustering and analyzing.

Preparing the Data

The first step is calling the data with read.csv function. The dataset is saved as Mall_Customers variable.

Mall_Customers<-read.csv("C:/Users/Home/Downloads/Mall_Customers.csv")
str(Mall_Customers)

## 'data.frame':    200 obs. of  5 variables:
##  $ CustomerID            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender                : chr  "Male" "Male" "Female" "Female" ...
##  $ Age                   : int  19 21 20 23 31 22 35 23 64 30 ...
##  $ Annual.Income..k..    : int  15 15 16 16 17 17 18 18 19 19 ...
##  $ Spending.Score..1.100.: int  39 81 6 77 40 76 6 94 3 72 ...

The dataset consists of 5 columns, they are:
1. CustomerID: Unique ID for each customer
2. Gender: Customers’ gender, male or female
3. Age: Customers’ age
4. Annual.Income..k..: Customers’ income per year, in $
5. Spending.Score..1.100.: Customers’ spending score, from 1-100

K-Means consists of 2 parameters, they are:
1. x: The data, all have to be numerical
2. centers: The desired clusters, will be determined in 2nd process

Since in this data the Gender column is the only non-numerical type, so we have to transform this data to numerical type, in this case, represented by customer_matrix. After we transform the data, we combine the old data with the transformed gender data, saved with customer_field variable. The data will be like this:

customer_matrix <- data.matrix(Mall_Customers[c("Gender")])
Mall_Customers <- data.frame(Mall_Customers, customer_matrix)
customer_field<-c("Gender.1", "Age", "Annual.Income..k..", "Spending.Score..1.100.")
Mall_Customers

##     CustomerID Gender Age Annual.Income..k.. Spending.Score..1.100. Gender.1
## 1            1   Male  19                 15                     39        2
## 2            2   Male  21                 15                     81        2
## 3            3 Female  20                 16                      6        1
## 4            4 Female  23                 16                     77        1
## 5            5 Female  31                 17                     40        1
## 6            6 Female  22                 17                     76        1
## 7            7 Female  35                 18                      6        1
## 8            8 Female  23                 18                     94        1
## 9            9   Male  64                 19                      3        2
## 10          10 Female  30                 19                     72        1
## 11          11   Male  67                 19                     14        2
## 12          12 Female  35                 19                     99        1
## 13          13 Female  58                 20                     15        1
## 14          14 Female  24                 20                     77        1
## 15          15   Male  37                 20                     13        2
## 16          16   Male  22                 20                     79        2
## 17          17 Female  35                 21                     35        1
## 18          18   Male  20                 21                     66        2
## 19          19   Male  52                 23                     29        2
## 20          20 Female  35                 23                     98        1
## 21          21   Male  35                 24                     35        2
## 22          22   Male  25                 24                     73        2
## 23          23 Female  46                 25                      5        1
## 24          24   Male  31                 25                     73        2
## 25          25 Female  54                 28                     14        1
## 26          26   Male  29                 28                     82        2
## 27          27 Female  45                 28                     32        1
## 28          28   Male  35                 28                     61        2
## 29          29 Female  40                 29                     31        1
## 30          30 Female  23                 29                     87        1
## 31          31   Male  60                 30                      4        2
## 32          32 Female  21                 30                     73        1
## 33          33   Male  53                 33                      4        2
## 34          34   Male  18                 33                     92        2
## 35          35 Female  49                 33                     14        1
## 36          36 Female  21                 33                     81        1
## 37          37 Female  42                 34                     17        1
## 38          38 Female  30                 34                     73        1
## 39          39 Female  36                 37                     26        1
## 40          40 Female  20                 37                     75        1
## 41          41 Female  65                 38                     35        1
## 42          42   Male  24                 38                     92        2
## 43          43   Male  48                 39                     36        2
## 44          44 Female  31                 39                     61        1
## 45          45 Female  49                 39                     28        1
## 46          46 Female  24                 39                     65        1
## 47          47 Female  50                 40                     55        1
## 48          48 Female  27                 40                     47        1
## 49          49 Female  29                 40                     42        1
## 50          50 Female  31                 40                     42        1
## 51          51 Female  49                 42                     52        1
## 52          52   Male  33                 42                     60        2
## 53          53 Female  31                 43                     54        1
## 54          54   Male  59                 43                     60        2
## 55          55 Female  50                 43                     45        1
## 56          56   Male  47                 43                     41        2
## 57          57 Female  51                 44                     50        1
## 58          58   Male  69                 44                     46        2
## 59          59 Female  27                 46                     51        1
## 60          60   Male  53                 46                     46        2
## 61          61   Male  70                 46                     56        2
## 62          62   Male  19                 46                     55        2
## 63          63 Female  67                 47                     52        1
## 64          64 Female  54                 47                     59        1
## 65          65   Male  63                 48                     51        2
## 66          66   Male  18                 48                     59        2
## 67          67 Female  43                 48                     50        1
## 68          68 Female  68                 48                     48        1
## 69          69   Male  19                 48                     59        2
## 70          70 Female  32                 48                     47        1
## 71          71   Male  70                 49                     55        2
## 72          72 Female  47                 49                     42        1
## 73          73 Female  60                 50                     49        1
## 74          74 Female  60                 50                     56        1
## 75          75   Male  59                 54                     47        2
## 76          76   Male  26                 54                     54        2
## 77          77 Female  45                 54                     53        1
## 78          78   Male  40                 54                     48        2
## 79          79 Female  23                 54                     52        1
## 80          80 Female  49                 54                     42        1
## 81          81   Male  57                 54                     51        2
## 82          82   Male  38                 54                     55        2
## 83          83   Male  67                 54                     41        2
## 84          84 Female  46                 54                     44        1
## 85          85 Female  21                 54                     57        1
## 86          86   Male  48                 54                     46        2
## 87          87 Female  55                 57                     58        1
## 88          88 Female  22                 57                     55        1
## 89          89 Female  34                 58                     60        1
## 90          90 Female  50                 58                     46        1
## 91          91 Female  68                 59                     55        1
## 92          92   Male  18                 59                     41        2
## 93          93   Male  48                 60                     49        2
## 94          94 Female  40                 60                     40        1
## 95          95 Female  32                 60                     42        1
## 96          96   Male  24                 60                     52        2
## 97          97 Female  47                 60                     47        1
## 98          98 Female  27                 60                     50        1
## 99          99   Male  48                 61                     42        2
## 100        100   Male  20                 61                     49        2
## 101        101 Female  23                 62                     41        1
## 102        102 Female  49                 62                     48        1
## 103        103   Male  67                 62                     59        2
## 104        104   Male  26                 62                     55        2
## 105        105   Male  49                 62                     56        2
## 106        106 Female  21                 62                     42        1
## 107        107 Female  66                 63                     50        1
## 108        108   Male  54                 63                     46        2
## 109        109   Male  68                 63                     43        2
## 110        110   Male  66                 63                     48        2
## 111        111   Male  65                 63                     52        2
## 112        112 Female  19                 63                     54        1
## 113        113 Female  38                 64                     42        1
## 114        114   Male  19                 64                     46        2
## 115        115 Female  18                 65                     48        1
## 116        116 Female  19                 65                     50        1
## 117        117 Female  63                 65                     43        1
## 118        118 Female  49                 65                     59        1
## 119        119 Female  51                 67                     43        1
## 120        120 Female  50                 67                     57        1
## 121        121   Male  27                 67                     56        2
## 122        122 Female  38                 67                     40        1
## 123        123 Female  40                 69                     58        1
## 124        124   Male  39                 69                     91        2
## 125        125 Female  23                 70                     29        1
## 126        126 Female  31                 70                     77        1
## 127        127   Male  43                 71                     35        2
## 128        128   Male  40                 71                     95        2
## 129        129   Male  59                 71                     11        2
## 130        130   Male  38                 71                     75        2
## 131        131   Male  47                 71                      9        2
## 132        132   Male  39                 71                     75        2
## 133        133 Female  25                 72                     34        1
## 134        134 Female  31                 72                     71        1
## 135        135   Male  20                 73                      5        2
## 136        136 Female  29                 73                     88        1
## 137        137 Female  44                 73                      7        1
## 138        138   Male  32                 73                     73        2
## 139        139   Male  19                 74                     10        2
## 140        140 Female  35                 74                     72        1
## 141        141 Female  57                 75                      5        1
## 142        142   Male  32                 75                     93        2
## 143        143 Female  28                 76                     40        1
## 144        144 Female  32                 76                     87        1
## 145        145   Male  25                 77                     12        2
## 146        146   Male  28                 77                     97        2
## 147        147   Male  48                 77                     36        2
## 148        148 Female  32                 77                     74        1
## 149        149 Female  34                 78                     22        1
## 150        150   Male  34                 78                     90        2
## 151        151   Male  43                 78                     17        2
## 152        152   Male  39                 78                     88        2
## 153        153 Female  44                 78                     20        1
## 154        154 Female  38                 78                     76        1
## 155        155 Female  47                 78                     16        1
## 156        156 Female  27                 78                     89        1
## 157        157   Male  37                 78                      1        2
## 158        158 Female  30                 78                     78        1
## 159        159   Male  34                 78                      1        2
## 160        160 Female  30                 78                     73        1
## 161        161 Female  56                 79                     35        1
## 162        162 Female  29                 79                     83        1
## 163        163   Male  19                 81                      5        2
## 164        164 Female  31                 81                     93        1
## 165        165   Male  50                 85                     26        2
## 166        166 Female  36                 85                     75        1
## 167        167   Male  42                 86                     20        2
## 168        168 Female  33                 86                     95        1
## 169        169 Female  36                 87                     27        1
## 170        170   Male  32                 87                     63        2
## 171        171   Male  40                 87                     13        2
## 172        172   Male  28                 87                     75        2
## 173        173   Male  36                 87                     10        2
## 174        174   Male  36                 87                     92        2
## 175        175 Female  52                 88                     13        1
## 176        176 Female  30                 88                     86        1
## 177        177   Male  58                 88                     15        2
## 178        178   Male  27                 88                     69        2
## 179        179   Male  59                 93                     14        2
## 180        180   Male  35                 93                     90        2
## 181        181 Female  37                 97                     32        1
## 182        182 Female  32                 97                     86        1
## 183        183   Male  46                 98                     15        2
## 184        184 Female  29                 98                     88        1
## 185        185 Female  41                 99                     39        1
## 186        186   Male  30                 99                     97        2
## 187        187 Female  54                101                     24        1
## 188        188   Male  28                101                     68        2
## 189        189 Female  41                103                     17        1
## 190        190 Female  36                103                     85        1
## 191        191 Female  34                103                     23        1
## 192        192 Female  32                103                     69        1
## 193        193   Male  33                113                      8        2
## 194        194 Female  38                113                     91        1
## 195        195 Female  47                120                     16        1
## 196        196 Female  35                120                     79        1
## 197        197 Female  45                126                     28        1
## 198        198   Male  32                126                     74        2
## 199        199   Male  32                137                     18        2
## 200        200   Male  30                137                     83        2

Determining the Number of Clusters

Now that we already complete the first condition, now we will fulfill the second condition, centers. We will use Elbow Method, the method where we compare the best combination of number of clusters and SSE (Sum Square of Error). The best combination provides higher between_SS/total_SS percentage and lower SSE with lesser number of clusters. So, we can choose the combination that formed the biggest angle measurement, located at the “elbow” of the curve.
To make the Elbow Curve, we will need the SSE for each cluster, so we will count them and the results are as follow:

set.seed(100)
sse<-sapply(1:10,function(param_k){kmeans(Mall_Customers[customer_field],param_k,nstart=25)$tot.withinss})
sse

##  [1] 308862.06 212889.44 143391.59 104414.68  75399.62  58348.64  51130.69
##  [8]  44355.31  40615.15  37061.44

Now we already have the SSE for each clusters, we can now proceed to the curve. We will need ggplot2 package for this curve. The result will be:

library(ggplot2)
cluster_max <- 10
ssdata = data.frame(cluster=c(1:cluster_max),sse)
ggplot(ssdata, aes(x=cluster,y=sse)) +
  geom_line(color="red") + geom_point() +
  ylab("Within Cluster Sum of Squares") + xlab("Total Cluster") +
  geom_text(aes(label=format(round(sse, 2), nsmall = 2)),hjust=-0.2, vjust=-0.5) +
  scale_x_discrete(limits=c(1:cluster_max))

## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?

Based on this curve, the most optimum number of clusters is 5 or 6. So, we will analyze based on its data distribution, SSE value, and between_SS/total_SS value.

This is the result when we use 5 clusters:

set.seed(100)
kmeans(x=Mall_Customers[c(customer_field)],centers=5,nstart=25)

## K-means clustering with 5 clusters of sizes 79, 39, 36, 23, 23
## 
## Cluster means:
##   Gender.1      Age Annual.Income..k.. Spending.Score..1.100.
## 1 1.417722 43.08861           55.29114               49.56962
## 2 1.461538 32.69231           86.53846               82.12821
## 3 1.527778 40.66667           87.75000               17.58333
## 4 1.391304 25.52174           26.30435               78.56522
## 5 1.391304 45.21739           26.30435               20.91304
## 
## Clustering vector:
##   [1] 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5
##  [38] 4 5 4 5 4 5 4 5 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 1 2 3 2 3 2
## [149] 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
## [186] 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2
## 
## Within cluster sum of squares by cluster:
## [1] 30157.266 13982.051 17678.472  4627.739  8954.087
##  (between_SS / total_SS =  75.6 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

If we use 5 clusters, the data isn’t evenly distributed. We can see that cluster 1 have the biggest number of data, but compared to other clusters, the gap is big. The SSE values are relatively bigger, resulting in lower percentage of between_SS/total_SS.

Now compare with 6 clusters. The result is as follow:

set.seed(100)
kmeans(x=Mall_Customers[c(customer_field)],centers=6,nstart=25)

## K-means clustering with 6 clusters of sizes 45, 39, 35, 22, 21, 38
## 
## Cluster means:
##   Gender.1      Age Annual.Income..k.. Spending.Score..1.100.
## 1 1.444444 56.15556           53.37778               49.08889
## 2 1.461538 32.69231           86.53846               82.12821
## 3 1.571429 41.68571           88.22857               17.28571
## 4 1.409091 25.27273           25.72727               79.36364
## 5 1.380952 44.14286           25.14286               19.52381
## 6 1.342105 27.00000           56.65789               49.13158
## 
## Clustering vector:
##   [1] 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5 4 5
##  [38] 4 5 4 1 4 1 6 5 4 1 6 6 6 1 6 6 1 1 1 1 1 6 1 1 6 1 1 1 6 1 1 6 6 1 1 1 1
##  [75] 1 6 1 6 6 1 1 6 1 1 6 1 1 6 6 1 1 6 1 6 6 6 1 6 1 6 6 1 1 6 1 6 1 1 1 1 1
## [112] 6 6 6 6 6 1 1 1 1 6 6 6 2 6 2 3 2 3 2 3 2 6 2 3 2 3 2 3 2 3 2 6 2 3 2 3 2
## [149] 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
## [186] 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2
## 
## Within cluster sum of squares by cluster:
## [1]  8073.244 13982.051 16699.429  4105.136  7737.333  7751.447
##  (between_SS / total_SS =  81.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

If we use 6 clusters, the data is more evenly distributed and the gap between each clusters isn’t too big unlike when we use 5 clusters. The SSE values are also relatively smaller, resulting in higher percentage of between_SS/total_SS.

So, based on this comparation analysis, we decided to use 6 clusters.

Clustering

Now that we got our desired number of clusters, we can cluster and analyze its result. It will be saved in segmentation variable.

We will analyze the characteristics of each cluster by using the mean of centers.

set.seed(100)
segmentation <-kmeans(x=Mall_Customers[c(customer_field)],centers=6,nstart=25)
segmentation$centers

##   Gender.1      Age Annual.Income..k.. Spending.Score..1.100.
## 1 1.444444 56.15556           53.37778               49.08889
## 2 1.461538 32.69231           86.53846               82.12821
## 3 1.571429 41.68571           88.22857               17.28571
## 4 1.409091 25.27273           25.72727               79.36364
## 5 1.380952 44.14286           25.14286               19.52381
## 6 1.342105 27.00000           56.65789               49.13158

Each clusters has different characteristics. Now we will break down the characteristics of each cluster. Since the Gender is in numerical form, please note that 1 is for female and 2 is for male.
Cluster 1
Dominated with female, the average age is 56 years old, the average annual income is $53, and the average spending score is 49 of 100.
Cluster 2
Dominated with female, the average age is 32 years old, the average annual income is $86, and the average spending score is 82 of 100.
Cluster 3
Dominated with male, the average age is 41 years old, the average annual income is $88, and the average spending score is 17 of 100.
Cluster 4
Dominated with female, the average age is 25 years old, the average annual income is $25, and the average spending score is 79 of 100.
Cluster 5
Dominated with female, the average age is 44 years old, the average annual income is $25, and the average spending score is 19 of 100.
Cluster 6
Dominated with female, the average age is 27 years old, the average annual income is $56, and the average spending score is 49 of 100.

Based on the clusters’ characteristics, we can see that people in their 40s tend to spend less than others, proved by the lower spending score. And most of the clusters that is dominated by female tend to have moderate to high spending score.