OVERVIEW

DQLab is an online data science learning platform with browser-based Live Code Editor to learn and practice. Every module help student to improve analytical skills, so that it can be applied to the real case industry.

This module about “Data Science in Marketing: Customer Segmentation” where to learn about analysis techniques and divide customer data into segments that are useful for businesses in marketing and CRM. In this analysis using the k-means algorithm. Data mentor of this module is Xeratic.

Customer segmentation is the process of dividing customers based on characteristics such as demographics or behaviors, so a company can market to each group effectively and appropriately. The goal of segmenting customers is to deliver marketing messages personally to each segment so that business performance is better and costs incurred are as optimal as possible.

DATASET

The dataset contains customer data with total 50 data and has seven columns with the following fields explanation :

KEY METRICS

PREPARATION

Load the Dataset

customer <- read.csv("https://storage.googleapis.com/dqlab-dataset/customer_segments.txt",sep="\t")
customer
##    Customer_ID        Nama.Pelanggan Jenis.Kelamin Umur          Profesi
## 1     CUST-001          Budi Anggara          Pria   58       Wiraswasta
## 2     CUST-002      Shirley Ratuwati        Wanita   14          Pelajar
## 3     CUST-003          Agus Cahyono          Pria   48     Professional
## 4     CUST-004      Antonius Winarta          Pria   53     Professional
## 5     CUST-005   Ibu Sri Wahyuni, IR        Wanita   41       Wiraswasta
## 6     CUST-006       Rosalina Kurnia        Wanita   24     Professional
## 7     CUST-007         Cahyono, Agus          Pria   64       Wiraswasta
## 8     CUST-008        Danang Santosa          Pria   52     Professional
## 9     CUST-009 Elisabeth Suryadinata        Wanita   29     Professional
## 10    CUST-010        Mario Setiawan          Pria   33     Professional
## 11    CUST-011        Maria Suryawan        Wanita   50     Professional
## 12    CUST-012       Erliana Widjaja        Wanita   49     Professional
## 13    CUST-013          Cahaya Putri        Wanita   64       Wiraswasta
## 14    CUST-014        Mario Setiawan          Pria   60       Wiraswasta
## 15    CUST-015      Shirley Ratuwati        Wanita   20       Wiraswasta
## 16    CUST-016          Bambang Rudi          Pria   35     Professional
## 17    CUST-017             Yuni Sari        Wanita   32 Ibu Rumah Tangga
## 18    CUST-018           Nelly Halim        Wanita   63 Ibu Rumah Tangga
## 19    CUST-019          Mega Pranoto        Wanita   32       Wiraswasta
## 20    CUST-020        Irene Novianto        Wanita   16          Pelajar
## 21    CUST-021      Lestari Fabianto        Wanita   38       Wiraswasta
## 22    CUST-022          Novita Purba        Wanita   52     Professional
## 23    CUST-023       Denny Amiruddin          Pria   34     Professional
## 24    CUST-024         Putri Ginting        Wanita   39       Wiraswasta
## 25    CUST-025        Julia Setiawan        Wanita   29       Wiraswasta
## 26    CUST-026     Christine Winarto        Wanita   55     Professional
## 27    CUST-027         Grace Mulyati        Wanita   35       Wiraswasta
## 28    CUST-028         Adeline Huang        Wanita   40 Ibu Rumah Tangga
## 29    CUST-029          Tia Hartanti        Wanita   56     Professional
## 30    CUST-030        Rosita Saragih        Wanita   46 Ibu Rumah Tangga
## 31    CUST-031         Eviana Handry        Wanita   19        Mahasiswa
## 32    CUST-032       Chintya Winarni        Wanita   47       Wiraswasta
## 33    CUST-033       Cecilia Kusnadi        Wanita   19        Mahasiswa
## 34    CUST-034        Deasy Arisandi        Wanita   21       Wiraswasta
## 35    CUST-035               Ida Ayu        Wanita   39     Professional
## 36    CUST-036        Ni Made Suasti        Wanita   30       Wiraswasta
## 37    CUST-037      Felicia Tandiono        Wanita   25     Professional
## 38    CUST-038          Agatha Salim        Wanita   46       Wiraswasta
## 39    CUST-039          Gina Hidayat        Wanita   20     Professional
## 40    CUST-040        Irene Darmawan        Wanita   14          Pelajar
## 41    CUST-041      Shinta Aritonang        Wanita   24 Ibu Rumah Tangga
## 42    CUST-042          Yuliana Wati        Wanita   26       Wiraswasta
## 43    CUST-043          Yenna Sumadi        Wanita   31     Professional
## 44    CUST-044                  Anna        Wanita   18       Wiraswasta
## 45    CUST-045        Rismawati Juni        Wanita   22     Professional
## 46    CUST-046          Elfira Surya        Wanita   25       Wiraswasta
## 47    CUST-047           Mira Kurnia        Wanita   55 Ibu Rumah Tangga
## 48    CUST-048      Maria Hutagalung        Wanita   45       Wiraswasta
## 49    CUST-049       Josephine Wahab        Wanita   33 Ibu Rumah Tangga
## 50    CUST-050        Lianna Nugraha        Wanita   55       Wiraswasta
##    Tipe.Residen NilaiBelanjaSetahun
## 1        Sector             9497927
## 2       Cluster             2722700
## 3       Cluster             5286429
## 4       Cluster             5204498
## 5       Cluster            10615206
## 6       Cluster             5215541
## 7        Sector             9837260
## 8       Cluster             5223569
## 9        Sector             5993218
## 10      Cluster             5257448
## 11       Sector             5987367
## 12       Sector             5941914
## 13      Cluster             9333168
## 14      Cluster             9471615
## 15      Cluster            10365668
## 16      Cluster             5262521
## 17      Cluster             5677762
## 18      Cluster             5340690
## 19      Cluster            10884508
## 20       Sector             2896845
## 21      Cluster             9222070
## 22      Cluster             5298157
## 23      Cluster             5239290
## 24      Cluster            10259572
## 25       Sector            10721998
## 26      Cluster             5269392
## 27      Cluster             9114159
## 28      Cluster             6631680
## 29      Cluster             5271845
## 30       Sector             5020976
## 31      Cluster             3042773
## 32       Sector            10663179
## 33      Cluster             3047926
## 34       Sector             9759822
## 35       Sector             5962575
## 36      Cluster             9678994
## 37       Sector             5972787
## 38       Sector            10477127
## 39      Cluster             5257775
## 40       Sector             2861855
## 41      Cluster             6820976
## 42      Cluster             9880607
## 43      Cluster             5268410
## 44      Cluster             9339737
## 45      Cluster             5211041
## 46       Sector            10099807
## 47      Cluster             6130724
## 48       Sector            10390732
## 49       Sector             4992585
## 50       Sector            10569316

Change the Data Type Text into Numeric

customer_matrix <- data.matrix(customer[c("Jenis.Kelamin","Profesi","Tipe.Residen")])
customer_matrix
##       Jenis.Kelamin Profesi Tipe.Residen
##  [1,]             1       5            2
##  [2,]             2       3            1
##  [3,]             1       4            1
##  [4,]             1       4            1
##  [5,]             2       5            1
##  [6,]             2       4            1
##  [7,]             1       5            2
##  [8,]             1       4            1
##  [9,]             2       4            2
## [10,]             1       4            1
## [11,]             2       4            2
## [12,]             2       4            2
## [13,]             2       5            1
## [14,]             1       5            1
## [15,]             2       5            1
## [16,]             1       4            1
## [17,]             2       1            1
## [18,]             2       1            1
## [19,]             2       5            1
## [20,]             2       3            2
## [21,]             2       5            1
## [22,]             2       4            1
## [23,]             1       4            1
## [24,]             2       5            1
## [25,]             2       5            2
## [26,]             2       4            1
## [27,]             2       5            1
## [28,]             2       1            1
## [29,]             2       4            1
## [30,]             2       1            2
## [31,]             2       2            1
## [32,]             2       5            2
## [33,]             2       2            1
## [34,]             2       5            2
## [35,]             2       4            2
## [36,]             2       5            1
## [37,]             2       4            2
## [38,]             2       5            2
## [39,]             2       4            1
## [40,]             2       3            2
## [41,]             2       1            1
## [42,]             2       5            1
## [43,]             2       4            1
## [44,]             2       5            1
## [45,]             2       4            1
## [46,]             2       5            2
## [47,]             2       1            1
## [48,]             2       5            2
## [49,]             2       1            2
## [50,]             2       5            2

Combine Conversion Results

customer <- data.frame(customer, customer_matrix)
customer
##    Customer_ID        Nama.Pelanggan Jenis.Kelamin Umur          Profesi
## 1     CUST-001          Budi Anggara          Pria   58       Wiraswasta
## 2     CUST-002      Shirley Ratuwati        Wanita   14          Pelajar
## 3     CUST-003          Agus Cahyono          Pria   48     Professional
## 4     CUST-004      Antonius Winarta          Pria   53     Professional
## 5     CUST-005   Ibu Sri Wahyuni, IR        Wanita   41       Wiraswasta
## 6     CUST-006       Rosalina Kurnia        Wanita   24     Professional
## 7     CUST-007         Cahyono, Agus          Pria   64       Wiraswasta
## 8     CUST-008        Danang Santosa          Pria   52     Professional
## 9     CUST-009 Elisabeth Suryadinata        Wanita   29     Professional
## 10    CUST-010        Mario Setiawan          Pria   33     Professional
## 11    CUST-011        Maria Suryawan        Wanita   50     Professional
## 12    CUST-012       Erliana Widjaja        Wanita   49     Professional
## 13    CUST-013          Cahaya Putri        Wanita   64       Wiraswasta
## 14    CUST-014        Mario Setiawan          Pria   60       Wiraswasta
## 15    CUST-015      Shirley Ratuwati        Wanita   20       Wiraswasta
## 16    CUST-016          Bambang Rudi          Pria   35     Professional
## 17    CUST-017             Yuni Sari        Wanita   32 Ibu Rumah Tangga
## 18    CUST-018           Nelly Halim        Wanita   63 Ibu Rumah Tangga
## 19    CUST-019          Mega Pranoto        Wanita   32       Wiraswasta
## 20    CUST-020        Irene Novianto        Wanita   16          Pelajar
## 21    CUST-021      Lestari Fabianto        Wanita   38       Wiraswasta
## 22    CUST-022          Novita Purba        Wanita   52     Professional
## 23    CUST-023       Denny Amiruddin          Pria   34     Professional
## 24    CUST-024         Putri Ginting        Wanita   39       Wiraswasta
## 25    CUST-025        Julia Setiawan        Wanita   29       Wiraswasta
## 26    CUST-026     Christine Winarto        Wanita   55     Professional
## 27    CUST-027         Grace Mulyati        Wanita   35       Wiraswasta
## 28    CUST-028         Adeline Huang        Wanita   40 Ibu Rumah Tangga
## 29    CUST-029          Tia Hartanti        Wanita   56     Professional
## 30    CUST-030        Rosita Saragih        Wanita   46 Ibu Rumah Tangga
## 31    CUST-031         Eviana Handry        Wanita   19        Mahasiswa
## 32    CUST-032       Chintya Winarni        Wanita   47       Wiraswasta
## 33    CUST-033       Cecilia Kusnadi        Wanita   19        Mahasiswa
## 34    CUST-034        Deasy Arisandi        Wanita   21       Wiraswasta
## 35    CUST-035               Ida Ayu        Wanita   39     Professional
## 36    CUST-036        Ni Made Suasti        Wanita   30       Wiraswasta
## 37    CUST-037      Felicia Tandiono        Wanita   25     Professional
## 38    CUST-038          Agatha Salim        Wanita   46       Wiraswasta
## 39    CUST-039          Gina Hidayat        Wanita   20     Professional
## 40    CUST-040        Irene Darmawan        Wanita   14          Pelajar
## 41    CUST-041      Shinta Aritonang        Wanita   24 Ibu Rumah Tangga
## 42    CUST-042          Yuliana Wati        Wanita   26       Wiraswasta
## 43    CUST-043          Yenna Sumadi        Wanita   31     Professional
## 44    CUST-044                  Anna        Wanita   18       Wiraswasta
## 45    CUST-045        Rismawati Juni        Wanita   22     Professional
## 46    CUST-046          Elfira Surya        Wanita   25       Wiraswasta
## 47    CUST-047           Mira Kurnia        Wanita   55 Ibu Rumah Tangga
## 48    CUST-048      Maria Hutagalung        Wanita   45       Wiraswasta
## 49    CUST-049       Josephine Wahab        Wanita   33 Ibu Rumah Tangga
## 50    CUST-050        Lianna Nugraha        Wanita   55       Wiraswasta
##    Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 1        Sector             9497927               1         5              2
## 2       Cluster             2722700               2         3              1
## 3       Cluster             5286429               1         4              1
## 4       Cluster             5204498               1         4              1
## 5       Cluster            10615206               2         5              1
## 6       Cluster             5215541               2         4              1
## 7        Sector             9837260               1         5              2
## 8       Cluster             5223569               1         4              1
## 9        Sector             5993218               2         4              2
## 10      Cluster             5257448               1         4              1
## 11       Sector             5987367               2         4              2
## 12       Sector             5941914               2         4              2
## 13      Cluster             9333168               2         5              1
## 14      Cluster             9471615               1         5              1
## 15      Cluster            10365668               2         5              1
## 16      Cluster             5262521               1         4              1
## 17      Cluster             5677762               2         1              1
## 18      Cluster             5340690               2         1              1
## 19      Cluster            10884508               2         5              1
## 20       Sector             2896845               2         3              2
## 21      Cluster             9222070               2         5              1
## 22      Cluster             5298157               2         4              1
## 23      Cluster             5239290               1         4              1
## 24      Cluster            10259572               2         5              1
## 25       Sector            10721998               2         5              2
## 26      Cluster             5269392               2         4              1
## 27      Cluster             9114159               2         5              1
## 28      Cluster             6631680               2         1              1
## 29      Cluster             5271845               2         4              1
## 30       Sector             5020976               2         1              2
## 31      Cluster             3042773               2         2              1
## 32       Sector            10663179               2         5              2
## 33      Cluster             3047926               2         2              1
## 34       Sector             9759822               2         5              2
## 35       Sector             5962575               2         4              2
## 36      Cluster             9678994               2         5              1
## 37       Sector             5972787               2         4              2
## 38       Sector            10477127               2         5              2
## 39      Cluster             5257775               2         4              1
## 40       Sector             2861855               2         3              2
## 41      Cluster             6820976               2         1              1
## 42      Cluster             9880607               2         5              1
## 43      Cluster             5268410               2         4              1
## 44      Cluster             9339737               2         5              1
## 45      Cluster             5211041               2         4              1
## 46       Sector            10099807               2         5              2
## 47      Cluster             6130724               2         1              1
## 48       Sector            10390732               2         5              2
## 49       Sector             4992585               2         1              2
## 50       Sector            10569316               2         5              2

Normalize “NilaiBelanjaSetahun”

We consider that “NilaiBelanjaSetahun” column contains million of data. When this column is used for clustering, the calculation of the sum of squared errors (in the kmeans chapter) will be very large.

We will normalize the value to make the calculation simple and easier to digest, but not reduce accuracy. Normalization can be done in many ways. In this case, it is enough to divided “NilaiBelanjaSetahun” with 1000000

customer$NilaiBelanjaSetahun <- customer$NilaiBelanjaSetahun / 1000000
customer$NilaiBelanjaSetahun
##  [1]  9.497927  2.722700  5.286429  5.204498 10.615206  5.215541  9.837260
##  [8]  5.223569  5.993218  5.257448  5.987367  5.941914  9.333168  9.471615
## [15] 10.365668  5.262521  5.677762  5.340690 10.884508  2.896845  9.222070
## [22]  5.298157  5.239290 10.259572 10.721998  5.269392  9.114159  6.631680
## [29]  5.271845  5.020976  3.042773 10.663179  3.047926  9.759822  5.962575
## [36]  9.678994  5.972787 10.477127  5.257775  2.861855  6.820976  9.880607
## [43]  5.268410  9.339737  5.211041 10.099807  6.130724 10.390732  4.992585
## [50] 10.569316

Creating Data Master

data_master <- customer[c("Jenis.Kelamin","Jenis.Kelamin.1","Profesi","Profesi.1","Tipe.Residen","Tipe.Residen.1")]
data_master
##    Jenis.Kelamin Jenis.Kelamin.1          Profesi Profesi.1 Tipe.Residen
## 1           Pria               1       Wiraswasta         5       Sector
## 2         Wanita               2          Pelajar         3      Cluster
## 3           Pria               1     Professional         4      Cluster
## 4           Pria               1     Professional         4      Cluster
## 5         Wanita               2       Wiraswasta         5      Cluster
## 6         Wanita               2     Professional         4      Cluster
## 7           Pria               1       Wiraswasta         5       Sector
## 8           Pria               1     Professional         4      Cluster
## 9         Wanita               2     Professional         4       Sector
## 10          Pria               1     Professional         4      Cluster
## 11        Wanita               2     Professional         4       Sector
## 12        Wanita               2     Professional         4       Sector
## 13        Wanita               2       Wiraswasta         5      Cluster
## 14          Pria               1       Wiraswasta         5      Cluster
## 15        Wanita               2       Wiraswasta         5      Cluster
## 16          Pria               1     Professional         4      Cluster
## 17        Wanita               2 Ibu Rumah Tangga         1      Cluster
## 18        Wanita               2 Ibu Rumah Tangga         1      Cluster
## 19        Wanita               2       Wiraswasta         5      Cluster
## 20        Wanita               2          Pelajar         3       Sector
## 21        Wanita               2       Wiraswasta         5      Cluster
## 22        Wanita               2     Professional         4      Cluster
## 23          Pria               1     Professional         4      Cluster
## 24        Wanita               2       Wiraswasta         5      Cluster
## 25        Wanita               2       Wiraswasta         5       Sector
## 26        Wanita               2     Professional         4      Cluster
## 27        Wanita               2       Wiraswasta         5      Cluster
## 28        Wanita               2 Ibu Rumah Tangga         1      Cluster
## 29        Wanita               2     Professional         4      Cluster
## 30        Wanita               2 Ibu Rumah Tangga         1       Sector
## 31        Wanita               2        Mahasiswa         2      Cluster
## 32        Wanita               2       Wiraswasta         5       Sector
## 33        Wanita               2        Mahasiswa         2      Cluster
## 34        Wanita               2       Wiraswasta         5       Sector
## 35        Wanita               2     Professional         4       Sector
## 36        Wanita               2       Wiraswasta         5      Cluster
## 37        Wanita               2     Professional         4       Sector
## 38        Wanita               2       Wiraswasta         5       Sector
## 39        Wanita               2     Professional         4      Cluster
## 40        Wanita               2          Pelajar         3       Sector
## 41        Wanita               2 Ibu Rumah Tangga         1      Cluster
## 42        Wanita               2       Wiraswasta         5      Cluster
## 43        Wanita               2     Professional         4      Cluster
## 44        Wanita               2       Wiraswasta         5      Cluster
## 45        Wanita               2     Professional         4      Cluster
## 46        Wanita               2       Wiraswasta         5       Sector
## 47        Wanita               2 Ibu Rumah Tangga         1      Cluster
## 48        Wanita               2       Wiraswasta         5       Sector
## 49        Wanita               2 Ibu Rumah Tangga         1       Sector
## 50        Wanita               2       Wiraswasta         5       Sector
##    Tipe.Residen.1
## 1               2
## 2               1
## 3               1
## 4               1
## 5               1
## 6               1
## 7               2
## 8               1
## 9               2
## 10              1
## 11              2
## 12              2
## 13              1
## 14              1
## 15              1
## 16              1
## 17              1
## 18              1
## 19              1
## 20              2
## 21              1
## 22              1
## 23              1
## 24              1
## 25              2
## 26              1
## 27              1
## 28              1
## 29              1
## 30              2
## 31              1
## 32              2
## 33              1
## 34              2
## 35              2
## 36              1
## 37              2
## 38              2
## 39              1
## 40              2
## 41              1
## 42              1
## 43              1
## 44              1
## 45              1
## 46              2
## 47              1
## 48              2
## 49              2
## 50              2

Based on data_master, the name of category from each column :

  • Jenis.Kelamin : 1=Pria, 2=Wanita
  • Profesi : 1=Ibu Rumah Tangga, 2=Mahasiswa, 3=Pelajar, 4=Profesional, 5=Wiraswasta
  • Tipe.Residen : 1=Cluster, 2=Sector

Also, we can summarize with unique function

Profesi <- unique(customer[c("Profesi","Profesi.1")])
Profesi
##             Profesi Profesi.1
## 1        Wiraswasta         5
## 2           Pelajar         3
## 3      Professional         4
## 17 Ibu Rumah Tangga         1
## 31        Mahasiswa         2
Jenis.Kelamin <- unique(customer[c("Jenis.Kelamin","Jenis.Kelamin.1")])
Jenis.Kelamin
##   Jenis.Kelamin Jenis.Kelamin.1
## 1          Pria               1
## 2        Wanita               2
Tipe.Residen <- unique(customer[c("Tipe.Residen","Tipe.Residen.1")])
Tipe.Residen
##   Tipe.Residen Tipe.Residen.1
## 1       Sector              2
## 2      Cluster              1

ANALYSIS

K-means Method

name_field <- c("Jenis.Kelamin.1","Umur","Profesi.1","Tipe.Residen.1", "NilaiBelanjaSetahun")

set.seed(100)
segmentation <- kmeans(x=customer[name_field], centers=5, nstart=25)
segmentation
## K-means clustering with 5 clusters of sizes 5, 12, 14, 9, 10
## 
## Cluster means:
##   Jenis.Kelamin.1     Umur Profesi.1 Tipe.Residen.1 NilaiBelanjaSetahun
## 1            1.40 61.80000  4.200000       1.400000            8.696132
## 2            1.75 31.58333  3.916667       1.250000            7.330958
## 3            2.00 20.07143  3.571429       1.357143            5.901089
## 4            2.00 42.33333  4.000000       1.555556            8.804791
## 5            1.70 52.50000  3.800000       1.300000            6.018321
## 
## Clustering vector:
##  [1] 1 3 5 5 4 3 1 5 2 2 5 5 1 1 3 2 2 1 2 3 4 5 2 4 2 5 2 4 5 4 3 4 3 3 4 2 3 4
## [39] 3 3 3 2 2 3 3 3 5 4 2 5
## 
## Within cluster sum of squares by cluster:
## [1]  58.21123 174.85164 316.73367 171.67372 108.49735
##  (between_SS / total_SS =  92.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Based on kmeans clustering with 5 clusters it was found that cluster 1 has 14 data, cluster 2 has 5 data, cluster 3 has 12 data, cluster 4 has 9 data and cluster 5 has 10 data. Percentage ratio between between_SS per total_SS to 5 clusters is good because the number of percentage is high equal to 92,4%.

Add segmentation results to customer data

customer$cluster <- segmentation$cluster

If we want to viewing data on the each cluster, the code as follows:

customer[which(customer$cluster == 1),] #data from cluster 1
##    Customer_ID Nama.Pelanggan Jenis.Kelamin Umur          Profesi Tipe.Residen
## 1     CUST-001   Budi Anggara          Pria   58       Wiraswasta       Sector
## 7     CUST-007  Cahyono, Agus          Pria   64       Wiraswasta       Sector
## 13    CUST-013   Cahaya Putri        Wanita   64       Wiraswasta      Cluster
## 14    CUST-014 Mario Setiawan          Pria   60       Wiraswasta      Cluster
## 18    CUST-018    Nelly Halim        Wanita   63 Ibu Rumah Tangga      Cluster
##    NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1 cluster
## 1             9.497927               1         5              2       1
## 7             9.837260               1         5              2       1
## 13            9.333168               2         5              1       1
## 14            9.471615               1         5              1       1
## 18            5.340690               2         1              1       1
customer[which(customer$cluster == 2),] #data from cluster 2
##    Customer_ID        Nama.Pelanggan Jenis.Kelamin Umur          Profesi
## 9     CUST-009 Elisabeth Suryadinata        Wanita   29     Professional
## 10    CUST-010        Mario Setiawan          Pria   33     Professional
## 16    CUST-016          Bambang Rudi          Pria   35     Professional
## 17    CUST-017             Yuni Sari        Wanita   32 Ibu Rumah Tangga
## 19    CUST-019          Mega Pranoto        Wanita   32       Wiraswasta
## 23    CUST-023       Denny Amiruddin          Pria   34     Professional
## 25    CUST-025        Julia Setiawan        Wanita   29       Wiraswasta
## 27    CUST-027         Grace Mulyati        Wanita   35       Wiraswasta
## 36    CUST-036        Ni Made Suasti        Wanita   30       Wiraswasta
## 42    CUST-042          Yuliana Wati        Wanita   26       Wiraswasta
## 43    CUST-043          Yenna Sumadi        Wanita   31     Professional
## 49    CUST-049       Josephine Wahab        Wanita   33 Ibu Rumah Tangga
##    Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 9        Sector            5.993218               2         4              2
## 10      Cluster            5.257448               1         4              1
## 16      Cluster            5.262521               1         4              1
## 17      Cluster            5.677762               2         1              1
## 19      Cluster           10.884508               2         5              1
## 23      Cluster            5.239290               1         4              1
## 25       Sector           10.721998               2         5              2
## 27      Cluster            9.114159               2         5              1
## 36      Cluster            9.678994               2         5              1
## 42      Cluster            9.880607               2         5              1
## 43      Cluster            5.268410               2         4              1
## 49       Sector            4.992585               2         1              2
##    cluster
## 9        2
## 10       2
## 16       2
## 17       2
## 19       2
## 23       2
## 25       2
## 27       2
## 36       2
## 42       2
## 43       2
## 49       2
customer[which(customer$cluster == 3),] #data from cluster 3
##    Customer_ID   Nama.Pelanggan Jenis.Kelamin Umur          Profesi
## 2     CUST-002 Shirley Ratuwati        Wanita   14          Pelajar
## 6     CUST-006  Rosalina Kurnia        Wanita   24     Professional
## 15    CUST-015 Shirley Ratuwati        Wanita   20       Wiraswasta
## 20    CUST-020   Irene Novianto        Wanita   16          Pelajar
## 31    CUST-031    Eviana Handry        Wanita   19        Mahasiswa
## 33    CUST-033  Cecilia Kusnadi        Wanita   19        Mahasiswa
## 34    CUST-034   Deasy Arisandi        Wanita   21       Wiraswasta
## 37    CUST-037 Felicia Tandiono        Wanita   25     Professional
## 39    CUST-039     Gina Hidayat        Wanita   20     Professional
## 40    CUST-040   Irene Darmawan        Wanita   14          Pelajar
## 41    CUST-041 Shinta Aritonang        Wanita   24 Ibu Rumah Tangga
## 44    CUST-044             Anna        Wanita   18       Wiraswasta
## 45    CUST-045   Rismawati Juni        Wanita   22     Professional
## 46    CUST-046     Elfira Surya        Wanita   25       Wiraswasta
##    Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 2       Cluster            2.722700               2         3              1
## 6       Cluster            5.215541               2         4              1
## 15      Cluster           10.365668               2         5              1
## 20       Sector            2.896845               2         3              2
## 31      Cluster            3.042773               2         2              1
## 33      Cluster            3.047926               2         2              1
## 34       Sector            9.759822               2         5              2
## 37       Sector            5.972787               2         4              2
## 39      Cluster            5.257775               2         4              1
## 40       Sector            2.861855               2         3              2
## 41      Cluster            6.820976               2         1              1
## 44      Cluster            9.339737               2         5              1
## 45      Cluster            5.211041               2         4              1
## 46       Sector           10.099807               2         5              2
##    cluster
## 2        3
## 6        3
## 15       3
## 20       3
## 31       3
## 33       3
## 34       3
## 37       3
## 39       3
## 40       3
## 41       3
## 44       3
## 45       3
## 46       3
customer[which(customer$cluster == 4),] #data from cluster 4
##    Customer_ID      Nama.Pelanggan Jenis.Kelamin Umur          Profesi
## 5     CUST-005 Ibu Sri Wahyuni, IR        Wanita   41       Wiraswasta
## 21    CUST-021    Lestari Fabianto        Wanita   38       Wiraswasta
## 24    CUST-024       Putri Ginting        Wanita   39       Wiraswasta
## 28    CUST-028       Adeline Huang        Wanita   40 Ibu Rumah Tangga
## 30    CUST-030      Rosita Saragih        Wanita   46 Ibu Rumah Tangga
## 32    CUST-032     Chintya Winarni        Wanita   47       Wiraswasta
## 35    CUST-035             Ida Ayu        Wanita   39     Professional
## 38    CUST-038        Agatha Salim        Wanita   46       Wiraswasta
## 48    CUST-048    Maria Hutagalung        Wanita   45       Wiraswasta
##    Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 5       Cluster           10.615206               2         5              1
## 21      Cluster            9.222070               2         5              1
## 24      Cluster           10.259572               2         5              1
## 28      Cluster            6.631680               2         1              1
## 30       Sector            5.020976               2         1              2
## 32       Sector           10.663179               2         5              2
## 35       Sector            5.962575               2         4              2
## 38       Sector           10.477127               2         5              2
## 48       Sector           10.390732               2         5              2
##    cluster
## 5        4
## 21       4
## 24       4
## 28       4
## 30       4
## 32       4
## 35       4
## 38       4
## 48       4
customer[which(customer$cluster == 5),] #data from cluster 5
##    Customer_ID    Nama.Pelanggan Jenis.Kelamin Umur          Profesi
## 3     CUST-003      Agus Cahyono          Pria   48     Professional
## 4     CUST-004  Antonius Winarta          Pria   53     Professional
## 8     CUST-008    Danang Santosa          Pria   52     Professional
## 11    CUST-011    Maria Suryawan        Wanita   50     Professional
## 12    CUST-012   Erliana Widjaja        Wanita   49     Professional
## 22    CUST-022      Novita Purba        Wanita   52     Professional
## 26    CUST-026 Christine Winarto        Wanita   55     Professional
## 29    CUST-029      Tia Hartanti        Wanita   56     Professional
## 47    CUST-047       Mira Kurnia        Wanita   55 Ibu Rumah Tangga
## 50    CUST-050    Lianna Nugraha        Wanita   55       Wiraswasta
##    Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 3       Cluster            5.286429               1         4              1
## 4       Cluster            5.204498               1         4              1
## 8       Cluster            5.223569               1         4              1
## 11       Sector            5.987367               2         4              2
## 12       Sector            5.941914               2         4              2
## 22      Cluster            5.298157               2         4              1
## 26      Cluster            5.269392               2         4              1
## 29      Cluster            5.271845               2         4              1
## 47      Cluster            6.130724               2         1              1
## 50       Sector           10.569316               2         5              2
##    cluster
## 3        5
## 4        5
## 8        5
## 11       5
## 12       5
## 22       5
## 26       5
## 29       5
## 47       5
## 50       5

Determine the Best Number of Clusters

sse <- sapply(1:10, function(param_k) {kmeans(x=customer[name_field], param_k, nstart=25)$tot.withinss})
sse
##  [1] 10990.9740  3016.5612  1550.8725  1064.4187   829.9676   625.1462
##  [7]   508.1568   431.6977   374.1095   317.9424
library(ggplot2)

jumlah_cluster_max <- 10
ssdata = data.frame(cluster=c(1:jumlah_cluster_max),sse)
ggplot(ssdata, aes(x=cluster,y=sse)) +
                geom_line(color="red") + geom_point() +
                ylab("Within Cluster Sum of Squares") + xlab("Jumlah Cluster") +
                geom_text(aes(label=format(round(sse, 2), nsmall = 2)),hjust=-0.2, vjust=-0.5) +
  scale_x_discrete(limits=c(1:jumlah_cluster_max))
## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?

The far left point is the number of SS for 1 number of clusters, the second point is for 2 number of clusters, and so on. Notice that the further to the right the difference in distance between each point is getting smaller. This line graph has a shape like an elbow, and for the optimal number of clusters we usually take the point of the elbow. In the example above we can take 4 or 5.

Name of Optimal Cluster with the New Characteristics

The name of each cluster :

  • Cluster 1 : Diamond Senior : average age is 61 years and spending over 8 million.
  • Cluster 2 : Gold Young Professional : average age is 31 years, professional and the spending is quite large.
  • Cluster 3 : Silver Youth Gals : average age is 20, all women, the profession is mixed between students and professionals also spending around 6 million.
  • Cluster 4 : Diamond Professional : average age is 42 years, the highest spenders and all are professionals
  • Cluster 5 : Silver Mid Professional : average age is 52 years and spending around 6 million

Make one variable data frame named Customer.Segment

Customer.Segment <- data.frame(Cluster = c(1,2,3,4,5),
                               Name.Segment = c("Diamond Senior",
                                                "Gold Young Professional",
                                                "Silver Youth Gals", 
                                                "Diamond Professional", 
                                                "Silver Mid Professional"))
Customer.Segment
##   Cluster            Name.Segment
## 1       1          Diamond Senior
## 2       2 Gold Young Professional
## 3       3       Silver Youth Gals
## 4       4    Diamond Professional
## 5       5 Silver Mid Professional

After that, we make Cluster.Identity variable

Cluster.Identity <- list( Profession=Profesi,
                          Gender=Jenis.Kelamin, 
                          Resident_Type=Tipe.Residen,
                          Segmentation=segmentation, 
                          Customer_Segmen=Customer.Segment, 
                          Name_Field=name_field)
print(Cluster.Identity)
## $Profession
##             Profesi Profesi.1
## 1        Wiraswasta         5
## 2           Pelajar         3
## 3      Professional         4
## 17 Ibu Rumah Tangga         1
## 31        Mahasiswa         2
## 
## $Gender
##   Jenis.Kelamin Jenis.Kelamin.1
## 1          Pria               1
## 2        Wanita               2
## 
## $Resident_Type
##   Tipe.Residen Tipe.Residen.1
## 1       Sector              2
## 2      Cluster              1
## 
## $Segmentation
## K-means clustering with 5 clusters of sizes 5, 12, 14, 9, 10
## 
## Cluster means:
##   Jenis.Kelamin.1     Umur Profesi.1 Tipe.Residen.1 NilaiBelanjaSetahun
## 1            1.40 61.80000  4.200000       1.400000            8.696132
## 2            1.75 31.58333  3.916667       1.250000            7.330958
## 3            2.00 20.07143  3.571429       1.357143            5.901089
## 4            2.00 42.33333  4.000000       1.555556            8.804791
## 5            1.70 52.50000  3.800000       1.300000            6.018321
## 
## Clustering vector:
##  [1] 1 3 5 5 4 3 1 5 2 2 5 5 1 1 3 2 2 1 2 3 4 5 2 4 2 5 2 4 5 4 3 4 3 3 4 2 3 4
## [39] 3 3 3 2 2 3 3 3 5 4 2 5
## 
## Within cluster sum of squares by cluster:
## [1]  58.21123 174.85164 316.73367 171.67372 108.49735
##  (between_SS / total_SS =  92.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"      
## 
## $Customer_Segmen
##   Cluster            Name.Segment
## 1       1          Diamond Senior
## 2       2 Gold Young Professional
## 3       3       Silver Youth Gals
## 4       4    Diamond Professional
## 5       5 Silver Mid Professional
## 
## $Name_Field
## [1] "Jenis.Kelamin.1"     "Umur"                "Profesi.1"          
## [4] "Tipe.Residen.1"      "NilaiBelanjaSetahun"

CONCLUSION

Based on the results of cluster analysis using the k-means method, five optimum clusters were obtained where this can help the marketing team in automating and deliver messages to the right target.