DQLab is an online data science learning platform with browser-based Live Code Editor to learn and practice. Every module help student to improve analytical skills, so that it can be applied to the real case industry.
This module about “Data Science in Marketing: Customer Segmentation” where to learn about analysis techniques and divide customer data into segments that are useful for businesses in marketing and CRM. In this analysis using the k-means algorithm. Data mentor of this module is Xeratic.
Customer segmentation is the process of dividing customers based on characteristics such as demographics or behaviors, so a company can market to each group effectively and appropriately. The goal of segmenting customers is to deliver marketing messages personally to each segment so that business performance is better and costs incurred are as optimal as possible.
The dataset contains customer data with total 50 data and has seven columns with the following fields explanation :
customer <- read.csv("https://storage.googleapis.com/dqlab-dataset/customer_segments.txt",sep="\t")
customer
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi
## 1 CUST-001 Budi Anggara Pria 58 Wiraswasta
## 2 CUST-002 Shirley Ratuwati Wanita 14 Pelajar
## 3 CUST-003 Agus Cahyono Pria 48 Professional
## 4 CUST-004 Antonius Winarta Pria 53 Professional
## 5 CUST-005 Ibu Sri Wahyuni, IR Wanita 41 Wiraswasta
## 6 CUST-006 Rosalina Kurnia Wanita 24 Professional
## 7 CUST-007 Cahyono, Agus Pria 64 Wiraswasta
## 8 CUST-008 Danang Santosa Pria 52 Professional
## 9 CUST-009 Elisabeth Suryadinata Wanita 29 Professional
## 10 CUST-010 Mario Setiawan Pria 33 Professional
## 11 CUST-011 Maria Suryawan Wanita 50 Professional
## 12 CUST-012 Erliana Widjaja Wanita 49 Professional
## 13 CUST-013 Cahaya Putri Wanita 64 Wiraswasta
## 14 CUST-014 Mario Setiawan Pria 60 Wiraswasta
## 15 CUST-015 Shirley Ratuwati Wanita 20 Wiraswasta
## 16 CUST-016 Bambang Rudi Pria 35 Professional
## 17 CUST-017 Yuni Sari Wanita 32 Ibu Rumah Tangga
## 18 CUST-018 Nelly Halim Wanita 63 Ibu Rumah Tangga
## 19 CUST-019 Mega Pranoto Wanita 32 Wiraswasta
## 20 CUST-020 Irene Novianto Wanita 16 Pelajar
## 21 CUST-021 Lestari Fabianto Wanita 38 Wiraswasta
## 22 CUST-022 Novita Purba Wanita 52 Professional
## 23 CUST-023 Denny Amiruddin Pria 34 Professional
## 24 CUST-024 Putri Ginting Wanita 39 Wiraswasta
## 25 CUST-025 Julia Setiawan Wanita 29 Wiraswasta
## 26 CUST-026 Christine Winarto Wanita 55 Professional
## 27 CUST-027 Grace Mulyati Wanita 35 Wiraswasta
## 28 CUST-028 Adeline Huang Wanita 40 Ibu Rumah Tangga
## 29 CUST-029 Tia Hartanti Wanita 56 Professional
## 30 CUST-030 Rosita Saragih Wanita 46 Ibu Rumah Tangga
## 31 CUST-031 Eviana Handry Wanita 19 Mahasiswa
## 32 CUST-032 Chintya Winarni Wanita 47 Wiraswasta
## 33 CUST-033 Cecilia Kusnadi Wanita 19 Mahasiswa
## 34 CUST-034 Deasy Arisandi Wanita 21 Wiraswasta
## 35 CUST-035 Ida Ayu Wanita 39 Professional
## 36 CUST-036 Ni Made Suasti Wanita 30 Wiraswasta
## 37 CUST-037 Felicia Tandiono Wanita 25 Professional
## 38 CUST-038 Agatha Salim Wanita 46 Wiraswasta
## 39 CUST-039 Gina Hidayat Wanita 20 Professional
## 40 CUST-040 Irene Darmawan Wanita 14 Pelajar
## 41 CUST-041 Shinta Aritonang Wanita 24 Ibu Rumah Tangga
## 42 CUST-042 Yuliana Wati Wanita 26 Wiraswasta
## 43 CUST-043 Yenna Sumadi Wanita 31 Professional
## 44 CUST-044 Anna Wanita 18 Wiraswasta
## 45 CUST-045 Rismawati Juni Wanita 22 Professional
## 46 CUST-046 Elfira Surya Wanita 25 Wiraswasta
## 47 CUST-047 Mira Kurnia Wanita 55 Ibu Rumah Tangga
## 48 CUST-048 Maria Hutagalung Wanita 45 Wiraswasta
## 49 CUST-049 Josephine Wahab Wanita 33 Ibu Rumah Tangga
## 50 CUST-050 Lianna Nugraha Wanita 55 Wiraswasta
## Tipe.Residen NilaiBelanjaSetahun
## 1 Sector 9497927
## 2 Cluster 2722700
## 3 Cluster 5286429
## 4 Cluster 5204498
## 5 Cluster 10615206
## 6 Cluster 5215541
## 7 Sector 9837260
## 8 Cluster 5223569
## 9 Sector 5993218
## 10 Cluster 5257448
## 11 Sector 5987367
## 12 Sector 5941914
## 13 Cluster 9333168
## 14 Cluster 9471615
## 15 Cluster 10365668
## 16 Cluster 5262521
## 17 Cluster 5677762
## 18 Cluster 5340690
## 19 Cluster 10884508
## 20 Sector 2896845
## 21 Cluster 9222070
## 22 Cluster 5298157
## 23 Cluster 5239290
## 24 Cluster 10259572
## 25 Sector 10721998
## 26 Cluster 5269392
## 27 Cluster 9114159
## 28 Cluster 6631680
## 29 Cluster 5271845
## 30 Sector 5020976
## 31 Cluster 3042773
## 32 Sector 10663179
## 33 Cluster 3047926
## 34 Sector 9759822
## 35 Sector 5962575
## 36 Cluster 9678994
## 37 Sector 5972787
## 38 Sector 10477127
## 39 Cluster 5257775
## 40 Sector 2861855
## 41 Cluster 6820976
## 42 Cluster 9880607
## 43 Cluster 5268410
## 44 Cluster 9339737
## 45 Cluster 5211041
## 46 Sector 10099807
## 47 Cluster 6130724
## 48 Sector 10390732
## 49 Sector 4992585
## 50 Sector 10569316
customer_matrix <- data.matrix(customer[c("Jenis.Kelamin","Profesi","Tipe.Residen")])
customer_matrix
## Jenis.Kelamin Profesi Tipe.Residen
## [1,] 1 5 2
## [2,] 2 3 1
## [3,] 1 4 1
## [4,] 1 4 1
## [5,] 2 5 1
## [6,] 2 4 1
## [7,] 1 5 2
## [8,] 1 4 1
## [9,] 2 4 2
## [10,] 1 4 1
## [11,] 2 4 2
## [12,] 2 4 2
## [13,] 2 5 1
## [14,] 1 5 1
## [15,] 2 5 1
## [16,] 1 4 1
## [17,] 2 1 1
## [18,] 2 1 1
## [19,] 2 5 1
## [20,] 2 3 2
## [21,] 2 5 1
## [22,] 2 4 1
## [23,] 1 4 1
## [24,] 2 5 1
## [25,] 2 5 2
## [26,] 2 4 1
## [27,] 2 5 1
## [28,] 2 1 1
## [29,] 2 4 1
## [30,] 2 1 2
## [31,] 2 2 1
## [32,] 2 5 2
## [33,] 2 2 1
## [34,] 2 5 2
## [35,] 2 4 2
## [36,] 2 5 1
## [37,] 2 4 2
## [38,] 2 5 2
## [39,] 2 4 1
## [40,] 2 3 2
## [41,] 2 1 1
## [42,] 2 5 1
## [43,] 2 4 1
## [44,] 2 5 1
## [45,] 2 4 1
## [46,] 2 5 2
## [47,] 2 1 1
## [48,] 2 5 2
## [49,] 2 1 2
## [50,] 2 5 2
customer <- data.frame(customer, customer_matrix)
customer
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi
## 1 CUST-001 Budi Anggara Pria 58 Wiraswasta
## 2 CUST-002 Shirley Ratuwati Wanita 14 Pelajar
## 3 CUST-003 Agus Cahyono Pria 48 Professional
## 4 CUST-004 Antonius Winarta Pria 53 Professional
## 5 CUST-005 Ibu Sri Wahyuni, IR Wanita 41 Wiraswasta
## 6 CUST-006 Rosalina Kurnia Wanita 24 Professional
## 7 CUST-007 Cahyono, Agus Pria 64 Wiraswasta
## 8 CUST-008 Danang Santosa Pria 52 Professional
## 9 CUST-009 Elisabeth Suryadinata Wanita 29 Professional
## 10 CUST-010 Mario Setiawan Pria 33 Professional
## 11 CUST-011 Maria Suryawan Wanita 50 Professional
## 12 CUST-012 Erliana Widjaja Wanita 49 Professional
## 13 CUST-013 Cahaya Putri Wanita 64 Wiraswasta
## 14 CUST-014 Mario Setiawan Pria 60 Wiraswasta
## 15 CUST-015 Shirley Ratuwati Wanita 20 Wiraswasta
## 16 CUST-016 Bambang Rudi Pria 35 Professional
## 17 CUST-017 Yuni Sari Wanita 32 Ibu Rumah Tangga
## 18 CUST-018 Nelly Halim Wanita 63 Ibu Rumah Tangga
## 19 CUST-019 Mega Pranoto Wanita 32 Wiraswasta
## 20 CUST-020 Irene Novianto Wanita 16 Pelajar
## 21 CUST-021 Lestari Fabianto Wanita 38 Wiraswasta
## 22 CUST-022 Novita Purba Wanita 52 Professional
## 23 CUST-023 Denny Amiruddin Pria 34 Professional
## 24 CUST-024 Putri Ginting Wanita 39 Wiraswasta
## 25 CUST-025 Julia Setiawan Wanita 29 Wiraswasta
## 26 CUST-026 Christine Winarto Wanita 55 Professional
## 27 CUST-027 Grace Mulyati Wanita 35 Wiraswasta
## 28 CUST-028 Adeline Huang Wanita 40 Ibu Rumah Tangga
## 29 CUST-029 Tia Hartanti Wanita 56 Professional
## 30 CUST-030 Rosita Saragih Wanita 46 Ibu Rumah Tangga
## 31 CUST-031 Eviana Handry Wanita 19 Mahasiswa
## 32 CUST-032 Chintya Winarni Wanita 47 Wiraswasta
## 33 CUST-033 Cecilia Kusnadi Wanita 19 Mahasiswa
## 34 CUST-034 Deasy Arisandi Wanita 21 Wiraswasta
## 35 CUST-035 Ida Ayu Wanita 39 Professional
## 36 CUST-036 Ni Made Suasti Wanita 30 Wiraswasta
## 37 CUST-037 Felicia Tandiono Wanita 25 Professional
## 38 CUST-038 Agatha Salim Wanita 46 Wiraswasta
## 39 CUST-039 Gina Hidayat Wanita 20 Professional
## 40 CUST-040 Irene Darmawan Wanita 14 Pelajar
## 41 CUST-041 Shinta Aritonang Wanita 24 Ibu Rumah Tangga
## 42 CUST-042 Yuliana Wati Wanita 26 Wiraswasta
## 43 CUST-043 Yenna Sumadi Wanita 31 Professional
## 44 CUST-044 Anna Wanita 18 Wiraswasta
## 45 CUST-045 Rismawati Juni Wanita 22 Professional
## 46 CUST-046 Elfira Surya Wanita 25 Wiraswasta
## 47 CUST-047 Mira Kurnia Wanita 55 Ibu Rumah Tangga
## 48 CUST-048 Maria Hutagalung Wanita 45 Wiraswasta
## 49 CUST-049 Josephine Wahab Wanita 33 Ibu Rumah Tangga
## 50 CUST-050 Lianna Nugraha Wanita 55 Wiraswasta
## Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 1 Sector 9497927 1 5 2
## 2 Cluster 2722700 2 3 1
## 3 Cluster 5286429 1 4 1
## 4 Cluster 5204498 1 4 1
## 5 Cluster 10615206 2 5 1
## 6 Cluster 5215541 2 4 1
## 7 Sector 9837260 1 5 2
## 8 Cluster 5223569 1 4 1
## 9 Sector 5993218 2 4 2
## 10 Cluster 5257448 1 4 1
## 11 Sector 5987367 2 4 2
## 12 Sector 5941914 2 4 2
## 13 Cluster 9333168 2 5 1
## 14 Cluster 9471615 1 5 1
## 15 Cluster 10365668 2 5 1
## 16 Cluster 5262521 1 4 1
## 17 Cluster 5677762 2 1 1
## 18 Cluster 5340690 2 1 1
## 19 Cluster 10884508 2 5 1
## 20 Sector 2896845 2 3 2
## 21 Cluster 9222070 2 5 1
## 22 Cluster 5298157 2 4 1
## 23 Cluster 5239290 1 4 1
## 24 Cluster 10259572 2 5 1
## 25 Sector 10721998 2 5 2
## 26 Cluster 5269392 2 4 1
## 27 Cluster 9114159 2 5 1
## 28 Cluster 6631680 2 1 1
## 29 Cluster 5271845 2 4 1
## 30 Sector 5020976 2 1 2
## 31 Cluster 3042773 2 2 1
## 32 Sector 10663179 2 5 2
## 33 Cluster 3047926 2 2 1
## 34 Sector 9759822 2 5 2
## 35 Sector 5962575 2 4 2
## 36 Cluster 9678994 2 5 1
## 37 Sector 5972787 2 4 2
## 38 Sector 10477127 2 5 2
## 39 Cluster 5257775 2 4 1
## 40 Sector 2861855 2 3 2
## 41 Cluster 6820976 2 1 1
## 42 Cluster 9880607 2 5 1
## 43 Cluster 5268410 2 4 1
## 44 Cluster 9339737 2 5 1
## 45 Cluster 5211041 2 4 1
## 46 Sector 10099807 2 5 2
## 47 Cluster 6130724 2 1 1
## 48 Sector 10390732 2 5 2
## 49 Sector 4992585 2 1 2
## 50 Sector 10569316 2 5 2
We consider that “NilaiBelanjaSetahun” column contains million of data. When this column is used for clustering, the calculation of the sum of squared errors (in the kmeans chapter) will be very large.
We will normalize the value to make the calculation simple and easier to digest, but not reduce accuracy. Normalization can be done in many ways. In this case, it is enough to divided “NilaiBelanjaSetahun” with 1000000
customer$NilaiBelanjaSetahun <- customer$NilaiBelanjaSetahun / 1000000
customer$NilaiBelanjaSetahun
## [1] 9.497927 2.722700 5.286429 5.204498 10.615206 5.215541 9.837260
## [8] 5.223569 5.993218 5.257448 5.987367 5.941914 9.333168 9.471615
## [15] 10.365668 5.262521 5.677762 5.340690 10.884508 2.896845 9.222070
## [22] 5.298157 5.239290 10.259572 10.721998 5.269392 9.114159 6.631680
## [29] 5.271845 5.020976 3.042773 10.663179 3.047926 9.759822 5.962575
## [36] 9.678994 5.972787 10.477127 5.257775 2.861855 6.820976 9.880607
## [43] 5.268410 9.339737 5.211041 10.099807 6.130724 10.390732 4.992585
## [50] 10.569316
data_master <- customer[c("Jenis.Kelamin","Jenis.Kelamin.1","Profesi","Profesi.1","Tipe.Residen","Tipe.Residen.1")]
data_master
## Jenis.Kelamin Jenis.Kelamin.1 Profesi Profesi.1 Tipe.Residen
## 1 Pria 1 Wiraswasta 5 Sector
## 2 Wanita 2 Pelajar 3 Cluster
## 3 Pria 1 Professional 4 Cluster
## 4 Pria 1 Professional 4 Cluster
## 5 Wanita 2 Wiraswasta 5 Cluster
## 6 Wanita 2 Professional 4 Cluster
## 7 Pria 1 Wiraswasta 5 Sector
## 8 Pria 1 Professional 4 Cluster
## 9 Wanita 2 Professional 4 Sector
## 10 Pria 1 Professional 4 Cluster
## 11 Wanita 2 Professional 4 Sector
## 12 Wanita 2 Professional 4 Sector
## 13 Wanita 2 Wiraswasta 5 Cluster
## 14 Pria 1 Wiraswasta 5 Cluster
## 15 Wanita 2 Wiraswasta 5 Cluster
## 16 Pria 1 Professional 4 Cluster
## 17 Wanita 2 Ibu Rumah Tangga 1 Cluster
## 18 Wanita 2 Ibu Rumah Tangga 1 Cluster
## 19 Wanita 2 Wiraswasta 5 Cluster
## 20 Wanita 2 Pelajar 3 Sector
## 21 Wanita 2 Wiraswasta 5 Cluster
## 22 Wanita 2 Professional 4 Cluster
## 23 Pria 1 Professional 4 Cluster
## 24 Wanita 2 Wiraswasta 5 Cluster
## 25 Wanita 2 Wiraswasta 5 Sector
## 26 Wanita 2 Professional 4 Cluster
## 27 Wanita 2 Wiraswasta 5 Cluster
## 28 Wanita 2 Ibu Rumah Tangga 1 Cluster
## 29 Wanita 2 Professional 4 Cluster
## 30 Wanita 2 Ibu Rumah Tangga 1 Sector
## 31 Wanita 2 Mahasiswa 2 Cluster
## 32 Wanita 2 Wiraswasta 5 Sector
## 33 Wanita 2 Mahasiswa 2 Cluster
## 34 Wanita 2 Wiraswasta 5 Sector
## 35 Wanita 2 Professional 4 Sector
## 36 Wanita 2 Wiraswasta 5 Cluster
## 37 Wanita 2 Professional 4 Sector
## 38 Wanita 2 Wiraswasta 5 Sector
## 39 Wanita 2 Professional 4 Cluster
## 40 Wanita 2 Pelajar 3 Sector
## 41 Wanita 2 Ibu Rumah Tangga 1 Cluster
## 42 Wanita 2 Wiraswasta 5 Cluster
## 43 Wanita 2 Professional 4 Cluster
## 44 Wanita 2 Wiraswasta 5 Cluster
## 45 Wanita 2 Professional 4 Cluster
## 46 Wanita 2 Wiraswasta 5 Sector
## 47 Wanita 2 Ibu Rumah Tangga 1 Cluster
## 48 Wanita 2 Wiraswasta 5 Sector
## 49 Wanita 2 Ibu Rumah Tangga 1 Sector
## 50 Wanita 2 Wiraswasta 5 Sector
## Tipe.Residen.1
## 1 2
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
## 7 2
## 8 1
## 9 2
## 10 1
## 11 2
## 12 2
## 13 1
## 14 1
## 15 1
## 16 1
## 17 1
## 18 1
## 19 1
## 20 2
## 21 1
## 22 1
## 23 1
## 24 1
## 25 2
## 26 1
## 27 1
## 28 1
## 29 1
## 30 2
## 31 1
## 32 2
## 33 1
## 34 2
## 35 2
## 36 1
## 37 2
## 38 2
## 39 1
## 40 2
## 41 1
## 42 1
## 43 1
## 44 1
## 45 1
## 46 2
## 47 1
## 48 2
## 49 2
## 50 2
Based on data_master, the name of category from each column :
Also, we can summarize with unique function
Profesi <- unique(customer[c("Profesi","Profesi.1")])
Profesi
## Profesi Profesi.1
## 1 Wiraswasta 5
## 2 Pelajar 3
## 3 Professional 4
## 17 Ibu Rumah Tangga 1
## 31 Mahasiswa 2
Jenis.Kelamin <- unique(customer[c("Jenis.Kelamin","Jenis.Kelamin.1")])
Jenis.Kelamin
## Jenis.Kelamin Jenis.Kelamin.1
## 1 Pria 1
## 2 Wanita 2
Tipe.Residen <- unique(customer[c("Tipe.Residen","Tipe.Residen.1")])
Tipe.Residen
## Tipe.Residen Tipe.Residen.1
## 1 Sector 2
## 2 Cluster 1
name_field <- c("Jenis.Kelamin.1","Umur","Profesi.1","Tipe.Residen.1", "NilaiBelanjaSetahun")
set.seed(100)
segmentation <- kmeans(x=customer[name_field], centers=5, nstart=25)
segmentation
## K-means clustering with 5 clusters of sizes 5, 12, 14, 9, 10
##
## Cluster means:
## Jenis.Kelamin.1 Umur Profesi.1 Tipe.Residen.1 NilaiBelanjaSetahun
## 1 1.40 61.80000 4.200000 1.400000 8.696132
## 2 1.75 31.58333 3.916667 1.250000 7.330958
## 3 2.00 20.07143 3.571429 1.357143 5.901089
## 4 2.00 42.33333 4.000000 1.555556 8.804791
## 5 1.70 52.50000 3.800000 1.300000 6.018321
##
## Clustering vector:
## [1] 1 3 5 5 4 3 1 5 2 2 5 5 1 1 3 2 2 1 2 3 4 5 2 4 2 5 2 4 5 4 3 4 3 3 4 2 3 4
## [39] 3 3 3 2 2 3 3 3 5 4 2 5
##
## Within cluster sum of squares by cluster:
## [1] 58.21123 174.85164 316.73367 171.67372 108.49735
## (between_SS / total_SS = 92.4 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
Based on kmeans clustering with 5 clusters it was found that cluster 1 has 14 data, cluster 2 has 5 data, cluster 3 has 12 data, cluster 4 has 9 data and cluster 5 has 10 data. Percentage ratio between between_SS per total_SS to 5 clusters is good because the number of percentage is high equal to 92,4%.
Add segmentation results to customer data
customer$cluster <- segmentation$cluster
If we want to viewing data on the each cluster, the code as follows:
customer[which(customer$cluster == 1),] #data from cluster 1
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi Tipe.Residen
## 1 CUST-001 Budi Anggara Pria 58 Wiraswasta Sector
## 7 CUST-007 Cahyono, Agus Pria 64 Wiraswasta Sector
## 13 CUST-013 Cahaya Putri Wanita 64 Wiraswasta Cluster
## 14 CUST-014 Mario Setiawan Pria 60 Wiraswasta Cluster
## 18 CUST-018 Nelly Halim Wanita 63 Ibu Rumah Tangga Cluster
## NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1 cluster
## 1 9.497927 1 5 2 1
## 7 9.837260 1 5 2 1
## 13 9.333168 2 5 1 1
## 14 9.471615 1 5 1 1
## 18 5.340690 2 1 1 1
customer[which(customer$cluster == 2),] #data from cluster 2
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi
## 9 CUST-009 Elisabeth Suryadinata Wanita 29 Professional
## 10 CUST-010 Mario Setiawan Pria 33 Professional
## 16 CUST-016 Bambang Rudi Pria 35 Professional
## 17 CUST-017 Yuni Sari Wanita 32 Ibu Rumah Tangga
## 19 CUST-019 Mega Pranoto Wanita 32 Wiraswasta
## 23 CUST-023 Denny Amiruddin Pria 34 Professional
## 25 CUST-025 Julia Setiawan Wanita 29 Wiraswasta
## 27 CUST-027 Grace Mulyati Wanita 35 Wiraswasta
## 36 CUST-036 Ni Made Suasti Wanita 30 Wiraswasta
## 42 CUST-042 Yuliana Wati Wanita 26 Wiraswasta
## 43 CUST-043 Yenna Sumadi Wanita 31 Professional
## 49 CUST-049 Josephine Wahab Wanita 33 Ibu Rumah Tangga
## Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 9 Sector 5.993218 2 4 2
## 10 Cluster 5.257448 1 4 1
## 16 Cluster 5.262521 1 4 1
## 17 Cluster 5.677762 2 1 1
## 19 Cluster 10.884508 2 5 1
## 23 Cluster 5.239290 1 4 1
## 25 Sector 10.721998 2 5 2
## 27 Cluster 9.114159 2 5 1
## 36 Cluster 9.678994 2 5 1
## 42 Cluster 9.880607 2 5 1
## 43 Cluster 5.268410 2 4 1
## 49 Sector 4.992585 2 1 2
## cluster
## 9 2
## 10 2
## 16 2
## 17 2
## 19 2
## 23 2
## 25 2
## 27 2
## 36 2
## 42 2
## 43 2
## 49 2
customer[which(customer$cluster == 3),] #data from cluster 3
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi
## 2 CUST-002 Shirley Ratuwati Wanita 14 Pelajar
## 6 CUST-006 Rosalina Kurnia Wanita 24 Professional
## 15 CUST-015 Shirley Ratuwati Wanita 20 Wiraswasta
## 20 CUST-020 Irene Novianto Wanita 16 Pelajar
## 31 CUST-031 Eviana Handry Wanita 19 Mahasiswa
## 33 CUST-033 Cecilia Kusnadi Wanita 19 Mahasiswa
## 34 CUST-034 Deasy Arisandi Wanita 21 Wiraswasta
## 37 CUST-037 Felicia Tandiono Wanita 25 Professional
## 39 CUST-039 Gina Hidayat Wanita 20 Professional
## 40 CUST-040 Irene Darmawan Wanita 14 Pelajar
## 41 CUST-041 Shinta Aritonang Wanita 24 Ibu Rumah Tangga
## 44 CUST-044 Anna Wanita 18 Wiraswasta
## 45 CUST-045 Rismawati Juni Wanita 22 Professional
## 46 CUST-046 Elfira Surya Wanita 25 Wiraswasta
## Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 2 Cluster 2.722700 2 3 1
## 6 Cluster 5.215541 2 4 1
## 15 Cluster 10.365668 2 5 1
## 20 Sector 2.896845 2 3 2
## 31 Cluster 3.042773 2 2 1
## 33 Cluster 3.047926 2 2 1
## 34 Sector 9.759822 2 5 2
## 37 Sector 5.972787 2 4 2
## 39 Cluster 5.257775 2 4 1
## 40 Sector 2.861855 2 3 2
## 41 Cluster 6.820976 2 1 1
## 44 Cluster 9.339737 2 5 1
## 45 Cluster 5.211041 2 4 1
## 46 Sector 10.099807 2 5 2
## cluster
## 2 3
## 6 3
## 15 3
## 20 3
## 31 3
## 33 3
## 34 3
## 37 3
## 39 3
## 40 3
## 41 3
## 44 3
## 45 3
## 46 3
customer[which(customer$cluster == 4),] #data from cluster 4
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi
## 5 CUST-005 Ibu Sri Wahyuni, IR Wanita 41 Wiraswasta
## 21 CUST-021 Lestari Fabianto Wanita 38 Wiraswasta
## 24 CUST-024 Putri Ginting Wanita 39 Wiraswasta
## 28 CUST-028 Adeline Huang Wanita 40 Ibu Rumah Tangga
## 30 CUST-030 Rosita Saragih Wanita 46 Ibu Rumah Tangga
## 32 CUST-032 Chintya Winarni Wanita 47 Wiraswasta
## 35 CUST-035 Ida Ayu Wanita 39 Professional
## 38 CUST-038 Agatha Salim Wanita 46 Wiraswasta
## 48 CUST-048 Maria Hutagalung Wanita 45 Wiraswasta
## Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 5 Cluster 10.615206 2 5 1
## 21 Cluster 9.222070 2 5 1
## 24 Cluster 10.259572 2 5 1
## 28 Cluster 6.631680 2 1 1
## 30 Sector 5.020976 2 1 2
## 32 Sector 10.663179 2 5 2
## 35 Sector 5.962575 2 4 2
## 38 Sector 10.477127 2 5 2
## 48 Sector 10.390732 2 5 2
## cluster
## 5 4
## 21 4
## 24 4
## 28 4
## 30 4
## 32 4
## 35 4
## 38 4
## 48 4
customer[which(customer$cluster == 5),] #data from cluster 5
## Customer_ID Nama.Pelanggan Jenis.Kelamin Umur Profesi
## 3 CUST-003 Agus Cahyono Pria 48 Professional
## 4 CUST-004 Antonius Winarta Pria 53 Professional
## 8 CUST-008 Danang Santosa Pria 52 Professional
## 11 CUST-011 Maria Suryawan Wanita 50 Professional
## 12 CUST-012 Erliana Widjaja Wanita 49 Professional
## 22 CUST-022 Novita Purba Wanita 52 Professional
## 26 CUST-026 Christine Winarto Wanita 55 Professional
## 29 CUST-029 Tia Hartanti Wanita 56 Professional
## 47 CUST-047 Mira Kurnia Wanita 55 Ibu Rumah Tangga
## 50 CUST-050 Lianna Nugraha Wanita 55 Wiraswasta
## Tipe.Residen NilaiBelanjaSetahun Jenis.Kelamin.1 Profesi.1 Tipe.Residen.1
## 3 Cluster 5.286429 1 4 1
## 4 Cluster 5.204498 1 4 1
## 8 Cluster 5.223569 1 4 1
## 11 Sector 5.987367 2 4 2
## 12 Sector 5.941914 2 4 2
## 22 Cluster 5.298157 2 4 1
## 26 Cluster 5.269392 2 4 1
## 29 Cluster 5.271845 2 4 1
## 47 Cluster 6.130724 2 1 1
## 50 Sector 10.569316 2 5 2
## cluster
## 3 5
## 4 5
## 8 5
## 11 5
## 12 5
## 22 5
## 26 5
## 29 5
## 47 5
## 50 5
sse <- sapply(1:10, function(param_k) {kmeans(x=customer[name_field], param_k, nstart=25)$tot.withinss})
sse
## [1] 10990.9740 3016.5612 1550.8725 1064.4187 829.9676 625.1462
## [7] 508.1568 431.6977 374.1095 317.9424
library(ggplot2)
jumlah_cluster_max <- 10
ssdata = data.frame(cluster=c(1:jumlah_cluster_max),sse)
ggplot(ssdata, aes(x=cluster,y=sse)) +
geom_line(color="red") + geom_point() +
ylab("Within Cluster Sum of Squares") + xlab("Jumlah Cluster") +
geom_text(aes(label=format(round(sse, 2), nsmall = 2)),hjust=-0.2, vjust=-0.5) +
scale_x_discrete(limits=c(1:jumlah_cluster_max))
## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?
The far left point is the number of SS for 1 number of clusters, the second point is for 2 number of clusters, and so on. Notice that the further to the right the difference in distance between each point is getting smaller. This line graph has a shape like an elbow, and for the optimal number of clusters we usually take the point of the elbow. In the example above we can take 4 or 5.
The name of each cluster :
Make one variable data frame named Customer.Segment
Customer.Segment <- data.frame(Cluster = c(1,2,3,4,5),
Name.Segment = c("Diamond Senior",
"Gold Young Professional",
"Silver Youth Gals",
"Diamond Professional",
"Silver Mid Professional"))
Customer.Segment
## Cluster Name.Segment
## 1 1 Diamond Senior
## 2 2 Gold Young Professional
## 3 3 Silver Youth Gals
## 4 4 Diamond Professional
## 5 5 Silver Mid Professional
After that, we make Cluster.Identity variable
Cluster.Identity <- list( Profession=Profesi,
Gender=Jenis.Kelamin,
Resident_Type=Tipe.Residen,
Segmentation=segmentation,
Customer_Segmen=Customer.Segment,
Name_Field=name_field)
print(Cluster.Identity)
## $Profession
## Profesi Profesi.1
## 1 Wiraswasta 5
## 2 Pelajar 3
## 3 Professional 4
## 17 Ibu Rumah Tangga 1
## 31 Mahasiswa 2
##
## $Gender
## Jenis.Kelamin Jenis.Kelamin.1
## 1 Pria 1
## 2 Wanita 2
##
## $Resident_Type
## Tipe.Residen Tipe.Residen.1
## 1 Sector 2
## 2 Cluster 1
##
## $Segmentation
## K-means clustering with 5 clusters of sizes 5, 12, 14, 9, 10
##
## Cluster means:
## Jenis.Kelamin.1 Umur Profesi.1 Tipe.Residen.1 NilaiBelanjaSetahun
## 1 1.40 61.80000 4.200000 1.400000 8.696132
## 2 1.75 31.58333 3.916667 1.250000 7.330958
## 3 2.00 20.07143 3.571429 1.357143 5.901089
## 4 2.00 42.33333 4.000000 1.555556 8.804791
## 5 1.70 52.50000 3.800000 1.300000 6.018321
##
## Clustering vector:
## [1] 1 3 5 5 4 3 1 5 2 2 5 5 1 1 3 2 2 1 2 3 4 5 2 4 2 5 2 4 5 4 3 4 3 3 4 2 3 4
## [39] 3 3 3 2 2 3 3 3 5 4 2 5
##
## Within cluster sum of squares by cluster:
## [1] 58.21123 174.85164 316.73367 171.67372 108.49735
## (between_SS / total_SS = 92.4 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
##
## $Customer_Segmen
## Cluster Name.Segment
## 1 1 Diamond Senior
## 2 2 Gold Young Professional
## 3 3 Silver Youth Gals
## 4 4 Diamond Professional
## 5 5 Silver Mid Professional
##
## $Name_Field
## [1] "Jenis.Kelamin.1" "Umur" "Profesi.1"
## [4] "Tipe.Residen.1" "NilaiBelanjaSetahun"
Based on the results of cluster analysis using the k-means method, five optimum clusters were obtained where this can help the marketing team in automating and deliver messages to the right target.