Analisis Segmentasi Pelanggan Menggunakan K-Means Clustering

🔰 Pendahuluan

Segmentasi pelanggan penting untuk mengidentifikasi kelompok konsumen berdasarkan perilaku belanja, sehingga perusahaan dapat menyusun strategi pemasaran yang lebih tepat sasaran.
Pada analisis ini digunakan algoritma K-Means Clustering dengan variabel:

Income (pendapatan)
Recency (jarak pembelian terakhir)
MntWines (pengeluaran untuk wine)
NumWebPurchases (jumlah pembelian online)

📌 Import Data & Library

library(readxl)
library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.5.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(cluster)
library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

library(ggplot2)
library(gridExtra)

## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

# Import Data
data = read_excel('/Users/zaula/Downloads/marketing_campaign.csv.xlsx')
head(data)

## # A tibble: 6 × 29
##      ID Year_Birth Education  Marital_Status Income Kidhome Teenhome
##   <dbl>      <dbl> <chr>      <chr>           <dbl>   <dbl>    <dbl>
## 1  5524       1957 Graduation Single          58138       0        0
## 2  2174       1954 Graduation Single          46344       1        1
## 3  4141       1965 Graduation Together        71613       0        0
## 4  6182       1984 Graduation Together        26646       1        0
## 5  5324       1981 PhD        Married         58293       1        0
## 6  7446       1967 Master     Together        62513       0        1
## # ℹ 22 more variables: Dt_Customer <dttm>, Recency <dbl>, MntWines <dbl>,
## #   MntFruits <dbl>, MntMeatProducts <dbl>, MntFishProducts <dbl>,
## #   MntSweetProducts <dbl>, MntGoldProds <dbl>, NumDealsPurchases <dbl>,
## #   NumWebPurchases <dbl>, NumCatalogPurchases <dbl>, NumStorePurchases <dbl>,
## #   NumWebVisitsMonth <dbl>, AcceptedCmp3 <dbl>, AcceptedCmp4 <dbl>,
## #   AcceptedCmp5 <dbl>, AcceptedCmp1 <dbl>, AcceptedCmp2 <dbl>, Complain <dbl>,
## #   Z_CostContact <dbl>, Z_Revenue <dbl>, Response <dbl>

📌 Pemilihan Variabel & Standardisasi

customer_data <- data[, c("Income", "Recency", "MntWines", "NumWebPurchases")]
customer_data <- na.omit(customer_data)
scaled_data <- scale(customer_data)
head(scaled_data)

##          Income    Recency   MntWines NumWebPurchases
## [1,]  0.2340099  0.3104621  0.9780050       1.4282310
## [2,] -0.2345065 -0.3804236 -0.8718271      -1.1256271
## [3,]  0.7693040 -0.7949549  0.3584298       1.4282310
## [4,] -1.0170092 -0.7949549 -0.8718271      -0.7607902
## [5,]  0.2401673  1.5540562 -0.3915822       0.3337204
## [6,]  0.4078067 -1.1403978  0.6370904       0.6985572

📌 Interpretasi:
Standarisasi dilakukan agar seluruh variabel memiliki satuan skala yang sama, sehingga tidak ada variabel yang mendominasi proses clustering.

📌 Menentukan Jumlah Klaster Optimal (Elbow Method)

set.seed(123)
elbow_plot <- fviz_nbclust(scaled_data, kmeans, method = "wss") +
  ggtitle("Elbow Method - Jumlah Klaster Optimal") +
  theme_minimal()
elbow_plot

📌 Interpretasi:
Bentuk grafik elbow menunjukkan k = 3 sebagai jumlah klaster optimal, karena setelah titik ke-3 grafik menurun lebih landai.

📌 K-Means Clustering

k <- 3
kmeans_result <- kmeans(scaled_data, centers = k, nstart = 25)
customer_data$Cluster <- as.factor(kmeans_result$cluster)
kmeans_result

## K-means clustering with 3 clusters of sizes 897, 659, 660
## 
## Cluster means:
##       Income    Recency   MntWines NumWebPurchases
## 1  0.7520352  0.0390372  0.9633446       0.7953590
## 2 -0.5267625  0.8711907 -0.6732837      -0.5592718
## 3 -0.4961198 -0.9229258 -0.6370092      -0.5225407
## 
## Clustering vector:
##    [1] 1 3 1 3 2 1 1 3 3 2 2 2 1 3 1 2 3 1 2 3 3 1 1 2 2 3 3 1 3 2 3 3 3 2 1 2 2
##   [38] 1 1 2 3 3 2 3 2 1 3 1 2 1 1 1 1 3 1 1 1 1 1 1 2 3 1 3 1 3 1 1 2 2 1 1 3 1
##   [75] 2 3 2 2 3 2 3 3 1 2 2 2 3 1 3 1 3 3 3 1 1 1 2 3 1 3 1 1 1 1 1 3 3 1 2 2 3
##  [112] 1 3 2 2 1 2 1 2 1 3 1 1 2 1 3 2 3 2 1 1 1 1 3 1 3 2 2 2 2 1 3 3 1 1 3 2 3
##  [149] 1 3 1 2 1 1 2 3 2 3 2 2 2 3 3 2 1 1 2 2 1 2 2 1 3 2 3 2 1 1 3 2 1 2 2 3 3
##  [186] 1 1 1 2 1 1 1 1 2 3 2 2 2 1 3 1 2 1 2 2 3 1 3 1 2 1 1 3 1 3 1 1 1 2 3 1 2
##  [223] 3 1 2 3 1 2 3 1 1 2 1 1 3 1 1 1 1 2 2 1 2 1 3 1 2 2 3 2 1 2 3 3 3 1 2 3 3
##  [260] 1 3 3 2 3 1 1 3 3 1 3 1 3 2 3 2 1 2 1 2 3 2 1 2 3 1 3 3 3 1 2 1 2 3 2 1 2
##  [297] 1 1 3 2 3 1 3 2 2 3 3 3 2 3 1 1 1 3 3 3 3 2 3 1 3 2 3 1 3 1 1 1 3 1 2 3 2
##  [334] 2 3 3 2 1 1 1 1 1 3 2 3 1 2 1 1 2 3 2 1 1 2 1 1 2 2 2 1 3 2 3 3 1 2 3 2 3
##  [371] 2 3 3 3 1 2 1 1 3 1 2 1 1 3 2 2 2 2 1 2 3 1 2 3 3 3 1 2 1 1 2 1 1 3 1 1 1
##  [408] 3 3 2 1 1 3 1 1 2 3 1 1 1 1 3 2 1 1 2 2 2 2 3 3 2 2 2 1 3 2 1 1 3 2 1 2 1
##  [445] 2 2 1 3 2 3 1 3 1 1 3 1 1 1 3 1 2 2 1 2 1 1 1 2 2 3 3 1 1 1 1 3 2 1 2 1 1
##  [482] 1 2 3 1 1 3 2 3 3 1 2 1 2 1 2 1 2 3 2 1 2 1 3 2 3 1 2 1 3 1 3 2 1 2 1 2 2
##  [519] 1 1 2 2 2 3 1 2 2 3 3 3 1 2 2 1 2 3 3 2 3 1 2 3 2 1 1 3 1 3 1 1 3 3 3 1 2
##  [556] 3 3 3 2 3 3 2 1 3 2 3 2 2 2 2 2 2 3 1 2 1 2 3 1 2 3 3 3 2 3 3 2 3 1 2 1 2
##  [593] 3 2 2 3 1 2 2 3 3 1 2 2 1 3 3 1 2 1 2 1 2 1 1 2 2 1 1 3 3 1 3 1 3 1 1 1 1
##  [630] 1 3 1 2 1 3 1 1 1 3 1 2 3 1 2 2 1 2 1 2 1 3 2 2 3 3 3 2 1 2 1 1 1 3 1 2 3
##  [667] 3 1 1 3 2 2 2 1 1 1 3 2 1 2 3 3 2 2 3 1 1 1 1 1 1 3 1 3 1 1 3 3 1 3 1 2 1
##  [704] 1 3 1 3 1 1 3 1 3 2 1 1 1 1 3 1 1 2 1 1 1 3 3 1 1 3 2 2 1 1 3 3 3 1 1 2 1
##  [741] 1 1 1 1 1 3 2 3 1 1 3 2 2 1 1 2 1 3 1 3 3 2 2 2 1 2 1 1 3 3 3 3 3 2 3 1 3
##  [778] 1 1 2 3 2 3 1 1 1 3 2 2 2 1 1 1 2 3 1 1 3 3 3 1 1 1 2 2 1 3 1 3 1 2 1 1 1
##  [815] 2 1 3 2 2 1 3 2 1 3 1 3 1 3 2 3 2 3 1 1 1 2 2 3 1 1 3 3 1 3 1 2 1 2 3 2 2
##  [852] 2 3 3 2 2 2 1 1 2 3 1 1 1 3 1 2 3 3 2 2 1 1 2 3 1 1 2 2 2 2 1 1 1 1 2 3 1
##  [889] 3 2 3 3 1 1 3 2 3 1 1 1 3 1 1 1 1 3 1 3 2 2 2 1 2 1 1 1 1 1 2 1 2 2 2 1 1
##  [926] 1 1 1 1 2 1 2 1 2 2 2 1 2 3 3 2 2 2 2 2 3 1 1 1 3 2 3 2 1 3 2 1 1 3 3 1 1
##  [963] 3 1 1 2 1 2 3 2 1 1 1 1 1 1 3 1 3 1 2 2 2 1 2 1 3 1 1 1 3 2 3 1 1 3 2 1 2
## [1000] 2 3 3 2 1 2 3 3 2 2 1 2 2 3 2 2 3 1 1 1 3 2 1 2 3 3 3 2 1 2 2 2 3 3 2 1 2
## [1037] 1 1 3 1 3 2 2 2 3 1 2 1 1 3 3 3 3 1 1 2 2 2 1 1 2 3 1 1 3 3 2 2 3 1 2 1 1
## [1074] 3 1 2 3 1 2 3 3 3 2 1 2 3 1 1 3 2 2 3 1 3 3 1 2 1 1 2 1 2 1 3 3 3 1 1 3 2
## [1111] 3 3 2 1 2 2 1 1 2 3 1 1 2 2 1 3 2 1 2 2 2 1 2 3 1 1 3 1 1 2 1 3 2 3 1 1 1
## [1148] 2 1 1 1 3 1 3 3 1 1 2 2 1 1 2 2 3 3 2 1 1 2 1 2 3 2 2 1 3 3 1 1 2 2 2 1 2
## [1185] 1 1 3 3 1 3 2 2 1 1 3 2 2 2 2 1 1 2 3 3 2 1 2 3 2 1 3 2 1 1 3 2 2 2 2 3 2
## [1222] 3 1 2 1 2 3 3 2 1 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 3 1 3 1 1 3 3 1 1 2 3 1 1
## [1259] 1 3 1 2 1 3 2 1 3 1 1 2 3 2 3 1 2 1 1 3 2 3 2 3 3 2 1 1 3 1 1 2 3 1 1 1 1
## [1296] 1 1 1 1 2 3 2 3 2 2 2 2 1 3 1 3 3 1 2 2 3 1 2 3 1 1 1 2 3 3 3 3 2 3 1 3 2
## [1333] 3 3 3 1 1 1 1 3 3 1 1 2 1 1 3 1 3 3 1 3 1 1 2 3 3 2 2 2 2 1 3 3 3 3 2 2 3
## [1370] 1 3 3 1 3 2 3 3 3 1 2 2 1 3 2 1 3 2 1 2 2 1 3 3 1 1 2 1 2 2 3 3 3 3 3 3 1
## [1407] 2 2 3 2 2 2 2 2 2 1 2 3 2 2 3 2 3 2 3 2 1 1 2 1 1 1 1 3 3 1 2 3 3 2 2 1 2
## [1444] 1 3 3 3 3 3 1 1 3 1 2 2 2 1 1 3 1 3 2 1 1 1 3 2 1 1 1 1 1 2 1 3 1 1 2 2 3
## [1481] 1 1 1 2 2 1 1 1 1 1 1 3 1 1 2 1 1 3 2 1 3 2 3 1 3 2 3 2 1 1 2 1 2 1 2 1 2
## [1518] 2 2 2 3 1 1 3 1 2 2 1 3 3 2 1 3 2 3 1 2 1 2 2 2 2 1 3 1 3 3 3 3 1 1 1 1 1
## [1555] 2 1 3 1 3 3 1 2 1 3 1 1 2 3 3 2 3 3 1 3 2 3 2 2 3 2 2 1 3 1 1 1 2 2 2 1 1
## [1592] 1 3 1 3 1 2 3 3 3 3 3 1 2 3 1 1 2 3 3 3 3 2 1 2 3 3 2 1 1 3 3 3 2 1 1 2 1
## [1629] 3 1 1 1 3 2 1 3 1 2 2 3 1 1 1 1 2 1 3 3 2 1 2 1 1 2 1 1 1 1 3 3 3 3 3 1 2
## [1666] 3 2 2 2 1 1 1 1 1 1 3 2 2 2 2 1 3 1 1 2 2 1 3 3 1 2 1 2 1 1 2 1 3 2 1 3 2
## [1703] 1 2 1 1 1 3 3 3 2 1 1 3 2 2 1 1 1 1 1 1 3 2 3 3 1 1 1 2 1 1 1 1 3 3 1 3 3
## [1740] 2 3 2 1 1 3 3 1 2 1 3 3 2 3 3 2 2 1 3 3 2 2 3 3 2 2 1 1 3 3 3 2 1 1 3 3 1
## [1777] 2 2 2 3 2 1 1 1 1 2 3 2 1 1 2 1 1 2 1 1 2 1 2 1 1 2 3 2 3 3 1 1 2 2 3 1 3
## [1814] 3 3 1 3 3 3 3 2 1 1 2 2 2 2 1 1 3 3 1 2 1 2 1 3 1 1 2 2 1 1 1 3 2 1 1 1 2
## [1851] 2 1 1 2 1 2 1 2 3 3 1 1 1 1 3 2 1 2 1 3 1 1 2 1 1 2 1 2 3 2 1 1 2 3 3 2 2
## [1888] 3 1 1 2 3 2 1 1 1 1 1 3 3 2 3 2 3 1 1 1 1 3 1 1 1 1 2 2 3 3 1 3 3 1 3 2 1
## [1925] 2 2 1 2 2 3 1 2 3 2 3 1 1 3 1 2 2 1 3 1 1 3 3 3 1 1 1 1 1 1 3 3 3 2 1 3 3
## [1962] 3 2 2 3 1 3 1 3 3 3 3 3 1 2 1 1 2 1 1 3 3 2 2 3 3 3 3 2 1 2 2 1 1 3 1 1 1
## [1999] 3 3 2 3 2 3 2 2 1 1 2 3 3 1 2 1 1 2 1 2 3 1 3 2 1 2 1 3 2 2 2 2 1 1 1 3 1
## [2036] 1 3 2 2 1 1 1 2 1 2 1 1 3 1 2 2 1 1 2 3 1 3 3 2 3 1 1 1 1 1 3 3 3 2 3 1 1
## [2073] 2 1 3 1 1 3 1 3 1 3 3 3 3 3 1 1 1 2 3 2 1 3 1 1 2 3 2 3 2 2 1 1 1 1 2 2 1
## [2110] 1 2 3 1 3 3 2 3 3 3 2 3 1 1 3 3 3 3 3 1 2 3 3 3 2 3 1 2 1 1 3 1 3 2 1 2 1
## [2147] 2 1 1 1 1 1 1 1 3 1 2 2 1 2 3 3 1 1 1 3 3 1 2 3 1 1 2 3 1 3 2 2 1 3 1 3 2
## [2184] 1 2 2 3 1 1 3 1 3 2 2 1 3 2 2 1 3 3 1 1 1 3 3 2 1 2 1 2 1 1 1 1 3
## 
## Within cluster sum of squares by cluster:
## [1] 3174.2102  689.2913  727.4047
##  (between_SS / total_SS =  48.2 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

📌 Interpretasi:
Algoritma berhasil mengelompokkan pelanggan ke dalam 3 klaster berbeda berdasarkan kesamaan pola perilaku pembelian.

📌 Visualisasi Klaster

cluster_plot <- fviz_cluster(kmeans_result, data = scaled_data,
                             palette = c("#2E9FDF", "#00AFBB", "#E7B800"),
                             geom = "point",
                             ellipse.type = "convex",
                             ggtheme = theme_bw()) +
  ggtitle("Hasil K-Means Clustering (k=3)") +
  theme_minimal()
cluster_plot

📌 Interpretasi:
Tampak tiga kelompok konsumen yang terpisah jelas → artinya segmentasi yang diperoleh stabil dan efektif.

📌 Profil Statistik Klaster

cluster_summary <- customer_data %>%
  group_by(Cluster) %>%
  summarise(
    Jumlah_Pelanggan = n(),
    Persentase = round(n() / nrow(customer_data) * 100, 1),
    Income_Rata2 = round(mean(Income), 0),
    Recency_Rata2 = round(mean(Recency), 1),
    MntWines_Rata2 = round(mean(MntWines), 1),
    NumWebPurchases_Rata2 = round(mean(NumWebPurchases), 1),
    .groups = 'drop'
  )
cluster_summary

## # A tibble: 3 × 7
##   Cluster Jumlah_Pelanggan Persentase Income_Rata2 Recency_Rata2 MntWines_Rata2
##   <fct>              <int>      <dbl>        <dbl>         <dbl>          <dbl>
## 1 1                    897       40.5        71178          50.1          630. 
## 2 2                    659       29.7        38987          74.2           78  
## 3 3                    660       29.8        39758          22.3           90.2
## # ℹ 1 more variable: NumWebPurchases_Rata2 <dbl>

📌 Interpretasi per klaster (berdasarkan tabel di atas)

Klaster	Pola Karakteristik	Makna Bisnis
Cluster 1	Income tinggi, pembelian wine sangat tinggi, pembelian online aktif	Pelanggan premium dan sangat loyal
Cluster 2	Income dan pembelian rata-rata	Pelanggan stabil, cocok untuk upselling
Cluster 3	Income rendah, pembelian rendah, Recency tinggi	Pelanggan dorman / tidak aktif

📌 Visualisasi Distribusi Setiap Variabel per Klaster

p1 <- ggplot(customer_data, aes(x = Cluster, y = Income, fill = Cluster)) +
  geom_boxplot() +
  labs(title = "Distribusi Income per Klaster", y = "Income") +
  theme_minimal()

p2 <- ggplot(customer_data, aes(x = Cluster, y = MntWines, fill = Cluster)) +
  geom_boxplot() +
  labs(title = "Pengeluaran Wine per Klaster", y = "MntWines") +
  theme_minimal()

p3 <- ggplot(customer_data, aes(x = Cluster, y = NumWebPurchases, fill = Cluster)) +
  geom_boxplot() +
  labs(title = "Pembelian Web per Klaster", y = "NumWebPurchases") +
  theme_minimal()

grid.arrange(p1, p2, p3, ncol = 3)

📌 Interpretasi visual: - Cluster 1 mendominasi semua metrik → segmen bernilai tinggi - Cluster 2 di tengah → segmen potensial berkembang - Cluster 3 terendah → segmen kontribusi rendah

🧠 Kesimpulan & Implikasi Bisnis

Analisis menemukan 3 segmen pelanggan utama:

⭐ Cluster 1 → High-Value Customer

Pengeluaran dan pendapatan tinggi
Perlu strategi loyalty program, personalized offering, premium promo

🔄 Cluster 2 → Mid-Value Customer

Stabil namun belum maksimal
Strategi upselling, cashback, bundling diskon khusus

⚠ Cluster 3 → Low-Value / Dormant Customer

Kontribusi rendah
Strategi promosi besar, kupon reaktivasi, email reminder

📌 Dengan segmentasi ini perusahaan dapat: - Memaksimalkan ROI pemasaran - Menargetkan promosi sesuai perilaku pelanggan - Meningkatkan penjualan dan retensi pelanggan

📌 Catatan Tambahan

Jika ingin menambahkan visual distribusi segmen:

final_summary <- cluster_summary %>%
  rename(Segment = Cluster, Jumlah = Jumlah_Pelanggan)

ggplot(final_summary, aes(x = Segment, y = Jumlah, fill = Segment)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = paste0(Jumlah, " (", Persentase, "%)")), 
            vjust = -0.5, size = 4) +
  labs(title = "Distribusi Segmentasi Pelanggan", y = "Jumlah") +
  theme_minimal()