Segmentasi pelanggan penting untuk mengidentifikasi kelompok konsumen
berdasarkan perilaku belanja, sehingga perusahaan dapat menyusun
strategi pemasaran yang lebih tepat sasaran.
Pada analisis ini digunakan algoritma K-Means
Clustering dengan variabel:
library(readxl)
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(cluster)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
# Import Data
data = read_excel('/Users/zaula/Downloads/marketing_campaign.csv.xlsx')
head(data)
## # A tibble: 6 × 29
## ID Year_Birth Education Marital_Status Income Kidhome Teenhome
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 5524 1957 Graduation Single 58138 0 0
## 2 2174 1954 Graduation Single 46344 1 1
## 3 4141 1965 Graduation Together 71613 0 0
## 4 6182 1984 Graduation Together 26646 1 0
## 5 5324 1981 PhD Married 58293 1 0
## 6 7446 1967 Master Together 62513 0 1
## # ℹ 22 more variables: Dt_Customer <dttm>, Recency <dbl>, MntWines <dbl>,
## # MntFruits <dbl>, MntMeatProducts <dbl>, MntFishProducts <dbl>,
## # MntSweetProducts <dbl>, MntGoldProds <dbl>, NumDealsPurchases <dbl>,
## # NumWebPurchases <dbl>, NumCatalogPurchases <dbl>, NumStorePurchases <dbl>,
## # NumWebVisitsMonth <dbl>, AcceptedCmp3 <dbl>, AcceptedCmp4 <dbl>,
## # AcceptedCmp5 <dbl>, AcceptedCmp1 <dbl>, AcceptedCmp2 <dbl>, Complain <dbl>,
## # Z_CostContact <dbl>, Z_Revenue <dbl>, Response <dbl>
customer_data <- data[, c("Income", "Recency", "MntWines", "NumWebPurchases")]
customer_data <- na.omit(customer_data)
scaled_data <- scale(customer_data)
head(scaled_data)
## Income Recency MntWines NumWebPurchases
## [1,] 0.2340099 0.3104621 0.9780050 1.4282310
## [2,] -0.2345065 -0.3804236 -0.8718271 -1.1256271
## [3,] 0.7693040 -0.7949549 0.3584298 1.4282310
## [4,] -1.0170092 -0.7949549 -0.8718271 -0.7607902
## [5,] 0.2401673 1.5540562 -0.3915822 0.3337204
## [6,] 0.4078067 -1.1403978 0.6370904 0.6985572
📌 Interpretasi:
Standarisasi dilakukan agar seluruh variabel memiliki satuan skala yang
sama, sehingga tidak ada variabel yang mendominasi proses
clustering.
set.seed(123)
elbow_plot <- fviz_nbclust(scaled_data, kmeans, method = "wss") +
ggtitle("Elbow Method - Jumlah Klaster Optimal") +
theme_minimal()
elbow_plot
📌 Interpretasi:
Bentuk grafik elbow menunjukkan k = 3 sebagai jumlah klaster optimal,
karena setelah titik ke-3 grafik menurun lebih landai.
k <- 3
kmeans_result <- kmeans(scaled_data, centers = k, nstart = 25)
customer_data$Cluster <- as.factor(kmeans_result$cluster)
kmeans_result
## K-means clustering with 3 clusters of sizes 897, 659, 660
##
## Cluster means:
## Income Recency MntWines NumWebPurchases
## 1 0.7520352 0.0390372 0.9633446 0.7953590
## 2 -0.5267625 0.8711907 -0.6732837 -0.5592718
## 3 -0.4961198 -0.9229258 -0.6370092 -0.5225407
##
## Clustering vector:
## [1] 1 3 1 3 2 1 1 3 3 2 2 2 1 3 1 2 3 1 2 3 3 1 1 2 2 3 3 1 3 2 3 3 3 2 1 2 2
## [38] 1 1 2 3 3 2 3 2 1 3 1 2 1 1 1 1 3 1 1 1 1 1 1 2 3 1 3 1 3 1 1 2 2 1 1 3 1
## [75] 2 3 2 2 3 2 3 3 1 2 2 2 3 1 3 1 3 3 3 1 1 1 2 3 1 3 1 1 1 1 1 3 3 1 2 2 3
## [112] 1 3 2 2 1 2 1 2 1 3 1 1 2 1 3 2 3 2 1 1 1 1 3 1 3 2 2 2 2 1 3 3 1 1 3 2 3
## [149] 1 3 1 2 1 1 2 3 2 3 2 2 2 3 3 2 1 1 2 2 1 2 2 1 3 2 3 2 1 1 3 2 1 2 2 3 3
## [186] 1 1 1 2 1 1 1 1 2 3 2 2 2 1 3 1 2 1 2 2 3 1 3 1 2 1 1 3 1 3 1 1 1 2 3 1 2
## [223] 3 1 2 3 1 2 3 1 1 2 1 1 3 1 1 1 1 2 2 1 2 1 3 1 2 2 3 2 1 2 3 3 3 1 2 3 3
## [260] 1 3 3 2 3 1 1 3 3 1 3 1 3 2 3 2 1 2 1 2 3 2 1 2 3 1 3 3 3 1 2 1 2 3 2 1 2
## [297] 1 1 3 2 3 1 3 2 2 3 3 3 2 3 1 1 1 3 3 3 3 2 3 1 3 2 3 1 3 1 1 1 3 1 2 3 2
## [334] 2 3 3 2 1 1 1 1 1 3 2 3 1 2 1 1 2 3 2 1 1 2 1 1 2 2 2 1 3 2 3 3 1 2 3 2 3
## [371] 2 3 3 3 1 2 1 1 3 1 2 1 1 3 2 2 2 2 1 2 3 1 2 3 3 3 1 2 1 1 2 1 1 3 1 1 1
## [408] 3 3 2 1 1 3 1 1 2 3 1 1 1 1 3 2 1 1 2 2 2 2 3 3 2 2 2 1 3 2 1 1 3 2 1 2 1
## [445] 2 2 1 3 2 3 1 3 1 1 3 1 1 1 3 1 2 2 1 2 1 1 1 2 2 3 3 1 1 1 1 3 2 1 2 1 1
## [482] 1 2 3 1 1 3 2 3 3 1 2 1 2 1 2 1 2 3 2 1 2 1 3 2 3 1 2 1 3 1 3 2 1 2 1 2 2
## [519] 1 1 2 2 2 3 1 2 2 3 3 3 1 2 2 1 2 3 3 2 3 1 2 3 2 1 1 3 1 3 1 1 3 3 3 1 2
## [556] 3 3 3 2 3 3 2 1 3 2 3 2 2 2 2 2 2 3 1 2 1 2 3 1 2 3 3 3 2 3 3 2 3 1 2 1 2
## [593] 3 2 2 3 1 2 2 3 3 1 2 2 1 3 3 1 2 1 2 1 2 1 1 2 2 1 1 3 3 1 3 1 3 1 1 1 1
## [630] 1 3 1 2 1 3 1 1 1 3 1 2 3 1 2 2 1 2 1 2 1 3 2 2 3 3 3 2 1 2 1 1 1 3 1 2 3
## [667] 3 1 1 3 2 2 2 1 1 1 3 2 1 2 3 3 2 2 3 1 1 1 1 1 1 3 1 3 1 1 3 3 1 3 1 2 1
## [704] 1 3 1 3 1 1 3 1 3 2 1 1 1 1 3 1 1 2 1 1 1 3 3 1 1 3 2 2 1 1 3 3 3 1 1 2 1
## [741] 1 1 1 1 1 3 2 3 1 1 3 2 2 1 1 2 1 3 1 3 3 2 2 2 1 2 1 1 3 3 3 3 3 2 3 1 3
## [778] 1 1 2 3 2 3 1 1 1 3 2 2 2 1 1 1 2 3 1 1 3 3 3 1 1 1 2 2 1 3 1 3 1 2 1 1 1
## [815] 2 1 3 2 2 1 3 2 1 3 1 3 1 3 2 3 2 3 1 1 1 2 2 3 1 1 3 3 1 3 1 2 1 2 3 2 2
## [852] 2 3 3 2 2 2 1 1 2 3 1 1 1 3 1 2 3 3 2 2 1 1 2 3 1 1 2 2 2 2 1 1 1 1 2 3 1
## [889] 3 2 3 3 1 1 3 2 3 1 1 1 3 1 1 1 1 3 1 3 2 2 2 1 2 1 1 1 1 1 2 1 2 2 2 1 1
## [926] 1 1 1 1 2 1 2 1 2 2 2 1 2 3 3 2 2 2 2 2 3 1 1 1 3 2 3 2 1 3 2 1 1 3 3 1 1
## [963] 3 1 1 2 1 2 3 2 1 1 1 1 1 1 3 1 3 1 2 2 2 1 2 1 3 1 1 1 3 2 3 1 1 3 2 1 2
## [1000] 2 3 3 2 1 2 3 3 2 2 1 2 2 3 2 2 3 1 1 1 3 2 1 2 3 3 3 2 1 2 2 2 3 3 2 1 2
## [1037] 1 1 3 1 3 2 2 2 3 1 2 1 1 3 3 3 3 1 1 2 2 2 1 1 2 3 1 1 3 3 2 2 3 1 2 1 1
## [1074] 3 1 2 3 1 2 3 3 3 2 1 2 3 1 1 3 2 2 3 1 3 3 1 2 1 1 2 1 2 1 3 3 3 1 1 3 2
## [1111] 3 3 2 1 2 2 1 1 2 3 1 1 2 2 1 3 2 1 2 2 2 1 2 3 1 1 3 1 1 2 1 3 2 3 1 1 1
## [1148] 2 1 1 1 3 1 3 3 1 1 2 2 1 1 2 2 3 3 2 1 1 2 1 2 3 2 2 1 3 3 1 1 2 2 2 1 2
## [1185] 1 1 3 3 1 3 2 2 1 1 3 2 2 2 2 1 1 2 3 3 2 1 2 3 2 1 3 2 1 1 3 2 2 2 2 3 2
## [1222] 3 1 2 1 2 3 3 2 1 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 3 1 3 1 1 3 3 1 1 2 3 1 1
## [1259] 1 3 1 2 1 3 2 1 3 1 1 2 3 2 3 1 2 1 1 3 2 3 2 3 3 2 1 1 3 1 1 2 3 1 1 1 1
## [1296] 1 1 1 1 2 3 2 3 2 2 2 2 1 3 1 3 3 1 2 2 3 1 2 3 1 1 1 2 3 3 3 3 2 3 1 3 2
## [1333] 3 3 3 1 1 1 1 3 3 1 1 2 1 1 3 1 3 3 1 3 1 1 2 3 3 2 2 2 2 1 3 3 3 3 2 2 3
## [1370] 1 3 3 1 3 2 3 3 3 1 2 2 1 3 2 1 3 2 1 2 2 1 3 3 1 1 2 1 2 2 3 3 3 3 3 3 1
## [1407] 2 2 3 2 2 2 2 2 2 1 2 3 2 2 3 2 3 2 3 2 1 1 2 1 1 1 1 3 3 1 2 3 3 2 2 1 2
## [1444] 1 3 3 3 3 3 1 1 3 1 2 2 2 1 1 3 1 3 2 1 1 1 3 2 1 1 1 1 1 2 1 3 1 1 2 2 3
## [1481] 1 1 1 2 2 1 1 1 1 1 1 3 1 1 2 1 1 3 2 1 3 2 3 1 3 2 3 2 1 1 2 1 2 1 2 1 2
## [1518] 2 2 2 3 1 1 3 1 2 2 1 3 3 2 1 3 2 3 1 2 1 2 2 2 2 1 3 1 3 3 3 3 1 1 1 1 1
## [1555] 2 1 3 1 3 3 1 2 1 3 1 1 2 3 3 2 3 3 1 3 2 3 2 2 3 2 2 1 3 1 1 1 2 2 2 1 1
## [1592] 1 3 1 3 1 2 3 3 3 3 3 1 2 3 1 1 2 3 3 3 3 2 1 2 3 3 2 1 1 3 3 3 2 1 1 2 1
## [1629] 3 1 1 1 3 2 1 3 1 2 2 3 1 1 1 1 2 1 3 3 2 1 2 1 1 2 1 1 1 1 3 3 3 3 3 1 2
## [1666] 3 2 2 2 1 1 1 1 1 1 3 2 2 2 2 1 3 1 1 2 2 1 3 3 1 2 1 2 1 1 2 1 3 2 1 3 2
## [1703] 1 2 1 1 1 3 3 3 2 1 1 3 2 2 1 1 1 1 1 1 3 2 3 3 1 1 1 2 1 1 1 1 3 3 1 3 3
## [1740] 2 3 2 1 1 3 3 1 2 1 3 3 2 3 3 2 2 1 3 3 2 2 3 3 2 2 1 1 3 3 3 2 1 1 3 3 1
## [1777] 2 2 2 3 2 1 1 1 1 2 3 2 1 1 2 1 1 2 1 1 2 1 2 1 1 2 3 2 3 3 1 1 2 2 3 1 3
## [1814] 3 3 1 3 3 3 3 2 1 1 2 2 2 2 1 1 3 3 1 2 1 2 1 3 1 1 2 2 1 1 1 3 2 1 1 1 2
## [1851] 2 1 1 2 1 2 1 2 3 3 1 1 1 1 3 2 1 2 1 3 1 1 2 1 1 2 1 2 3 2 1 1 2 3 3 2 2
## [1888] 3 1 1 2 3 2 1 1 1 1 1 3 3 2 3 2 3 1 1 1 1 3 1 1 1 1 2 2 3 3 1 3 3 1 3 2 1
## [1925] 2 2 1 2 2 3 1 2 3 2 3 1 1 3 1 2 2 1 3 1 1 3 3 3 1 1 1 1 1 1 3 3 3 2 1 3 3
## [1962] 3 2 2 3 1 3 1 3 3 3 3 3 1 2 1 1 2 1 1 3 3 2 2 3 3 3 3 2 1 2 2 1 1 3 1 1 1
## [1999] 3 3 2 3 2 3 2 2 1 1 2 3 3 1 2 1 1 2 1 2 3 1 3 2 1 2 1 3 2 2 2 2 1 1 1 3 1
## [2036] 1 3 2 2 1 1 1 2 1 2 1 1 3 1 2 2 1 1 2 3 1 3 3 2 3 1 1 1 1 1 3 3 3 2 3 1 1
## [2073] 2 1 3 1 1 3 1 3 1 3 3 3 3 3 1 1 1 2 3 2 1 3 1 1 2 3 2 3 2 2 1 1 1 1 2 2 1
## [2110] 1 2 3 1 3 3 2 3 3 3 2 3 1 1 3 3 3 3 3 1 2 3 3 3 2 3 1 2 1 1 3 1 3 2 1 2 1
## [2147] 2 1 1 1 1 1 1 1 3 1 2 2 1 2 3 3 1 1 1 3 3 1 2 3 1 1 2 3 1 3 2 2 1 3 1 3 2
## [2184] 1 2 2 3 1 1 3 1 3 2 2 1 3 2 2 1 3 3 1 1 1 3 3 2 1 2 1 2 1 1 1 1 3
##
## Within cluster sum of squares by cluster:
## [1] 3174.2102 689.2913 727.4047
## (between_SS / total_SS = 48.2 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
📌 Interpretasi:
Algoritma berhasil mengelompokkan pelanggan ke dalam 3 klaster
berbeda berdasarkan kesamaan pola perilaku pembelian.
cluster_plot <- fviz_cluster(kmeans_result, data = scaled_data,
palette = c("#2E9FDF", "#00AFBB", "#E7B800"),
geom = "point",
ellipse.type = "convex",
ggtheme = theme_bw()) +
ggtitle("Hasil K-Means Clustering (k=3)") +
theme_minimal()
cluster_plot
📌 Interpretasi:
Tampak tiga kelompok konsumen yang terpisah jelas → artinya segmentasi
yang diperoleh stabil dan efektif.
cluster_summary <- customer_data %>%
group_by(Cluster) %>%
summarise(
Jumlah_Pelanggan = n(),
Persentase = round(n() / nrow(customer_data) * 100, 1),
Income_Rata2 = round(mean(Income), 0),
Recency_Rata2 = round(mean(Recency), 1),
MntWines_Rata2 = round(mean(MntWines), 1),
NumWebPurchases_Rata2 = round(mean(NumWebPurchases), 1),
.groups = 'drop'
)
cluster_summary
## # A tibble: 3 × 7
## Cluster Jumlah_Pelanggan Persentase Income_Rata2 Recency_Rata2 MntWines_Rata2
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 897 40.5 71178 50.1 630.
## 2 2 659 29.7 38987 74.2 78
## 3 3 660 29.8 39758 22.3 90.2
## # ℹ 1 more variable: NumWebPurchases_Rata2 <dbl>
📌 Interpretasi per klaster (berdasarkan tabel di atas)
| Klaster | Pola Karakteristik | Makna Bisnis |
|---|---|---|
| Cluster 1 | Income tinggi, pembelian wine sangat tinggi, pembelian online aktif | Pelanggan premium dan sangat loyal |
| Cluster 2 | Income dan pembelian rata-rata | Pelanggan stabil, cocok untuk upselling |
| Cluster 3 | Income rendah, pembelian rendah, Recency tinggi | Pelanggan dorman / tidak aktif |
p1 <- ggplot(customer_data, aes(x = Cluster, y = Income, fill = Cluster)) +
geom_boxplot() +
labs(title = "Distribusi Income per Klaster", y = "Income") +
theme_minimal()
p2 <- ggplot(customer_data, aes(x = Cluster, y = MntWines, fill = Cluster)) +
geom_boxplot() +
labs(title = "Pengeluaran Wine per Klaster", y = "MntWines") +
theme_minimal()
p3 <- ggplot(customer_data, aes(x = Cluster, y = NumWebPurchases, fill = Cluster)) +
geom_boxplot() +
labs(title = "Pembelian Web per Klaster", y = "NumWebPurchases") +
theme_minimal()
grid.arrange(p1, p2, p3, ncol = 3)
📌 Interpretasi visual: - Cluster 1 mendominasi semua metrik → segmen bernilai tinggi - Cluster 2 di tengah → segmen potensial berkembang - Cluster 3 terendah → segmen kontribusi rendah
Analisis menemukan 3 segmen pelanggan utama:
📌 Dengan segmentasi ini perusahaan dapat: - Memaksimalkan ROI pemasaran - Menargetkan promosi sesuai perilaku pelanggan - Meningkatkan penjualan dan retensi pelanggan
Jika ingin menambahkan visual distribusi segmen:
final_summary <- cluster_summary %>%
rename(Segment = Cluster, Jumlah = Jumlah_Pelanggan)
ggplot(final_summary, aes(x = Segment, y = Jumlah, fill = Segment)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(Jumlah, " (", Persentase, "%)")),
vjust = -0.5, size = 4) +
labs(title = "Distribusi Segmentasi Pelanggan", y = "Jumlah") +
theme_minimal()