# Load required libraries
library(readr)
library(cluster)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
# Load the CSV file
data <- read_csv("customer_segmentation.csv")
## Rows: 22 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (15): ID, CS_helpful, Recommend, Come_again, All_Products, Profesionalis...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# View the first few rows
head(data)
## # A tibble: 6 × 15
## ID CS_helpful Recommend Come_again All_Products Profesionalism Limitation
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 2 2 2 2 2
## 2 2 1 2 1 1 1 1
## 3 3 2 1 1 1 1 2
## 4 4 3 3 2 4 1 2
## 5 5 2 1 3 5 2 1
## 6 6 1 1 3 2 1 1
## # ℹ 8 more variables: Online_grocery <dbl>, delivery <dbl>, Pick_up <dbl>,
## # Find_items <dbl>, other_shops <dbl>, Gender <dbl>, Age <dbl>,
## # Education <dbl>
# Keep only numeric columns
data_clean <- data[, sapply(data, is.numeric)]
# Scale the numeric data
data_scaled <- scale(data_clean)
# Use the Elbow Method to choose number of clusters
fviz_nbclust(data_scaled, kmeans, method = "wss")
# Run K-means with 3 clusters
set.seed(123) # for reproducibility
kmeans_result <- kmeans(data_scaled, centers = 3, nstart = 25)
# Visualize clusters
fviz_cluster(kmeans_result, data = data_scaled)
kmeans_result$centers
## ID CS_helpful Recommend Come_again All_Products Profesionalism
## 1 -0.02566635 -0.01031923 -0.1054899 -0.50262359 -0.39835424 0.01283318
## 2 0.00000000 -0.80490011 -0.4922862 0.06154575 0.07113469 -0.69299145
## 3 0.07699905 1.23830786 1.0548991 1.41555215 1.08836068 1.00098765
## Limitation Online_grocery delivery Pick_up Find_items other_shops
## 1 1.850372e-17 0.40480555 0.2373423 0.5949772 -0.1806489 -0.4212692
## 2 -4.157397e-01 -0.78986449 -1.0112848 -0.4301040 -0.1806489 0.7669260
## 3 6.236096e-01 -0.02961992 0.8049001 -1.1397755 0.8129201 0.1134186
## Gender Age Education
## 1 -0.2326695 -0.3897897 -0.3688989
## 2 -0.2326695 0.5128812 0.8125115
## 3 1.0470128 0.4000473 -0.1120706
After running cluster analysis on the survey data, I found three different customer segments. Cluster 0 respondents were more positive overall, they rated customer service and product variety higher and were more likely to return. Cluster 1 gave lower scores on most aspects, including customer service and likelihood to return, suggesting dissatisfaction or low engagement. Cluster 2 had mixed responses, with a strong preference for other shops and higher education levels. This analysis showed me how clustering helps break down customers into groups based on behavior and satisfaction, making it easier to target marketing or service improvements.