In today’s competitive retail landscape, understanding customer behavior is paramount for the success of any shopping mall. The project aims to explore the shopping habits of individuals in a small shopping mall based on a dataset that contains shopping information from a shopping mall. The data was gathered from various genders and age groups to provide a good view of the shopping patterns in the mall.
The purpose of this project is to answer below questions: - How to achieve customer segmentation using machine learning algorithm (KMeans Clustering) - Who are your target customers with whom can start marketing strategy [easy to converse] - How the marketing strategy works in real world
K-Means clustering is one of the simplest and most commonly used clustering algorithms. It tries to find cluster that are representative of certain regions of the data. The algorithm alternates between two steps: assigning each data point that are assigned to it. The algorithm is finished when the assignment of instances to cluster no longer changes.
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
#> CustomerID Gender Age Annual.Income..k.. Spending.Score..1.100.
#> 1 1 Male 19 15 39
#> 2 2 Male 21 15 81
#> 3 3 Female 20 16 6
#> 4 4 Female 23 16 77
#> 5 5 Female 31 17 40
#> 6 6 Female 22 17 76
#> 7 7 Female 35 18 6
#> 8 8 Female 23 18 94
#> 9 9 Male 64 19 3
#> 10 10 Female 30 19 72
Information about columns in dataset:
CustomerID: unique ID assigned to the customer
Gender: Gender of the customer
Age: Age of the customer
Annual income: annual income of the customee
Spending score: Score assigned by the mall based on customer behavior and spending nature.
Select relevant columns for clustering:
kmeans_data <- data[, c("Annual.Income..k..", "Spending.Score..1.100.")]Histogram of Age
Scatter plot of Annual Income vs Spending Score:
The next step is to find the neccessary number of clusters. Since the analyzed dataset is rather small and straightforward. The optimal number of clusters will be chosen through the Elbow method and the Silhouette statistic.
plot(1:10, wss, type = "b", pch = 19, frame = FALSE,
xlab = "Number of clusters K", ylab = "Total Within-Cluster Sum of Squares",
main = "Elbow Method for Optimal K")plot(2:10, sil_width, type = "b", pch = 19, frame = FALSE,
xlab = "Number of clusters K", ylab = "Average Silhouette Width",
main = "Silhouette Method for Optimal K")According the two method, in order to get meaningful customer segmentation, the highest silhouette width is most well-defined at K= 5, this aligns with the Elbow method ensures that the segmentation is both well-structured, which is crucial for marketing strategy development and customer targeting.
Applying K-means clustering with the optimal K = 5 based on elbow and silhouette methods
set.seed(123)
kmeans_model <- kmeans(kmeans_data, centers = 5, nstart = 25)Visualize the clusters:
fviz_cluster(kmeans_model, data = kmeans_data, geom = "point", ellipse.type = "norm") +
labs(title = "Customer Segmentation using K-Means Clustering")This clustering visualization successfully segments mall customers into meaningful groups, aiding in targeted marketing and strategic decision-making. Redefining purposes of the project which achieve customer segmentation through K-means and understanding insights in optimzing offerings, and personalized customers experience.