library(cluster)
library(tidyverse)
library(kableExtra)
market <- read_csv("supermarket_customers.csv")Customer Segmentation
Setup
Segmenting Supermarket Customers using Hierarchical Clustering
market_2 <- select(market, annual_income, spending_score)
market_2_scale <- scale(market_2)
d1 <- dist(market_2_scale)
h1 <- hclust(d1)
plot(h1, hang = -1)heatmap(as.matrix(d1), Rowv = as.dendrogram(h1), Colv = 'Rowv')Findings
The dataset needs to be scaled, because the values of one variable range from 1 to 70, second variable is 1 to 90 and the third is one to 130, while the difference isn’t massive just to be safe we will scale the variables
We can see distinct clusters along the diagonal of the heatmap, although I found it easier to identify the 5 clusters through the dendrogram rather than the heatmap. The heatmap makes is seem like there is 4 clusters rather than 5.
clusters1 <- cutree(h1, k = 5)
sil1 <- silhouette(clusters1, d1)
summary(sil1)Silhouette of 200 units in 5 clusters from silhouette.default(x = clusters1, dist = d1) :
Cluster sizes and average silhouette widths:
23 21 79 39 38
0.5065849 0.6199558 0.6140651 0.5094809 0.4623933
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1401 0.4993 0.5965 0.5531 0.6610 0.7645
Findings
A reasonable clustering structure has been found for 2 clusters. And the 3 remaining clusters are close to the tipping point of a reasonable clustering, mean quality is 0.59 which is within the 0.51 - 0.71 reasonable structure.
ggplot(data = market_2) +
geom_point(mapping = aes(x = annual_income, y = spending_score, colour = clusters1)) +
labs(title = "Scatterplot of Annual Income and Spending Score by Cluster",
x = "Annual Income",
y = "Spending Score")Findings
Cluster 1 is made up of low income low spenders. Cluster 2 is made up of low income high spenders. Cluster 3 is made up of medium income medium spenders. Cluster 4 is made up of high income high spenders. Cluster 5 is made up of high income low spenders.
market_clus <- market %>%
mutate(clusters1 = clusters1) %>%
mutate(cluster = case_when(clusters1 == 1 ~ 'C1',
clusters1 == 2 ~ 'C2',
clusters1 == 3 ~ 'C3',
clusters1 == 4 ~ 'C4',
clusters1 == 5 ~ 'C5'))
market_clus_means <- market_clus %>%
group_by(cluster) %>%
summarise(num_custs = n(),
average_age = mean(age),
income_m = mean(annual_income),
spending_m = mean(spending_score))
knitr::kable(select(market_clus_means, cluster,num_custs, average_age, income_m, spending_m),
digits = c(0,0,0,0,0),
col.names = c("Cluster", "# Customers", "Avg. Age","Avg. Income","Avg. Spending"),
caption = "Number of Customers, Average Age, Average Income and Average Spending by Cluster") %>%
kable_styling(full_width = F) %>%
row_spec(1, color = "white", background = "royalblue") %>%
row_spec(2, color = "white", background = "#0041C2") %>%
row_spec(3, color = "white", background = "blue") %>%
row_spec(4, color = "white", background = "#123456") %>%
row_spec(5, color = "white", background = "#151B54")| Cluster | # Customers | Avg. Age | Avg. Income | Avg. Spending |
|---|---|---|---|---|
| C1 | 23 | 45 | 26 | 21 |
| C2 | 21 | 25 | 25 | 80 |
| C3 | 79 | 43 | 54 | 50 |
| C4 | 39 | 33 | 87 | 82 |
| C5 | 38 | 40 | 87 | 19 |
Findings
Cluster 1 is a smaller cluster with the average age of 45, lower income and lower spending. Cluster 2 is also a smaller cluster made up of younger customers with low income, but higher spend. Cluster 3 is a large segment made up of middle aged customers with medium income and medium spend. Cluster 4 is a medium sized cluster made up of millennial customers with high income and high spending. Cluster 5 is a medium sized cluster made up of middle aged customers with high income and low spending. Cluster
market_clus$gender <- factor(market_clus$gender,
levels = c("Male", "Female"))
ggplot(market_clus, aes(x = gender, group = cluster)) +
geom_bar(aes(y = after_stat(prop),
fill = factor(after_stat(x))),
stat = "count",
show.legend = FALSE) +
facet_grid(~ cluster) +
scale_y_continuous(labels = scales::percent) +
ylab("Percentage of Customers") +
xlab("Gender") +
ggtitle("Gender Distribution by Cluster")Findings
Cluster 1 is predominantly female close to a 60/40 split. Cluster 2 is also predominantly female close to a 57/43 split. Cluster 3 is similar to cluster 1 where female customers make up almost 60%. Cluster 4 is more balanced still mostly female, but seems like a 53/47 split cluster 5 is predominantly male with a balance of 52/48.
Cluster 1 - Smart Savers Cluster 2 - Budget Ballers Cluster 3 - Middle Marry’s Cluster 4 - Premium Patrons Cluster 5 - Prosperous Penny-Pinchers
Marketing Actions
The supermarket should focus on its middle market, the third cluster as it is the largest segment with mean spend, it will be easier to persuade them to be high spenders, then for example cluster 1. Since it is predominantly middle-aged females with medium income the supermarket should run a campaign for the average working woman, see what they buy most often and create bundles and survey them on what they would like to see added to the offering. Next target market should be cluster 5, which is predominantly middle aged men with high income, but low expenditure, They have the money to spend, but they are either saving it or spending it somewhere else, if we can get them to spend their more in this supermarket we could turn them to high spenders. This could be done by offering luxury products, they may be looking for somewhere else (eg. expensive alcohol, premium cut meats), we need to find out what they want and then offer it to them.
Segmenting Bank Customers using K-means
bank <- read_csv("bank_personal_loan.csv")
bank_2 <- select(bank, age, experience, income, cc_avg)
bank_2_scale <- scale(bank_2)
set.seed(101)
kmeans2 <- kmeans(bank_2_scale, centers = 3)
kmeans2$cluster [1] 1 1 1 1 1 1 2 2 1 3 2 1 3 2 2 2 3 1 3 2 2 2 1 1 3 1 1 3 2 3 2 1 2 1 1 2 2
[38] 2 3 1 2 1 1 1 3 2 1 1 2 1 1 2 1 3 1 3 2 2 1 3 2 3 1 1 3 3 2 2 2 2 3 2 3 3
[75] 1 3 2 2 2 2 2 2 3 1 2 1 1 2 2 1 3 1 1 2 2 1 3 3 2 2 2 2 2 1 2 1 1 1 1 1 1
[112] 2 1 2 1 2 2 2 1 3 2 2 2 1 1 2 1 1 1 1 1 3 1 1 2 2 2 2 2 2 2 1 1 1 2 3 1 2
[149] 2 2 3 3 2 2 2 1 1 1 1 2 3 2 1 1 2 1 1 1 2 1 1 2 3 2 3 1 2 1 2 2 2 1 1 3 2
[186] 1 2 3 2 2 2 2 2 2 3 1 3 2 1 3 1 1 1 2 2 1 2 1 1 3 2 1 2 3 2 1 1 1 1 2 1 1
[223] 1 2 2 1 1 3 2 2 2 1 1 2 1 1 1 2 2 1 2 2 1 3 1 1 1 2 2 1 1 3 2 2 2 2 1 2 1
[260] 2 2 1 2 1 1 2 2 2 2 1 2 1 1 1 1 2 1 1 2 3 1 2 1 2 1 1 2 1 3 1 3 1 1 1 1 2
[297] 1 2 1 3 1 3 3 3 2 2 2 1 3 2 2 3 1 1 2 1 2 3 1 2 2 3 2 2 3 2 2 2 2 3 2 1 2
[334] 2 2 2 1 2 1 1 2 1 3 1 2 2 1 1 3 1 1 3 2 2 3 1 2 1 3 1 1 3 2 1 2 3 2 1 2 1
[371] 1 2 2 2 1 1 2 1 2 1 2 2 2 1 2 1 1 1 2 3 1 2 2 2 1 2 2 1 2 1 3 1 3 2 2 3 3
[408] 2 2 2 3 2 1 1 2 1 1 2 1 2 2 1 3 3 2 1 1 1 2 1 2 1 3 2 1 2 2 1 2 2 2 2 2 2
[445] 2 2 2 2 1 2 2 1 1 2 2 1 2 1 2 3 2 2 3 3 3 2 1 1 1 2 1 2 1 2 3 3 2 2 3 3 2
[482] 1 3 1 1 2 2 1 1 2 1 1 2 3 3 1 2 2 1 2 2 2 1 1 1 1 2 2 2 3 2 1 1 1 1 1 2 2
[519] 1 1 2 2 1 2 1 2 3 1 2 1 2 1 2 1 2 2 1 3 1 2 1 3 1 2 3 1 1 1 2 2 2 2 1 2 3
[556] 1 2 1 1 2 1 2 1 2 1 2 3 1 1 1 3 1 3 2 1 2 1 3 1 2 2 1 1 1 3 1 1 3 1 3 1 1
[593] 1 1 2 1 3 3 2 1 2 2 1 2 1 2 1 1 1 1 2 2 2 2 3 3 1 2 2 2 1 1 1 1 1 2 1 1 2
[630] 1 1 1 3 2 2 2 3 2 1 2 3 3 2 3 2 1 2 2 2 1 3 1 1 2 3 2 1 1 2 2 3 2 2 2 2 2
[667] 2 2 2 2 1 2 2 1 2 1 2 3 2 2 2 3 2 1 3 1 1 2 1 2 2 1 1 1 3 1 2 3 2 1 1 1 3
[704] 3 3 2 3 2 1 1 1 2 1 1 2 2 1 2 2 2 2 2 2 2 2 3 2 2 3 2 3 1 1 2 2 1 3 2 1 2
[741] 3 2 1 2 3 1 2 2 1 2 3 2 2 2 1 2 2 2 2 2 1 2 1 2 1 2 1 2 1 1 3 1 3 3 2 2 3
[778] 2 2 3 1 3 3 3 3 3 2 3 2 1 2 2 3 1 2 2 1 1 1 1 3 2 1 2 2 3 2 3 2 2 1 2 1 2
[815] 1 2 2 1 2 2 2 1 2 1 1 1 2 2 1 2 1 2 1 2 3 3 1 1 1 1 1 3 1 2 2 1 2 1 2 1 2
[852] 1 1 1 2 2 2 2 1 3 2 2 2 2 1 2 1 2 3 2 1 2 1 1 1 2 1 1 1 2 2 3 3 2 1 1 2 3
[889] 3 1 2 3 1 2 1 1 3 2 2 3 1 2 2 1 1 2 1 2 2 3 2 2 1 2 2 3 1 3 1 2 1 1 1 2 2
[926] 1 1 2 1 2 1 1 2 2 2 3 2 1 2 2 3 3 2 1 1 2 1 2 1 1 1 2 3 2 3 2 1 2 2 2 3 1
[963] 2 1 1 3 2 2 2 1 2 3 1 3 2 2 2 2 2 2 1 3 2 3 2 3 3 2 2 1 1 1 1 3 1 1 1 2 2
[1000] 2 2 2 2 1 2 1 2 3 3 1 1 2 2 1 3 2 1 1 1 1 2 1 1 3 2 2 1 3 1 1 2 2 1 2 2 1
[1037] 2 1 3 3 1 2 2 2 2 3 2 2 2 1 3 1 1 2 2 1 1 1 2 1 2 2 3 2 3 1 1 3 1 1 1 1 3
[1074] 1 1 1 1 3 2 3 2 2 1 1 3 2 2 1 2 2 1 1 1 1 2 2 1 3 2 1 1 1 1 1 3 1 1 2 2 2
[1111] 2 3 2 1 1 2 3 3 1 1 1 1 1 2 1 3 1 1 3 1 3 3 1 1 1 2 2 3 1 2 1 3 1 1 2 1 1
[1148] 1 3 3 2 2 2 2 2 1 2 2 1 2 1 3 1 3 3 1 3 3 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2
[1185] 1 1 2 2 1 1 3 1 3 2 1 1 1 2 1 1 1 1 1 2 1 1 3 1 2 2 3 2 1 1 2 1 2 1 2 1 3
[1222] 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 1 3 1 2 2 2 1 3 3 2 2 2 1 2 2 1 1 2 1 1 1 2
[1259] 1 2 2 2 1 1 3 1 2 2 1 1 1 1 2 3 2 1 1 3 1 2 2 1 2 1 2 1 1 1 2 2 2 2 2 2 1
[1296] 1 1 2 1 2 2 3 1 1 3 1 1 3 2 1 2 1 2 2 1 2 1 2 3 1 1 3 1 2 2 2 1 2 3 1 1 1
[1333] 1 2 2 3 1 3 2 3 1 1 1 1 2 2 1 2 1 3 1 2 2 2 3 2 1 2 2 2 3 2 1 1 1 2 2 2 2
[1370] 2 1 2 3 2 2 3 2 1 2 2 2 1 1 2 2 2 1 1 2 1 1 1 2 2 2 3 1 2 3 1 1 1 3 1 2 3
[1407] 2 2 3 1 2 3 2 2 2 1 1 1 3 1 1 1 1 2 1 2 1 1 1 1 1 3 3 2 2 1 2 1 2 2 1 2 1
[1444] 1 3 3 1 3 1 2 2 1 2 1 2 2 1 1 2 2 1 2 2 1 1 1 3 2 1 2 2 3 1 2 3 3 2 1 3 1
[1481] 2 1 2 2 2 1 3 1 1 2 1 1 1 2 2 3 1 2 3 3 2 1 2 1 3 2 2 1 1 3 2 2 2 3 3 2 1
[1518] 2 1 2 3 3 1 1 1 1 1 2 3 1 2 1 1 2 2 2 1 2 2 1 1 3 2 2 1 2 3 2 2 2 1 3 3 2
[1555] 1 2 1 2 1 2 1 2 1 2 2 1 2 2 2 2 3 1 2 1 2 2 1 1 1 1 1 2 3 3 2 2 2 2 1 2 2
[1592] 1 3 2 1 2 1 2 1 3 3 3 1 3 2 2 1 2 1 2 1 2 1 2 2 2 2 2 1 2 1 1 1 2 1 2 3 2
[1629] 1 3 1 2 1 2 2 2 2 3 1 2 1 3 1 1 2 2 2 1 2 1 1 3 3 1 2 1 3 1 2 3 1 1 2 2 2
[1666] 1 3 1 2 1 1 1 2 1 1 2 2 1 2 3 2 3 2 2 2 1 2 2 2 2 1 2 2 2 2 3 3 2 3 2 1 1
[1703] 2 2 3 2 2 2 2 2 1 3 1 1 2 1 3 1 1 3 2 2 1 1 1 2 2 2 2 2 1 3 1 1 1 2 2 1 2
[1740] 1 1 3 2 2 1 1 2 1 2 2 2 2 3 2 3 1 1 1 1 1 1 2 2 3 2 3 2 1 3 2 2 2 1 1 1 2
[1777] 2 2 1 1 2 2 1 3 2 3 1 1 1 3 1 2 3 1 2 2 2 3 3 1 2 1 1 2 1 2 2 2 2 1 2 1 3
[1814] 2 2 2 1 1 1 2 2 1 3 1 3 3 2 2 1 2 3 2 2 1 1 3 1 1 1 1 2 1 2 3 2 1 2 1 1 2
[1851] 1 1 1 2 2 2 2 1 1 2 3 2 3 2 2 1 2 2 3 2 2 1 1 1 1 1 2 2 2 2 1 1 2 3 3 1 2
[1888] 1 1 2 3 1 2 2 2 1 1 2 2 2 2 3 1 2 1 1 1 1 2 2 3 2 3 2 2 1 2 2 3 1 2 2 1 1
[1925] 2 1 1 1 2 1 2 1 2 2 1 3 2 3 1 2 2 1 2 2 2 2 2 2 1 2 1 2 1 2 1 1 1 1 1 2 1
[1962] 3 3 2 1 1 2 1 2 2 1 1 1 2 1 1 1 1 1 1 3 2 2 1 1 1 1 2 2 2 1 2 2 1 3 1 2 2
[1999] 2 2 1 3 3 3 1 3 2 2 2 1 2 2 2 1 3 1 1 1 2 1 2 3 1 2 1 2 2 1 1 1 2 2 2 2 2
[2036] 1 1 1 3 2 1 3 1 2 2 2 3 2 1 1 1 1 1 2 1 2 1 3 1 3 2 3 2 2 2 1 1 3 2 1 2 2
[2073] 1 2 2 1 3 3 1 1 2 2 1 1 1 2 1 3 1 2 2 1 2 2 2 3 2 1 2 2 1 3 1 3 1 1 2 1 2
[2110] 3 1 2 1 2 2 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 2 2 1 2 1 2 2 1 2 2 1 2 2 3 2
[2147] 1 1 2 3 2 1 2 3 1 2 1 1 2 2 1 2 1 1 1 1 1 2 2 2 1 1 1 1 1 3 1 3 3 2 2 1 1
[2184] 1 3 2 1 2 1 2 1 3 1 1 1 2 3 2 2 2 2 1 2 2 2 2 1 3 2 1 2 1 2 2 2 3 2 3 1 2
[2221] 2 2 1 2 1 2 3 2 2 2 3 3 2 2 1 3 2 1 2 2 1 1 1 2 2 2 3 2 2 1 3 1 2 2 2 1 2
[2258] 2 2 1 1 3 2 2 1 2 3 3 1 1 1 2 1 1 1 3 3 1 3 2 1 2 1 2 2 2 2 1 1 2 1 2 3 1
[2295] 1 2 1 2 2 2 2 1 3 2 3 3 1 2 1 1 1 2 2 2 1 3 3 3 2 3 2 1 2 1 1 2 1 2 3 1 1
[2332] 2 1 2 2 1 3 3 3 2 1 1 2 2 2 2 2 2 3 2 2 2 1 2 1 2 3 1 2 3 1 1 1 1 2 1 1 1
[2369] 2 2 1 1 1 3 1 2 2 3 1 3 1 1 3 2 2 3 1 1 2 1 1 3 3 2 3 1 1 2 2 2 3 1 2 1 1
[2406] 2 1 1 2 2 3 2 2 2 3 2 2 1 1 2 2 1 2 2 1 2 2 1 3 1 1 2 2 1 3 1 2 2 2 2 1 2
[2443] 1 1 2 2 1 3 2 1 1 3 1 1 2 3 2 1 2 2 1 1 2 1 2 2 1 1 3 1 1 1 2 2 2 2 2 3 3
[2480] 2 1 2 1 1 2 2 2 1 1 1 3 1 1 1 1 2 2 1 3 2 1 1 3 1 2 1 2 2 1 1 2 2 2 2 1 1
[2517] 1 2 2 2 2 2 3 3 2 1 1 1 2 1 2 2 2 2 2 2 3 2 3 3 2 3 2 2 2 1 2 1 2 1 1 2 1
[2554] 1 2 2 1 1 3 1 1 3 2 1 1 1 1 3 2 3 3 1 2 2 1 1 2 2 3 2 3 2 1 1 2 2 3 2 2 2
[2591] 3 1 2 3 2 3 1 3 2 1 1 2 3 2 1 2 2 2 2 3 1 1 2 3 3 2 2 1 1 1 2 1 2 3 2 2 2
[2628] 2 1 1 2 2 3 2 1 1 3 2 1 2 1 3 2 2 1 1 3 2 3 1 2 1 1 1 2 2 3 1 1 2 1 2 2 3
[2665] 3 3 1 2 2 1 2 2 1 2 1 1 1 1 3 2 2 1 2 2 1 1 2 2 2 3 2 3 2 2 1 1 2 2 3 1 1
[2702] 2 3 2 3 2 3 1 2 1 2 1 1 3 3 1 1 1 2 2 2 3 2 2 2 2 2 1 1 2 2 1 1 2 2 1 2 2
[2739] 3 1 2 1 1 1 2 2 2 1 1 2 2 2 2 3 1 1 1 2 2 2 1 1 2 2 1 2 3 2 3 3 1 3 3 3 2
[2776] 3 3 2 2 3 1 2 2 2 1 1 1 3 1 1 2 3 2 3 2 2 2 2 2 2 2 2 3 1 2 3 2 1 2 3 2 1
[2813] 3 2 2 1 2 3 1 2 1 2 1 1 2 1 1 1 1 1 2 2 3 3 1 1 1 1 3 2 1 3 1 1 2 2 2 1 1
[2850] 3 3 2 3 1 2 1 1 3 3 3 1 1 2 1 2 2 3 3 2 2 3 2 2 2 2 2 1 2 3 1 2 2 2 1 1 2
[2887] 2 1 2 2 2 2 2 1 2 2 2 1 3 1 2 2 2 2 3 2 1 1 2 1 3 1 3 1 1 3 2 2 1 1 2 2 2
[2924] 2 2 2 2 3 3 1 1 1 3 2 3 2 2 2 1 2 1 2 3 2 2 1 3 3 1 1 1 1 3 2 1 2 3 2 2 1
[2961] 2 2 1 1 1 2 1 1 3 1 2 2 1 2 1 2 1 3 2 2 1 2 3 2 2 2 3 1 3 3 2 1 2 2 3 3 3
[2998] 2 2 2 3 1 1 2 1 3 3 2 2 1 1 2 3 3 2 1 2 2 2 2 3 2 2 2 2 2 1 2 2 3 2 2 2 3
[3035] 2 2 1 1 1 3 1 1 2 2 1 2 1 3 2 2 2 2 2 1 3 1 2 1 1 2 3 1 1 2 2 3 2 1 3 2 1
[3072] 1 2 1 1 1 1 2 1 2 2 3 1 1 1 2 2 2 2 1 2 2 1 1 2 2 3 2 1 2 2 2 2 2 2 1 1 1
[3109] 1 2 1 1 2 1 1 1 1 1 2 2 1 1 1 1 3 2 2 1 1 1 1 2 1 1 2 1 2 2 3 2 1 2 1 2 1
[3146] 1 1 1 2 2 3 1 1 1 1 2 2 1 2 2 3 3 1 2 1 3 1 2 3 2 1 1 1 1 2 1 2 1 2 1 1 1
[3183] 2 1 3 3 1 1 3 1 2 1 2 3 1 2 1 1 1 1 2 1 1 3 2 2 1 2 2 3 1 3 2 1 2 1 1 2 3
[3220] 1 2 1 2 1 2 2 1 1 1 1 2 2 2 2 1 2 1 1 2 1 2 1 1 2 2 3 1 3 1 2 1 2 2 2 2 1
[3257] 1 2 1 1 2 2 3 1 2 1 2 2 3 2 3 3 1 3 1 1 3 1 3 1 2 2 3 2 1 1 2 1 2 2 2 2 1
[3294] 1 1 3 3 2 2 2 2 2 1 2 1 1 2 1 3 2 2 2 3 2 1 2 2 2 3 2 2 3 3 2 2 2 3 3 1 3
[3331] 1 2 1 1 1 3 2 2 1 3 1 1 1 2 3 1 1 2 2 3 1 3 1 2 1 2 3 1 2 1 3 1 1 2 1 1 3
[3368] 2 3 1 1 1 3 3 2 1 3 1 1 2 2 3 2 3 3 1 1 2 3 1 1 2 1 1 1 1 2 3 1 2 2 1 2 2
[3405] 1 2 1 2 2 1 1 2 3 2 2 1 2 1 2 1 2 3 2 2 1 1 1 3 2 1 2 2 2 1 2 1 2 2 1 1 1
[3442] 2 1 1 2 1 3 2 1 3 1 2 2 1 2 1 3 2 3 1 2 2 2 1 3 2 1 2 1 1 2 2 2 2 2 2 3 3
[3479] 1 1 2 2 2 3 1 1 1 1 1 3 1 2 1 2 1 1 1 2 1 2 2 2 1 1 2 2 1 2 3 1 1 1 2 1 1
[3516] 2 2 1 3 1 2 1 2 1 2 2 2 1 1 1 2 1 1 2 1 2 2 2 1 2 1 3 1 1 1 3 2 2 1 1 1 2
[3553] 2 3 1 1 1 2 2 2 1 1 3 2 1 1 2 2 3 1 2 3 1 3 2 2 2 1 3 1 1 1 2 1 2 1 3 1 2
[3590] 1 1 2 1 2 1 1 3 2 1 1 1 1 2 2 2 2 1 1 3 3 1 2 3 3 3 3 1 1 1 1 2 2 2 1 2 2
[3627] 1 1 3 2 3 2 3 2 2 2 1 1 2 3 2 2 2 2 2 1 3 1 1 2 2 3 1 2 2 3 1 2 2 1 1 3 1
[3664] 1 2 1 2 1 1 1 1 2 1 1 1 2 2 2 2 3 1 3 1 2 2 2 2 1 3 1 2 1 2 2 1 2 1 1 1 2
[3701] 3 2 3 2 3 1 2 1 1 1 2 1 2 2 2 1 2 2 1 1 2 1 1 2 1 1 1 2 1 1 1 1 1 2 1 3 2
[3738] 1 2 1 3 2 3 1 2 3 2 1 1 1 2 1 2 1 2 2 1 3 3 1 2 2 2 2 2 1 2 1 1 3 1 1 3 2
[3775] 2 1 1 2 2 2 3 2 1 2 3 2 2 1 1 2 2 1 2 3 2 2 1 2 2 1 2 1 1 1 3 1 1 3 3 1 2
[3812] 2 1 2 1 1 2 2 1 2 1 3 3 2 1 1 3 3 1 2 1 3 2 1 2 1 1 1 1 1 2 1 2 3 2 1 1 1
[3849] 2 1 2 3 1 2 1 3 2 2 1 2 1 2 2 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1
[3886] 1 2 3 1 1 3 2 2 1 1 1 3 2 3 2 2 1 2 2 1 2 2 1 1 1 1 2 1 1 1 1 2 1 2 3 1 1
[3923] 1 1 3 1 2 2 2 1 3 2 1 1 1 2 1 1 2 2 1 2 1 3 2 3 1 3 3 1 1 1 2 2 3 2 2 1 2
[3960] 1 2 2 1 2 1 1 1 1 1 1 2 1 3 2 2 2 2 2 1 1 2 2 1 1 1 2 3 2 2 2 2 2 3 1 1 2
[3997] 2 2 1 2 2 2 2 2 2 2 2 1 3 3 1 2 1 2 2 1 3 1 3 2 3 1 1 2 1 2 3 2 2 1 2 1 2
[4034] 2 1 3 2 2 2 3 2 1 3 2 1 2 1 3 1 2 2 3 1 1 2 1 3 2 1 2 3 1 1 2 2 1 2 2 2 2
[4071] 2 1 1 2 2 1 2 1 1 2 1 2 1 2 2 1 2 3 1 1 1 1 3 3 2 1 1 2 1 2 1 2 1 1 1 3 2
[4108] 2 2 1 2 1 1 1 2 1 3 1 1 1 2 2 3 2 2 2 2 1 2 1 2 2 2 3 1 3 1 1 2 1 2 1 2 2
[4145] 3 2 2 3 2 1 2 3 1 3 3 2 3 1 2 1 1 1 2 2 1 2 2 3 2 3 1 2 2 1 1 3 1 2 2 1 1
[4182] 2 2 3 2 1 1 1 1 1 3 1 2 2 2 1 2 2 2 1 1 2 1 2 1 3 2 1 2 1 1 1 2 2 2 2 3 2
[4219] 2 2 2 2 2 2 2 3 1 3 1 2 2 2 1 3 2 1 3 2 3 2 3 1 2 2 2 3 2 2 3 1 2 1 2 2 2
[4256] 2 3 1 3 3 2 2 2 2 2 1 1 3 2 2 1 3 2 3 1 2 3 3 2 1 3 1 3 2 3 3 2 2 1 2 2 2
[4293] 3 2 3 2 1 1 1 1 2 2 2 3 2 1 1 3 3 3 3 1 1 2 1 2 1 2 2 2 3 1 1 2 2 2 1 1 3
[4330] 2 2 3 2 2 2 1 1 3 2 1 1 1 1 1 3 3 2 2 2 1 2 1 1 2 3 1 1 3 1 1 2 2 1 1 2 1
[4367] 2 3 1 2 1 2 1 3 1 1 1 3 1 1 1 1 2 1 1 2 2 1 3 2 2 3 2 1 2 2 1 2 2 2 1 2 2
[4404] 2 1 2 2 1 2 1 3 1 1 1 3 2 2 2 2 3 3 2 3 2 1 3 3 1 2 3 1 1 2 2 1 2 2 2 1 1
[4441] 1 2 2 1 1 2 2 2 2 1 1 2 2 1 2 2 1 2 2 3 2 2 1 1 2 1 1 2 2 1 3 2 2 1 2 1 2
[4478] 1 1 3 3 1 1 2 3 1 1 1 1 1 1 1 2 2 3 1 2 3 2 2 2 2 2 2 1 1 1 1 1 2 2 1 2 1
[4515] 1 1 2 1 2 2 1 2 1 1 2 3 1 1 2 1 1 1 3 2 1 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2
[4552] 1 2 2 1 1 2 1 1 2 1 2 2 3 2 3 3 2 1 2 3 2 1 2 3 2 2 2 1 2 3 1 1 2 1 3 2 1
[4589] 1 1 2 1 1 3 2 1 1 1 2 2 2 1 2 3 1 2 3 2 1 2 1 1 1 2 2 1 2 1 1 2 2 2 2 2 1
[4626] 3 2 1 1 3 2 3 2 1 1 1 1 1 1 2 1 1 3 1 2 1 3 2 1 3 2 2 3 3 1 3 2 1 1 3 2 3
[4663] 2 1 2 1 1 2 1 1 3 3 3 2 1 1 1 1 1 3 3 1 2 2 2 2 2 2 1 2 2 1 2 2 1 1 2 2 3
[4700] 2 3 1 1 2 2 2 2 2 2 1 1 2 1 1 1 2 2 1 1 3 1 2 1 3 1 1 1 1 2 1 2 1 1 3 2 1
[4737] 2 2 2 3 3 2 2 2 1 2 1 2 1 1 2 3 3 2 2 3 1 1 2 2 2 2 1 2 2 2 1 1 1 1 1 1 1
[4774] 2 2 1 2 1 2 1 2 1 3 1 2 1 1 2 1 2 3 2 1 2 2 2 1 1 1 1 1 1 1 2 2 3 2 1 1 1
[4811] 2 3 3 2 2 2 2 3 1 1 1 1 3 3 1 2 2 3 2 1 1 1 1 2 2 2 2 1 2 1 1 2 3 2 1 3 3
[4848] 1 2 2 2 2 1 1 3 2 2 1 2 3 2 2 1 2 1 2 1 1 2 2 2 2 1 3 1 2 3 2 1 1 2 2 1 3
[4885] 2 2 2 1 3 2 2 2 1 1 2 3 1 1 2 2 1 1 1 1 2 2 2 1 3 1 2 3 2 1 2 2 1 1 2 1 1
[4922] 1 1 1 1 2 1 1 2 2 2 2 2 3 1 2 1 3 2 2 2 1 2 1 2 1 2 1 1 1 2 2 1 2 1 2 1 1
[4959] 2 2 2 1 3 3 3 1 1 1 2 1 1 2 2 1 2 1 1 1 2 2 3 3 1 2 1 2 1 2 1 1 2 2 1 3 2
[4996] 1 1 2 2 1
kmeans2$centers age experience income cc_avg
1 -0.8912604 -0.8924191 -0.2930636 -0.3277429
2 0.8904403 0.8852371 -0.3195201 -0.3326932
3 -0.1520638 -0.1350880 1.6031701 1.7263626
kmeans2$size [1] 2029 2168 803
kmeans2$iter [1] 3
d2 <- dist(bank_2_scale)
sil_kmeans2 <- silhouette(kmeans2$cluster, d2)
summary(sil_kmeans2)Silhouette of 5000 units in 3 clusters from silhouette.default(x = kmeans2$cluster, dist = d2) :
Cluster sizes and average silhouette widths:
2029 2168 803
0.4344352 0.4359975 0.2505801
Individual silhouette widths:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1365 0.2776 0.4538 0.4056 0.5614 0.6311
Findings
The data needs to be scaled as there are many different ranges of numbers within the variables. The quality of the clusters is weak and could be artificial.
bank_clus <- bank %>%
mutate(clusters2 = kmeans2$cluster) %>%
mutate(cluster = case_when(clusters2 == 1 ~ 'C1',
clusters2 == 2 ~ 'C2',
clusters2 == 3 ~ 'C3'))
bank_clus_means <- bank_clus %>%
group_by(cluster) %>%
summarise(num_custs = n(),
age_m = mean(age),
experience_m = mean(experience),
income_m = mean(income),
cc_avg_m = mean(cc_avg),
personal_loan_m = mean(personal_loan))
knitr::kable(select(bank_clus_means, cluster, num_custs, age_m, experience_m, income_m, cc_avg_m),
digits = c(0,0,0,0,0,0),
col.names = c("Cluster", "# Customers", "Avg. Age", "Avg. Experience","Avg. Income", "Avg. Credit Card Spend"),
caption = "Number of Customers and Average Age, Experience, Income and Credit Card Spend by Cluster")%>%
kable_styling(full_width = F) %>%
row_spec(1, color = "white", background = "royalblue") %>%
row_spec(2, color = "white", background = "#0041C2") %>%
row_spec(3, color = "white", background = "blue")| Cluster | # Customers | Avg. Age | Avg. Experience | Avg. Income | Avg. Credit Card Spend |
|---|---|---|---|---|---|
| C1 | 2029 | 35 | 10 | 60 | 1 |
| C2 | 2168 | 56 | 30 | 59 | 1 |
| C3 | 803 | 44 | 19 | 148 | 5 |
Findings
Cluster 1 is a large cluster of 2029 customers with the average age of 35, they have been working for an average of 10 years and make around €60 000 a year. They spend on average €1000 a month on their credit card.
Cluster 2 is also a large cluster made up of 2168 customers they are a bit older, on average 56 years old and they have been working for 30 years. They make on average €59 000 a year and spend €1000 a moth on their credit card.
Cluster 3 is a much smaller cluster of 803 customers, they are middle aged (44 y.o. avg.) have been working for an average of 19 years and make €148000 a year, they spend around €5000 on their credit card every month.
bank_clus$personal_loan <- factor(bank_clus$personal_loan,
levels = c(1, 0))
ggplot(bank_clus, aes(x = personal_loan, group = cluster)) +
geom_bar(aes(y = after_stat(prop),
fill = factor(after_stat(x))),
stat = "count",
show.legend = FALSE) +
facet_grid(~ cluster) +
scale_y_continuous(labels = scales::percent) +
ylab("Percentage of Customers") +
xlab("Taken out a Loan in the Past") +
ggtitle("Loan Taken out in the Past by Cluster")Findings
Segment 3 has the highest percentage of customers who have taken out a loan in the past, they are most likely to take out a loan in the future.