Customer Segmentation

Setup

library(cluster)
library(tidyverse)
library(kableExtra)
market <- read_csv("supermarket_customers.csv")

Segmenting Supermarket Customers using Hierarchical Clustering

market_2 <- select(market, annual_income, spending_score)
market_2_scale <- scale(market_2)
d1 <- dist(market_2_scale)
h1 <- hclust(d1)
plot(h1, hang = -1)

heatmap(as.matrix(d1), Rowv = as.dendrogram(h1), Colv = 'Rowv')

Findings

The dataset needs to be scaled, because the values of one variable range from 1 to 70, second variable is 1 to 90 and the third is one to 130, while the difference isn’t massive just to be safe we will scale the variables

We can see distinct clusters along the diagonal of the heatmap, although I found it easier to identify the 5 clusters through the dendrogram rather than the heatmap. The heatmap makes is seem like there is 4 clusters rather than 5.

clusters1 <- cutree(h1, k = 5)
sil1 <- silhouette(clusters1, d1)
summary(sil1)
Silhouette of 200 units in 5 clusters from silhouette.default(x = clusters1, dist = d1) :
 Cluster sizes and average silhouette widths:
       23        21        79        39        38 
0.5065849 0.6199558 0.6140651 0.5094809 0.4623933 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1401  0.4993  0.5965  0.5531  0.6610  0.7645 

Findings

A reasonable clustering structure has been found for 2 clusters. And the 3 remaining clusters are close to the tipping point of a reasonable clustering, mean quality is 0.59 which is within the 0.51 - 0.71 reasonable structure.

ggplot(data = market_2) + 
  geom_point(mapping = aes(x = annual_income, y = spending_score, colour = clusters1)) +
  labs(title = "Scatterplot of Annual Income and Spending Score by Cluster",
    x = "Annual Income",
    y = "Spending Score")

Findings

Cluster 1 is made up of low income low spenders. Cluster 2 is made up of low income high spenders. Cluster 3 is made up of medium income medium spenders. Cluster 4 is made up of high income high spenders. Cluster 5 is made up of high income low spenders.

market_clus <- market %>%
  mutate(clusters1 = clusters1) %>%
  mutate(cluster = case_when(clusters1 == 1 ~ 'C1',
                             clusters1 == 2 ~ 'C2',
                             clusters1 == 3 ~ 'C3',
                             clusters1 == 4 ~ 'C4',
                             clusters1 == 5 ~ 'C5'))

market_clus_means <- market_clus %>%
  group_by(cluster) %>%
  summarise(num_custs = n(),
            average_age = mean(age),
      income_m = mean(annual_income),
      spending_m = mean(spending_score))

knitr::kable(select(market_clus_means, cluster,num_custs, average_age, income_m, spending_m), 
             digits = c(0,0,0,0,0),
             col.names = c("Cluster", "# Customers", "Avg. Age","Avg. Income","Avg. Spending"), 
             caption = "Number of Customers, Average Age, Average Income and Average Spending by Cluster") %>%
 kable_styling(full_width = F) %>%
  row_spec(1, color = "white", background = "royalblue") %>%
  row_spec(2, color = "white", background = "#0041C2") %>%
  row_spec(3, color = "white", background = "blue") %>%
  row_spec(4, color = "white", background = "#123456") %>%
  row_spec(5, color = "white", background = "#151B54")
Number of Customers, Average Age, Average Income and Average Spending by Cluster
Cluster # Customers Avg. Age Avg. Income Avg. Spending
C1 23 45 26 21
C2 21 25 25 80
C3 79 43 54 50
C4 39 33 87 82
C5 38 40 87 19

Findings

Cluster 1 is a smaller cluster with the average age of 45, lower income and lower spending. Cluster 2 is also a smaller cluster made up of younger customers with low income, but higher spend. Cluster 3 is a large segment made up of middle aged customers with medium income and medium spend. Cluster 4 is a medium sized cluster made up of millennial customers with high income and high spending. Cluster 5 is a medium sized cluster made up of middle aged customers with high income and low spending. Cluster

market_clus$gender <- factor(market_clus$gender, 
                               levels = c("Male", "Female"))

ggplot(market_clus, aes(x = gender, group = cluster)) + 
  geom_bar(aes(y = after_stat(prop), 
               fill = factor(after_stat(x))), 
           stat = "count", 
           show.legend = FALSE) + 
  facet_grid(~ cluster) +
  scale_y_continuous(labels = scales::percent) +
  ylab("Percentage of Customers") + 
  xlab("Gender") +
  ggtitle("Gender Distribution by Cluster")

Findings

Cluster 1 is predominantly female close to a 60/40 split. Cluster 2 is also predominantly female close to a 57/43 split. Cluster 3 is similar to cluster 1 where female customers make up almost 60%. Cluster 4 is more balanced still mostly female, but seems like a 53/47 split cluster 5 is predominantly male with a balance of 52/48.

Cluster 1 - Smart Savers Cluster 2 - Budget Ballers Cluster 3 - Middle Marry’s Cluster 4 - Premium Patrons Cluster 5 - Prosperous Penny-Pinchers

Marketing Actions

The supermarket should focus on its middle market, the third cluster as it is the largest segment with mean spend, it will be easier to persuade them to be high spenders, then for example cluster 1. Since it is predominantly middle-aged females with medium income the supermarket should run a campaign for the average working woman, see what they buy most often and create bundles and survey them on what they would like to see added to the offering. Next target market should be cluster 5, which is predominantly middle aged men with high income, but low expenditure, They have the money to spend, but they are either saving it or spending it somewhere else, if we can get them to spend their more in this supermarket we could turn them to high spenders. This could be done by offering luxury products, they may be looking for somewhere else (eg. expensive alcohol, premium cut meats), we need to find out what they want and then offer it to them.

Segmenting Bank Customers using K-means

bank <- read_csv("bank_personal_loan.csv")
bank_2 <- select(bank, age, experience, income, cc_avg)
bank_2_scale <- scale(bank_2)

set.seed(101)
kmeans2 <- kmeans(bank_2_scale, centers = 3)

kmeans2$cluster
   [1] 1 1 1 1 1 1 2 2 1 3 2 1 3 2 2 2 3 1 3 2 2 2 1 1 3 1 1 3 2 3 2 1 2 1 1 2 2
  [38] 2 3 1 2 1 1 1 3 2 1 1 2 1 1 2 1 3 1 3 2 2 1 3 2 3 1 1 3 3 2 2 2 2 3 2 3 3
  [75] 1 3 2 2 2 2 2 2 3 1 2 1 1 2 2 1 3 1 1 2 2 1 3 3 2 2 2 2 2 1 2 1 1 1 1 1 1
 [112] 2 1 2 1 2 2 2 1 3 2 2 2 1 1 2 1 1 1 1 1 3 1 1 2 2 2 2 2 2 2 1 1 1 2 3 1 2
 [149] 2 2 3 3 2 2 2 1 1 1 1 2 3 2 1 1 2 1 1 1 2 1 1 2 3 2 3 1 2 1 2 2 2 1 1 3 2
 [186] 1 2 3 2 2 2 2 2 2 3 1 3 2 1 3 1 1 1 2 2 1 2 1 1 3 2 1 2 3 2 1 1 1 1 2 1 1
 [223] 1 2 2 1 1 3 2 2 2 1 1 2 1 1 1 2 2 1 2 2 1 3 1 1 1 2 2 1 1 3 2 2 2 2 1 2 1
 [260] 2 2 1 2 1 1 2 2 2 2 1 2 1 1 1 1 2 1 1 2 3 1 2 1 2 1 1 2 1 3 1 3 1 1 1 1 2
 [297] 1 2 1 3 1 3 3 3 2 2 2 1 3 2 2 3 1 1 2 1 2 3 1 2 2 3 2 2 3 2 2 2 2 3 2 1 2
 [334] 2 2 2 1 2 1 1 2 1 3 1 2 2 1 1 3 1 1 3 2 2 3 1 2 1 3 1 1 3 2 1 2 3 2 1 2 1
 [371] 1 2 2 2 1 1 2 1 2 1 2 2 2 1 2 1 1 1 2 3 1 2 2 2 1 2 2 1 2 1 3 1 3 2 2 3 3
 [408] 2 2 2 3 2 1 1 2 1 1 2 1 2 2 1 3 3 2 1 1 1 2 1 2 1 3 2 1 2 2 1 2 2 2 2 2 2
 [445] 2 2 2 2 1 2 2 1 1 2 2 1 2 1 2 3 2 2 3 3 3 2 1 1 1 2 1 2 1 2 3 3 2 2 3 3 2
 [482] 1 3 1 1 2 2 1 1 2 1 1 2 3 3 1 2 2 1 2 2 2 1 1 1 1 2 2 2 3 2 1 1 1 1 1 2 2
 [519] 1 1 2 2 1 2 1 2 3 1 2 1 2 1 2 1 2 2 1 3 1 2 1 3 1 2 3 1 1 1 2 2 2 2 1 2 3
 [556] 1 2 1 1 2 1 2 1 2 1 2 3 1 1 1 3 1 3 2 1 2 1 3 1 2 2 1 1 1 3 1 1 3 1 3 1 1
 [593] 1 1 2 1 3 3 2 1 2 2 1 2 1 2 1 1 1 1 2 2 2 2 3 3 1 2 2 2 1 1 1 1 1 2 1 1 2
 [630] 1 1 1 3 2 2 2 3 2 1 2 3 3 2 3 2 1 2 2 2 1 3 1 1 2 3 2 1 1 2 2 3 2 2 2 2 2
 [667] 2 2 2 2 1 2 2 1 2 1 2 3 2 2 2 3 2 1 3 1 1 2 1 2 2 1 1 1 3 1 2 3 2 1 1 1 3
 [704] 3 3 2 3 2 1 1 1 2 1 1 2 2 1 2 2 2 2 2 2 2 2 3 2 2 3 2 3 1 1 2 2 1 3 2 1 2
 [741] 3 2 1 2 3 1 2 2 1 2 3 2 2 2 1 2 2 2 2 2 1 2 1 2 1 2 1 2 1 1 3 1 3 3 2 2 3
 [778] 2 2 3 1 3 3 3 3 3 2 3 2 1 2 2 3 1 2 2 1 1 1 1 3 2 1 2 2 3 2 3 2 2 1 2 1 2
 [815] 1 2 2 1 2 2 2 1 2 1 1 1 2 2 1 2 1 2 1 2 3 3 1 1 1 1 1 3 1 2 2 1 2 1 2 1 2
 [852] 1 1 1 2 2 2 2 1 3 2 2 2 2 1 2 1 2 3 2 1 2 1 1 1 2 1 1 1 2 2 3 3 2 1 1 2 3
 [889] 3 1 2 3 1 2 1 1 3 2 2 3 1 2 2 1 1 2 1 2 2 3 2 2 1 2 2 3 1 3 1 2 1 1 1 2 2
 [926] 1 1 2 1 2 1 1 2 2 2 3 2 1 2 2 3 3 2 1 1 2 1 2 1 1 1 2 3 2 3 2 1 2 2 2 3 1
 [963] 2 1 1 3 2 2 2 1 2 3 1 3 2 2 2 2 2 2 1 3 2 3 2 3 3 2 2 1 1 1 1 3 1 1 1 2 2
[1000] 2 2 2 2 1 2 1 2 3 3 1 1 2 2 1 3 2 1 1 1 1 2 1 1 3 2 2 1 3 1 1 2 2 1 2 2 1
[1037] 2 1 3 3 1 2 2 2 2 3 2 2 2 1 3 1 1 2 2 1 1 1 2 1 2 2 3 2 3 1 1 3 1 1 1 1 3
[1074] 1 1 1 1 3 2 3 2 2 1 1 3 2 2 1 2 2 1 1 1 1 2 2 1 3 2 1 1 1 1 1 3 1 1 2 2 2
[1111] 2 3 2 1 1 2 3 3 1 1 1 1 1 2 1 3 1 1 3 1 3 3 1 1 1 2 2 3 1 2 1 3 1 1 2 1 1
[1148] 1 3 3 2 2 2 2 2 1 2 2 1 2 1 3 1 3 3 1 3 3 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2
[1185] 1 1 2 2 1 1 3 1 3 2 1 1 1 2 1 1 1 1 1 2 1 1 3 1 2 2 3 2 1 1 2 1 2 1 2 1 3
[1222] 1 2 1 2 1 2 1 2 2 1 2 1 2 1 2 1 3 1 2 2 2 1 3 3 2 2 2 1 2 2 1 1 2 1 1 1 2
[1259] 1 2 2 2 1 1 3 1 2 2 1 1 1 1 2 3 2 1 1 3 1 2 2 1 2 1 2 1 1 1 2 2 2 2 2 2 1
[1296] 1 1 2 1 2 2 3 1 1 3 1 1 3 2 1 2 1 2 2 1 2 1 2 3 1 1 3 1 2 2 2 1 2 3 1 1 1
[1333] 1 2 2 3 1 3 2 3 1 1 1 1 2 2 1 2 1 3 1 2 2 2 3 2 1 2 2 2 3 2 1 1 1 2 2 2 2
[1370] 2 1 2 3 2 2 3 2 1 2 2 2 1 1 2 2 2 1 1 2 1 1 1 2 2 2 3 1 2 3 1 1 1 3 1 2 3
[1407] 2 2 3 1 2 3 2 2 2 1 1 1 3 1 1 1 1 2 1 2 1 1 1 1 1 3 3 2 2 1 2 1 2 2 1 2 1
[1444] 1 3 3 1 3 1 2 2 1 2 1 2 2 1 1 2 2 1 2 2 1 1 1 3 2 1 2 2 3 1 2 3 3 2 1 3 1
[1481] 2 1 2 2 2 1 3 1 1 2 1 1 1 2 2 3 1 2 3 3 2 1 2 1 3 2 2 1 1 3 2 2 2 3 3 2 1
[1518] 2 1 2 3 3 1 1 1 1 1 2 3 1 2 1 1 2 2 2 1 2 2 1 1 3 2 2 1 2 3 2 2 2 1 3 3 2
[1555] 1 2 1 2 1 2 1 2 1 2 2 1 2 2 2 2 3 1 2 1 2 2 1 1 1 1 1 2 3 3 2 2 2 2 1 2 2
[1592] 1 3 2 1 2 1 2 1 3 3 3 1 3 2 2 1 2 1 2 1 2 1 2 2 2 2 2 1 2 1 1 1 2 1 2 3 2
[1629] 1 3 1 2 1 2 2 2 2 3 1 2 1 3 1 1 2 2 2 1 2 1 1 3 3 1 2 1 3 1 2 3 1 1 2 2 2
[1666] 1 3 1 2 1 1 1 2 1 1 2 2 1 2 3 2 3 2 2 2 1 2 2 2 2 1 2 2 2 2 3 3 2 3 2 1 1
[1703] 2 2 3 2 2 2 2 2 1 3 1 1 2 1 3 1 1 3 2 2 1 1 1 2 2 2 2 2 1 3 1 1 1 2 2 1 2
[1740] 1 1 3 2 2 1 1 2 1 2 2 2 2 3 2 3 1 1 1 1 1 1 2 2 3 2 3 2 1 3 2 2 2 1 1 1 2
[1777] 2 2 1 1 2 2 1 3 2 3 1 1 1 3 1 2 3 1 2 2 2 3 3 1 2 1 1 2 1 2 2 2 2 1 2 1 3
[1814] 2 2 2 1 1 1 2 2 1 3 1 3 3 2 2 1 2 3 2 2 1 1 3 1 1 1 1 2 1 2 3 2 1 2 1 1 2
[1851] 1 1 1 2 2 2 2 1 1 2 3 2 3 2 2 1 2 2 3 2 2 1 1 1 1 1 2 2 2 2 1 1 2 3 3 1 2
[1888] 1 1 2 3 1 2 2 2 1 1 2 2 2 2 3 1 2 1 1 1 1 2 2 3 2 3 2 2 1 2 2 3 1 2 2 1 1
[1925] 2 1 1 1 2 1 2 1 2 2 1 3 2 3 1 2 2 1 2 2 2 2 2 2 1 2 1 2 1 2 1 1 1 1 1 2 1
[1962] 3 3 2 1 1 2 1 2 2 1 1 1 2 1 1 1 1 1 1 3 2 2 1 1 1 1 2 2 2 1 2 2 1 3 1 2 2
[1999] 2 2 1 3 3 3 1 3 2 2 2 1 2 2 2 1 3 1 1 1 2 1 2 3 1 2 1 2 2 1 1 1 2 2 2 2 2
[2036] 1 1 1 3 2 1 3 1 2 2 2 3 2 1 1 1 1 1 2 1 2 1 3 1 3 2 3 2 2 2 1 1 3 2 1 2 2
[2073] 1 2 2 1 3 3 1 1 2 2 1 1 1 2 1 3 1 2 2 1 2 2 2 3 2 1 2 2 1 3 1 3 1 1 2 1 2
[2110] 3 1 2 1 2 2 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 2 2 2 1 2 1 2 2 1 2 2 1 2 2 3 2
[2147] 1 1 2 3 2 1 2 3 1 2 1 1 2 2 1 2 1 1 1 1 1 2 2 2 1 1 1 1 1 3 1 3 3 2 2 1 1
[2184] 1 3 2 1 2 1 2 1 3 1 1 1 2 3 2 2 2 2 1 2 2 2 2 1 3 2 1 2 1 2 2 2 3 2 3 1 2
[2221] 2 2 1 2 1 2 3 2 2 2 3 3 2 2 1 3 2 1 2 2 1 1 1 2 2 2 3 2 2 1 3 1 2 2 2 1 2
[2258] 2 2 1 1 3 2 2 1 2 3 3 1 1 1 2 1 1 1 3 3 1 3 2 1 2 1 2 2 2 2 1 1 2 1 2 3 1
[2295] 1 2 1 2 2 2 2 1 3 2 3 3 1 2 1 1 1 2 2 2 1 3 3 3 2 3 2 1 2 1 1 2 1 2 3 1 1
[2332] 2 1 2 2 1 3 3 3 2 1 1 2 2 2 2 2 2 3 2 2 2 1 2 1 2 3 1 2 3 1 1 1 1 2 1 1 1
[2369] 2 2 1 1 1 3 1 2 2 3 1 3 1 1 3 2 2 3 1 1 2 1 1 3 3 2 3 1 1 2 2 2 3 1 2 1 1
[2406] 2 1 1 2 2 3 2 2 2 3 2 2 1 1 2 2 1 2 2 1 2 2 1 3 1 1 2 2 1 3 1 2 2 2 2 1 2
[2443] 1 1 2 2 1 3 2 1 1 3 1 1 2 3 2 1 2 2 1 1 2 1 2 2 1 1 3 1 1 1 2 2 2 2 2 3 3
[2480] 2 1 2 1 1 2 2 2 1 1 1 3 1 1 1 1 2 2 1 3 2 1 1 3 1 2 1 2 2 1 1 2 2 2 2 1 1
[2517] 1 2 2 2 2 2 3 3 2 1 1 1 2 1 2 2 2 2 2 2 3 2 3 3 2 3 2 2 2 1 2 1 2 1 1 2 1
[2554] 1 2 2 1 1 3 1 1 3 2 1 1 1 1 3 2 3 3 1 2 2 1 1 2 2 3 2 3 2 1 1 2 2 3 2 2 2
[2591] 3 1 2 3 2 3 1 3 2 1 1 2 3 2 1 2 2 2 2 3 1 1 2 3 3 2 2 1 1 1 2 1 2 3 2 2 2
[2628] 2 1 1 2 2 3 2 1 1 3 2 1 2 1 3 2 2 1 1 3 2 3 1 2 1 1 1 2 2 3 1 1 2 1 2 2 3
[2665] 3 3 1 2 2 1 2 2 1 2 1 1 1 1 3 2 2 1 2 2 1 1 2 2 2 3 2 3 2 2 1 1 2 2 3 1 1
[2702] 2 3 2 3 2 3 1 2 1 2 1 1 3 3 1 1 1 2 2 2 3 2 2 2 2 2 1 1 2 2 1 1 2 2 1 2 2
[2739] 3 1 2 1 1 1 2 2 2 1 1 2 2 2 2 3 1 1 1 2 2 2 1 1 2 2 1 2 3 2 3 3 1 3 3 3 2
[2776] 3 3 2 2 3 1 2 2 2 1 1 1 3 1 1 2 3 2 3 2 2 2 2 2 2 2 2 3 1 2 3 2 1 2 3 2 1
[2813] 3 2 2 1 2 3 1 2 1 2 1 1 2 1 1 1 1 1 2 2 3 3 1 1 1 1 3 2 1 3 1 1 2 2 2 1 1
[2850] 3 3 2 3 1 2 1 1 3 3 3 1 1 2 1 2 2 3 3 2 2 3 2 2 2 2 2 1 2 3 1 2 2 2 1 1 2
[2887] 2 1 2 2 2 2 2 1 2 2 2 1 3 1 2 2 2 2 3 2 1 1 2 1 3 1 3 1 1 3 2 2 1 1 2 2 2
[2924] 2 2 2 2 3 3 1 1 1 3 2 3 2 2 2 1 2 1 2 3 2 2 1 3 3 1 1 1 1 3 2 1 2 3 2 2 1
[2961] 2 2 1 1 1 2 1 1 3 1 2 2 1 2 1 2 1 3 2 2 1 2 3 2 2 2 3 1 3 3 2 1 2 2 3 3 3
[2998] 2 2 2 3 1 1 2 1 3 3 2 2 1 1 2 3 3 2 1 2 2 2 2 3 2 2 2 2 2 1 2 2 3 2 2 2 3
[3035] 2 2 1 1 1 3 1 1 2 2 1 2 1 3 2 2 2 2 2 1 3 1 2 1 1 2 3 1 1 2 2 3 2 1 3 2 1
[3072] 1 2 1 1 1 1 2 1 2 2 3 1 1 1 2 2 2 2 1 2 2 1 1 2 2 3 2 1 2 2 2 2 2 2 1 1 1
[3109] 1 2 1 1 2 1 1 1 1 1 2 2 1 1 1 1 3 2 2 1 1 1 1 2 1 1 2 1 2 2 3 2 1 2 1 2 1
[3146] 1 1 1 2 2 3 1 1 1 1 2 2 1 2 2 3 3 1 2 1 3 1 2 3 2 1 1 1 1 2 1 2 1 2 1 1 1
[3183] 2 1 3 3 1 1 3 1 2 1 2 3 1 2 1 1 1 1 2 1 1 3 2 2 1 2 2 3 1 3 2 1 2 1 1 2 3
[3220] 1 2 1 2 1 2 2 1 1 1 1 2 2 2 2 1 2 1 1 2 1 2 1 1 2 2 3 1 3 1 2 1 2 2 2 2 1
[3257] 1 2 1 1 2 2 3 1 2 1 2 2 3 2 3 3 1 3 1 1 3 1 3 1 2 2 3 2 1 1 2 1 2 2 2 2 1
[3294] 1 1 3 3 2 2 2 2 2 1 2 1 1 2 1 3 2 2 2 3 2 1 2 2 2 3 2 2 3 3 2 2 2 3 3 1 3
[3331] 1 2 1 1 1 3 2 2 1 3 1 1 1 2 3 1 1 2 2 3 1 3 1 2 1 2 3 1 2 1 3 1 1 2 1 1 3
[3368] 2 3 1 1 1 3 3 2 1 3 1 1 2 2 3 2 3 3 1 1 2 3 1 1 2 1 1 1 1 2 3 1 2 2 1 2 2
[3405] 1 2 1 2 2 1 1 2 3 2 2 1 2 1 2 1 2 3 2 2 1 1 1 3 2 1 2 2 2 1 2 1 2 2 1 1 1
[3442] 2 1 1 2 1 3 2 1 3 1 2 2 1 2 1 3 2 3 1 2 2 2 1 3 2 1 2 1 1 2 2 2 2 2 2 3 3
[3479] 1 1 2 2 2 3 1 1 1 1 1 3 1 2 1 2 1 1 1 2 1 2 2 2 1 1 2 2 1 2 3 1 1 1 2 1 1
[3516] 2 2 1 3 1 2 1 2 1 2 2 2 1 1 1 2 1 1 2 1 2 2 2 1 2 1 3 1 1 1 3 2 2 1 1 1 2
[3553] 2 3 1 1 1 2 2 2 1 1 3 2 1 1 2 2 3 1 2 3 1 3 2 2 2 1 3 1 1 1 2 1 2 1 3 1 2
[3590] 1 1 2 1 2 1 1 3 2 1 1 1 1 2 2 2 2 1 1 3 3 1 2 3 3 3 3 1 1 1 1 2 2 2 1 2 2
[3627] 1 1 3 2 3 2 3 2 2 2 1 1 2 3 2 2 2 2 2 1 3 1 1 2 2 3 1 2 2 3 1 2 2 1 1 3 1
[3664] 1 2 1 2 1 1 1 1 2 1 1 1 2 2 2 2 3 1 3 1 2 2 2 2 1 3 1 2 1 2 2 1 2 1 1 1 2
[3701] 3 2 3 2 3 1 2 1 1 1 2 1 2 2 2 1 2 2 1 1 2 1 1 2 1 1 1 2 1 1 1 1 1 2 1 3 2
[3738] 1 2 1 3 2 3 1 2 3 2 1 1 1 2 1 2 1 2 2 1 3 3 1 2 2 2 2 2 1 2 1 1 3 1 1 3 2
[3775] 2 1 1 2 2 2 3 2 1 2 3 2 2 1 1 2 2 1 2 3 2 2 1 2 2 1 2 1 1 1 3 1 1 3 3 1 2
[3812] 2 1 2 1 1 2 2 1 2 1 3 3 2 1 1 3 3 1 2 1 3 2 1 2 1 1 1 1 1 2 1 2 3 2 1 1 1
[3849] 2 1 2 3 1 2 1 3 2 2 1 2 1 2 2 1 2 2 2 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1
[3886] 1 2 3 1 1 3 2 2 1 1 1 3 2 3 2 2 1 2 2 1 2 2 1 1 1 1 2 1 1 1 1 2 1 2 3 1 1
[3923] 1 1 3 1 2 2 2 1 3 2 1 1 1 2 1 1 2 2 1 2 1 3 2 3 1 3 3 1 1 1 2 2 3 2 2 1 2
[3960] 1 2 2 1 2 1 1 1 1 1 1 2 1 3 2 2 2 2 2 1 1 2 2 1 1 1 2 3 2 2 2 2 2 3 1 1 2
[3997] 2 2 1 2 2 2 2 2 2 2 2 1 3 3 1 2 1 2 2 1 3 1 3 2 3 1 1 2 1 2 3 2 2 1 2 1 2
[4034] 2 1 3 2 2 2 3 2 1 3 2 1 2 1 3 1 2 2 3 1 1 2 1 3 2 1 2 3 1 1 2 2 1 2 2 2 2
[4071] 2 1 1 2 2 1 2 1 1 2 1 2 1 2 2 1 2 3 1 1 1 1 3 3 2 1 1 2 1 2 1 2 1 1 1 3 2
[4108] 2 2 1 2 1 1 1 2 1 3 1 1 1 2 2 3 2 2 2 2 1 2 1 2 2 2 3 1 3 1 1 2 1 2 1 2 2
[4145] 3 2 2 3 2 1 2 3 1 3 3 2 3 1 2 1 1 1 2 2 1 2 2 3 2 3 1 2 2 1 1 3 1 2 2 1 1
[4182] 2 2 3 2 1 1 1 1 1 3 1 2 2 2 1 2 2 2 1 1 2 1 2 1 3 2 1 2 1 1 1 2 2 2 2 3 2
[4219] 2 2 2 2 2 2 2 3 1 3 1 2 2 2 1 3 2 1 3 2 3 2 3 1 2 2 2 3 2 2 3 1 2 1 2 2 2
[4256] 2 3 1 3 3 2 2 2 2 2 1 1 3 2 2 1 3 2 3 1 2 3 3 2 1 3 1 3 2 3 3 2 2 1 2 2 2
[4293] 3 2 3 2 1 1 1 1 2 2 2 3 2 1 1 3 3 3 3 1 1 2 1 2 1 2 2 2 3 1 1 2 2 2 1 1 3
[4330] 2 2 3 2 2 2 1 1 3 2 1 1 1 1 1 3 3 2 2 2 1 2 1 1 2 3 1 1 3 1 1 2 2 1 1 2 1
[4367] 2 3 1 2 1 2 1 3 1 1 1 3 1 1 1 1 2 1 1 2 2 1 3 2 2 3 2 1 2 2 1 2 2 2 1 2 2
[4404] 2 1 2 2 1 2 1 3 1 1 1 3 2 2 2 2 3 3 2 3 2 1 3 3 1 2 3 1 1 2 2 1 2 2 2 1 1
[4441] 1 2 2 1 1 2 2 2 2 1 1 2 2 1 2 2 1 2 2 3 2 2 1 1 2 1 1 2 2 1 3 2 2 1 2 1 2
[4478] 1 1 3 3 1 1 2 3 1 1 1 1 1 1 1 2 2 3 1 2 3 2 2 2 2 2 2 1 1 1 1 1 2 2 1 2 1
[4515] 1 1 2 1 2 2 1 2 1 1 2 3 1 1 2 1 1 1 3 2 1 1 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2
[4552] 1 2 2 1 1 2 1 1 2 1 2 2 3 2 3 3 2 1 2 3 2 1 2 3 2 2 2 1 2 3 1 1 2 1 3 2 1
[4589] 1 1 2 1 1 3 2 1 1 1 2 2 2 1 2 3 1 2 3 2 1 2 1 1 1 2 2 1 2 1 1 2 2 2 2 2 1
[4626] 3 2 1 1 3 2 3 2 1 1 1 1 1 1 2 1 1 3 1 2 1 3 2 1 3 2 2 3 3 1 3 2 1 1 3 2 3
[4663] 2 1 2 1 1 2 1 1 3 3 3 2 1 1 1 1 1 3 3 1 2 2 2 2 2 2 1 2 2 1 2 2 1 1 2 2 3
[4700] 2 3 1 1 2 2 2 2 2 2 1 1 2 1 1 1 2 2 1 1 3 1 2 1 3 1 1 1 1 2 1 2 1 1 3 2 1
[4737] 2 2 2 3 3 2 2 2 1 2 1 2 1 1 2 3 3 2 2 3 1 1 2 2 2 2 1 2 2 2 1 1 1 1 1 1 1
[4774] 2 2 1 2 1 2 1 2 1 3 1 2 1 1 2 1 2 3 2 1 2 2 2 1 1 1 1 1 1 1 2 2 3 2 1 1 1
[4811] 2 3 3 2 2 2 2 3 1 1 1 1 3 3 1 2 2 3 2 1 1 1 1 2 2 2 2 1 2 1 1 2 3 2 1 3 3
[4848] 1 2 2 2 2 1 1 3 2 2 1 2 3 2 2 1 2 1 2 1 1 2 2 2 2 1 3 1 2 3 2 1 1 2 2 1 3
[4885] 2 2 2 1 3 2 2 2 1 1 2 3 1 1 2 2 1 1 1 1 2 2 2 1 3 1 2 3 2 1 2 2 1 1 2 1 1
[4922] 1 1 1 1 2 1 1 2 2 2 2 2 3 1 2 1 3 2 2 2 1 2 1 2 1 2 1 1 1 2 2 1 2 1 2 1 1
[4959] 2 2 2 1 3 3 3 1 1 1 2 1 1 2 2 1 2 1 1 1 2 2 3 3 1 2 1 2 1 2 1 1 2 2 1 3 2
[4996] 1 1 2 2 1
kmeans2$centers 
         age experience     income     cc_avg
1 -0.8912604 -0.8924191 -0.2930636 -0.3277429
2  0.8904403  0.8852371 -0.3195201 -0.3326932
3 -0.1520638 -0.1350880  1.6031701  1.7263626
kmeans2$size    
[1] 2029 2168  803
kmeans2$iter 
[1] 3
d2 <- dist(bank_2_scale)

sil_kmeans2 <- silhouette(kmeans2$cluster, d2)
summary(sil_kmeans2)
Silhouette of 5000 units in 3 clusters from silhouette.default(x = kmeans2$cluster, dist = d2) :
 Cluster sizes and average silhouette widths:
     2029      2168       803 
0.4344352 0.4359975 0.2505801 
Individual silhouette widths:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1365  0.2776  0.4538  0.4056  0.5614  0.6311 

Findings

The data needs to be scaled as there are many different ranges of numbers within the variables. The quality of the clusters is weak and could be artificial.

bank_clus <- bank %>%
  mutate(clusters2 = kmeans2$cluster) %>%
  mutate(cluster = case_when(clusters2 == 1 ~ 'C1',
                             clusters2 == 2 ~ 'C2',
                             clusters2 == 3 ~ 'C3'))

bank_clus_means <- bank_clus %>%
  group_by(cluster) %>%
  summarise(num_custs = n(),
            age_m = mean(age),
            experience_m = mean(experience),
            income_m = mean(income),
            cc_avg_m = mean(cc_avg),
            personal_loan_m = mean(personal_loan))

knitr::kable(select(bank_clus_means, cluster, num_custs, age_m, experience_m, income_m, cc_avg_m), 
             digits = c(0,0,0,0,0,0),
             col.names = c("Cluster", "# Customers", "Avg. Age", "Avg. Experience","Avg. Income", "Avg. Credit Card Spend"), 
             caption = "Number of Customers and Average Age, Experience, Income and Credit Card Spend by Cluster")%>%
 kable_styling(full_width = F) %>%
  row_spec(1, color = "white", background = "royalblue") %>%
  row_spec(2, color = "white", background = "#0041C2") %>%
  row_spec(3, color = "white", background = "blue")
Number of Customers and Average Age, Experience, Income and Credit Card Spend by Cluster
Cluster # Customers Avg. Age Avg. Experience Avg. Income Avg. Credit Card Spend
C1 2029 35 10 60 1
C2 2168 56 30 59 1
C3 803 44 19 148 5

Findings

Cluster 1 is a large cluster of 2029 customers with the average age of 35, they have been working for an average of 10 years and make around €60 000 a year. They spend on average €1000 a month on their credit card.

Cluster 2 is also a large cluster made up of 2168 customers they are a bit older, on average 56 years old and they have been working for 30 years. They make on average €59 000 a year and spend €1000 a moth on their credit card.

Cluster 3 is a much smaller cluster of 803 customers, they are middle aged (44 y.o. avg.) have been working for an average of 19 years and make €148000 a year, they spend around €5000 on their credit card every month.

bank_clus$personal_loan <- factor(bank_clus$personal_loan, 
                               levels = c(1, 0))

ggplot(bank_clus, aes(x = personal_loan, group = cluster)) + 
  geom_bar(aes(y = after_stat(prop), 
               fill = factor(after_stat(x))), 
           stat = "count", 
           show.legend = FALSE) + 
  facet_grid(~ cluster) +
  scale_y_continuous(labels = scales::percent) +
  ylab("Percentage of Customers") + 
  xlab("Taken out a Loan in the Past") +
  ggtitle("Loan Taken out in the Past by Cluster")

Findings

Segment 3 has the highest percentage of customers who have taken out a loan in the past, they are most likely to take out a loan in the future.