Introducation

categorize the dataset based on distinct characteristics or patterns. These segments can provide valuable insights for formulating targeted marketing strategies. Since each segment exhibits different purchase capacities, understanding these variations allows for tailored approaches to maximize effectiveness and meet the unique needs of each customer group.

Data Description

CUST_ID | Identification of Credit Card holder (Categorical)
BALANCE | Balance amount left in their account to make purchases (
BALANCE_FREQUENCY | How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
PURCHASES | Amount of purchases made from account
ONEOFF_PURCHASES | Maximum purchase amount done in one-go
INSTALLMENTS_PURCHASES | Amount of purchase done in installment
CASH_ADVANCE | Cash in advance given by the user
PURCHASES_FREQUENCY | How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
ONEOFFPURCHASESFREQUENCY | How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
PURCHASESINSTALLMENTSFREQUENCY | How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
CASHADVANCEFREQUENCY | How frequently the cash in advance being paid
CASHADVANCETRX | Number of Transactions made with “Cash in Advanced”
PURCHASES_TRX | Numbe of purchase transactions made
CREDIT_LIMIT | Limit of Credit Card for user
PAYMENTS | Amount of Payment done by user
MINIMUM_PAYMENTS | Minimum amount of payments made by user
PRCFULLPAYMENT | Percent of full payment paid by user
TENURE | Tenure of credit card service for user

load the packages

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(stats)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(e1071)
library(caret)
## Loading required package: lattice
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(cluster)

load the data

data <- read.csv("/Users/zihualai/Desktop/CC GENERAL.csv")

perceive the data and clean it

head(data)
##   CUST_ID    BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1  C10001   40.90075          0.818182     95.40             0.00
## 2  C10002 3202.46742          0.909091      0.00             0.00
## 3  C10003 2495.14886          1.000000    773.17           773.17
## 4  C10004 1666.67054          0.636364   1499.00          1499.00
## 5  C10005  817.71434          1.000000     16.00            16.00
## 6  C10006 1809.82875          1.000000   1333.28             0.00
##   INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1                  95.40        0.000            0.166667
## 2                   0.00     6442.945            0.000000
## 3                   0.00        0.000            1.000000
## 4                   0.00      205.788            0.083333
## 5                   0.00        0.000            0.083333
## 6                1333.28        0.000            0.666667
##   ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1                   0.000000                         0.083333
## 2                   0.000000                         0.000000
## 3                   1.000000                         0.000000
## 4                   0.083333                         0.000000
## 5                   0.083333                         0.000000
## 6                   0.000000                         0.583333
##   CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT  PAYMENTS
## 1               0.000000                0             2         1000  201.8021
## 2               0.250000                4             0         7000 4103.0326
## 3               0.000000                0            12         7500  622.0667
## 4               0.083333                1             1         7500    0.0000
## 5               0.000000                0             1         1200  678.3348
## 6               0.000000                0             8         1800 1400.0578
##   MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 1         139.5098         0.000000     12
## 2        1072.3402         0.222222     12
## 3         627.2848         0.000000     12
## 4               NA         0.000000     12
## 5         244.7912         0.000000     12
## 6        2407.2460         0.000000     12
summary(data)
##    CUST_ID             BALANCE        BALANCE_FREQUENCY   PURCHASES       
##  Length:8950        Min.   :    0.0   Min.   :0.0000    Min.   :    0.00  
##  Class :character   1st Qu.:  128.3   1st Qu.:0.8889    1st Qu.:   39.63  
##  Mode  :character   Median :  873.4   Median :1.0000    Median :  361.28  
##                     Mean   : 1564.5   Mean   :0.8773    Mean   : 1003.20  
##                     3rd Qu.: 2054.1   3rd Qu.:1.0000    3rd Qu.: 1110.13  
##                     Max.   :19043.1   Max.   :1.0000    Max.   :49039.57  
##                                                                           
##  ONEOFF_PURCHASES  INSTALLMENTS_PURCHASES  CASH_ADVANCE     PURCHASES_FREQUENCY
##  Min.   :    0.0   Min.   :    0.0        Min.   :    0.0   Min.   :0.00000    
##  1st Qu.:    0.0   1st Qu.:    0.0        1st Qu.:    0.0   1st Qu.:0.08333    
##  Median :   38.0   Median :   89.0        Median :    0.0   Median :0.50000    
##  Mean   :  592.4   Mean   :  411.1        Mean   :  978.9   Mean   :0.49035    
##  3rd Qu.:  577.4   3rd Qu.:  468.6        3rd Qu.: 1113.8   3rd Qu.:0.91667    
##  Max.   :40761.2   Max.   :22500.0        Max.   :47137.2   Max.   :1.00000    
##                                                                                
##  ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
##  Min.   :0.00000            Min.   :0.0000                  
##  1st Qu.:0.00000            1st Qu.:0.0000                  
##  Median :0.08333            Median :0.1667                  
##  Mean   :0.20246            Mean   :0.3644                  
##  3rd Qu.:0.30000            3rd Qu.:0.7500                  
##  Max.   :1.00000            Max.   :1.0000                  
##                                                             
##  CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX  PURCHASES_TRX     CREDIT_LIMIT  
##  Min.   :0.0000         Min.   :  0.000   Min.   :  0.00   Min.   :   50  
##  1st Qu.:0.0000         1st Qu.:  0.000   1st Qu.:  1.00   1st Qu.: 1600  
##  Median :0.0000         Median :  0.000   Median :  7.00   Median : 3000  
##  Mean   :0.1351         Mean   :  3.249   Mean   : 14.71   Mean   : 4494  
##  3rd Qu.:0.2222         3rd Qu.:  4.000   3rd Qu.: 17.00   3rd Qu.: 6500  
##  Max.   :1.5000         Max.   :123.000   Max.   :358.00   Max.   :30000  
##                                                            NA's   :1      
##     PAYMENTS       MINIMUM_PAYMENTS   PRC_FULL_PAYMENT     TENURE     
##  Min.   :    0.0   Min.   :    0.02   Min.   :0.0000   Min.   : 6.00  
##  1st Qu.:  383.3   1st Qu.:  169.12   1st Qu.:0.0000   1st Qu.:12.00  
##  Median :  856.9   Median :  312.34   Median :0.0000   Median :12.00  
##  Mean   : 1733.1   Mean   :  864.21   Mean   :0.1537   Mean   :11.52  
##  3rd Qu.: 1901.1   3rd Qu.:  825.49   3rd Qu.:0.1429   3rd Qu.:12.00  
##  Max.   :50721.5   Max.   :76406.21   Max.   :1.0000   Max.   :12.00  
##                    NA's   :313

1 CREDIT_LIMIT and MINIMUM_PAYMENTS contain null values. and the NA values for MINIMUM_PAYMENTS is huge, so we cannot ignore the missing value 2 it is numeric data, so it is suitble for clustering 3 There is difference between the mean and median values for most features.

mutate the missing values

firstly, deal with the CREDIT_LIMIT

hist(data$CREDIT_LIMIT)

Due to the right-skewed distribution of the data, it is better to fill the missing values with the median instead of the mean

fill in medium into the NA of CREDIT_LIMIT

data <- data %>%
  mutate(CREDIT_LIMIT 
         = replace(CREDIT_LIMIT,
                   is.na(CREDIT_LIMIT),
                   median(CREDIT_LIMIT, na.rm = TRUE)))

Then deal with the MINIMUM_PAYMENTS

firstly plot the histgram to perceive

hist(data$MINIMUM_PAYMENTS)

As the data is right skewed, it’s better to fill the missing value with median instead of mean ### fill in the NA with median values

data <- data %>%
  mutate(MINIMUM_PAYMENTS 
         = replace(MINIMUM_PAYMENTS,
                   is.na(MINIMUM_PAYMENTS ),
                   median(MINIMUM_PAYMENTS , na.rm = TRUE)))

Then check the na value of dataset again and drop the column ID , then check the structure of the data

data <- data[,-1]
str(data)
## 'data.frame':    8950 obs. of  17 variables:
##  $ BALANCE                         : num  40.9 3202.5 2495.1 1666.7 817.7 ...
##  $ BALANCE_FREQUENCY               : num  0.818 0.909 1 0.636 1 ...
##  $ PURCHASES                       : num  95.4 0 773.2 1499 16 ...
##  $ ONEOFF_PURCHASES                : num  0 0 773 1499 16 ...
##  $ INSTALLMENTS_PURCHASES          : num  95.4 0 0 0 0 ...
##  $ CASH_ADVANCE                    : num  0 6443 0 206 0 ...
##  $ PURCHASES_FREQUENCY             : num  0.1667 0 1 0.0833 0.0833 ...
##  $ ONEOFF_PURCHASES_FREQUENCY      : num  0 0 1 0.0833 0.0833 ...
##  $ PURCHASES_INSTALLMENTS_FREQUENCY: num  0.0833 0 0 0 0 ...
##  $ CASH_ADVANCE_FREQUENCY          : num  0 0.25 0 0.0833 0 ...
##  $ CASH_ADVANCE_TRX                : int  0 4 0 1 0 0 0 0 0 0 ...
##  $ PURCHASES_TRX                   : int  2 0 12 1 1 8 64 12 5 3 ...
##  $ CREDIT_LIMIT                    : num  1000 7000 7500 7500 1200 1800 13500 2300 7000 11000 ...
##  $ PAYMENTS                        : num  202 4103 622 0 678 ...
##  $ MINIMUM_PAYMENTS                : num  140 1072 627 312 245 ...
##  $ PRC_FULL_PAYMENT                : num  0 0.222 0 0 0 ...
##  $ TENURE                          : int  12 12 12 12 12 12 12 12 12 12 ...

Now everything is okay, then we should do data visualization

data visualisation

long_data <- gather(data, key = "variable", value = "value")

ggplot(long_data, aes(x = value, fill = variable)) +
    geom_density(alpha = 0.7) +
    facet_wrap(~ variable, scales = "free") +
    labs(title = "Density Plots of Variables", x = "Value", y = "Density") +
    theme_minimal()

probably there are a small number of customers who make a significant number of purchases, while the majority of customers make fewer purchases, so there are skewness appeared.

But skewed data can distort these distance measures influence the distance measures sensitivity .And it can lead to centroids that are not representative of the majority of data points in a cluster so that there are bias in centroids. SO here we will deal with the skewness firsly.

deal with the skewness

Employ a logarithmic transformation to deal with the right skewness

firstly, choose the positive skewed variables

right_skewed_variables <- data[c('BALANCE_FREQUENCY', 'ONEOFF_PURCHASES', 'INSTALLMENTS_PURCHASES', 'CASH_ADVANCE', 'ONEOFF_PURCHASES_FREQUENCY','PURCHASES_INSTALLMENTS_FREQUENCY', 'CASH_ADVANCE_TRX', 'PURCHASES_TRX', 'CREDIT_LIMIT', 'PAYMENTS', 'MINIMUM_PAYMENTS', 'PRC_FULL_PAYMENT')]

use the log transformation and check again

transformed_data <- as.data.frame(lapply(right_skewed_variables, function(x) log(x + 1)))

long_data <- gather(transformed_data, key = "variable", value = "value")

ggplot(long_data, aes(x = value)) +
    geom_density(alpha = 0.7) +
    facet_wrap(~ variable, scales = "free") +
    labs(title = "Density Plots of Right Skewed Variables", x = "Transformed Value", y = "Density") +
    theme_minimal()

the results are not very perfect, so now I will try square root transformation

transformed_data1 <- as.data.frame(lapply(right_skewed_variables, sqrt))

long_data1 <- gather(transformed_data1, key = "variable", value = "value")

ggplot(long_data1, aes(x = value)) +
    geom_density(alpha = 0.7) +
    facet_wrap(~ variable, scales = "free") +
    labs(title = "Density Plots of Right Skewed Variables", x = "Transformed Value1", y = "Density") +
    theme_minimal()

log transformation is still the best choice

correlation

Use heatmap to decide if we need to reduce the dimensions. if there are features have high correlation, we should do dimension reduction

cor_matrix <- cor(data, use = "complete.obs") 

# Melt the correlation matrix for ggplot2
cor_melted <- melt(cor_matrix)

# Plotting the heatmap with text labels
ggplot(cor_melted, aes(Var1, Var2, fill = value)) + 
  geom_tile() +
  geom_text(aes(label = sprintf("%.2f", value)), vjust = 1) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0) +
  theme_minimal() +
  labs(x = "Variables", y = "Variables", fill = "Correlation") +
  coord_fixed()

According to the results of heatmap, there are several variables have high correlations, so it is necessary to do dimension reduction.

PCA

check the dimension of the dataset

dim(data)
## [1] 8950   17

normalizationn of data

preproc1 <- preProcess(data, method=c("center", "scale"))
data_s<- predict(preproc1, data)
data_s <- as.data.frame(data_s)
summary(data_s)
##     BALANCE        BALANCE_FREQUENCY    PURCHASES        ONEOFF_PURCHASES   
##  Min.   :-0.7516   Min.   :-3.70306   Min.   :-0.46953   Min.   :-0.356914  
##  1st Qu.:-0.6900   1st Qu.: 0.04904   1st Qu.:-0.45098   1st Qu.:-0.356914  
##  Median :-0.3320   Median : 0.51806   Median :-0.30044   Median :-0.334021  
##  Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.000000  
##  3rd Qu.: 0.2352   3rd Qu.: 0.51806   3rd Qu.: 0.05004   3rd Qu.:-0.009056  
##  Max.   : 8.3970   Max.   : 0.51806   Max.   :22.48225   Max.   :24.199714  
##  INSTALLMENTS_PURCHASES  CASH_ADVANCE      PURCHASES_FREQUENCY
##  Min.   :-0.45455       Min.   :-0.46676   Min.   :-1.22169   
##  1st Qu.:-0.45455       1st Qu.:-0.46676   1st Qu.:-1.01407   
##  Median :-0.35614       Median :-0.46676   Median : 0.02404   
##  Mean   : 0.00000       Mean   : 0.00000   Mean   : 0.00000   
##  3rd Qu.: 0.06366       3rd Qu.: 0.06435   3rd Qu.: 1.06215   
##  Max.   :24.42552       Max.   :22.00989   Max.   : 1.26977   
##  ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
##  Min.   :-0.6786            Min.   :-0.9169                 
##  1st Qu.:-0.6786            1st Qu.:-0.9169                 
##  Median :-0.3993            Median :-0.4976                 
##  Mean   : 0.0000            Mean   : 0.0000                 
##  3rd Qu.: 0.3270            3rd Qu.: 0.9701                 
##  Max.   : 2.6733            Max.   : 1.5991                 
##  CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX  PURCHASES_TRX       CREDIT_LIMIT    
##  Min.   :-0.6753        Min.   :-0.4760   Min.   :-0.59176   Min.   :-1.2214  
##  1st Qu.:-0.6753        1st Qu.:-0.4760   1st Qu.:-0.55153   1st Qu.:-0.7954  
##  Median :-0.6753        Median :-0.4760   Median :-0.31016   Median :-0.4107  
##  Mean   : 0.0000        Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.4351        3rd Qu.: 0.1101   3rd Qu.: 0.09213   3rd Qu.: 0.5512  
##  Max.   : 6.8201        Max.   :17.5469   Max.   :13.81024   Max.   : 7.0097  
##     PAYMENTS        MINIMUM_PAYMENTS   PRC_FULL_PAYMENT       TENURE       
##  Min.   :-0.59866   Min.   :-0.36218   Min.   :-0.52552   Min.   :-4.1225  
##  1st Qu.:-0.46626   1st Qu.:-0.28895   1st Qu.:-0.52552   1st Qu.: 0.3607  
##  Median :-0.30267   Median :-0.22829   Median :-0.52552   Median : 0.3607  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.05803   3rd Qu.:-0.02409   3rd Qu.:-0.03712   3rd Qu.: 0.3607  
##  Max.   :16.92133   Max.   :32.39092   Max.   : 2.89329   Max.   : 0.3607

###eigenvalues on the basis of covariance

data_s.cov<-cov(data_s)
data_s.eigen<-eigen(data_s.cov)
data_s.eigen$values
##  [1] 4.640604e+00 3.453343e+00 1.498241e+00 1.271519e+00 1.058204e+00
##  [6] 9.758096e-01 8.301823e-01 7.308646e-01 6.457040e-01 5.236004e-01
## [11] 4.032827e-01 3.014728e-01 2.427357e-01 2.068763e-01 1.721534e-01
## [16] 4.539592e-02 1.164886e-05
head(data_s.eigen$vectors)
##             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]
## [1,] -0.10173497 -0.40381873 -0.16163198  0.27931641 -0.05102225  0.04275156
## [2,] -0.12049041 -0.13134414 -0.42299545  0.15576173  0.47683620  0.02204495
## [3,] -0.41156240 -0.04002462  0.25747423  0.04162083 -0.02788702  0.18045255
## [4,] -0.34633416 -0.06191772  0.39089919  0.08507909  0.16083881  0.17998375
## [5,] -0.33676092  0.01917539 -0.10926846 -0.05791521 -0.36140247  0.09600344
## [6,]  0.02280988 -0.43902651 -0.02312809 -0.25659565 -0.09756007 -0.14501811
##            [,7]        [,8]        [,9]       [,10]      [,11]       [,12]
## [1,]  0.2541816 -0.17029351  0.12851376 -0.03117406 -0.1484861  0.48066944
## [2,] -0.1008009  0.28349585  0.61625425  0.08158054  0.1189014 -0.09139969
## [3,] -0.2002205  0.02605464  0.10477733 -0.05116270 -0.2007868 -0.07487246
## [4,] -0.1206849  0.12697358  0.01400471  0.17546751 -0.4481486  0.05238789
## [5,] -0.2517713 -0.17169631  0.22202301 -0.44326909  0.3484815 -0.27316781
## [6,]  0.0487302 -0.02241577 -0.04942654  0.35921657  0.3543696  0.18362311
##             [,13]       [,14]       [,15]        [,16]         [,17]
## [1,]  0.524791875 -0.17453771  0.21891187  0.005249742 -4.216029e-05
## [2,] -0.184842432  0.04116932  0.04192939  0.010891185 -7.979220e-06
## [3,]  0.114177416  0.22513789 -0.05169797 -0.002159184  7.489748e-01
## [4,] -0.008510644  0.22524436 -0.05634003  0.004666332 -5.819431e-01
## [5,]  0.285345707  0.11843846 -0.01874770 -0.014555735 -3.168263e-01
## [6,]  0.023281010  0.60945077 -0.21008717  0.007087134 -2.220327e-05

use PCA

xxx<- data_s 

xxx.pca1<-prcomp(xxx, center=FALSE, scale.=FALSE) # stats::
summary(xxx.pca1)
## Importance of components:
##                          PC1    PC2     PC3    PC4     PC5    PC6     PC7
## Standard deviation     2.154 1.8583 1.22403 1.1276 1.02869 0.9878 0.91114
## Proportion of Variance 0.273 0.2031 0.08813 0.0748 0.06225 0.0574 0.04883
## Cumulative Proportion  0.273 0.4761 0.56425 0.6390 0.70129 0.7587 0.80752
##                            PC8     PC9   PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.85491 0.80356 0.7236 0.63505 0.54907 0.49268 0.45484
## Proportion of Variance 0.04299 0.03798 0.0308 0.02372 0.01773 0.01428 0.01217
## Cumulative Proportion  0.85052 0.88850 0.9193 0.94302 0.96075 0.97503 0.98720
##                           PC15    PC16     PC17
## Standard deviation     0.41491 0.21306 0.003413
## Proportion of Variance 0.01013 0.00267 0.000000
## Cumulative Proportion  0.99733 1.00000 1.000000

now let us plot to verify

plot(xxx.pca1)

fviz_pca_var(xxx.pca1, col.var="steelblue")

### visusalisation of quality

fviz_eig(xxx.pca1, choice='eigenvalue') # eigenvalues on y-axis

fviz_eig(xxx.pca1) # percentage of explained variance on y-axis

eig.val<-get_eigenvalue(xxx.pca1)
a<-summary(xxx.pca1)
plot(a$importance[3,],type="l") # cumulative variance

here we want to keep as much variables as we could in suitable range, so we use 5 PCs so we define the variables again

data <- data[,c(1:5)]

K_Means for clustering

Use elbow method to decide the best cluster numbers

set.seed(123) # Set seed for reproducibility

# Compute total within-cluster sum of square
wss <- sapply(1:5, function(k){
  kmeans(data, k, nstart = 15)$tot.withinss
})

# Plot the elbow curve
plot(1:5, wss, type="b", xlab="Number of Clusters", ylab="Total within-clusters sum of squares")

according to the plot, we choose the elbow point, 3 clusters

K- MENAS

k <- 4
km_result <- kmeans(data, centers = k, nstart = 25)

Adding cluster assignment to your data

data_with_cluster <- data
data_with_cluster$cluster <- as.factor(km_result$cluster)

To indentify which variables are the characteristics of each clusters

Calculate the mean for each variable in each cluster

cluster_summary <- aggregate(. ~ cluster, data_with_cluster, mean)
print(cluster_summary)
##   cluster   BALANCE BALANCE_FREQUENCY  PURCHASES ONEOFF_PURCHASES
## 1       1 5601.6446         0.9541322 28394.1623       22858.4573
## 2       2  827.6439         0.8516878   602.5697         305.6232
## 3       3 5646.9296         0.9902468   776.1323         434.0495
## 4       4 2461.1269         0.9756786  5921.5745        3991.9298
##   INSTALLMENTS_PURCHASES
## 1              5535.7050
## 2               297.2188
## 3               342.1400
## 4              1930.8904

conclustion:

Cluster 1: This cluster has a high average balance and a high balance frequency, suggesting that these customers regularly use their accounts and maintain a high balance. They also have the highest average purchases and one-off purchases, which implies they make significant and frequent transactions, possibly indicating a group of high-spending customers. The high installment purchases suggest they also utilize installment plans.

Cluster 2: Customers in this cluster have the lowest average balance and balance frequency, along with low purchase values. This could indicate a group that uses their credit cards infrequently and for smaller amounts, suggesting they might be more conservative with credit card usage or have lower credit limits.

Cluster 3: This cluster has a very high balance frequency, almost equal to 1, which suggests that their balances are updated almost constantly, indicating regular account activity. However, the purchase amounts are relatively low, especially for one-off purchases. This might represent customers who use their cards regularly but for smaller routine expenses.

Cluster 4: Customers here have a moderate average balance and high balance frequency, with purchase amounts larger than Cluster 2 and 3 but significantly lower than Cluster 1. The one-off purchases are reasonably high, suggesting these customers engage in substantial one-time transactions, but less frequently than those in Cluster 1. The installment purchases are also moderate, indicating a balanced use between one-off and installment purchases.

we can infer :

Cluster 1 could represent affluent customers with high spending capacity. Cluster 2 might consist of low-usage customers who could be targeted for increased engagement and usage. Cluster 3 may represent customers who use their credit cards as a regular payment method for smaller amounts. Cluster 4 could include a middle segment that uses credit cards for both regular and significant purchases but not to the extent of Cluster 1.