The goal of this analysis report is to discover the Customer Segmentation of a bank, by looking through their behavior/profile while using Credit Card. Hopefully, we can get a clear segmentation of the customer, so we can deploy effective marketing campaign or sales promotion to the targeted costumer.
## 'data.frame': 8950 obs. of 18 variables:
## $ CUST_ID : Factor w/ 8950 levels "C10001","C10002",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ BALANCE : num 40.9 3202.5 2495.1 1666.7 817.7 ...
## $ BALANCE_FREQUENCY : num 0.818 0.909 1 0.636 1 ...
## $ PURCHASES : num 95.4 0 773.2 1499 16 ...
## $ ONEOFF_PURCHASES : num 0 0 773 1499 16 ...
## $ INSTALLMENTS_PURCHASES : num 95.4 0 0 0 0 ...
## $ CASH_ADVANCE : num 0 6443 0 206 0 ...
## $ PURCHASES_FREQUENCY : num 0.1667 0 1 0.0833 0.0833 ...
## $ ONEOFF_PURCHASES_FREQUENCY : num 0 0 1 0.0833 0.0833 ...
## $ PURCHASES_INSTALLMENTS_FREQUENCY: num 0.0833 0 0 0 0 ...
## $ CASH_ADVANCE_FREQUENCY : num 0 0.25 0 0.0833 0 ...
## $ CASH_ADVANCE_TRX : int 0 4 0 1 0 0 0 0 0 0 ...
## $ PURCHASES_TRX : int 2 0 12 1 1 8 64 12 5 3 ...
## $ CREDIT_LIMIT : num 1000 7000 7500 7500 1200 1800 13500 2300 7000 11000 ...
## $ PAYMENTS : num 202 4103 622 0 678 ...
## $ MINIMUM_PAYMENTS : num 140 1072 627 NA 245 ...
## $ PRC_FULL_PAYMENT : num 0 0.222 0 0 0 ...
## $ TENURE : int 12 12 12 12 12 12 12 12 12 12 ...
## CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES
## C10001 : 1 Min. : 0.0 Min. :0.0000 Min. : 0.00
## C10002 : 1 1st Qu.: 128.3 1st Qu.:0.8889 1st Qu.: 39.63
## C10003 : 1 Median : 873.4 Median :1.0000 Median : 361.28
## C10004 : 1 Mean : 1564.5 Mean :0.8773 Mean : 1003.20
## C10005 : 1 3rd Qu.: 2054.1 3rd Qu.:1.0000 3rd Qu.: 1110.13
## C10006 : 1 Max. :19043.1 Max. :1.0000 Max. :49039.57
## (Other):8944
## ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE
## Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 38.0 Median : 89.0 Median : 0.0
## Mean : 592.4 Mean : 411.1 Mean : 978.9
## 3rd Qu.: 577.4 3rd Qu.: 468.6 3rd Qu.: 1113.8
## Max. :40761.2 Max. :22500.0 Max. :47137.2
##
## PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
## Min. :0.00000 Min. :0.00000
## 1st Qu.:0.08333 1st Qu.:0.00000
## Median :0.50000 Median :0.08333
## Mean :0.49035 Mean :0.20246
## 3rd Qu.:0.91667 3rd Qu.:0.30000
## Max. :1.00000 Max. :1.00000
##
## PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX
## Min. :0.0000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000
## Median :0.1667 Median :0.0000 Median : 0.000
## Mean :0.3644 Mean :0.1351 Mean : 3.249
## 3rd Qu.:0.7500 3rd Qu.:0.2222 3rd Qu.: 4.000
## Max. :1.0000 Max. :1.5000 Max. :123.000
##
## PURCHASES_TRX CREDIT_LIMIT PAYMENTS MINIMUM_PAYMENTS
## Min. : 0.00 Min. : 50 Min. : 0.0 Min. : 0.02
## 1st Qu.: 1.00 1st Qu.: 1600 1st Qu.: 383.3 1st Qu.: 169.12
## Median : 7.00 Median : 3000 Median : 856.9 Median : 312.34
## Mean : 14.71 Mean : 4494 Mean : 1733.1 Mean : 864.21
## 3rd Qu.: 17.00 3rd Qu.: 6500 3rd Qu.: 1901.1 3rd Qu.: 825.49
## Max. :358.00 Max. :30000 Max. :50721.5 Max. :76406.21
## NA's :1 NA's :313
## PRC_FULL_PAYMENT TENURE
## Min. :0.0000 Min. : 6.00
## 1st Qu.:0.0000 1st Qu.:12.00
## Median :0.0000 Median :12.00
## Mean :0.1537 Mean :11.52
## 3rd Qu.:0.1429 3rd Qu.:12.00
## Max. :1.0000 Max. :12.00
##
The explanation of every variables:
CUST_ID : Identification of Credit Card holder (Categorical)
BALANCE : Balance amount left in their account to make purchases
BALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
PURCHASES : Amount of purchases made from account
ONEOFF_PURCHASES : Maximum purchase amount done in one-go
INSTALLMENTS_PURCHASES : Amount of purchase done in installment
CASH_ADVANCE : Cash in advance given by the user
PURCHASES_FREQUENCY : How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
ONEOFFPURCHASESFREQUENCY : How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
CASHADVANCEFREQUENCY : How frequently the cash in advance being paid
CASHADVANCETRX : Number of Transactions made with “Cash in Advanced”
PURCHASES_TRX : Number of purchase transactions made
CREDIT_LIMIT : Limit of Credit Card for user
PAYMENTS : Amount of Payment done by user
MINIMUM_PAYMENTS : Minimum amount of payments made by user
PRCFULLPAYMENT : Percent of full payment paid by user
TENURE : Tenure of credit card service for user
CUST_ID since it was a unique variable and we can’t get further information from it.NA's value, because they occured in small quantity, there shouldn’t be a problem.## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1 40.90075 0.818182 95.40 0.00
## 2 3202.46742 0.909091 0.00 0.00
## 3 2495.14886 1.000000 773.17 773.17
## 5 817.71434 1.000000 16.00 16.00
## 6 1809.82875 1.000000 1333.28 0.00
## 7 627.26081 1.000000 7091.01 6402.63
## 8 1823.65274 1.000000 436.20 0.00
## 9 1014.92647 1.000000 861.49 661.49
## 10 152.22598 0.545455 1281.60 1281.60
## 11 1293.12494 1.000000 920.12 0.00
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1 95.40 0.000 0.166667
## 2 0.00 6442.945 0.000000
## 3 0.00 0.000 1.000000
## 5 0.00 0.000 0.083333
## 6 1333.28 0.000 0.666667
## 7 688.38 0.000 1.000000
## 8 436.20 0.000 1.000000
## 9 200.00 0.000 0.333333
## 10 0.00 0.000 0.166667
## 11 920.12 0.000 1.000000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1 0.000000 0.083333
## 2 0.000000 0.000000
## 3 1.000000 0.000000
## 5 0.083333 0.000000
## 6 0.000000 0.583333
## 7 1.000000 1.000000
## 8 0.000000 1.000000
## 9 0.083333 0.250000
## 10 0.166667 0.000000
## 11 0.000000 1.000000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1 0.00 0 2 1000
## 2 0.25 4 0 7000
## 3 0.00 0 12 7500
## 5 0.00 0 1 1200
## 6 0.00 0 8 1800
## 7 0.00 0 64 13500
## 8 0.00 0 12 2300
## 9 0.00 0 5 7000
## 10 0.00 0 3 11000
## 11 0.00 0 12 1200
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 1 201.8021 139.5098 0.000000 12
## 2 4103.0326 1072.3402 0.222222 12
## 3 622.0667 627.2848 0.000000 12
## 5 678.3348 244.7912 0.000000 12
## 6 1400.0578 2407.2460 0.000000 12
## 7 6354.3143 198.0659 1.000000 12
## 8 679.0651 532.0340 0.000000 12
## 9 688.2786 311.9634 0.000000 12
## 10 1164.7706 100.3023 0.000000 12
## 11 1083.3010 2172.6978 0.000000 12
ggcorr(credit,
label = T,
label_size = 3,
label_round = 2,
hjust = 1,
size = 3,
color = "royalblue",
layout.exp = 5,
low = "dodgerblue",
mid = "gray95",
high = "red2",
name = "Correlation") - Since all of the variables are numerics, it will easier to see their correlation by using correlation matrix.
- Most of the variables are having positive correlation rather than negative one.
- Some of them even strongly correlated, just like PURCHASES and ONEOFF_PURCHASES, indicating most customers spend their purchase in one-go, the same thing goes to PURCHASES_FREQUENCY and PURCHASESINSTALLMENTSFREQUENCY. It indicates that there some strong multicollinearity between them. But since we are not going regression model, it won’t be a problem.
- The PURCHASES and PURCHASES_TRX has strong correlation, indicating that the amount of purchase comes along with the transaction numbers.
- Some interesting finding can be also found between CASH_ADVANCE and the BALANCE. It seems that customer who has bigger balance will tend to have payment with cash in advance.
- Customer with bigger BALANCE tends to has a bigger CREDIT_LIMIT, since their correlation is positive (0.54).
- TENURE seems has weak correlation with other variables, it seems TENURE did not affected by the customer’s behavior. We will check it again upon the PCA analysis.
Let us see whether there is some interesting finding on the data distribution, especially on Balance, Purchase, Credit Limit, and Tenure.
Before determine the cluster of Customers, we need to conduct PCA to reduce the dimensionality but maintain information as much as possible. Therefore, we need to determine which PCs that we are going to select based on the proportion of variance. But first, we will need to scale the credit dataset.
## Standard deviations (1, .., p=17):
## [1] 2.151580834 1.861083634 1.231306920 1.134686503 1.032969064
## [6] 0.985481427 0.914194694 0.846064315 0.791131113 0.723779209
## [11] 0.633913729 0.549246516 0.492484259 0.447629482 0.413976472
## [16] 0.214703643 0.003425023
##
## Rotation (n x k) = (17 x 17):
## PC1 PC2 PC3
## BALANCE 0.09198590 0.405978695 -0.174155217
## BALANCE_FREQUENCY 0.10981218 0.127738729 -0.458853339
## PURCHASES 0.41215123 0.049530298 0.242581867
## ONEOFF_PURCHASES 0.34677536 0.069929651 0.368572605
## INSTALLMENTS_PURCHASES 0.33705564 -0.011481319 -0.103753039
## CASH_ADVANCE -0.03058765 0.437246881 -0.001725939
## PURCHASES_FREQUENCY 0.32366488 -0.186581704 -0.355749762
## ONEOFF_PURCHASES_FREQUENCY 0.29476135 -0.014746578 0.104743080
## PURCHASES_INSTALLMENTS_FREQUENCY 0.27722626 -0.173576912 -0.449940259
## CASH_ADVANCE_FREQUENCY -0.09914541 0.429996888 -0.087635459
## CASH_ADVANCE_TRX -0.05696036 0.416411836 -0.087051918
## PURCHASES_TRX 0.39106653 -0.011946595 -0.079799025
## CREDIT_LIMIT 0.21005184 0.243823086 0.095181896
## PAYMENTS 0.26372547 0.264181758 0.287920713
## MINIMUM_PAYMENTS 0.05932632 0.170415769 -0.248706216
## PRC_FULL_PAYMENT 0.13056503 -0.195708905 0.184195976
## TENURE 0.07791867 -0.004565576 -0.065743191
## PC4 PC5 PC6
## BALANCE 0.25942307 0.07570040 -0.03576313
## BALANCE_FREQUENCY 0.15932011 -0.45085906 0.01465339
## PURCHASES 0.06400168 -0.01041000 -0.19599250
## ONEOFF_PURCHASES 0.12314791 -0.19702123 -0.17300614
## INSTALLMENTS_PURCHASES -0.07502838 0.33748918 -0.14543134
## CASH_ADVANCE -0.26556462 0.09942509 0.13251141
## PURCHASES_FREQUENCY -0.22173800 -0.08853801 0.08569339
## ONEOFF_PURCHASES_FREQUENCY 0.05546399 -0.52155960 0.09682787
## PURCHASES_INSTALLMENTS_FREQUENCY -0.26529256 0.17540752 0.04745999
## CASH_ADVANCE_FREQUENCY -0.26659223 -0.15993226 -0.03154756
## CASH_ADVANCE_TRX -0.33264408 -0.08974284 0.08978691
## PURCHASES_TRX -0.02410640 -0.05252272 -0.07813781
## CREDIT_LIMIT 0.12272574 0.13200136 0.31286494
## PAYMENTS -0.09751655 0.18919193 0.06565158
## MINIMUM_PAYMENTS 0.35220392 0.41681138 -0.34027647
## PRC_FULL_PAYMENT -0.41815027 0.20109740 0.28866061
## TENURE 0.42837395 0.11778693 0.74566146
## PC7 PC8 PC9
## BALANCE 0.26336950 0.199909116 -0.06197751
## BALANCE_FREQUENCY -0.09867483 -0.127517815 -0.67124591
## PURCHASES -0.20135723 0.004961326 -0.10127322
## ONEOFF_PURCHASES -0.11273384 -0.123272594 -0.06905673
## INSTALLMENTS_PURCHASES -0.26897198 0.238348234 -0.11249796
## CASH_ADVANCE 0.03854000 0.004672424 0.01914138
## PURCHASES_FREQUENCY 0.15790086 -0.026416556 0.19145556
## ONEOFF_PURCHASES_FREQUENCY 0.30574575 -0.199748785 0.36226652
## PURCHASES_INSTALLMENTS_FREQUENCY -0.04322889 0.129174475 0.08214839
## CASH_ADVANCE_FREQUENCY -0.13731595 -0.077263491 0.08684470
## CASH_ADVANCE_TRX -0.19670125 -0.179753605 0.21481677
## PURCHASES_TRX -0.10423056 0.045473232 0.25517812
## CREDIT_LIMIT 0.54388385 0.367373919 -0.09414198
## PAYMENTS -0.16879476 -0.048019266 -0.13562050
## MINIMUM_PAYMENTS 0.20417945 -0.613456344 0.14800036
## PRC_FULL_PAYMENT 0.28035918 -0.481560271 -0.39284213
## TENURE -0.40066207 -0.169092838 0.14356536
## PC10 PC11 PC12
## BALANCE 0.04507583 0.15051794 -0.47603934
## BALANCE_FREQUENCY -0.02664969 -0.13922295 0.06735340
## PURCHASES 0.05933502 0.19641144 0.07900829
## ONEOFF_PURCHASES -0.16512046 0.44648142 -0.04939272
## INSTALLMENTS_PURCHASES 0.44371023 -0.35610940 0.27747873
## CASH_ADVANCE -0.37380405 -0.35285930 -0.17402761
## PURCHASES_FREQUENCY -0.25789128 0.12579403 0.16148484
## ONEOFF_PURCHASES_FREQUENCY 0.08905699 -0.37039216 0.16552752
## PURCHASES_INSTALLMENTS_FREQUENCY -0.25554276 0.29615331 -0.01747642
## CASH_ADVANCE_FREQUENCY 0.29006015 0.21298490 0.03754873
## CASH_ADVANCE_TRX 0.20775042 0.20853964 0.20389321
## PURCHASES_TRX 0.22976565 -0.20288582 -0.59358733
## CREDIT_LIMIT 0.16063017 0.15191194 0.32059510
## PAYMENTS -0.45992657 -0.26006200 0.11797783
## MINIMUM_PAYMENTS -0.01656729 -0.02237652 0.15951141
## PRC_FULL_PAYMENT 0.26595119 0.04971503 -0.24636254
## TENURE 0.04119176 0.06555575 -0.03064761
## PC13 PC14 PC15
## BALANCE 0.53776360 -0.142931091 -0.21899594
## BALANCE_FREQUENCY -0.16874883 0.023207170 -0.04364865
## PURCHASES 0.10871129 0.224677700 0.06287646
## ONEOFF_PURCHASES -0.01078444 0.223011102 0.06812328
## INSTALLMENTS_PURCHASES 0.27661467 0.121272113 0.02347596
## CASH_ADVANCE 0.01057757 0.597409014 0.23754880
## PURCHASES_FREQUENCY 0.19351323 -0.010277748 0.02388578
## ONEOFF_PURCHASES_FREQUENCY 0.24779547 -0.043774920 0.01549951
## PURCHASES_INSTALLMENTS_FREQUENCY -0.04083117 -0.043562583 0.06734487
## CASH_ADVANCE_FREQUENCY 0.04262180 -0.337622654 0.64739670
## CASH_ADVANCE_TRX -0.09393053 0.118324846 -0.64854493
## PURCHASES_TRX -0.52975825 -0.080454848 -0.03815196
## CREDIT_LIMIT -0.40190308 0.030390689 0.05389870
## PAYMENTS -0.04245051 -0.604103939 -0.13785392
## MINIMUM_PAYMENTS -0.14112581 0.023690712 0.07201141
## PRC_FULL_PAYMENT 0.11293777 -0.008694622 -0.01067467
## TENURE 0.07727909 0.048262432 0.10400839
## PC16 PC17
## BALANCE -0.005697928 -0.000042347933
## BALANCE_FREQUENCY -0.009378625 -0.000021964107
## PURCHASES 0.001484780 0.748862215275
## ONEOFF_PURCHASES -0.005222760 -0.582112332781
## INSTALLMENTS_PURCHASES 0.013983240 -0.316781515978
## CASH_ADVANCE -0.008379976 -0.000021822273
## PURCHASES_FREQUENCY 0.678750052 0.000159543970
## ONEOFF_PURCHASES_FREQUENCY -0.341692602 -0.000142077484
## PURCHASES_INSTALLMENTS_FREQUENCY -0.632813034 -0.000168735071
## CASH_ADVANCE_FREQUENCY 0.041773789 0.000015622131
## CASH_ADVANCE_TRX -0.014367929 -0.000011225642
## PURCHASES_TRX 0.134871073 0.000082977198
## CREDIT_LIMIT 0.018635564 0.000008652891
## PAYMENTS 0.011141115 0.000015645778
## MINIMUM_PAYMENTS -0.014566076 -0.000028671620
## PRC_FULL_PAYMENT -0.020994292 -0.000020465262
## TENURE 0.020260848 0.000003486820
From the Rotation, we can see what variable which contributes the most to each PC:
- PC1: PURCHASES
- PC2: CASH_ADVANCE
- PC3: BALANCE_FREQUENCY
- PC4: TENURE
- PC5: ONEOFF_PURCHASES_FREQUENCY
- PC6: TENURE
- PC7: CREDIT_LIMIT
- PC8: PRC_FULL_PAYMENT
- PC9: BALANCE_FREQUENCY
- PC10: PAYMENTS
- PC11: INSTALLMENTS_PURCHASES
- PC12: PURCHASES_TRX
- PC13: BALANCE
- PC14: PAYMENTS
- PC15: CASH_ADVANCE_TRX
- PC16: PURCHASES_FREQUENCY
- PC17: ONEOFF_PURCHASES
Looks like all of the important variables already covered in our PC.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 2.1516 1.8611 1.23131 1.13469 1.03297 0.98548
## Proportion of Variance 0.2723 0.2037 0.08918 0.07574 0.06277 0.05713
## Cumulative Proportion 0.2723 0.4760 0.56524 0.64097 0.70374 0.76087
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.91419 0.84606 0.79113 0.72378 0.63391 0.54925
## Proportion of Variance 0.04916 0.04211 0.03682 0.03082 0.02364 0.01775
## Cumulative Proportion 0.81003 0.85214 0.88895 0.91977 0.94341 0.96115
## PC13 PC14 PC15 PC16 PC17
## Standard deviation 0.49248 0.44763 0.41398 0.21470 0.003425
## Proportion of Variance 0.01427 0.01179 0.01008 0.00271 0.000000
## Cumulative Proportion 0.97542 0.98721 0.99729 1.00000 1.000000
After looking the summary, we will decide to reduce the dimensionality when the Cumulative Proportion reach 90%, which is PC10. The reason why we would like to take the 90% proportion is because that number is considerably able to describe the overall customer information.
fviz_pca_biplot(credit_pca,
axes = c(1:2),
col.var = "orange",
col.ind = "royalblue",
labelsize = 3) +
theme_igray() +
labs(title = "Biplot of PC1 and PC2") Some interpretations of biplot PC1 and PC2:
- Most of individuals are centered nearing the average.
- Data seems separated into two main directions, the one who correlated with the PURCHASES variables (right) and the one that correlated with CASH_ADVANCEvariables (left).
- Customers who spend more PURCHASES tend to have smaller amount of CASH_ADVANCE. By looking at this, we can see that their behavior is quite different and may lead into different cluster.
- Customers who have higher BALANCE and CREDIT_LIMIT tend to have bigger PURCHASES.
- Customers who have bigger CREDIT_LIMIT tend to make a bigger PAYMENTS.
fviz_pca_biplot(credit_pca,
axes = c(1:2),
col.var = "orange",
col.ind = "red",
labelsize = 3,
select.ind = list(contrib = 5)) +
theme_igray() +
labs(title = "Outlier of PC1 and PC2")As seen on the Outlier Plot, there are 5 Outliers with the following conditions:
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 502 13479.288 1 41050.40 40624.06
## 551 11547.520 1 49039.57 40761.25
## 1257 4010.622 1 40040.71 24543.52
## 1605 3391.702 1 38902.71 33803.84
## 2160 10905.054 1 431.93 133.50
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 502 426.34 0.0000 0.833333
## 551 8278.32 558.1669 1.000000
## 1257 15497.19 0.0000 1.000000
## 1605 5098.87 0.0000 1.000000
## 2160 298.43 47137.2118 0.583333
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 502 0.666667 0.416667
## 551 1.000000 0.916667
## 1257 1.000000 1.000000
## 1605 1.000000 1.000000
## 2160 0.250000 0.500000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 502 0.000000 0 157 17000
## 551 0.083333 1 101 22500
## 1257 0.000000 0 358 18000
## 1605 0.000000 0 195 18000
## 2160 1.000000 123 21 19600
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 502 36066.75 15914.4846 0.083333 12
## 551 46930.60 2974.0694 0.250000 12
## 1257 33994.73 703.5033 1.000000 12
## 1605 35843.63 716.0277 1.000000 12
## 2160 39048.60 5394.1737 0.000000 12
PURCHASES and ONEOFF_PUCHASES.CASH_ADVANCE.fviz_pca_var(credit_pca, col.var="orange") +
theme_igray() +
labs(title = "Variables Factor Map - PC 1 & PC2") Some Interpretations:
- As we can see in the Variable Data Plot, the PURCHASES and PURCHASES_TRX variables are close to the circle, indicating that those variables are highly contributed to the PC1 and PC2.
- On the other hand, the BALANCE_FREQUENCY, MINIMUM_PAYMENTS and TENURE are variables that less contributed to PC1 and PC2.
- INSTALLMENTS_PURCHASES, ONEOFFPURCHASESFREQUENCY and PURCHASES_TRX indicated has strong positive correlation, since they were heading into same direction. Same thing happened with variables related to the CASH_ADVANCE which heading the same direction.
- Interesting thing, variables that related with PURCHASES are less correlated with variables that related with CASH_ADVANCE, since they are forming almost 90 degrees.
Based on interepertations from above, we will decide to take only 10 dimensions and put it to new dataset called credit_new
After defining which dimensions that are going to used in Clustering, now we will use K-Means to determine how many Clusters do we need to divide Customers, which may represents their profile and hopefully we can determine what kind of tretment should be given to them.
fviz_nbclust(credit_new,
kmeans,
method = "wss",
linecolor = "green4") +
geom_vline(xintercept = c(4,7), linetype = 2, col = "red") +
theme_igray() Based on the Elbow Method, we can say that the potential number of cluster (k) that may represents the customer segmentation is lies on 4-7. Actually based on the picture we can see at the 4th K, the line already steadily declined, but since on the 7th K the line is declined in quite serious numbers, we will take it along.
Since there will be 4 K that going to be tested, it is better to run it on a loop function:
for(i in 4:7){
set.seed(289)
model <- kmeans(credit_new, i)
print(paste("WSS of K",i, "=",model$tot.withinss))
print(paste("BSS Proportion of K",i, "=", model$betweenss/model$totss))
print(paste("Cluster Size of K",i, "="))
print(paste(model$size))
print(fviz_cluster(model, credit, palette = "Set1") +
theme_igray())
}## [1] "WSS of K 4 = 90683.9635948779"
## [1] "BSS Proportion of K 4 = 0.328354376549163"
## [1] "Cluster Size of K 4 ="
## [1] "3546" "682" "3871" "537"
## [1] "WSS of K 5 = 83275.4959511983"
## [1] "BSS Proportion of K 5 = 0.383224771182373"
## [1] "Cluster Size of K 5 ="
## [1] "3352" "669" "3641" "943" "31"
## [1] "WSS of K 6 = 71417.3149301612"
## [1] "BSS Proportion of K 6 = 0.47105171509991"
## [1] "Cluster Size of K 6 ="
## [1] "2765" "616" "3472" "734" "30" "1019"
## [1] "WSS of K 7 = 66984.174207859"
## [1] "BSS Proportion of K 7 = 0.503885519956275"
## [1] "Cluster Size of K 7 ="
## [1] "2219" "599" "3180" "579" "30" "966" "1063"
As we can see, the smallest WSS and biggest R2 value is given by K = 7. But when we see the Cluster Plot, there is no significance different between K = 7 and the K = 6, therefore we will use K = 6 as the number of Clusters, since we don’t want too many Clusters and focusing our treatment to the Customers.
## BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1 40.90075 0.818182 95.40 0.00
## 2 3202.46742 0.909091 0.00 0.00
## 3 2495.14886 1.000000 773.17 773.17
## 5 817.71434 1.000000 16.00 16.00
## 6 1809.82875 1.000000 1333.28 0.00
## 7 627.26081 1.000000 7091.01 6402.63
## 8 1823.65274 1.000000 436.20 0.00
## 9 1014.92647 1.000000 861.49 661.49
## 10 152.22598 0.545455 1281.60 1281.60
## 11 1293.12494 1.000000 920.12 0.00
## INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1 95.40 0.000 0.166667
## 2 0.00 6442.945 0.000000
## 3 0.00 0.000 1.000000
## 5 0.00 0.000 0.083333
## 6 1333.28 0.000 0.666667
## 7 688.38 0.000 1.000000
## 8 436.20 0.000 1.000000
## 9 200.00 0.000 0.333333
## 10 0.00 0.000 0.166667
## 11 920.12 0.000 1.000000
## ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1 0.000000 0.083333
## 2 0.000000 0.000000
## 3 1.000000 0.000000
## 5 0.083333 0.000000
## 6 0.000000 0.583333
## 7 1.000000 1.000000
## 8 0.000000 1.000000
## 9 0.083333 0.250000
## 10 0.166667 0.000000
## 11 0.000000 1.000000
## CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1 0.00 0 2 1000
## 2 0.25 4 0 7000
## 3 0.00 0 12 7500
## 5 0.00 0 1 1200
## 6 0.00 0 8 1800
## 7 0.00 0 64 13500
## 8 0.00 0 12 2300
## 9 0.00 0 5 7000
## 10 0.00 0 3 11000
## 11 0.00 0 12 1200
## PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE CLUSTER
## 1 201.8021 139.5098 0.000000 12 2
## 2 4103.0326 1072.3402 0.222222 12 6
## 3 622.0667 627.2848 0.000000 12 3
## 5 678.3348 244.7912 0.000000 12 2
## 6 1400.0578 2407.2460 0.000000 12 3
## 7 6354.3143 198.0659 1.000000 12 4
## 8 679.0651 532.0340 0.000000 12 3
## 9 688.2786 311.9634 0.000000 12 2
## 10 1164.7706 100.3023 0.000000 12 2
## 11 1083.3010 2172.6978 0.000000 12 3
After determine the number of Clusters, now we will take a look closer to their characteristics and determine what kind of treatments that may effective:
- Cluster 1 (Red) with 2765 Customers: Customer on this Cluster tend to keep their Credit Card and avoid to use it for transactions (Low Purchases), we could give them some gimmicks as the first user, so they will attracted to use their Credit Card more often.
- Cluster 2 (Blue) with 616 Customers: This Clusters filled with Customers that tend to have low Credit Limit and Payments. Probably most of them are still on approval/review process.
- Cluster 3 (Green) with 3472 Customers: Have a similar behavior with Cluster 1 and 2, but they also have low Cash Advance. This might indicates they are non-potential customers.
- Cluster 4 (Purple) with 734 Customers: Customers in this Cluster are potential, since they spend Purchases in medium amount. A cash back or discount program may attract them to use their credit card in bigger Purchase.
- Cluster 5 (Orange) with Customers: This is our “Big Spenders”. They use their Credit Cards effectively with big Purchases. Point Rewaed/Loyalty Program may keep them on using the Credit Card.
- Cluster 6 (Yellow) with 1019 Customers: This Clusters actually a quite potential Customers. They give a lot cash in advance, but unfortunatley their spending on Purchases is still minimum. We should attract them to apply installment even with the 0% interest at the beginning.