Foreground

The goal of this analysis report is to discover the Customer Segmentation of a bank, by looking through their behavior/profile while using Credit Card. Hopefully, we can get a clear segmentation of the customer, so we can deploy effective marketing campaign or sales promotion to the targeted costumer.

Exploratory Data Analysis

Structure and Summary of Data

## 'data.frame':    8950 obs. of  18 variables:
##  $ CUST_ID                         : Factor w/ 8950 levels "C10001","C10002",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ BALANCE                         : num  40.9 3202.5 2495.1 1666.7 817.7 ...
##  $ BALANCE_FREQUENCY               : num  0.818 0.909 1 0.636 1 ...
##  $ PURCHASES                       : num  95.4 0 773.2 1499 16 ...
##  $ ONEOFF_PURCHASES                : num  0 0 773 1499 16 ...
##  $ INSTALLMENTS_PURCHASES          : num  95.4 0 0 0 0 ...
##  $ CASH_ADVANCE                    : num  0 6443 0 206 0 ...
##  $ PURCHASES_FREQUENCY             : num  0.1667 0 1 0.0833 0.0833 ...
##  $ ONEOFF_PURCHASES_FREQUENCY      : num  0 0 1 0.0833 0.0833 ...
##  $ PURCHASES_INSTALLMENTS_FREQUENCY: num  0.0833 0 0 0 0 ...
##  $ CASH_ADVANCE_FREQUENCY          : num  0 0.25 0 0.0833 0 ...
##  $ CASH_ADVANCE_TRX                : int  0 4 0 1 0 0 0 0 0 0 ...
##  $ PURCHASES_TRX                   : int  2 0 12 1 1 8 64 12 5 3 ...
##  $ CREDIT_LIMIT                    : num  1000 7000 7500 7500 1200 1800 13500 2300 7000 11000 ...
##  $ PAYMENTS                        : num  202 4103 622 0 678 ...
##  $ MINIMUM_PAYMENTS                : num  140 1072 627 NA 245 ...
##  $ PRC_FULL_PAYMENT                : num  0 0.222 0 0 0 ...
##  $ TENURE                          : int  12 12 12 12 12 12 12 12 12 12 ...
##     CUST_ID        BALANCE        BALANCE_FREQUENCY   PURCHASES       
##  C10001 :   1   Min.   :    0.0   Min.   :0.0000    Min.   :    0.00  
##  C10002 :   1   1st Qu.:  128.3   1st Qu.:0.8889    1st Qu.:   39.63  
##  C10003 :   1   Median :  873.4   Median :1.0000    Median :  361.28  
##  C10004 :   1   Mean   : 1564.5   Mean   :0.8773    Mean   : 1003.20  
##  C10005 :   1   3rd Qu.: 2054.1   3rd Qu.:1.0000    3rd Qu.: 1110.13  
##  C10006 :   1   Max.   :19043.1   Max.   :1.0000    Max.   :49039.57  
##  (Other):8944                                                         
##  ONEOFF_PURCHASES  INSTALLMENTS_PURCHASES  CASH_ADVANCE    
##  Min.   :    0.0   Min.   :    0.0        Min.   :    0.0  
##  1st Qu.:    0.0   1st Qu.:    0.0        1st Qu.:    0.0  
##  Median :   38.0   Median :   89.0        Median :    0.0  
##  Mean   :  592.4   Mean   :  411.1        Mean   :  978.9  
##  3rd Qu.:  577.4   3rd Qu.:  468.6        3rd Qu.: 1113.8  
##  Max.   :40761.2   Max.   :22500.0        Max.   :47137.2  
##                                                            
##  PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY
##  Min.   :0.00000     Min.   :0.00000           
##  1st Qu.:0.08333     1st Qu.:0.00000           
##  Median :0.50000     Median :0.08333           
##  Mean   :0.49035     Mean   :0.20246           
##  3rd Qu.:0.91667     3rd Qu.:0.30000           
##  Max.   :1.00000     Max.   :1.00000           
##                                                
##  PURCHASES_INSTALLMENTS_FREQUENCY CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX 
##  Min.   :0.0000                   Min.   :0.0000         Min.   :  0.000  
##  1st Qu.:0.0000                   1st Qu.:0.0000         1st Qu.:  0.000  
##  Median :0.1667                   Median :0.0000         Median :  0.000  
##  Mean   :0.3644                   Mean   :0.1351         Mean   :  3.249  
##  3rd Qu.:0.7500                   3rd Qu.:0.2222         3rd Qu.:  4.000  
##  Max.   :1.0000                   Max.   :1.5000         Max.   :123.000  
##                                                                           
##  PURCHASES_TRX     CREDIT_LIMIT      PAYMENTS       MINIMUM_PAYMENTS  
##  Min.   :  0.00   Min.   :   50   Min.   :    0.0   Min.   :    0.02  
##  1st Qu.:  1.00   1st Qu.: 1600   1st Qu.:  383.3   1st Qu.:  169.12  
##  Median :  7.00   Median : 3000   Median :  856.9   Median :  312.34  
##  Mean   : 14.71   Mean   : 4494   Mean   : 1733.1   Mean   :  864.21  
##  3rd Qu.: 17.00   3rd Qu.: 6500   3rd Qu.: 1901.1   3rd Qu.:  825.49  
##  Max.   :358.00   Max.   :30000   Max.   :50721.5   Max.   :76406.21  
##                   NA's   :1                         NA's   :313       
##  PRC_FULL_PAYMENT     TENURE     
##  Min.   :0.0000   Min.   : 6.00  
##  1st Qu.:0.0000   1st Qu.:12.00  
##  Median :0.0000   Median :12.00  
##  Mean   :0.1537   Mean   :11.52  
##  3rd Qu.:0.1429   3rd Qu.:12.00  
##  Max.   :1.0000   Max.   :12.00  
## 

The explanation of every variables:
CUST_ID : Identification of Credit Card holder (Categorical)
BALANCE : Balance amount left in their account to make purchases
BALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)
PURCHASES : Amount of purchases made from account
ONEOFF_PURCHASES : Maximum purchase amount done in one-go
INSTALLMENTS_PURCHASES : Amount of purchase done in installment
CASH_ADVANCE : Cash in advance given by the user
PURCHASES_FREQUENCY : How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)
ONEOFFPURCHASESFREQUENCY : How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)
PURCHASESINSTALLMENTSFREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)
CASHADVANCEFREQUENCY : How frequently the cash in advance being paid
CASHADVANCETRX : Number of Transactions made with “Cash in Advanced”
PURCHASES_TRX : Number of purchase transactions made
CREDIT_LIMIT : Limit of Credit Card for user
PAYMENTS : Amount of Payment done by user
MINIMUM_PAYMENTS : Minimum amount of payments made by user
PRCFULLPAYMENT : Percent of full payment paid by user
TENURE : Tenure of credit card service for user

Feature Selection by Business Wise

  • All variables are in the correct structure, no need to re-adjust their structure. And eventually, all the variables are numerics.
  • We will take out the CUST_ID since it was a unique variable and we can’t get further information from it.
  • We will going to take out the NA's value, because they occured in small quantity, there shouldn’t be a problem.
  • The scale on most of the variables is different, but before scale it, we will take a closer look at the data, we will deal with the scaling at PCA Session.
##       BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1    40.90075          0.818182     95.40             0.00
## 2  3202.46742          0.909091      0.00             0.00
## 3  2495.14886          1.000000    773.17           773.17
## 5   817.71434          1.000000     16.00            16.00
## 6  1809.82875          1.000000   1333.28             0.00
## 7   627.26081          1.000000   7091.01          6402.63
## 8  1823.65274          1.000000    436.20             0.00
## 9  1014.92647          1.000000    861.49           661.49
## 10  152.22598          0.545455   1281.60          1281.60
## 11 1293.12494          1.000000    920.12             0.00
##    INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1                   95.40        0.000            0.166667
## 2                    0.00     6442.945            0.000000
## 3                    0.00        0.000            1.000000
## 5                    0.00        0.000            0.083333
## 6                 1333.28        0.000            0.666667
## 7                  688.38        0.000            1.000000
## 8                  436.20        0.000            1.000000
## 9                  200.00        0.000            0.333333
## 10                   0.00        0.000            0.166667
## 11                 920.12        0.000            1.000000
##    ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1                    0.000000                         0.083333
## 2                    0.000000                         0.000000
## 3                    1.000000                         0.000000
## 5                    0.083333                         0.000000
## 6                    0.000000                         0.583333
## 7                    1.000000                         1.000000
## 8                    0.000000                         1.000000
## 9                    0.083333                         0.250000
## 10                   0.166667                         0.000000
## 11                   0.000000                         1.000000
##    CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1                    0.00                0             2         1000
## 2                    0.25                4             0         7000
## 3                    0.00                0            12         7500
## 5                    0.00                0             1         1200
## 6                    0.00                0             8         1800
## 7                    0.00                0            64        13500
## 8                    0.00                0            12         2300
## 9                    0.00                0             5         7000
## 10                   0.00                0             3        11000
## 11                   0.00                0            12         1200
##     PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 1   201.8021         139.5098         0.000000     12
## 2  4103.0326        1072.3402         0.222222     12
## 3   622.0667         627.2848         0.000000     12
## 5   678.3348         244.7912         0.000000     12
## 6  1400.0578        2407.2460         0.000000     12
## 7  6354.3143         198.0659         1.000000     12
## 8   679.0651         532.0340         0.000000     12
## 9   688.2786         311.9634         0.000000     12
## 10 1164.7706         100.3023         0.000000     12
## 11 1083.3010        2172.6978         0.000000     12

Data Exploration

Correlation Between Variables

- Since all of the variables are numerics, it will easier to see their correlation by using correlation matrix.
- Most of the variables are having positive correlation rather than negative one.
- Some of them even strongly correlated, just like PURCHASES and ONEOFF_PURCHASES, indicating most customers spend their purchase in one-go, the same thing goes to PURCHASES_FREQUENCY and PURCHASESINSTALLMENTSFREQUENCY. It indicates that there some strong multicollinearity between them. But since we are not going regression model, it won’t be a problem.
- The PURCHASES and PURCHASES_TRX has strong correlation, indicating that the amount of purchase comes along with the transaction numbers.
- Some interesting finding can be also found between CASH_ADVANCE and the BALANCE. It seems that customer who has bigger balance will tend to have payment with cash in advance.
- Customer with bigger BALANCE tends to has a bigger CREDIT_LIMIT, since their correlation is positive (0.54).
- TENURE seems has weak correlation with other variables, it seems TENURE did not affected by the customer’s behavior. We will check it again upon the PCA analysis.

Data Distribution

Let us see whether there is some interesting finding on the data distribution, especially on Balance, Purchase, Credit Limit, and Tenure.

Customer Balance

It seems most of the customers who use their Credit Card would maximize the usage of their credit balance, until it reach 0.

Customer Purchase

Suprisingly, most of the customers don’t use their Credit Card to purchase something. Only around 38% who ever use their Credit Card to purcahse something (Purchase value > 0).

Credit Limit

Most of the customers have Credit Limit on 1032 (there are 144 customers who has 0 credit limit, we will assume that their credit application has not approved yet). The frequency goes lower along with the increment of Credit Limit, even there are some small peaks again at 6196 - 7229.

Tenure

Most of the customers have 12 months Tenure, with significant numbers compared to other tenure.

Clustering

Principal Component Analysis (PCA)

Before determine the cluster of Customers, we need to conduct PCA to reduce the dimensionality but maintain information as much as possible. Therefore, we need to determine which PCs that we are going to select based on the proportion of variance. But first, we will need to scale the credit dataset.

PC Intrepretation

## Standard deviations (1, .., p=17):
##  [1] 2.151580834 1.861083634 1.231306920 1.134686503 1.032969064
##  [6] 0.985481427 0.914194694 0.846064315 0.791131113 0.723779209
## [11] 0.633913729 0.549246516 0.492484259 0.447629482 0.413976472
## [16] 0.214703643 0.003425023
## 
## Rotation (n x k) = (17 x 17):
##                                          PC1          PC2          PC3
## BALANCE                           0.09198590  0.405978695 -0.174155217
## BALANCE_FREQUENCY                 0.10981218  0.127738729 -0.458853339
## PURCHASES                         0.41215123  0.049530298  0.242581867
## ONEOFF_PURCHASES                  0.34677536  0.069929651  0.368572605
## INSTALLMENTS_PURCHASES            0.33705564 -0.011481319 -0.103753039
## CASH_ADVANCE                     -0.03058765  0.437246881 -0.001725939
## PURCHASES_FREQUENCY               0.32366488 -0.186581704 -0.355749762
## ONEOFF_PURCHASES_FREQUENCY        0.29476135 -0.014746578  0.104743080
## PURCHASES_INSTALLMENTS_FREQUENCY  0.27722626 -0.173576912 -0.449940259
## CASH_ADVANCE_FREQUENCY           -0.09914541  0.429996888 -0.087635459
## CASH_ADVANCE_TRX                 -0.05696036  0.416411836 -0.087051918
## PURCHASES_TRX                     0.39106653 -0.011946595 -0.079799025
## CREDIT_LIMIT                      0.21005184  0.243823086  0.095181896
## PAYMENTS                          0.26372547  0.264181758  0.287920713
## MINIMUM_PAYMENTS                  0.05932632  0.170415769 -0.248706216
## PRC_FULL_PAYMENT                  0.13056503 -0.195708905  0.184195976
## TENURE                            0.07791867 -0.004565576 -0.065743191
##                                          PC4         PC5         PC6
## BALANCE                           0.25942307  0.07570040 -0.03576313
## BALANCE_FREQUENCY                 0.15932011 -0.45085906  0.01465339
## PURCHASES                         0.06400168 -0.01041000 -0.19599250
## ONEOFF_PURCHASES                  0.12314791 -0.19702123 -0.17300614
## INSTALLMENTS_PURCHASES           -0.07502838  0.33748918 -0.14543134
## CASH_ADVANCE                     -0.26556462  0.09942509  0.13251141
## PURCHASES_FREQUENCY              -0.22173800 -0.08853801  0.08569339
## ONEOFF_PURCHASES_FREQUENCY        0.05546399 -0.52155960  0.09682787
## PURCHASES_INSTALLMENTS_FREQUENCY -0.26529256  0.17540752  0.04745999
## CASH_ADVANCE_FREQUENCY           -0.26659223 -0.15993226 -0.03154756
## CASH_ADVANCE_TRX                 -0.33264408 -0.08974284  0.08978691
## PURCHASES_TRX                    -0.02410640 -0.05252272 -0.07813781
## CREDIT_LIMIT                      0.12272574  0.13200136  0.31286494
## PAYMENTS                         -0.09751655  0.18919193  0.06565158
## MINIMUM_PAYMENTS                  0.35220392  0.41681138 -0.34027647
## PRC_FULL_PAYMENT                 -0.41815027  0.20109740  0.28866061
## TENURE                            0.42837395  0.11778693  0.74566146
##                                          PC7          PC8         PC9
## BALANCE                           0.26336950  0.199909116 -0.06197751
## BALANCE_FREQUENCY                -0.09867483 -0.127517815 -0.67124591
## PURCHASES                        -0.20135723  0.004961326 -0.10127322
## ONEOFF_PURCHASES                 -0.11273384 -0.123272594 -0.06905673
## INSTALLMENTS_PURCHASES           -0.26897198  0.238348234 -0.11249796
## CASH_ADVANCE                      0.03854000  0.004672424  0.01914138
## PURCHASES_FREQUENCY               0.15790086 -0.026416556  0.19145556
## ONEOFF_PURCHASES_FREQUENCY        0.30574575 -0.199748785  0.36226652
## PURCHASES_INSTALLMENTS_FREQUENCY -0.04322889  0.129174475  0.08214839
## CASH_ADVANCE_FREQUENCY           -0.13731595 -0.077263491  0.08684470
## CASH_ADVANCE_TRX                 -0.19670125 -0.179753605  0.21481677
## PURCHASES_TRX                    -0.10423056  0.045473232  0.25517812
## CREDIT_LIMIT                      0.54388385  0.367373919 -0.09414198
## PAYMENTS                         -0.16879476 -0.048019266 -0.13562050
## MINIMUM_PAYMENTS                  0.20417945 -0.613456344  0.14800036
## PRC_FULL_PAYMENT                  0.28035918 -0.481560271 -0.39284213
## TENURE                           -0.40066207 -0.169092838  0.14356536
##                                         PC10        PC11        PC12
## BALANCE                           0.04507583  0.15051794 -0.47603934
## BALANCE_FREQUENCY                -0.02664969 -0.13922295  0.06735340
## PURCHASES                         0.05933502  0.19641144  0.07900829
## ONEOFF_PURCHASES                 -0.16512046  0.44648142 -0.04939272
## INSTALLMENTS_PURCHASES            0.44371023 -0.35610940  0.27747873
## CASH_ADVANCE                     -0.37380405 -0.35285930 -0.17402761
## PURCHASES_FREQUENCY              -0.25789128  0.12579403  0.16148484
## ONEOFF_PURCHASES_FREQUENCY        0.08905699 -0.37039216  0.16552752
## PURCHASES_INSTALLMENTS_FREQUENCY -0.25554276  0.29615331 -0.01747642
## CASH_ADVANCE_FREQUENCY            0.29006015  0.21298490  0.03754873
## CASH_ADVANCE_TRX                  0.20775042  0.20853964  0.20389321
## PURCHASES_TRX                     0.22976565 -0.20288582 -0.59358733
## CREDIT_LIMIT                      0.16063017  0.15191194  0.32059510
## PAYMENTS                         -0.45992657 -0.26006200  0.11797783
## MINIMUM_PAYMENTS                 -0.01656729 -0.02237652  0.15951141
## PRC_FULL_PAYMENT                  0.26595119  0.04971503 -0.24636254
## TENURE                            0.04119176  0.06555575 -0.03064761
##                                         PC13         PC14        PC15
## BALANCE                           0.53776360 -0.142931091 -0.21899594
## BALANCE_FREQUENCY                -0.16874883  0.023207170 -0.04364865
## PURCHASES                         0.10871129  0.224677700  0.06287646
## ONEOFF_PURCHASES                 -0.01078444  0.223011102  0.06812328
## INSTALLMENTS_PURCHASES            0.27661467  0.121272113  0.02347596
## CASH_ADVANCE                      0.01057757  0.597409014  0.23754880
## PURCHASES_FREQUENCY               0.19351323 -0.010277748  0.02388578
## ONEOFF_PURCHASES_FREQUENCY        0.24779547 -0.043774920  0.01549951
## PURCHASES_INSTALLMENTS_FREQUENCY -0.04083117 -0.043562583  0.06734487
## CASH_ADVANCE_FREQUENCY            0.04262180 -0.337622654  0.64739670
## CASH_ADVANCE_TRX                 -0.09393053  0.118324846 -0.64854493
## PURCHASES_TRX                    -0.52975825 -0.080454848 -0.03815196
## CREDIT_LIMIT                     -0.40190308  0.030390689  0.05389870
## PAYMENTS                         -0.04245051 -0.604103939 -0.13785392
## MINIMUM_PAYMENTS                 -0.14112581  0.023690712  0.07201141
## PRC_FULL_PAYMENT                  0.11293777 -0.008694622 -0.01067467
## TENURE                            0.07727909  0.048262432  0.10400839
##                                          PC16            PC17
## BALANCE                          -0.005697928 -0.000042347933
## BALANCE_FREQUENCY                -0.009378625 -0.000021964107
## PURCHASES                         0.001484780  0.748862215275
## ONEOFF_PURCHASES                 -0.005222760 -0.582112332781
## INSTALLMENTS_PURCHASES            0.013983240 -0.316781515978
## CASH_ADVANCE                     -0.008379976 -0.000021822273
## PURCHASES_FREQUENCY               0.678750052  0.000159543970
## ONEOFF_PURCHASES_FREQUENCY       -0.341692602 -0.000142077484
## PURCHASES_INSTALLMENTS_FREQUENCY -0.632813034 -0.000168735071
## CASH_ADVANCE_FREQUENCY            0.041773789  0.000015622131
## CASH_ADVANCE_TRX                 -0.014367929 -0.000011225642
## PURCHASES_TRX                     0.134871073  0.000082977198
## CREDIT_LIMIT                      0.018635564  0.000008652891
## PAYMENTS                          0.011141115  0.000015645778
## MINIMUM_PAYMENTS                 -0.014566076 -0.000028671620
## PRC_FULL_PAYMENT                 -0.020994292 -0.000020465262
## TENURE                            0.020260848  0.000003486820

From the Rotation, we can see what variable which contributes the most to each PC:
- PC1: PURCHASES
- PC2: CASH_ADVANCE
- PC3: BALANCE_FREQUENCY
- PC4: TENURE
- PC5: ONEOFF_PURCHASES_FREQUENCY
- PC6: TENURE
- PC7: CREDIT_LIMIT
- PC8: PRC_FULL_PAYMENT
- PC9: BALANCE_FREQUENCY
- PC10: PAYMENTS
- PC11: INSTALLMENTS_PURCHASES
- PC12: PURCHASES_TRX
- PC13: BALANCE
- PC14: PAYMENTS
- PC15: CASH_ADVANCE_TRX
- PC16: PURCHASES_FREQUENCY
- PC17: ONEOFF_PURCHASES
Looks like all of the important variables already covered in our PC.

Variance Interpretation

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     2.1516 1.8611 1.23131 1.13469 1.03297 0.98548
## Proportion of Variance 0.2723 0.2037 0.08918 0.07574 0.06277 0.05713
## Cumulative Proportion  0.2723 0.4760 0.56524 0.64097 0.70374 0.76087
##                            PC7     PC8     PC9    PC10    PC11    PC12
## Standard deviation     0.91419 0.84606 0.79113 0.72378 0.63391 0.54925
## Proportion of Variance 0.04916 0.04211 0.03682 0.03082 0.02364 0.01775
## Cumulative Proportion  0.81003 0.85214 0.88895 0.91977 0.94341 0.96115
##                           PC13    PC14    PC15    PC16     PC17
## Standard deviation     0.49248 0.44763 0.41398 0.21470 0.003425
## Proportion of Variance 0.01427 0.01179 0.01008 0.00271 0.000000
## Cumulative Proportion  0.97542 0.98721 0.99729 1.00000 1.000000

After looking the summary, we will decide to reduce the dimensionality when the Cumulative Proportion reach 90%, which is PC10. The reason why we would like to take the 90% proportion is because that number is considerably able to describe the overall customer information.

Individual Data Plot Interpretation

Some interpretations of biplot PC1 and PC2:
- Most of individuals are centered nearing the average.
- Data seems separated into two main directions, the one who correlated with the PURCHASES variables (right) and the one that correlated with CASH_ADVANCEvariables (left).
- Customers who spend more PURCHASES tend to have smaller amount of CASH_ADVANCE. By looking at this, we can see that their behavior is quite different and may lead into different cluster.
- Customers who have higher BALANCE and CREDIT_LIMIT tend to have bigger PURCHASES.
- Customers who have bigger CREDIT_LIMIT tend to make a bigger PAYMENTS.

Individual Data Plot Interpretation - Outlier

As seen on the Outlier Plot, there are 5 Outliers with the following conditions:

##        BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 502  13479.288                 1  41050.40         40624.06
## 551  11547.520                 1  49039.57         40761.25
## 1257  4010.622                 1  40040.71         24543.52
## 1605  3391.702                 1  38902.71         33803.84
## 2160 10905.054                 1    431.93           133.50
##      INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 502                  426.34       0.0000            0.833333
## 551                 8278.32     558.1669            1.000000
## 1257               15497.19       0.0000            1.000000
## 1605                5098.87       0.0000            1.000000
## 2160                 298.43   47137.2118            0.583333
##      ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 502                    0.666667                         0.416667
## 551                    1.000000                         0.916667
## 1257                   1.000000                         1.000000
## 1605                   1.000000                         1.000000
## 2160                   0.250000                         0.500000
##      CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 502                0.000000                0           157        17000
## 551                0.083333                1           101        22500
## 1257               0.000000                0           358        18000
## 1605               0.000000                0           195        18000
## 2160               1.000000              123            21        19600
##      PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE
## 502  36066.75       15914.4846         0.083333     12
## 551  46930.60        2974.0694         0.250000     12
## 1257 33994.73         703.5033         1.000000     12
## 1605 35843.63         716.0277         1.000000     12
## 2160 39048.60        5394.1737         0.000000     12
  • Customers 502, 551, 1257, and 1605 have a very high PURCHASES and ONEOFF_PUCHASES.
  • Customer 2160: Have a very high CASH_ADVANCE.

Variable Data Plot

Some Interpretations:
- As we can see in the Variable Data Plot, the PURCHASES and PURCHASES_TRX variables are close to the circle, indicating that those variables are highly contributed to the PC1 and PC2.
- On the other hand, the BALANCE_FREQUENCY, MINIMUM_PAYMENTS and TENURE are variables that less contributed to PC1 and PC2.
- INSTALLMENTS_PURCHASES, ONEOFFPURCHASESFREQUENCY and PURCHASES_TRX indicated has strong positive correlation, since they were heading into same direction. Same thing happened with variables related to the CASH_ADVANCE which heading the same direction.
- Interesting thing, variables that related with PURCHASES are less correlated with variables that related with CASH_ADVANCE, since they are forming almost 90 degrees.

Dimensionality Reduction

Based on interepertations from above, we will decide to take only 10 dimensions and put it to new dataset called credit_new

K-Means

After defining which dimensions that are going to used in Clustering, now we will use K-Means to determine how many Clusters do we need to divide Customers, which may represents their profile and hopefully we can determine what kind of tretment should be given to them.

Elbow Method

Based on the Elbow Method, we can say that the potential number of cluster (k) that may represents the customer segmentation is lies on 4-7. Actually based on the picture we can see at the 4th K, the line already steadily declined, but since on the 7th K the line is declined in quite serious numbers, we will take it along.

K-Means

Since there will be 4 K that going to be tested, it is better to run it on a loop function:

## [1] "WSS of K 4 = 90683.9635948779"
## [1] "BSS Proportion of K 4 = 0.328354376549163"
## [1] "Cluster Size of K 4 ="
## [1] "3546" "682"  "3871" "537"

## [1] "WSS of K 5 = 83275.4959511983"
## [1] "BSS Proportion of K 5 = 0.383224771182373"
## [1] "Cluster Size of K 5 ="
## [1] "3352" "669"  "3641" "943"  "31"

## [1] "WSS of K 6 = 71417.3149301612"
## [1] "BSS Proportion of K 6 = 0.47105171509991"
## [1] "Cluster Size of K 6 ="
## [1] "2765" "616"  "3472" "734"  "30"   "1019"

## [1] "WSS of K 7 = 66984.174207859"
## [1] "BSS Proportion of K 7 = 0.503885519956275"
## [1] "Cluster Size of K 7 ="
## [1] "2219" "599"  "3180" "579"  "30"   "966"  "1063"

As we can see, the smallest WSS and biggest R2 value is given by K = 7. But when we see the Cluster Plot, there is no significance different between K = 7 and the K = 6, therefore we will use K = 6 as the number of Clusters, since we don’t want too many Clusters and focusing our treatment to the Customers.

##       BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES
## 1    40.90075          0.818182     95.40             0.00
## 2  3202.46742          0.909091      0.00             0.00
## 3  2495.14886          1.000000    773.17           773.17
## 5   817.71434          1.000000     16.00            16.00
## 6  1809.82875          1.000000   1333.28             0.00
## 7   627.26081          1.000000   7091.01          6402.63
## 8  1823.65274          1.000000    436.20             0.00
## 9  1014.92647          1.000000    861.49           661.49
## 10  152.22598          0.545455   1281.60          1281.60
## 11 1293.12494          1.000000    920.12             0.00
##    INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY
## 1                   95.40        0.000            0.166667
## 2                    0.00     6442.945            0.000000
## 3                    0.00        0.000            1.000000
## 5                    0.00        0.000            0.083333
## 6                 1333.28        0.000            0.666667
## 7                  688.38        0.000            1.000000
## 8                  436.20        0.000            1.000000
## 9                  200.00        0.000            0.333333
## 10                   0.00        0.000            0.166667
## 11                 920.12        0.000            1.000000
##    ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUENCY
## 1                    0.000000                         0.083333
## 2                    0.000000                         0.000000
## 3                    1.000000                         0.000000
## 5                    0.083333                         0.000000
## 6                    0.000000                         0.583333
## 7                    1.000000                         1.000000
## 8                    0.000000                         1.000000
## 9                    0.083333                         0.250000
## 10                   0.166667                         0.000000
## 11                   0.000000                         1.000000
##    CASH_ADVANCE_FREQUENCY CASH_ADVANCE_TRX PURCHASES_TRX CREDIT_LIMIT
## 1                    0.00                0             2         1000
## 2                    0.25                4             0         7000
## 3                    0.00                0            12         7500
## 5                    0.00                0             1         1200
## 6                    0.00                0             8         1800
## 7                    0.00                0            64        13500
## 8                    0.00                0            12         2300
## 9                    0.00                0             5         7000
## 10                   0.00                0             3        11000
## 11                   0.00                0            12         1200
##     PAYMENTS MINIMUM_PAYMENTS PRC_FULL_PAYMENT TENURE CLUSTER
## 1   201.8021         139.5098         0.000000     12       2
## 2  4103.0326        1072.3402         0.222222     12       6
## 3   622.0667         627.2848         0.000000     12       3
## 5   678.3348         244.7912         0.000000     12       2
## 6  1400.0578        2407.2460         0.000000     12       3
## 7  6354.3143         198.0659         1.000000     12       4
## 8   679.0651         532.0340         0.000000     12       3
## 9   688.2786         311.9634         0.000000     12       2
## 10 1164.7706         100.3023         0.000000     12       2
## 11 1083.3010        2172.6978         0.000000     12       3

Character of Clusters and Suggested Treatments

After determine the number of Clusters, now we will take a look closer to their characteristics and determine what kind of treatments that may effective:
- Cluster 1 (Red) with 2765 Customers: Customer on this Cluster tend to keep their Credit Card and avoid to use it for transactions (Low Purchases), we could give them some gimmicks as the first user, so they will attracted to use their Credit Card more often.
- Cluster 2 (Blue) with 616 Customers: This Clusters filled with Customers that tend to have low Credit Limit and Payments. Probably most of them are still on approval/review process.
- Cluster 3 (Green) with 3472 Customers: Have a similar behavior with Cluster 1 and 2, but they also have low Cash Advance. This might indicates they are non-potential customers.
- Cluster 4 (Purple) with 734 Customers: Customers in this Cluster are potential, since they spend Purchases in medium amount. A cash back or discount program may attract them to use their credit card in bigger Purchase.
- Cluster 5 (Orange) with Customers: This is our “Big Spenders”. They use their Credit Cards effectively with big Purchases. Point Rewaed/Loyalty Program may keep them on using the Credit Card.
- Cluster 6 (Yellow) with 1019 Customers: This Clusters actually a quite potential Customers. They give a lot cash in advance, but unfortunatley their spending on Purchases is still minimum. We should attract them to apply installment even with the 0% interest at the beginning.