The Project: Part 1 - Classification Tree * Split data into Development (70%) and Hold-out (30%) Sample * Build Classification Tree using CART technique * Do necessary pruning * Measure Model Performance on Development Sample * Test Model Performance on Hold Out Sample * Ensure the model is not an overfit model

Part 2 - Random Forest * Split data into Development (70%) and Hold-out (30%) Sample * Build Model using Random Forest technique * Measure Model Performance on Development Sample * Test Model Performance on Hold Out Sample * Ensure the model is not an overfit model

Lets export the data, import dataset, search for missing values and take an overall view

setwd("C:/Users/adminsa/Desktop/Pos Graduacao/Machine Learning/Mybank")

read.csv("My Bank Case Study-dataset.csv", header = TRUE)

personal_loan <- read.table("My Bank Case Study-dataset.csv", sep = ",", header = TRUE)

View(personal_loan)

summary(personal_loan)
    CUST_ID          TARGET            AGE        GENDER       BALANCE           OCCUPATION    AGE_BKT          SCR        HOLDING_PERIOD 
 C1     :    1   Min.   :0.0000   Min.   :21.00   F: 5433   Min.   :      0   PROF    :5417   <25  :1753   Min.   :100.0   Min.   : 1.00  
 C10    :    1   1st Qu.:0.0000   1st Qu.:30.00   M:14376   1st Qu.:  64754   SAL     :5855   >50  :3035   1st Qu.:227.0   1st Qu.: 7.00  
 C100   :    1   Median :0.0000   Median :38.00   O:  191   Median : 231676   SELF-EMP:3568   26-30:3434   Median :364.0   Median :15.00  
 C1000  :    1   Mean   :0.1256   Mean   :38.42             Mean   : 511362   SENP    :5160   31-35:3404   Mean   :440.2   Mean   :14.96  
 C10000 :    1   3rd Qu.:0.0000   3rd Qu.:46.00             3rd Qu.: 653877                   36-40:2814   3rd Qu.:644.0   3rd Qu.:22.00  
 C10001 :    1   Max.   :1.0000   Max.   :55.00             Max.   :8360431                   41-45:3067   Max.   :999.0   Max.   :31.00  
 (Other):19994                                                                                46-50:2493                                  
 ACC_TYPE       ACC_OP_DATE    LEN_OF_RLTN_IN_MNTH NO_OF_L_CR_TXNS NO_OF_L_DR_TXNS  TOT_NO_OF_L_TXNS NO_OF_BR_CSH_WDL_DR_TXNS
 CA: 4241   11/16/2010:   24   Min.   : 29.0       Min.   : 0.00   Min.   : 0.000   Min.   :  0.00   Min.   : 0.000          
 SA:15759   04-03-09  :   23   1st Qu.: 79.0       1st Qu.: 6.00   1st Qu.: 2.000   1st Qu.:  9.00   1st Qu.: 1.000          
            7/25/2010 :   22   Median :125.0       Median :10.00   Median : 5.000   Median : 14.00   Median : 1.000          
            05-06-13  :   21   Mean   :125.2       Mean   :12.35   Mean   : 6.634   Mean   : 18.98   Mean   : 1.883          
            02-07-07  :   20   3rd Qu.:172.0       3rd Qu.:14.00   3rd Qu.: 7.000   3rd Qu.: 21.00   3rd Qu.: 2.000          
            8/24/2010 :   20   Max.   :221.0       Max.   :75.00   Max.   :74.000   Max.   :149.00   Max.   :15.000          
            (Other)   :19870                                                                                                 
 NO_OF_ATM_DR_TXNS NO_OF_NET_DR_TXNS NO_OF_MOB_DR_TXNS NO_OF_CHQ_DR_TXNS   FLG_HAS_CC       AMT_ATM_DR     AMT_BR_CSH_WDL_DR   AMT_CHQ_DR     
 Min.   : 0.000    Min.   : 0.000    Min.   : 0.0000   Min.   : 0.000    Min.   :0.0000   Min.   :     0   Min.   :     0    Min.   :      0  
 1st Qu.: 0.000    1st Qu.: 0.000    1st Qu.: 0.0000   1st Qu.: 0.000    1st Qu.:0.0000   1st Qu.:     0   1st Qu.:  2990    1st Qu.:      0  
 Median : 1.000    Median : 0.000    Median : 0.0000   Median : 2.000    Median :0.0000   Median :  6900   Median :340150    Median :  23840  
 Mean   : 1.029    Mean   : 1.172    Mean   : 0.4118   Mean   : 2.138    Mean   :0.3054   Mean   : 10990   Mean   :378475    Mean   : 124520  
 3rd Qu.: 1.000    3rd Qu.: 1.000    3rd Qu.: 0.0000   3rd Qu.: 4.000    3rd Qu.:1.0000   3rd Qu.: 15800   3rd Qu.:674675    3rd Qu.:  72470  
 Max.   :25.000    Max.   :22.000    Max.   :25.0000   Max.   :15.000    Max.   :1.0000   Max.   :199300   Max.   :999930    Max.   :4928640  
                                                                                                                                              
   AMT_NET_DR       AMT_MOB_DR        AMT_L_DR       FLG_HAS_ANY_CHGS AMT_OTH_BK_ATM_USG_CHGS AMT_MIN_BAL_NMC_CHGS NO_OF_IW_CHQ_BNC_TXNS
 Min.   :     0   Min.   :     0   Min.   :      0   Min.   :0.0000   Min.   :  0.000         Min.   :  0.000      Min.   :0.00000      
 1st Qu.:     0   1st Qu.:     0   1st Qu.: 237936   1st Qu.:0.0000   1st Qu.:  0.000         1st Qu.:  0.000      1st Qu.:0.00000      
 Median :     0   Median :     0   Median : 695115   Median :0.0000   Median :  0.000         Median :  0.000      Median :0.00000      
 Mean   :237308   Mean   : 22425   Mean   : 773717   Mean   :0.1106   Mean   :  1.099         Mean   :  1.292      Mean   :0.04275      
 3rd Qu.:473971   3rd Qu.:     0   3rd Qu.:1078927   3rd Qu.:0.0000   3rd Qu.:  0.000         3rd Qu.:  0.000      3rd Qu.:0.00000      
 Max.   :999854   Max.   :199667   Max.   :6514921   Max.   :1.0000   Max.   :250.000         Max.   :170.000      Max.   :2.00000      
                                                                                                                                        
 NO_OF_OW_CHQ_BNC_TXNS AVG_AMT_PER_ATM_TXN AVG_AMT_PER_CSH_WDL_TXN AVG_AMT_PER_CHQ_TXN AVG_AMT_PER_NET_TXN AVG_AMT_PER_MOB_TXN
 Min.   :0.0000        Min.   :    0       Min.   :     0          Min.   :     0      Min.   :     0      Min.   :     0     
 1st Qu.:0.0000        1st Qu.:    0       1st Qu.:  1266          1st Qu.:     0      1st Qu.:     0      1st Qu.:     0     
 Median :0.0000        Median : 6000       Median :147095          Median :  8645      Median :     0      Median :     0     
 Mean   :0.0444        Mean   : 7409       Mean   :242237          Mean   : 25093      Mean   :179059      Mean   : 20304     
 3rd Qu.:0.0000        3rd Qu.:13500       3rd Qu.:385000          3rd Qu.: 28605      3rd Qu.:257699      3rd Qu.:     0     
 Max.   :2.0000        Max.   :25000       Max.   :999640          Max.   :537842      Max.   :999854      Max.   :199667     
                                                                                                                              
 FLG_HAS_NOMINEE  FLG_HAS_OLD_LOAN     random         
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000114  
 1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.2481866  
 Median :1.0000   Median :0.0000   Median :0.5061214  
 Mean   :0.9012   Mean   :0.4929   Mean   :0.5019330  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.7535712  
 Max.   :1.0000   Max.   :1.0000   Max.   :0.9999471  
                                                      
#apparently there is no missing values

str(personal_loan)
'data.frame':   20000 obs. of  40 variables:
 $ CUST_ID                 : Factor w/ 20000 levels "C1","C10","C100",..: 17699 16532 11027 17984 2363 11747 18115 15556 15216 12494 ...
 $ TARGET                  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AGE                     : int  27 47 40 53 36 42 30 53 42 30 ...
 $ GENDER                  : Factor w/ 3 levels "F","M","O": 2 2 2 2 2 1 2 1 1 2 ...
 $ BALANCE                 : num  3384 287489 18217 71720 1671623 ...
 $ OCCUPATION              : Factor w/ 4 levels "PROF","SAL","SELF-EMP",..: 3 2 3 2 1 1 1 2 3 1 ...
 $ AGE_BKT                 : Factor w/ 7 levels "<25",">50","26-30",..: 3 7 5 2 5 6 3 2 6 3 ...
 $ SCR                     : int  776 324 603 196 167 493 479 562 105 170 ...
 $ HOLDING_PERIOD          : int  30 28 2 13 24 26 14 25 15 13 ...
 $ ACC_TYPE                : Factor w/ 2 levels "CA","SA": 2 2 2 1 2 2 2 1 2 2 ...
 $ ACC_OP_DATE             : Factor w/ 4869 levels "01-01-00","01-01-01",..: 3270 1806 3575 993 2861 862 4533 3160 257 334 ...
 $ LEN_OF_RLTN_IN_MNTH     : int  146 104 61 107 185 192 177 99 88 111 ...
 $ NO_OF_L_CR_TXNS         : int  7 8 10 36 20 5 6 14 18 14 ...
 $ NO_OF_L_DR_TXNS         : int  3 2 5 14 1 2 6 3 14 8 ...
 $ TOT_NO_OF_L_TXNS        : int  10 10 15 50 21 7 12 17 32 22 ...
 $ NO_OF_BR_CSH_WDL_DR_TXNS: int  0 0 1 4 1 1 0 3 6 3 ...
 $ NO_OF_ATM_DR_TXNS       : int  1 1 1 2 0 1 1 0 2 1 ...
 $ NO_OF_NET_DR_TXNS       : int  2 1 1 3 0 0 1 0 4 0 ...
 $ NO_OF_MOB_DR_TXNS       : int  0 0 0 1 0 0 0 0 1 0 ...
 $ NO_OF_CHQ_DR_TXNS       : int  0 0 2 4 0 0 4 0 1 4 ...
 $ FLG_HAS_CC              : int  0 0 0 0 0 1 0 0 1 0 ...
 $ AMT_ATM_DR              : int  13100 6600 11200 26100 0 18500 6200 0 35400 18000 ...
 $ AMT_BR_CSH_WDL_DR       : int  0 0 561120 673590 808480 379310 0 945160 198430 869880 ...
 $ AMT_CHQ_DR              : int  0 0 49320 60780 0 0 10580 0 51490 32610 ...
 $ AMT_NET_DR              : num  973557 799813 997570 741506 0 ...
 $ AMT_MOB_DR              : int  0 0 0 71388 0 0 0 0 170332 0 ...
 $ AMT_L_DR                : num  986657 806413 1619210 1573364 808480 ...
 $ FLG_HAS_ANY_CHGS        : int  0 1 1 0 0 0 1 0 0 0 ...
 $ AMT_OTH_BK_ATM_USG_CHGS : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AMT_MIN_BAL_NMC_CHGS    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NO_OF_IW_CHQ_BNC_TXNS   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NO_OF_OW_CHQ_BNC_TXNS   : int  0 0 1 0 0 0 0 0 0 0 ...
 $ AVG_AMT_PER_ATM_TXN     : num  13100 6600 11200 13050 0 ...
 $ AVG_AMT_PER_CSH_WDL_TXN : num  0 0 561120 168398 808480 ...
 $ AVG_AMT_PER_CHQ_TXN     : num  0 0 24660 15195 0 ...
 $ AVG_AMT_PER_NET_TXN     : num  486779 799813 997570 247169 0 ...
 $ AVG_AMT_PER_MOB_TXN     : num  0 0 0 71388 0 ...
 $ FLG_HAS_NOMINEE         : int  1 1 1 1 1 1 0 1 1 0 ...
 $ FLG_HAS_OLD_LOAN        : int  1 0 1 0 0 1 1 1 1 0 ...
 $ random                  : num  1.14e-05 1.11e-04 1.20e-04 1.37e-04 1.74e-04 ...
class(personal_loan$FLG_HAS_ANY_CHGS)
[1] "integer"

Lets take a closer look into the dataset through plotting and remove useless columns

library("VIM")
package 㤼㸱VIM㤼㸲 was built under R version 3.6.3Carregando pacotes exigidos: colorspace
Carregando pacotes exigidos: grid
Carregando pacotes exigidos: data.table
package 㤼㸱data.table㤼㸲 was built under R version 3.6.3Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
data.table 1.12.8 using 2 threads (see ?getDTthreads).  Latest news: r-datatable.com
VIM is ready to use. 
 Since version 4.0.0 the GUI is in its own package VIMGUI.

          Please use the package to use the new (and old) GUI.

Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues

Attaching package: 㤼㸱VIM㤼㸲

The following object is masked from 㤼㸱package:datasets㤼㸲:

    sleep
aggr(personal_loan, prop = F, cex.axis = 0.4, numbers = T)

#There is no missing values

#ID numbers and random numbers could be extracted

Treating variables, imported data dictionary as a support guide for variable treatment

setwd("C:/Users/adminsa/Desktop/Pos Graduacao/Machine Learning/Mybank")

library(readxl)
mybank_dictionary = read_excel("My Bank Case Study-Data dictionary.xlsx")
personal_loan$FLG_HAS_CC <- as.factor(personal_loan$FLG_HAS_CC)
personal_loan$FLG_HAS_ANY_CHGS <- as.factor(personal_loan$FLG_HAS_ANY_CHGS)
personal_loan$FLG_HAS_NOMINEE <- as.factor(personal_loan$FLG_HAS_NOMINEE)
personal_loan$FLG_HAS_OLD_LOAN <- as.factor(personal_loan$FLG_HAS_OLD_LOAN)
personal_loan$ACC_OP_DATE <- as.character(personal_loan$ACC_OP_DATE)

library(lubridate)
package 㤼㸱lubridate㤼㸲 was built under R version 3.6.3
Attaching package: 㤼㸱lubridate㤼㸲

The following objects are masked from 㤼㸱package:data.table㤼㸲:

    hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year

The following objects are masked from 㤼㸱package:base㤼㸲:

    date, intersect, setdiff, union
mdy <- mdy(personal_loan$ACC_OP_DATE) 
dmy <- dmy(personal_loan$ACC_OP_DATE) 
 12050 failed to parse.
mdy[is.na(mdy)] <- dmy[is.na(mdy)] 
personal_loan$ACC_OP_DATE <- mdy 
View(personal_loan)
summary(personal_loan)
    CUST_ID          TARGET            AGE        GENDER       BALANCE           OCCUPATION    AGE_BKT          SCR        HOLDING_PERIOD 
 C1     :    1   Min.   :0.0000   Min.   :21.00   F: 5433   Min.   :      0   PROF    :5417   <25  :1753   Min.   :100.0   Min.   : 1.00  
 C10    :    1   1st Qu.:0.0000   1st Qu.:30.00   M:14376   1st Qu.:  64754   SAL     :5855   >50  :3035   1st Qu.:227.0   1st Qu.: 7.00  
 C100   :    1   Median :0.0000   Median :38.00   O:  191   Median : 231676   SELF-EMP:3568   26-30:3434   Median :364.0   Median :15.00  
 C1000  :    1   Mean   :0.1256   Mean   :38.42             Mean   : 511362   SENP    :5160   31-35:3404   Mean   :440.2   Mean   :14.96  
 C10000 :    1   3rd Qu.:0.0000   3rd Qu.:46.00             3rd Qu.: 653877                   36-40:2814   3rd Qu.:644.0   3rd Qu.:22.00  
 C10001 :    1   Max.   :1.0000   Max.   :55.00             Max.   :8360431                   41-45:3067   Max.   :999.0   Max.   :31.00  
 (Other):19994                                                                                46-50:2493                                  
 ACC_TYPE    ACC_OP_DATE         LEN_OF_RLTN_IN_MNTH NO_OF_L_CR_TXNS NO_OF_L_DR_TXNS  TOT_NO_OF_L_TXNS NO_OF_BR_CSH_WDL_DR_TXNS
 CA: 4241   Min.   :1999-01-02   Min.   : 29.0       Min.   : 0.00   Min.   : 0.000   Min.   :  0.00   Min.   : 0.000          
 SA:15759   1st Qu.:2003-01-26   1st Qu.: 79.0       1st Qu.: 6.00   1st Qu.: 2.000   1st Qu.:  9.00   1st Qu.: 1.000          
            Median :2006-12-23   Median :125.0       Median :10.00   Median : 5.000   Median : 14.00   Median : 1.000          
            Mean   :2006-12-25   Mean   :125.2       Mean   :12.35   Mean   : 6.634   Mean   : 18.98   Mean   : 1.883          
            3rd Qu.:2010-11-16   3rd Qu.:172.0       3rd Qu.:14.00   3rd Qu.: 7.000   3rd Qu.: 21.00   3rd Qu.: 2.000          
            Max.   :2015-01-01   Max.   :221.0       Max.   :75.00   Max.   :74.000   Max.   :149.00   Max.   :15.000          
                                                                                                                               
 NO_OF_ATM_DR_TXNS NO_OF_NET_DR_TXNS NO_OF_MOB_DR_TXNS NO_OF_CHQ_DR_TXNS FLG_HAS_CC   AMT_ATM_DR     AMT_BR_CSH_WDL_DR   AMT_CHQ_DR     
 Min.   : 0.000    Min.   : 0.000    Min.   : 0.0000   Min.   : 0.000    0:13892    Min.   :     0   Min.   :     0    Min.   :      0  
 1st Qu.: 0.000    1st Qu.: 0.000    1st Qu.: 0.0000   1st Qu.: 0.000    1: 6108    1st Qu.:     0   1st Qu.:  2990    1st Qu.:      0  
 Median : 1.000    Median : 0.000    Median : 0.0000   Median : 2.000               Median :  6900   Median :340150    Median :  23840  
 Mean   : 1.029    Mean   : 1.172    Mean   : 0.4118   Mean   : 2.138               Mean   : 10990   Mean   :378475    Mean   : 124520  
 3rd Qu.: 1.000    3rd Qu.: 1.000    3rd Qu.: 0.0000   3rd Qu.: 4.000               3rd Qu.: 15800   3rd Qu.:674675    3rd Qu.:  72470  
 Max.   :25.000    Max.   :22.000    Max.   :25.0000   Max.   :15.000               Max.   :199300   Max.   :999930    Max.   :4928640  
                                                                                                                                        
   AMT_NET_DR       AMT_MOB_DR        AMT_L_DR       FLG_HAS_ANY_CHGS AMT_OTH_BK_ATM_USG_CHGS AMT_MIN_BAL_NMC_CHGS NO_OF_IW_CHQ_BNC_TXNS
 Min.   :     0   Min.   :     0   Min.   :      0   0:17788          Min.   :  0.000         Min.   :  0.000      Min.   :0.00000      
 1st Qu.:     0   1st Qu.:     0   1st Qu.: 237936   1: 2212          1st Qu.:  0.000         1st Qu.:  0.000      1st Qu.:0.00000      
 Median :     0   Median :     0   Median : 695115                    Median :  0.000         Median :  0.000      Median :0.00000      
 Mean   :237308   Mean   : 22425   Mean   : 773717                    Mean   :  1.099         Mean   :  1.292      Mean   :0.04275      
 3rd Qu.:473971   3rd Qu.:     0   3rd Qu.:1078927                    3rd Qu.:  0.000         3rd Qu.:  0.000      3rd Qu.:0.00000      
 Max.   :999854   Max.   :199667   Max.   :6514921                    Max.   :250.000         Max.   :170.000      Max.   :2.00000      
                                                                                                                                        
 NO_OF_OW_CHQ_BNC_TXNS AVG_AMT_PER_ATM_TXN AVG_AMT_PER_CSH_WDL_TXN AVG_AMT_PER_CHQ_TXN AVG_AMT_PER_NET_TXN AVG_AMT_PER_MOB_TXN FLG_HAS_NOMINEE
 Min.   :0.0000        Min.   :    0       Min.   :     0          Min.   :     0      Min.   :     0      Min.   :     0      0: 1977        
 1st Qu.:0.0000        1st Qu.:    0       1st Qu.:  1266          1st Qu.:     0      1st Qu.:     0      1st Qu.:     0      1:18023        
 Median :0.0000        Median : 6000       Median :147095          Median :  8645      Median :     0      Median :     0                     
 Mean   :0.0444        Mean   : 7409       Mean   :242237          Mean   : 25093      Mean   :179059      Mean   : 20304                     
 3rd Qu.:0.0000        3rd Qu.:13500       3rd Qu.:385000          3rd Qu.: 28605      3rd Qu.:257699      3rd Qu.:     0                     
 Max.   :2.0000        Max.   :25000       Max.   :999640          Max.   :537842      Max.   :999854      Max.   :199667                     
                                                                                                                                              
 FLG_HAS_OLD_LOAN     random         
 0:10141          Min.   :0.0000114  
 1: 9859          1st Qu.:0.2481866  
                  Median :0.5061214  
                  Mean   :0.5019330  
                  3rd Qu.:0.7535712  
                  Max.   :0.9999471  
                                     

Visualisation of all the independent and numeric variables through Correlation matrix plot and response rate for the loan proposal. We found a response rate of 12,56%

num_data <- subset(personal_loan[-c(2,4,6,7,10,11,21,28,38,39)])

names(num_data)
 [1] "CUST_ID"                  "AGE"                      "BALANCE"                  "SCR"                      "HOLDING_PERIOD"          
 [6] "LEN_OF_RLTN_IN_MNTH"      "NO_OF_L_CR_TXNS"          "NO_OF_L_DR_TXNS"          "TOT_NO_OF_L_TXNS"         "NO_OF_BR_CSH_WDL_DR_TXNS"
[11] "NO_OF_ATM_DR_TXNS"        "NO_OF_NET_DR_TXNS"        "NO_OF_MOB_DR_TXNS"        "NO_OF_CHQ_DR_TXNS"        "AMT_ATM_DR"              
[16] "AMT_BR_CSH_WDL_DR"        "AMT_CHQ_DR"               "AMT_NET_DR"               "AMT_MOB_DR"               "AMT_L_DR"                
[21] "AMT_OTH_BK_ATM_USG_CHGS"  "AMT_MIN_BAL_NMC_CHGS"     "NO_OF_IW_CHQ_BNC_TXNS"    "NO_OF_OW_CHQ_BNC_TXNS"    "AVG_AMT_PER_ATM_TXN"     
[26] "AVG_AMT_PER_CSH_WDL_TXN"  "AVG_AMT_PER_CHQ_TXN"      "AVG_AMT_PER_NET_TXN"      "AVG_AMT_PER_MOB_TXN"      "random"                  
library(corrplot)
package 㤼㸱corrplot㤼㸲 was built under R version 3.6.3corrplot 0.84 loaded
str(num_data)
'data.frame':   20000 obs. of  30 variables:
 $ CUST_ID                 : Factor w/ 20000 levels "C1","C10","C100",..: 17699 16532 11027 17984 2363 11747 18115 15556 15216 12494 ...
 $ AGE                     : int  27 47 40 53 36 42 30 53 42 30 ...
 $ BALANCE                 : num  3384 287489 18217 71720 1671623 ...
 $ SCR                     : int  776 324 603 196 167 493 479 562 105 170 ...
 $ HOLDING_PERIOD          : int  30 28 2 13 24 26 14 25 15 13 ...
 $ LEN_OF_RLTN_IN_MNTH     : int  146 104 61 107 185 192 177 99 88 111 ...
 $ NO_OF_L_CR_TXNS         : int  7 8 10 36 20 5 6 14 18 14 ...
 $ NO_OF_L_DR_TXNS         : int  3 2 5 14 1 2 6 3 14 8 ...
 $ TOT_NO_OF_L_TXNS        : int  10 10 15 50 21 7 12 17 32 22 ...
 $ NO_OF_BR_CSH_WDL_DR_TXNS: int  0 0 1 4 1 1 0 3 6 3 ...
 $ NO_OF_ATM_DR_TXNS       : int  1 1 1 2 0 1 1 0 2 1 ...
 $ NO_OF_NET_DR_TXNS       : int  2 1 1 3 0 0 1 0 4 0 ...
 $ NO_OF_MOB_DR_TXNS       : int  0 0 0 1 0 0 0 0 1 0 ...
 $ NO_OF_CHQ_DR_TXNS       : int  0 0 2 4 0 0 4 0 1 4 ...
 $ AMT_ATM_DR              : int  13100 6600 11200 26100 0 18500 6200 0 35400 18000 ...
 $ AMT_BR_CSH_WDL_DR       : int  0 0 561120 673590 808480 379310 0 945160 198430 869880 ...
 $ AMT_CHQ_DR              : int  0 0 49320 60780 0 0 10580 0 51490 32610 ...
 $ AMT_NET_DR              : num  973557 799813 997570 741506 0 ...
 $ AMT_MOB_DR              : int  0 0 0 71388 0 0 0 0 170332 0 ...
 $ AMT_L_DR                : num  986657 806413 1619210 1573364 808480 ...
 $ AMT_OTH_BK_ATM_USG_CHGS : int  0 0 0 0 0 0 0 0 0 0 ...
 $ AMT_MIN_BAL_NMC_CHGS    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NO_OF_IW_CHQ_BNC_TXNS   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NO_OF_OW_CHQ_BNC_TXNS   : int  0 0 1 0 0 0 0 0 0 0 ...
 $ AVG_AMT_PER_ATM_TXN     : num  13100 6600 11200 13050 0 ...
 $ AVG_AMT_PER_CSH_WDL_TXN : num  0 0 561120 168398 808480 ...
 $ AVG_AMT_PER_CHQ_TXN     : num  0 0 24660 15195 0 ...
 $ AVG_AMT_PER_NET_TXN     : num  486779 799813 997570 247169 0 ...
 $ AVG_AMT_PER_MOB_TXN     : num  0 0 0 71388 0 ...
 $ random                  : num  1.14e-05 1.11e-04 1.20e-04 1.37e-04 1.74e-04 ...
plt=cor(num_data [ ,-1])
correlation_plot<-corrplot(plt, method="circle",tl.cex=0.5)


corrplot(plt, method="circle",tl.cex=0.5)

response_rate <- (sum(personal_loan$TARGET)/nrow(personal_loan))* 100
response_rate
[1] 12.56

Splittting the data into: Training set; test set: <- Used in part2 Random Forest solution

library(caret)
package 㤼㸱caret㤼㸲 was built under R version 3.6.3Carregando pacotes exigidos: lattice
Carregando pacotes exigidos: ggplot2
Registered S3 method overwritten by 'dplyr':
  method           from
  print.rowwise_df     
Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/
set.seed(123)
index <- createDataPartition(personal_loan$TARGET, p=0.70, list=FALSE)
train <- personal_loan [ index,]
test  <- personal_loan [-index,]

Part 2 <- Random Forest From the output bellow, we find that the Out Of Bag (OOB) error rate is estimated as 12.39% which is the misclassification error rate of the model(OOB) We notice that around the this tree number there is no significant reduction in error rate: (Random_Forest_err.rate) [16,] 0.1221500 2.776190e-03 0.9604358 [17,] 0.1214633 1.959024e-03 0.9604585 [18,] 0.1212489 1.387642e-03 0.9627507 [19,] 0.1212489 1.550894e-03 0.9616046 [20,] 0.1214372 1.305803e-03 0.9644903

library(randomForest)
package 㤼㸱randomForest㤼㸲 was built under R version 3.6.3randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.

Attaching package: 㤼㸱randomForest㤼㸲

The following object is masked from 㤼㸱package:ggplot2㤼㸲:

    margin
Random_Forest <- randomForest(as.factor(TARGET) ~ ., data = train[,-1], 
                   ntree=501, mtry = 7, nodesize = 140,
                   importance=TRUE)

print(Random_Forest)

Call:
 randomForest(formula = as.factor(TARGET) ~ ., data = train[,      -1], ntree = 501, mtry = 7, nodesize = 140, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 501
No. of variables tried at each split: 7

        OOB estimate of  error rate: 12.36%
Confusion matrix:
      0  1 class.error
0 12254  0   0.0000000
1  1731 15   0.9914089
plot(Random_Forest,main = "")


Random_Forest$err.rate
             OOB            0         1
  [1,] 0.1303419 2.213859e-02 0.9049128
  [2,] 0.1313422 2.162902e-02 0.9079457
  [3,] 0.1270649 1.883487e-02 0.8992187
  [4,] 0.1266016 1.637280e-02 0.9043062
  [5,] 0.1273536 1.423907e-02 0.9263293
  [6,] 0.1255144 1.226940e-02 0.9239264
  [7,] 0.1248513 9.512485e-03 0.9360812
  [8,] 0.1248717 8.791025e-03 0.9433294
  [9,] 0.1233125 8.615690e-03 0.9343878
 [10,] 0.1227918 6.751194e-03 0.9408009
 [11,] 0.1225223 5.004102e-03 0.9486736
 [12,] 0.1221243 4.176220e-03 0.9494543
 [13,] 0.1219477 3.763397e-03 0.9512055
 [14,] 0.1226753 3.841438e-03 0.9558739
 [15,] 0.1222651 2.287395e-03 0.9638968
 [16,] 0.1226766 3.267173e-03 0.9604585
 [17,] 0.1221412 2.612885e-03 0.9610315
 [18,] 0.1224009 2.285714e-03 0.9656160
 [19,] 0.1231781 2.204082e-03 0.9719359
 [20,] 0.1227406 1.632520e-03 0.9725086
 [21,] 0.1228033 1.714006e-03 0.9725086
 [22,] 0.1231605 1.877245e-03 0.9742268
 [23,] 0.1222317 1.061051e-03 0.9725086
 [24,] 0.1225088 1.142577e-03 0.9742268
 [25,] 0.1222230 9.793520e-04 0.9730813
 [26,] 0.1221516 5.712887e-04 0.9753723
 [27,] 0.1222944 6.529013e-04 0.9759450
 [28,] 0.1221516 7.345140e-04 0.9742268
 [29,] 0.1224373 4.896760e-04 0.9782360
 [30,] 0.1222857 4.080300e-04 0.9776632
 [31,] 0.1225714 4.896360e-04 0.9793814
 [32,] 0.1226429 4.080300e-04 0.9805269
 [33,] 0.1221429 4.080300e-04 0.9765178
 [34,] 0.1224286 4.896360e-04 0.9782360
 [35,] 0.1222857 6.528480e-04 0.9759450
 [36,] 0.1222857 4.896360e-04 0.9770905
 [37,] 0.1225714 4.896360e-04 0.9793814
 [38,] 0.1224286 5.712420e-04 0.9776632
 [39,] 0.1222857 4.896360e-04 0.9770905
 [40,] 0.1225714 4.896360e-04 0.9793814
 [41,] 0.1225714 4.080300e-04 0.9799542
 [42,] 0.1223571 4.896360e-04 0.9776632
 [43,] 0.1222857 4.896360e-04 0.9770905
 [44,] 0.1223571 4.896360e-04 0.9776632
 [45,] 0.1225000 4.896360e-04 0.9788087
 [46,] 0.1225714 4.896360e-04 0.9793814
 [47,] 0.1226429 4.896360e-04 0.9799542
 [48,] 0.1223571 4.080300e-04 0.9782360
 [49,] 0.1220714 4.080300e-04 0.9759450
 [50,] 0.1221429 4.080300e-04 0.9765178
 [51,] 0.1223571 4.080300e-04 0.9782360
 [52,] 0.1221429 3.264240e-04 0.9770905
 [53,] 0.1223571 3.264240e-04 0.9788087
 [54,] 0.1223571 3.264240e-04 0.9788087
 [55,] 0.1223571 2.448180e-04 0.9793814
 [56,] 0.1223571 3.264240e-04 0.9788087
 [57,] 0.1224286 2.448180e-04 0.9799542
 [58,] 0.1225000 3.264240e-04 0.9799542
 [59,] 0.1225714 4.080300e-04 0.9799542
 [60,] 0.1225000 4.080300e-04 0.9793814
 [61,] 0.1225000 2.448180e-04 0.9805269
 [62,] 0.1226429 3.264240e-04 0.9810997
 [63,] 0.1227857 3.264240e-04 0.9822451
 [64,] 0.1226429 3.264240e-04 0.9810997
 [65,] 0.1224286 2.448180e-04 0.9799542
 [66,] 0.1223571 2.448180e-04 0.9793814
 [67,] 0.1226429 2.448180e-04 0.9816724
 [68,] 0.1227143 2.448180e-04 0.9822451
 [69,] 0.1227857 2.448180e-04 0.9828179
 [70,] 0.1227857 2.448180e-04 0.9828179
 [71,] 0.1230714 3.264240e-04 0.9845361
 [72,] 0.1230714 1.632120e-04 0.9856816
 [73,] 0.1230000 2.448180e-04 0.9845361
 [74,] 0.1229286 2.448180e-04 0.9839633
 [75,] 0.1227143 2.448180e-04 0.9822451
 [76,] 0.1228571 2.448180e-04 0.9833906
 [77,] 0.1226429 2.448180e-04 0.9816724
 [78,] 0.1226429 3.264240e-04 0.9810997
 [79,] 0.1227857 3.264240e-04 0.9822451
 [80,] 0.1227857 3.264240e-04 0.9822451
 [81,] 0.1227857 3.264240e-04 0.9822451
 [82,] 0.1226429 2.448180e-04 0.9816724
 [83,] 0.1228571 2.448180e-04 0.9833906
 [84,] 0.1230714 2.448180e-04 0.9851088
 [85,] 0.1230714 2.448180e-04 0.9851088
 [86,] 0.1231429 1.632120e-04 0.9862543
 [87,] 0.1230000 2.448180e-04 0.9845361
 [88,] 0.1230714 2.448180e-04 0.9851088
 [89,] 0.1229286 2.448180e-04 0.9839633
 [90,] 0.1229286 2.448180e-04 0.9839633
 [91,] 0.1230000 1.632120e-04 0.9851088
 [92,] 0.1230000 1.632120e-04 0.9851088
 [93,] 0.1230714 1.632120e-04 0.9856816
 [94,] 0.1229286 1.632120e-04 0.9845361
 [95,] 0.1230714 2.448180e-04 0.9851088
 [96,] 0.1231429 1.632120e-04 0.9862543
 [97,] 0.1232857 2.448180e-04 0.9868270
 [98,] 0.1232143 1.632120e-04 0.9868270
 [99,] 0.1232143 2.448180e-04 0.9862543
[100,] 0.1232143 1.632120e-04 0.9868270
[101,] 0.1231429 2.448180e-04 0.9856816
[102,] 0.1230714 1.632120e-04 0.9856816
[103,] 0.1232143 1.632120e-04 0.9868270
[104,] 0.1233571 1.632120e-04 0.9879725
[105,] 0.1233571 1.632120e-04 0.9879725
[106,] 0.1233571 8.160601e-05 0.9885452
[107,] 0.1231429 8.160601e-05 0.9868270
[108,] 0.1232143 1.632120e-04 0.9868270
[109,] 0.1232143 8.160601e-05 0.9873998
[110,] 0.1232143 1.632120e-04 0.9868270
[111,] 0.1231429 8.160601e-05 0.9868270
[112,] 0.1230714 8.160601e-05 0.9862543
[113,] 0.1230000 8.160601e-05 0.9856816
[114,] 0.1231429 8.160601e-05 0.9868270
[115,] 0.1232143 8.160601e-05 0.9873998
[116,] 0.1232857 8.160601e-05 0.9879725
[117,] 0.1233571 8.160601e-05 0.9885452
[118,] 0.1232857 8.160601e-05 0.9879725
[119,] 0.1233571 8.160601e-05 0.9885452
[120,] 0.1232857 8.160601e-05 0.9879725
[121,] 0.1232857 8.160601e-05 0.9879725
[122,] 0.1232857 8.160601e-05 0.9879725
[123,] 0.1233571 8.160601e-05 0.9885452
[124,] 0.1232857 8.160601e-05 0.9879725
[125,] 0.1232143 8.160601e-05 0.9873998
[126,] 0.1232143 8.160601e-05 0.9873998
[127,] 0.1233571 8.160601e-05 0.9885452
[128,] 0.1232857 0.000000e+00 0.9885452
[129,] 0.1234286 0.000000e+00 0.9896907
[130,] 0.1233571 0.000000e+00 0.9891180
[131,] 0.1234286 0.000000e+00 0.9896907
[132,] 0.1235000 0.000000e+00 0.9902635
[133,] 0.1233571 0.000000e+00 0.9891180
[134,] 0.1233571 0.000000e+00 0.9891180
[135,] 0.1233571 0.000000e+00 0.9891180
[136,] 0.1232857 0.000000e+00 0.9885452
[137,] 0.1233571 0.000000e+00 0.9891180
[138,] 0.1234286 0.000000e+00 0.9896907
[139,] 0.1232143 8.160601e-05 0.9873998
[140,] 0.1233571 1.632120e-04 0.9879725
[141,] 0.1232143 0.000000e+00 0.9879725
[142,] 0.1231429 0.000000e+00 0.9873998
[143,] 0.1230714 0.000000e+00 0.9868270
[144,] 0.1232857 8.160601e-05 0.9879725
[145,] 0.1232857 0.000000e+00 0.9885452
[146,] 0.1232143 0.000000e+00 0.9879725
[147,] 0.1232143 0.000000e+00 0.9879725
[148,] 0.1231429 0.000000e+00 0.9873998
[149,] 0.1231429 0.000000e+00 0.9873998
[150,] 0.1231429 0.000000e+00 0.9873998
[151,] 0.1232857 0.000000e+00 0.9885452
[152,] 0.1232857 0.000000e+00 0.9885452
[153,] 0.1233571 0.000000e+00 0.9891180
[154,] 0.1235000 0.000000e+00 0.9902635
[155,] 0.1234286 0.000000e+00 0.9896907
[156,] 0.1232857 0.000000e+00 0.9885452
[157,] 0.1232857 0.000000e+00 0.9885452
[158,] 0.1232857 0.000000e+00 0.9885452
[159,] 0.1232143 0.000000e+00 0.9879725
[160,] 0.1232143 0.000000e+00 0.9879725
[161,] 0.1232143 0.000000e+00 0.9879725
[162,] 0.1232857 0.000000e+00 0.9885452
[163,] 0.1232857 0.000000e+00 0.9885452
[164,] 0.1232857 0.000000e+00 0.9885452
[165,] 0.1232857 0.000000e+00 0.9885452
[166,] 0.1232143 0.000000e+00 0.9879725
[167,] 0.1232143 0.000000e+00 0.9879725
[168,] 0.1232857 0.000000e+00 0.9885452
[169,] 0.1232857 0.000000e+00 0.9885452
[170,] 0.1234286 0.000000e+00 0.9896907
[171,] 0.1234286 0.000000e+00 0.9896907
[172,] 0.1234286 0.000000e+00 0.9896907
[173,] 0.1234286 0.000000e+00 0.9896907
[174,] 0.1233571 0.000000e+00 0.9891180
[175,] 0.1233571 0.000000e+00 0.9891180
[176,] 0.1232857 0.000000e+00 0.9885452
[177,] 0.1232857 0.000000e+00 0.9885452
[178,] 0.1233571 0.000000e+00 0.9891180
[179,] 0.1234286 0.000000e+00 0.9896907
[180,] 0.1235000 0.000000e+00 0.9902635
[181,] 0.1234286 0.000000e+00 0.9896907
[182,] 0.1235000 0.000000e+00 0.9902635
[183,] 0.1234286 0.000000e+00 0.9896907
[184,] 0.1234286 0.000000e+00 0.9896907
[185,] 0.1234286 0.000000e+00 0.9896907
[186,] 0.1234286 0.000000e+00 0.9896907
[187,] 0.1233571 0.000000e+00 0.9891180
[188,] 0.1234286 0.000000e+00 0.9896907
[189,] 0.1235000 0.000000e+00 0.9902635
[190,] 0.1235714 0.000000e+00 0.9908362
[191,] 0.1235714 0.000000e+00 0.9908362
[192,] 0.1235714 0.000000e+00 0.9908362
[193,] 0.1234286 0.000000e+00 0.9896907
[194,] 0.1236429 0.000000e+00 0.9914089
[195,] 0.1235714 0.000000e+00 0.9908362
[196,] 0.1235714 0.000000e+00 0.9908362
[197,] 0.1235714 0.000000e+00 0.9908362
[198,] 0.1235714 0.000000e+00 0.9908362
[199,] 0.1235714 0.000000e+00 0.9908362
[200,] 0.1235714 0.000000e+00 0.9908362
[201,] 0.1235714 0.000000e+00 0.9908362
[202,] 0.1235714 0.000000e+00 0.9908362
[203,] 0.1235714 0.000000e+00 0.9908362
[204,] 0.1235714 0.000000e+00 0.9908362
[205,] 0.1235714 0.000000e+00 0.9908362
[206,] 0.1235714 0.000000e+00 0.9908362
[207,] 0.1235714 0.000000e+00 0.9908362
[208,] 0.1235000 0.000000e+00 0.9902635
[209,] 0.1235000 0.000000e+00 0.9902635
[210,] 0.1235000 0.000000e+00 0.9902635
[211,] 0.1234286 0.000000e+00 0.9896907
[212,] 0.1234286 0.000000e+00 0.9896907
[213,] 0.1235000 0.000000e+00 0.9902635
[214,] 0.1235000 0.000000e+00 0.9902635
[215,] 0.1235000 0.000000e+00 0.9902635
[216,] 0.1235000 0.000000e+00 0.9902635
[217,] 0.1234286 0.000000e+00 0.9896907
[218,] 0.1235000 0.000000e+00 0.9902635
[219,] 0.1235714 0.000000e+00 0.9908362
[220,] 0.1235000 0.000000e+00 0.9902635
[221,] 0.1235000 0.000000e+00 0.9902635
[222,] 0.1235714 0.000000e+00 0.9908362
[223,] 0.1235714 0.000000e+00 0.9908362
[224,] 0.1235714 0.000000e+00 0.9908362
[225,] 0.1235714 0.000000e+00 0.9908362
[226,] 0.1235000 0.000000e+00 0.9902635
[227,] 0.1235000 0.000000e+00 0.9902635
[228,] 0.1235714 0.000000e+00 0.9908362
[229,] 0.1235714 0.000000e+00 0.9908362
[230,] 0.1235714 0.000000e+00 0.9908362
[231,] 0.1235714 0.000000e+00 0.9908362
[232,] 0.1235714 0.000000e+00 0.9908362
[233,] 0.1234286 0.000000e+00 0.9896907
[234,] 0.1234286 0.000000e+00 0.9896907
[235,] 0.1235000 0.000000e+00 0.9902635
[236,] 0.1235000 0.000000e+00 0.9902635
[237,] 0.1235000 0.000000e+00 0.9902635
[238,] 0.1235000 0.000000e+00 0.9902635
[239,] 0.1235714 0.000000e+00 0.9908362
[240,] 0.1234286 0.000000e+00 0.9896907
[241,] 0.1235000 0.000000e+00 0.9902635
[242,] 0.1235714 0.000000e+00 0.9908362
[243,] 0.1235714 0.000000e+00 0.9908362
[244,] 0.1235714 0.000000e+00 0.9908362
[245,] 0.1235714 0.000000e+00 0.9908362
[246,] 0.1235714 0.000000e+00 0.9908362
[247,] 0.1235714 0.000000e+00 0.9908362
[248,] 0.1235000 0.000000e+00 0.9902635
[249,] 0.1235714 0.000000e+00 0.9908362
[250,] 0.1235714 0.000000e+00 0.9908362
[251,] 0.1235714 0.000000e+00 0.9908362
[252,] 0.1235714 0.000000e+00 0.9908362
[253,] 0.1235000 0.000000e+00 0.9902635
[254,] 0.1235714 0.000000e+00 0.9908362
[255,] 0.1235000 0.000000e+00 0.9902635
[256,] 0.1235000 0.000000e+00 0.9902635
[257,] 0.1235714 0.000000e+00 0.9908362
[258,] 0.1235000 0.000000e+00 0.9902635
[259,] 0.1235000 0.000000e+00 0.9902635
[260,] 0.1235000 0.000000e+00 0.9902635
[261,] 0.1235000 0.000000e+00 0.9902635
[262,] 0.1235000 0.000000e+00 0.9902635
[263,] 0.1235000 0.000000e+00 0.9902635
[264,] 0.1235000 0.000000e+00 0.9902635
[265,] 0.1235000 0.000000e+00 0.9902635
[266,] 0.1235000 0.000000e+00 0.9902635
[267,] 0.1235000 0.000000e+00 0.9902635
[268,] 0.1235000 0.000000e+00 0.9902635
[269,] 0.1235000 0.000000e+00 0.9902635
[270,] 0.1235000 0.000000e+00 0.9902635
[271,] 0.1235000 0.000000e+00 0.9902635
[272,] 0.1235000 0.000000e+00 0.9902635
[273,] 0.1235000 0.000000e+00 0.9902635
[274,] 0.1235000 0.000000e+00 0.9902635
[275,] 0.1235000 0.000000e+00 0.9902635
[276,] 0.1235000 0.000000e+00 0.9902635
[277,] 0.1235000 0.000000e+00 0.9902635
[278,] 0.1235000 0.000000e+00 0.9902635
[279,] 0.1235000 0.000000e+00 0.9902635
[280,] 0.1235714 0.000000e+00 0.9908362
[281,] 0.1235714 0.000000e+00 0.9908362
[282,] 0.1234286 0.000000e+00 0.9896907
[283,] 0.1235000 0.000000e+00 0.9902635
[284,] 0.1235714 0.000000e+00 0.9908362
[285,] 0.1235714 0.000000e+00 0.9908362
[286,] 0.1235714 0.000000e+00 0.9908362
[287,] 0.1235000 0.000000e+00 0.9902635
[288,] 0.1235714 0.000000e+00 0.9908362
[289,] 0.1235714 0.000000e+00 0.9908362
[290,] 0.1235714 0.000000e+00 0.9908362
[291,] 0.1235714 0.000000e+00 0.9908362
[292,] 0.1235714 0.000000e+00 0.9908362
[293,] 0.1235714 0.000000e+00 0.9908362
[294,] 0.1236429 0.000000e+00 0.9914089
[295,] 0.1236429 0.000000e+00 0.9914089
[296,] 0.1236429 0.000000e+00 0.9914089
[297,] 0.1236429 0.000000e+00 0.9914089
[298,] 0.1235714 0.000000e+00 0.9908362
[299,] 0.1235714 0.000000e+00 0.9908362
[300,] 0.1236429 0.000000e+00 0.9914089
[301,] 0.1236429 0.000000e+00 0.9914089
[302,] 0.1236429 0.000000e+00 0.9914089
[303,] 0.1236429 0.000000e+00 0.9914089
[304,] 0.1236429 0.000000e+00 0.9914089
[305,] 0.1236429 0.000000e+00 0.9914089
[306,] 0.1235714 0.000000e+00 0.9908362
[307,] 0.1236429 0.000000e+00 0.9914089
[308,] 0.1236429 0.000000e+00 0.9914089
[309,] 0.1236429 0.000000e+00 0.9914089
[310,] 0.1235714 0.000000e+00 0.9908362
[311,] 0.1235714 0.000000e+00 0.9908362
[312,] 0.1235714 0.000000e+00 0.9908362
[313,] 0.1235714 0.000000e+00 0.9908362
[314,] 0.1235714 0.000000e+00 0.9908362
[315,] 0.1235000 0.000000e+00 0.9902635
[316,] 0.1235000 0.000000e+00 0.9902635
[317,] 0.1235000 0.000000e+00 0.9902635
[318,] 0.1234286 0.000000e+00 0.9896907
[319,] 0.1235714 0.000000e+00 0.9908362
[320,] 0.1235000 0.000000e+00 0.9902635
[321,] 0.1235714 0.000000e+00 0.9908362
[322,] 0.1236429 0.000000e+00 0.9914089
[323,] 0.1236429 0.000000e+00 0.9914089
[324,] 0.1236429 0.000000e+00 0.9914089
[325,] 0.1236429 0.000000e+00 0.9914089
[326,] 0.1237143 0.000000e+00 0.9919817
[327,] 0.1237143 0.000000e+00 0.9919817
[328,] 0.1237143 0.000000e+00 0.9919817
[329,] 0.1237143 0.000000e+00 0.9919817
[330,] 0.1237143 0.000000e+00 0.9919817
[331,] 0.1237143 0.000000e+00 0.9919817
[332,] 0.1237143 0.000000e+00 0.9919817
[333,] 0.1237143 0.000000e+00 0.9919817
 [ reached getOption("max.print") -- omitted 168 rows ]

Reaching the optimal tree number (16) lowest OOB, Tuning the random forest created and right after the variables Importance, plot with mean decrease (accuracy , Gini)

library(randomForest)
tRandom_Forest <- tuneRF(x = train[,-c(1,2)], 
              y=as.factor(train$TARGET),
              mtryStart = 7, 
              ntreeTry=13, 
              stepFactor = 1.5, 
              improve = 0.001, 
              trace=TRUE, 
              plot = TRUE,
              doBest = TRUE,
              nodesize = 140, 
              importance=TRUE
)
mtry = 7  OOB error = 12.21% 
Searching left ...
mtry = 5    OOB error = 12.37% 
-0.01245447 0.001 
Searching right ...
mtry = 10   OOB error = 12.25% 
-0.002571925 0.001 

print(tRandom_Forest)

Call:
 randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1],      nodesize = 140, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 7

        OOB estimate of  error rate: 12.39%
Confusion matrix:
      0  1 class.error
0 12254  0   0.0000000
1  1735 11   0.9936999
tRandom_Forest$importance
                                     0             1 MeanDecreaseAccuracy MeanDecreaseGini
AGE                       5.932247e-04  3.238225e-03         9.223429e-04       17.5608787
GENDER                    5.076717e-04  1.480950e-03         6.288553e-04        7.8265449
BALANCE                   1.187693e-03  1.674465e-02         3.126277e-03       50.4338403
OCCUPATION                1.838211e-03  1.386206e-02         3.336996e-03       34.7108695
AGE_BKT                   8.259261e-04  6.532289e-03         1.534123e-03       25.2933806
SCR                       1.255491e-03  1.351321e-02         2.781627e-03       50.6646445
HOLDING_PERIOD            9.190341e-03  1.494535e-02         9.907399e-03       49.3860194
ACC_TYPE                  1.365082e-03 -1.066990e-03         1.064522e-03        2.3964117
ACC_OP_DATE               6.561090e-03  1.189101e-03         5.886271e-03       30.5775669
LEN_OF_RLTN_IN_MNTH       6.273823e-03 -1.036106e-03         5.357710e-03       21.5287976
NO_OF_L_CR_TXNS           2.096151e-02 -8.529834e-03         1.729329e-02       45.6320700
NO_OF_L_DR_TXNS           3.791543e-02 -1.248248e-02         3.164258e-02       26.9084114
TOT_NO_OF_L_TXNS          2.686632e-02 -1.570285e-02         2.156086e-02       46.5631157
NO_OF_BR_CSH_WDL_DR_TXNS  1.377455e-03  1.798422e-03         1.431976e-03        9.4886590
NO_OF_ATM_DR_TXNS         3.164534e-02 -1.461622e-02         2.589405e-02       13.7124619
NO_OF_NET_DR_TXNS         4.762323e-03 -1.544224e-03         3.969718e-03        6.2904795
NO_OF_MOB_DR_TXNS         2.167643e-03 -1.400384e-03         1.717745e-03        2.5774314
NO_OF_CHQ_DR_TXNS         6.725825e-03  1.396583e-03         6.059643e-03       12.6928994
FLG_HAS_CC                1.111514e-03  9.385764e-03         2.141909e-03       18.5910504
AMT_ATM_DR                1.560734e-02 -4.870681e-03         1.304836e-02       24.5654222
AMT_BR_CSH_WDL_DR         5.175306e-03  1.161031e-03         4.675804e-03       23.2062579
AMT_CHQ_DR                1.016798e-02 -2.699261e-03         8.561548e-03       20.2982371
AMT_NET_DR                5.580747e-03 -1.469755e-04         4.864232e-03       16.0770231
AMT_MOB_DR                2.662413e-03  1.095472e-03         2.463750e-03       12.7798973
AMT_L_DR                  1.145189e-02 -3.329270e-03         9.610738e-03       31.6845867
FLG_HAS_ANY_CHGS          8.915067e-05  4.378423e-04         1.324044e-04        3.7201612
AMT_OTH_BK_ATM_USG_CHGS  -1.512057e-05  1.070626e-04         9.683170e-10        0.3906525
AMT_MIN_BAL_NMC_CHGS      1.965656e-05  4.391212e-04         7.223683e-05        1.4938466
NO_OF_IW_CHQ_BNC_TXNS     4.750908e-05  2.046811e-04         6.689298e-05        1.8395124
NO_OF_OW_CHQ_BNC_TXNS     3.420120e-05  2.333444e-04         5.900232e-05        2.0296702
AVG_AMT_PER_ATM_TXN       1.308650e-02 -4.772692e-03         1.086068e-02       24.6949837
AVG_AMT_PER_CSH_WDL_TXN   4.300335e-03  7.484263e-04         3.861247e-03       20.5845711
AVG_AMT_PER_CHQ_TXN       1.011593e-02 -3.807532e-03         8.376979e-03       20.2305099
AVG_AMT_PER_NET_TXN       4.613202e-03 -1.970163e-04         4.009455e-03       16.5712449
AVG_AMT_PER_MOB_TXN       3.257375e-03  7.712126e-04         2.951839e-03       13.4358719
FLG_HAS_NOMINEE           1.238161e-05  3.042300e-05         1.471734e-05        1.0512241
FLG_HAS_OLD_LOAN          6.629774e-05  3.634775e-04         1.030548e-04        1.7875054
random                   -5.384687e-05 -1.090484e-05        -4.836985e-05       11.0100596
varImpPlot(tRandom_Forest,
           sort = T,
           main="Variable Importance",
           n.var=37)

Scoring syntax <- Create columns for predicts score and class

train$predict.class <- predict(tRandom_Forest, train, type="class")
train$predict.score <- predict(tRandom_Forest, train, type="prob")
head(train)
NA

Model Performance Measures - Rank ordering - TrainSet

library("StatMeasures")
package 㤼㸱StatMeasures㤼㸲 was built under R version 3.6.3
Attaching package: 㤼㸱StatMeasures㤼㸲

The following object is masked _by_ 㤼㸱.GlobalEnv㤼㸲:

    decile

The following object is masked from 㤼㸱package:VIM㤼㸲:

    mape
decile <- function(x){
  deciles <- vector(length=10)
  for (i in seq(0.1,1,.1)){
    deciles[i*10] <- quantile(x, i, na.rm=T)
  }
      return (
    ifelse(x<deciles[1], 1,
           ifelse(x<deciles[2], 2,
                  ifelse(x<deciles[3], 3,
                         ifelse(x<deciles[4], 4,
                                ifelse(x<deciles[5], 5,
                                       ifelse(x<deciles[6], 6,
                                              ifelse(x<deciles[7], 7,
                                                     ifelse(x<deciles[8], 8,
                                                            ifelse(x<deciles[9], 9, 10
                                                            ))))))))))
}


train$deciles <- decile(train$predict.score[,2])

library(data.table)
tmp_DT_rf = data.table(train)
rank_rf <- tmp_DT_rf[, list(
  cnt = length(TARGET), 
  cnt_resp = sum(TARGET), 
  cnt_non_resp = sum(TARGET == 0)), 
  by=deciles][order(-deciles)]
rank_rf$rrate <- round (rank_rf$cnt_resp / rank_rf$cnt,2);
rank_rf$cum_resp <- cumsum(rank_rf$cnt_resp)
rank_rf$cum_non_resp <- cumsum(rank_rf$cnt_non_resp)
rank_rf$cum_rel_resp <- round(rank_rf$cum_resp / sum(rank_rf$cnt_resp),2);
rank_rf$cum_rel_non_resp <- round(rank_rf$cum_non_resp / sum(rank_rf$cnt_non_resp),2);
rank_rf$ks <- abs(rank_rf$cum_rel_resp - rank_rf$cum_rel_non_resp)

library(scales)
package 㤼㸱scales㤼㸲 was built under R version 3.6.3
rank_rf$rrate <- percent(rank_rf$rrate)
rank_rf$cum_rel_resp <- percent(rank_rf$cum_rel_resp)
rank_rf$cum_rel_non_resp <- percent(rank_rf$cum_rel_non_resp)
rank_rf
NA

Receiver Operating Characteristic (ROC) A Receiver Operating Characteristic (ROC) Curve is a way to compare diagnostic tests. It is a plot of the true positive rate against the false positive rate.* A ROC plot shows: The relationship between sensitivity and specificity.

library(ROCR)
pred_rf <- prediction(train$predict.score[,2],train$TARGET)
perf_rf <- performance(pred_rf, "tpr", "fpr")
plot(perf_rf)

Validation of model using Test_Set <- Rank ordering technique: We have to create two columns as we did with the train set <- Predict score; predict class

test$predict.class <- predict(tRandom_Forest, test, type="class")
test$predict.score <- predict(tRandom_Forest, test, type="prob")

test$deciles <- decile(test$predict.score[,2])

Model Performance Measures - Rank ordering - TrainSet - Test Set

tmp_DT_rf2 = data.table(test)
h_rank_rf2 <- tmp_DT_rf2[, list(
  cnt = length(TARGET), 
  cnt_resp = sum(TARGET), 
  cnt_non_resp = sum(TARGET == 0)) , 
  by=deciles][order(-deciles)]
h_rank_rf2$rrate <- round (h_rank_rf2$cnt_resp / h_rank_rf2$cnt,2);
h_rank_rf2$cum_resp <- cumsum(h_rank_rf2$cnt_resp)
h_rank_rf2$cum_non_resp <- cumsum(h_rank_rf2$cnt_non_resp)
h_rank_rf2$cum_rel_resp <- round(h_rank_rf2$cum_resp / sum(h_rank_rf2$cnt_resp),2);
h_rank_rf2$cum_rel_non_resp <- round(h_rank_rf2$cum_non_resp / sum(h_rank_rf2$cnt_non_resp),2);
h_rank_rf2$ks <- abs(h_rank_rf2$cum_rel_resp - h_rank_rf2$cum_rel_non_resp)

library(scales)
h_rank_rf2$rrate <- percent(h_rank_rf2$rrate)
h_rank_rf2$cum_rel_resp <- percent(h_rank_rf2$cum_rel_resp)
h_rank_rf2$cum_rel_non_resp <- percent(h_rank_rf2$cum_rel_non_resp)

h_rank_rf2
NA

ROC curve (test set)

library(ROCR)
pred_rf2 <- prediction(test$predict.score[,2],test$TARGET)
perf_rf2 <- performance(pred_rf2, "tpr", "fpr")
plot(perf_rf2)

Confusion Matrix, as confusion_matrix1 <- Train set ** confusion_matrix2 <- Test Set

library(caret)
library(e1071)
package 㤼㸱e1071㤼㸲 was built under R version 3.6.3
train$TARGET = as.factor(train$TARGET)
class(train$TARGET)
[1] "factor"
test$TARGET  = as.factor(test$TARGET)
class(test$predict.class)
[1] "factor"
#Train
confusion_matrix1 = confusionMatrix(train$predict.class, train$TARGET, positive = "1")
print(confusion_matrix1)
Confusion Matrix and Statistics

          Reference
Prediction     0     1
         0 12254  1725
         1     0    21
                                          
               Accuracy : 0.8768          
                 95% CI : (0.8712, 0.8822)
    No Information Rate : 0.8753          
    P-Value [Acc > NIR] : 0.3008          
                                          
                  Kappa : 0.0209          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.01203         
            Specificity : 1.00000         
         Pos Pred Value : 1.00000         
         Neg Pred Value : 0.87660         
             Prevalence : 0.12471         
         Detection Rate : 0.00150         
   Detection Prevalence : 0.00150         
      Balanced Accuracy : 0.50601         
                                          
       'Positive' Class : 1               
                                          
#Test
confusion_matrix2 = confusionMatrix(test$predict.class, test$TARGET, positive = "1")
print(confusion_matrix1)
Confusion Matrix and Statistics

          Reference
Prediction     0     1
         0 12254  1725
         1     0    21
                                          
               Accuracy : 0.8768          
                 95% CI : (0.8712, 0.8822)
    No Information Rate : 0.8753          
    P-Value [Acc > NIR] : 0.3008          
                                          
                  Kappa : 0.0209          
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.01203         
            Specificity : 1.00000         
         Pos Pred Value : 1.00000         
         Neg Pred Value : 0.87660         
             Prevalence : 0.12471         
         Detection Rate : 0.00150         
   Detection Prevalence : 0.00150         
      Balanced Accuracy : 0.50601         
                                          
       'Positive' Class : 1               
                                          
