The Project: Part 1 - Classification Tree * Split data into Development (70%) and Hold-out (30%) Sample * Build Classification Tree using CART technique * Do necessary pruning * Measure Model Performance on Development Sample * Test Model Performance on Hold Out Sample * Ensure the model is not an overfit model
Part 2 - Random Forest * Split data into Development (70%) and Hold-out (30%) Sample * Build Model using Random Forest technique * Measure Model Performance on Development Sample * Test Model Performance on Hold Out Sample * Ensure the model is not an overfit model
Lets export the data, import dataset, search for missing values and take an overall view
setwd("C:/Users/adminsa/Desktop/Pos Graduacao/Machine Learning/Mybank")
read.csv("My Bank Case Study-dataset.csv", header = TRUE)
personal_loan <- read.table("My Bank Case Study-dataset.csv", sep = ",", header = TRUE)
View(personal_loan)
summary(personal_loan)
CUST_ID TARGET AGE GENDER BALANCE OCCUPATION AGE_BKT SCR HOLDING_PERIOD
C1 : 1 Min. :0.0000 Min. :21.00 F: 5433 Min. : 0 PROF :5417 <25 :1753 Min. :100.0 Min. : 1.00
C10 : 1 1st Qu.:0.0000 1st Qu.:30.00 M:14376 1st Qu.: 64754 SAL :5855 >50 :3035 1st Qu.:227.0 1st Qu.: 7.00
C100 : 1 Median :0.0000 Median :38.00 O: 191 Median : 231676 SELF-EMP:3568 26-30:3434 Median :364.0 Median :15.00
C1000 : 1 Mean :0.1256 Mean :38.42 Mean : 511362 SENP :5160 31-35:3404 Mean :440.2 Mean :14.96
C10000 : 1 3rd Qu.:0.0000 3rd Qu.:46.00 3rd Qu.: 653877 36-40:2814 3rd Qu.:644.0 3rd Qu.:22.00
C10001 : 1 Max. :1.0000 Max. :55.00 Max. :8360431 41-45:3067 Max. :999.0 Max. :31.00
(Other):19994 46-50:2493
ACC_TYPE ACC_OP_DATE LEN_OF_RLTN_IN_MNTH NO_OF_L_CR_TXNS NO_OF_L_DR_TXNS TOT_NO_OF_L_TXNS NO_OF_BR_CSH_WDL_DR_TXNS
CA: 4241 11/16/2010: 24 Min. : 29.0 Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
SA:15759 04-03-09 : 23 1st Qu.: 79.0 1st Qu.: 6.00 1st Qu.: 2.000 1st Qu.: 9.00 1st Qu.: 1.000
7/25/2010 : 22 Median :125.0 Median :10.00 Median : 5.000 Median : 14.00 Median : 1.000
05-06-13 : 21 Mean :125.2 Mean :12.35 Mean : 6.634 Mean : 18.98 Mean : 1.883
02-07-07 : 20 3rd Qu.:172.0 3rd Qu.:14.00 3rd Qu.: 7.000 3rd Qu.: 21.00 3rd Qu.: 2.000
8/24/2010 : 20 Max. :221.0 Max. :75.00 Max. :74.000 Max. :149.00 Max. :15.000
(Other) :19870
NO_OF_ATM_DR_TXNS NO_OF_NET_DR_TXNS NO_OF_MOB_DR_TXNS NO_OF_CHQ_DR_TXNS FLG_HAS_CC AMT_ATM_DR AMT_BR_CSH_WDL_DR AMT_CHQ_DR
Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. :0.0000 Min. : 0 Min. : 0 Min. : 0
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0 1st Qu.: 2990 1st Qu.: 0
Median : 1.000 Median : 0.000 Median : 0.0000 Median : 2.000 Median :0.0000 Median : 6900 Median :340150 Median : 23840
Mean : 1.029 Mean : 1.172 Mean : 0.4118 Mean : 2.138 Mean :0.3054 Mean : 10990 Mean :378475 Mean : 124520
3rd Qu.: 1.000 3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 4.000 3rd Qu.:1.0000 3rd Qu.: 15800 3rd Qu.:674675 3rd Qu.: 72470
Max. :25.000 Max. :22.000 Max. :25.0000 Max. :15.000 Max. :1.0000 Max. :199300 Max. :999930 Max. :4928640
AMT_NET_DR AMT_MOB_DR AMT_L_DR FLG_HAS_ANY_CHGS AMT_OTH_BK_ATM_USG_CHGS AMT_MIN_BAL_NMC_CHGS NO_OF_IW_CHQ_BNC_TXNS
Min. : 0 Min. : 0 Min. : 0 Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. :0.00000
1st Qu.: 0 1st Qu.: 0 1st Qu.: 237936 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.00000
Median : 0 Median : 0 Median : 695115 Median :0.0000 Median : 0.000 Median : 0.000 Median :0.00000
Mean :237308 Mean : 22425 Mean : 773717 Mean :0.1106 Mean : 1.099 Mean : 1.292 Mean :0.04275
3rd Qu.:473971 3rd Qu.: 0 3rd Qu.:1078927 3rd Qu.:0.0000 3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.:0.00000
Max. :999854 Max. :199667 Max. :6514921 Max. :1.0000 Max. :250.000 Max. :170.000 Max. :2.00000
NO_OF_OW_CHQ_BNC_TXNS AVG_AMT_PER_ATM_TXN AVG_AMT_PER_CSH_WDL_TXN AVG_AMT_PER_CHQ_TXN AVG_AMT_PER_NET_TXN AVG_AMT_PER_MOB_TXN
Min. :0.0000 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0
1st Qu.:0.0000 1st Qu.: 0 1st Qu.: 1266 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0
Median :0.0000 Median : 6000 Median :147095 Median : 8645 Median : 0 Median : 0
Mean :0.0444 Mean : 7409 Mean :242237 Mean : 25093 Mean :179059 Mean : 20304
3rd Qu.:0.0000 3rd Qu.:13500 3rd Qu.:385000 3rd Qu.: 28605 3rd Qu.:257699 3rd Qu.: 0
Max. :2.0000 Max. :25000 Max. :999640 Max. :537842 Max. :999854 Max. :199667
FLG_HAS_NOMINEE FLG_HAS_OLD_LOAN random
Min. :0.0000 Min. :0.0000 Min. :0.0000114
1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.2481866
Median :1.0000 Median :0.0000 Median :0.5061214
Mean :0.9012 Mean :0.4929 Mean :0.5019330
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.7535712
Max. :1.0000 Max. :1.0000 Max. :0.9999471
#apparently there is no missing values
str(personal_loan)
'data.frame': 20000 obs. of 40 variables:
$ CUST_ID : Factor w/ 20000 levels "C1","C10","C100",..: 17699 16532 11027 17984 2363 11747 18115 15556 15216 12494 ...
$ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
$ AGE : int 27 47 40 53 36 42 30 53 42 30 ...
$ GENDER : Factor w/ 3 levels "F","M","O": 2 2 2 2 2 1 2 1 1 2 ...
$ BALANCE : num 3384 287489 18217 71720 1671623 ...
$ OCCUPATION : Factor w/ 4 levels "PROF","SAL","SELF-EMP",..: 3 2 3 2 1 1 1 2 3 1 ...
$ AGE_BKT : Factor w/ 7 levels "<25",">50","26-30",..: 3 7 5 2 5 6 3 2 6 3 ...
$ SCR : int 776 324 603 196 167 493 479 562 105 170 ...
$ HOLDING_PERIOD : int 30 28 2 13 24 26 14 25 15 13 ...
$ ACC_TYPE : Factor w/ 2 levels "CA","SA": 2 2 2 1 2 2 2 1 2 2 ...
$ ACC_OP_DATE : Factor w/ 4869 levels "01-01-00","01-01-01",..: 3270 1806 3575 993 2861 862 4533 3160 257 334 ...
$ LEN_OF_RLTN_IN_MNTH : int 146 104 61 107 185 192 177 99 88 111 ...
$ NO_OF_L_CR_TXNS : int 7 8 10 36 20 5 6 14 18 14 ...
$ NO_OF_L_DR_TXNS : int 3 2 5 14 1 2 6 3 14 8 ...
$ TOT_NO_OF_L_TXNS : int 10 10 15 50 21 7 12 17 32 22 ...
$ NO_OF_BR_CSH_WDL_DR_TXNS: int 0 0 1 4 1 1 0 3 6 3 ...
$ NO_OF_ATM_DR_TXNS : int 1 1 1 2 0 1 1 0 2 1 ...
$ NO_OF_NET_DR_TXNS : int 2 1 1 3 0 0 1 0 4 0 ...
$ NO_OF_MOB_DR_TXNS : int 0 0 0 1 0 0 0 0 1 0 ...
$ NO_OF_CHQ_DR_TXNS : int 0 0 2 4 0 0 4 0 1 4 ...
$ FLG_HAS_CC : int 0 0 0 0 0 1 0 0 1 0 ...
$ AMT_ATM_DR : int 13100 6600 11200 26100 0 18500 6200 0 35400 18000 ...
$ AMT_BR_CSH_WDL_DR : int 0 0 561120 673590 808480 379310 0 945160 198430 869880 ...
$ AMT_CHQ_DR : int 0 0 49320 60780 0 0 10580 0 51490 32610 ...
$ AMT_NET_DR : num 973557 799813 997570 741506 0 ...
$ AMT_MOB_DR : int 0 0 0 71388 0 0 0 0 170332 0 ...
$ AMT_L_DR : num 986657 806413 1619210 1573364 808480 ...
$ FLG_HAS_ANY_CHGS : int 0 1 1 0 0 0 1 0 0 0 ...
$ AMT_OTH_BK_ATM_USG_CHGS : int 0 0 0 0 0 0 0 0 0 0 ...
$ AMT_MIN_BAL_NMC_CHGS : int 0 0 0 0 0 0 0 0 0 0 ...
$ NO_OF_IW_CHQ_BNC_TXNS : int 0 0 0 0 0 0 0 0 0 0 ...
$ NO_OF_OW_CHQ_BNC_TXNS : int 0 0 1 0 0 0 0 0 0 0 ...
$ AVG_AMT_PER_ATM_TXN : num 13100 6600 11200 13050 0 ...
$ AVG_AMT_PER_CSH_WDL_TXN : num 0 0 561120 168398 808480 ...
$ AVG_AMT_PER_CHQ_TXN : num 0 0 24660 15195 0 ...
$ AVG_AMT_PER_NET_TXN : num 486779 799813 997570 247169 0 ...
$ AVG_AMT_PER_MOB_TXN : num 0 0 0 71388 0 ...
$ FLG_HAS_NOMINEE : int 1 1 1 1 1 1 0 1 1 0 ...
$ FLG_HAS_OLD_LOAN : int 1 0 1 0 0 1 1 1 1 0 ...
$ random : num 1.14e-05 1.11e-04 1.20e-04 1.37e-04 1.74e-04 ...
class(personal_loan$FLG_HAS_ANY_CHGS)
[1] "integer"
Lets take a closer look into the dataset through plotting and remove useless columns
library("VIM")
package 㤼㸱VIM㤼㸲 was built under R version 3.6.3Carregando pacotes exigidos: colorspace
Carregando pacotes exigidos: grid
Carregando pacotes exigidos: data.table
package 㤼㸱data.table㤼㸲 was built under R version 3.6.3Registered S3 method overwritten by 'data.table':
method from
print.data.table
data.table 1.12.8 using 2 threads (see ?getDTthreads). Latest news: r-datatable.com
VIM is ready to use.
Since version 4.0.0 the GUI is in its own package VIMGUI.
Please use the package to use the new (and old) GUI.
Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
Attaching package: 㤼㸱VIM㤼㸲
The following object is masked from 㤼㸱package:datasets㤼㸲:
sleep
aggr(personal_loan, prop = F, cex.axis = 0.4, numbers = T)
#There is no missing values
#ID numbers and random numbers could be extracted
Treating variables, imported data dictionary as a support guide for variable treatment
setwd("C:/Users/adminsa/Desktop/Pos Graduacao/Machine Learning/Mybank")
library(readxl)
mybank_dictionary = read_excel("My Bank Case Study-Data dictionary.xlsx")
personal_loan$FLG_HAS_CC <- as.factor(personal_loan$FLG_HAS_CC)
personal_loan$FLG_HAS_ANY_CHGS <- as.factor(personal_loan$FLG_HAS_ANY_CHGS)
personal_loan$FLG_HAS_NOMINEE <- as.factor(personal_loan$FLG_HAS_NOMINEE)
personal_loan$FLG_HAS_OLD_LOAN <- as.factor(personal_loan$FLG_HAS_OLD_LOAN)
personal_loan$ACC_OP_DATE <- as.character(personal_loan$ACC_OP_DATE)
library(lubridate)
package 㤼㸱lubridate㤼㸲 was built under R version 3.6.3
Attaching package: 㤼㸱lubridate㤼㸲
The following objects are masked from 㤼㸱package:data.table㤼㸲:
hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year
The following objects are masked from 㤼㸱package:base㤼㸲:
date, intersect, setdiff, union
mdy <- mdy(personal_loan$ACC_OP_DATE)
dmy <- dmy(personal_loan$ACC_OP_DATE)
12050 failed to parse.
mdy[is.na(mdy)] <- dmy[is.na(mdy)]
personal_loan$ACC_OP_DATE <- mdy
View(personal_loan)
summary(personal_loan)
CUST_ID TARGET AGE GENDER BALANCE OCCUPATION AGE_BKT SCR HOLDING_PERIOD
C1 : 1 Min. :0.0000 Min. :21.00 F: 5433 Min. : 0 PROF :5417 <25 :1753 Min. :100.0 Min. : 1.00
C10 : 1 1st Qu.:0.0000 1st Qu.:30.00 M:14376 1st Qu.: 64754 SAL :5855 >50 :3035 1st Qu.:227.0 1st Qu.: 7.00
C100 : 1 Median :0.0000 Median :38.00 O: 191 Median : 231676 SELF-EMP:3568 26-30:3434 Median :364.0 Median :15.00
C1000 : 1 Mean :0.1256 Mean :38.42 Mean : 511362 SENP :5160 31-35:3404 Mean :440.2 Mean :14.96
C10000 : 1 3rd Qu.:0.0000 3rd Qu.:46.00 3rd Qu.: 653877 36-40:2814 3rd Qu.:644.0 3rd Qu.:22.00
C10001 : 1 Max. :1.0000 Max. :55.00 Max. :8360431 41-45:3067 Max. :999.0 Max. :31.00
(Other):19994 46-50:2493
ACC_TYPE ACC_OP_DATE LEN_OF_RLTN_IN_MNTH NO_OF_L_CR_TXNS NO_OF_L_DR_TXNS TOT_NO_OF_L_TXNS NO_OF_BR_CSH_WDL_DR_TXNS
CA: 4241 Min. :1999-01-02 Min. : 29.0 Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
SA:15759 1st Qu.:2003-01-26 1st Qu.: 79.0 1st Qu.: 6.00 1st Qu.: 2.000 1st Qu.: 9.00 1st Qu.: 1.000
Median :2006-12-23 Median :125.0 Median :10.00 Median : 5.000 Median : 14.00 Median : 1.000
Mean :2006-12-25 Mean :125.2 Mean :12.35 Mean : 6.634 Mean : 18.98 Mean : 1.883
3rd Qu.:2010-11-16 3rd Qu.:172.0 3rd Qu.:14.00 3rd Qu.: 7.000 3rd Qu.: 21.00 3rd Qu.: 2.000
Max. :2015-01-01 Max. :221.0 Max. :75.00 Max. :74.000 Max. :149.00 Max. :15.000
NO_OF_ATM_DR_TXNS NO_OF_NET_DR_TXNS NO_OF_MOB_DR_TXNS NO_OF_CHQ_DR_TXNS FLG_HAS_CC AMT_ATM_DR AMT_BR_CSH_WDL_DR AMT_CHQ_DR
Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 0:13892 Min. : 0 Min. : 0 Min. : 0
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1: 6108 1st Qu.: 0 1st Qu.: 2990 1st Qu.: 0
Median : 1.000 Median : 0.000 Median : 0.0000 Median : 2.000 Median : 6900 Median :340150 Median : 23840
Mean : 1.029 Mean : 1.172 Mean : 0.4118 Mean : 2.138 Mean : 10990 Mean :378475 Mean : 124520
3rd Qu.: 1.000 3rd Qu.: 1.000 3rd Qu.: 0.0000 3rd Qu.: 4.000 3rd Qu.: 15800 3rd Qu.:674675 3rd Qu.: 72470
Max. :25.000 Max. :22.000 Max. :25.0000 Max. :15.000 Max. :199300 Max. :999930 Max. :4928640
AMT_NET_DR AMT_MOB_DR AMT_L_DR FLG_HAS_ANY_CHGS AMT_OTH_BK_ATM_USG_CHGS AMT_MIN_BAL_NMC_CHGS NO_OF_IW_CHQ_BNC_TXNS
Min. : 0 Min. : 0 Min. : 0 0:17788 Min. : 0.000 Min. : 0.000 Min. :0.00000
1st Qu.: 0 1st Qu.: 0 1st Qu.: 237936 1: 2212 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.00000
Median : 0 Median : 0 Median : 695115 Median : 0.000 Median : 0.000 Median :0.00000
Mean :237308 Mean : 22425 Mean : 773717 Mean : 1.099 Mean : 1.292 Mean :0.04275
3rd Qu.:473971 3rd Qu.: 0 3rd Qu.:1078927 3rd Qu.: 0.000 3rd Qu.: 0.000 3rd Qu.:0.00000
Max. :999854 Max. :199667 Max. :6514921 Max. :250.000 Max. :170.000 Max. :2.00000
NO_OF_OW_CHQ_BNC_TXNS AVG_AMT_PER_ATM_TXN AVG_AMT_PER_CSH_WDL_TXN AVG_AMT_PER_CHQ_TXN AVG_AMT_PER_NET_TXN AVG_AMT_PER_MOB_TXN FLG_HAS_NOMINEE
Min. :0.0000 Min. : 0 Min. : 0 Min. : 0 Min. : 0 Min. : 0 0: 1977
1st Qu.:0.0000 1st Qu.: 0 1st Qu.: 1266 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0 1:18023
Median :0.0000 Median : 6000 Median :147095 Median : 8645 Median : 0 Median : 0
Mean :0.0444 Mean : 7409 Mean :242237 Mean : 25093 Mean :179059 Mean : 20304
3rd Qu.:0.0000 3rd Qu.:13500 3rd Qu.:385000 3rd Qu.: 28605 3rd Qu.:257699 3rd Qu.: 0
Max. :2.0000 Max. :25000 Max. :999640 Max. :537842 Max. :999854 Max. :199667
FLG_HAS_OLD_LOAN random
0:10141 Min. :0.0000114
1: 9859 1st Qu.:0.2481866
Median :0.5061214
Mean :0.5019330
3rd Qu.:0.7535712
Max. :0.9999471
Visualisation of all the independent and numeric variables through Correlation matrix plot and response rate for the loan proposal. We found a response rate of 12,56%
num_data <- subset(personal_loan[-c(2,4,6,7,10,11,21,28,38,39)])
names(num_data)
[1] "CUST_ID" "AGE" "BALANCE" "SCR" "HOLDING_PERIOD"
[6] "LEN_OF_RLTN_IN_MNTH" "NO_OF_L_CR_TXNS" "NO_OF_L_DR_TXNS" "TOT_NO_OF_L_TXNS" "NO_OF_BR_CSH_WDL_DR_TXNS"
[11] "NO_OF_ATM_DR_TXNS" "NO_OF_NET_DR_TXNS" "NO_OF_MOB_DR_TXNS" "NO_OF_CHQ_DR_TXNS" "AMT_ATM_DR"
[16] "AMT_BR_CSH_WDL_DR" "AMT_CHQ_DR" "AMT_NET_DR" "AMT_MOB_DR" "AMT_L_DR"
[21] "AMT_OTH_BK_ATM_USG_CHGS" "AMT_MIN_BAL_NMC_CHGS" "NO_OF_IW_CHQ_BNC_TXNS" "NO_OF_OW_CHQ_BNC_TXNS" "AVG_AMT_PER_ATM_TXN"
[26] "AVG_AMT_PER_CSH_WDL_TXN" "AVG_AMT_PER_CHQ_TXN" "AVG_AMT_PER_NET_TXN" "AVG_AMT_PER_MOB_TXN" "random"
library(corrplot)
package 㤼㸱corrplot㤼㸲 was built under R version 3.6.3corrplot 0.84 loaded
str(num_data)
'data.frame': 20000 obs. of 30 variables:
$ CUST_ID : Factor w/ 20000 levels "C1","C10","C100",..: 17699 16532 11027 17984 2363 11747 18115 15556 15216 12494 ...
$ AGE : int 27 47 40 53 36 42 30 53 42 30 ...
$ BALANCE : num 3384 287489 18217 71720 1671623 ...
$ SCR : int 776 324 603 196 167 493 479 562 105 170 ...
$ HOLDING_PERIOD : int 30 28 2 13 24 26 14 25 15 13 ...
$ LEN_OF_RLTN_IN_MNTH : int 146 104 61 107 185 192 177 99 88 111 ...
$ NO_OF_L_CR_TXNS : int 7 8 10 36 20 5 6 14 18 14 ...
$ NO_OF_L_DR_TXNS : int 3 2 5 14 1 2 6 3 14 8 ...
$ TOT_NO_OF_L_TXNS : int 10 10 15 50 21 7 12 17 32 22 ...
$ NO_OF_BR_CSH_WDL_DR_TXNS: int 0 0 1 4 1 1 0 3 6 3 ...
$ NO_OF_ATM_DR_TXNS : int 1 1 1 2 0 1 1 0 2 1 ...
$ NO_OF_NET_DR_TXNS : int 2 1 1 3 0 0 1 0 4 0 ...
$ NO_OF_MOB_DR_TXNS : int 0 0 0 1 0 0 0 0 1 0 ...
$ NO_OF_CHQ_DR_TXNS : int 0 0 2 4 0 0 4 0 1 4 ...
$ AMT_ATM_DR : int 13100 6600 11200 26100 0 18500 6200 0 35400 18000 ...
$ AMT_BR_CSH_WDL_DR : int 0 0 561120 673590 808480 379310 0 945160 198430 869880 ...
$ AMT_CHQ_DR : int 0 0 49320 60780 0 0 10580 0 51490 32610 ...
$ AMT_NET_DR : num 973557 799813 997570 741506 0 ...
$ AMT_MOB_DR : int 0 0 0 71388 0 0 0 0 170332 0 ...
$ AMT_L_DR : num 986657 806413 1619210 1573364 808480 ...
$ AMT_OTH_BK_ATM_USG_CHGS : int 0 0 0 0 0 0 0 0 0 0 ...
$ AMT_MIN_BAL_NMC_CHGS : int 0 0 0 0 0 0 0 0 0 0 ...
$ NO_OF_IW_CHQ_BNC_TXNS : int 0 0 0 0 0 0 0 0 0 0 ...
$ NO_OF_OW_CHQ_BNC_TXNS : int 0 0 1 0 0 0 0 0 0 0 ...
$ AVG_AMT_PER_ATM_TXN : num 13100 6600 11200 13050 0 ...
$ AVG_AMT_PER_CSH_WDL_TXN : num 0 0 561120 168398 808480 ...
$ AVG_AMT_PER_CHQ_TXN : num 0 0 24660 15195 0 ...
$ AVG_AMT_PER_NET_TXN : num 486779 799813 997570 247169 0 ...
$ AVG_AMT_PER_MOB_TXN : num 0 0 0 71388 0 ...
$ random : num 1.14e-05 1.11e-04 1.20e-04 1.37e-04 1.74e-04 ...
plt=cor(num_data [ ,-1])
correlation_plot<-corrplot(plt, method="circle",tl.cex=0.5)
corrplot(plt, method="circle",tl.cex=0.5)
response_rate <- (sum(personal_loan$TARGET)/nrow(personal_loan))* 100
response_rate
[1] 12.56
Splittting the data into: Training set; test set: <- Used in part2 Random Forest solution
library(caret)
package 㤼㸱caret㤼㸲 was built under R version 3.6.3Carregando pacotes exigidos: lattice
Carregando pacotes exigidos: ggplot2
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/
set.seed(123)
index <- createDataPartition(personal_loan$TARGET, p=0.70, list=FALSE)
train <- personal_loan [ index,]
test <- personal_loan [-index,]
Part 2 <- Random Forest From the output bellow, we find that the Out Of Bag (OOB) error rate is estimated as 12.39% which is the misclassification error rate of the model(OOB) We notice that around the this tree number there is no significant reduction in error rate: (Random_Forest_err.rate) [16,] 0.1221500 2.776190e-03 0.9604358 [17,] 0.1214633 1.959024e-03 0.9604585 [18,] 0.1212489 1.387642e-03 0.9627507 [19,] 0.1212489 1.550894e-03 0.9616046 [20,] 0.1214372 1.305803e-03 0.9644903
library(randomForest)
package 㤼㸱randomForest㤼㸲 was built under R version 3.6.3randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 㤼㸱randomForest㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
margin
Random_Forest <- randomForest(as.factor(TARGET) ~ ., data = train[,-1],
ntree=501, mtry = 7, nodesize = 140,
importance=TRUE)
print(Random_Forest)
Call:
randomForest(formula = as.factor(TARGET) ~ ., data = train[, -1], ntree = 501, mtry = 7, nodesize = 140, importance = TRUE)
Type of random forest: classification
Number of trees: 501
No. of variables tried at each split: 7
OOB estimate of error rate: 12.36%
Confusion matrix:
0 1 class.error
0 12254 0 0.0000000
1 1731 15 0.9914089
plot(Random_Forest,main = "")
Random_Forest$err.rate
OOB 0 1
[1,] 0.1303419 2.213859e-02 0.9049128
[2,] 0.1313422 2.162902e-02 0.9079457
[3,] 0.1270649 1.883487e-02 0.8992187
[4,] 0.1266016 1.637280e-02 0.9043062
[5,] 0.1273536 1.423907e-02 0.9263293
[6,] 0.1255144 1.226940e-02 0.9239264
[7,] 0.1248513 9.512485e-03 0.9360812
[8,] 0.1248717 8.791025e-03 0.9433294
[9,] 0.1233125 8.615690e-03 0.9343878
[10,] 0.1227918 6.751194e-03 0.9408009
[11,] 0.1225223 5.004102e-03 0.9486736
[12,] 0.1221243 4.176220e-03 0.9494543
[13,] 0.1219477 3.763397e-03 0.9512055
[14,] 0.1226753 3.841438e-03 0.9558739
[15,] 0.1222651 2.287395e-03 0.9638968
[16,] 0.1226766 3.267173e-03 0.9604585
[17,] 0.1221412 2.612885e-03 0.9610315
[18,] 0.1224009 2.285714e-03 0.9656160
[19,] 0.1231781 2.204082e-03 0.9719359
[20,] 0.1227406 1.632520e-03 0.9725086
[21,] 0.1228033 1.714006e-03 0.9725086
[22,] 0.1231605 1.877245e-03 0.9742268
[23,] 0.1222317 1.061051e-03 0.9725086
[24,] 0.1225088 1.142577e-03 0.9742268
[25,] 0.1222230 9.793520e-04 0.9730813
[26,] 0.1221516 5.712887e-04 0.9753723
[27,] 0.1222944 6.529013e-04 0.9759450
[28,] 0.1221516 7.345140e-04 0.9742268
[29,] 0.1224373 4.896760e-04 0.9782360
[30,] 0.1222857 4.080300e-04 0.9776632
[31,] 0.1225714 4.896360e-04 0.9793814
[32,] 0.1226429 4.080300e-04 0.9805269
[33,] 0.1221429 4.080300e-04 0.9765178
[34,] 0.1224286 4.896360e-04 0.9782360
[35,] 0.1222857 6.528480e-04 0.9759450
[36,] 0.1222857 4.896360e-04 0.9770905
[37,] 0.1225714 4.896360e-04 0.9793814
[38,] 0.1224286 5.712420e-04 0.9776632
[39,] 0.1222857 4.896360e-04 0.9770905
[40,] 0.1225714 4.896360e-04 0.9793814
[41,] 0.1225714 4.080300e-04 0.9799542
[42,] 0.1223571 4.896360e-04 0.9776632
[43,] 0.1222857 4.896360e-04 0.9770905
[44,] 0.1223571 4.896360e-04 0.9776632
[45,] 0.1225000 4.896360e-04 0.9788087
[46,] 0.1225714 4.896360e-04 0.9793814
[47,] 0.1226429 4.896360e-04 0.9799542
[48,] 0.1223571 4.080300e-04 0.9782360
[49,] 0.1220714 4.080300e-04 0.9759450
[50,] 0.1221429 4.080300e-04 0.9765178
[51,] 0.1223571 4.080300e-04 0.9782360
[52,] 0.1221429 3.264240e-04 0.9770905
[53,] 0.1223571 3.264240e-04 0.9788087
[54,] 0.1223571 3.264240e-04 0.9788087
[55,] 0.1223571 2.448180e-04 0.9793814
[56,] 0.1223571 3.264240e-04 0.9788087
[57,] 0.1224286 2.448180e-04 0.9799542
[58,] 0.1225000 3.264240e-04 0.9799542
[59,] 0.1225714 4.080300e-04 0.9799542
[60,] 0.1225000 4.080300e-04 0.9793814
[61,] 0.1225000 2.448180e-04 0.9805269
[62,] 0.1226429 3.264240e-04 0.9810997
[63,] 0.1227857 3.264240e-04 0.9822451
[64,] 0.1226429 3.264240e-04 0.9810997
[65,] 0.1224286 2.448180e-04 0.9799542
[66,] 0.1223571 2.448180e-04 0.9793814
[67,] 0.1226429 2.448180e-04 0.9816724
[68,] 0.1227143 2.448180e-04 0.9822451
[69,] 0.1227857 2.448180e-04 0.9828179
[70,] 0.1227857 2.448180e-04 0.9828179
[71,] 0.1230714 3.264240e-04 0.9845361
[72,] 0.1230714 1.632120e-04 0.9856816
[73,] 0.1230000 2.448180e-04 0.9845361
[74,] 0.1229286 2.448180e-04 0.9839633
[75,] 0.1227143 2.448180e-04 0.9822451
[76,] 0.1228571 2.448180e-04 0.9833906
[77,] 0.1226429 2.448180e-04 0.9816724
[78,] 0.1226429 3.264240e-04 0.9810997
[79,] 0.1227857 3.264240e-04 0.9822451
[80,] 0.1227857 3.264240e-04 0.9822451
[81,] 0.1227857 3.264240e-04 0.9822451
[82,] 0.1226429 2.448180e-04 0.9816724
[83,] 0.1228571 2.448180e-04 0.9833906
[84,] 0.1230714 2.448180e-04 0.9851088
[85,] 0.1230714 2.448180e-04 0.9851088
[86,] 0.1231429 1.632120e-04 0.9862543
[87,] 0.1230000 2.448180e-04 0.9845361
[88,] 0.1230714 2.448180e-04 0.9851088
[89,] 0.1229286 2.448180e-04 0.9839633
[90,] 0.1229286 2.448180e-04 0.9839633
[91,] 0.1230000 1.632120e-04 0.9851088
[92,] 0.1230000 1.632120e-04 0.9851088
[93,] 0.1230714 1.632120e-04 0.9856816
[94,] 0.1229286 1.632120e-04 0.9845361
[95,] 0.1230714 2.448180e-04 0.9851088
[96,] 0.1231429 1.632120e-04 0.9862543
[97,] 0.1232857 2.448180e-04 0.9868270
[98,] 0.1232143 1.632120e-04 0.9868270
[99,] 0.1232143 2.448180e-04 0.9862543
[100,] 0.1232143 1.632120e-04 0.9868270
[101,] 0.1231429 2.448180e-04 0.9856816
[102,] 0.1230714 1.632120e-04 0.9856816
[103,] 0.1232143 1.632120e-04 0.9868270
[104,] 0.1233571 1.632120e-04 0.9879725
[105,] 0.1233571 1.632120e-04 0.9879725
[106,] 0.1233571 8.160601e-05 0.9885452
[107,] 0.1231429 8.160601e-05 0.9868270
[108,] 0.1232143 1.632120e-04 0.9868270
[109,] 0.1232143 8.160601e-05 0.9873998
[110,] 0.1232143 1.632120e-04 0.9868270
[111,] 0.1231429 8.160601e-05 0.9868270
[112,] 0.1230714 8.160601e-05 0.9862543
[113,] 0.1230000 8.160601e-05 0.9856816
[114,] 0.1231429 8.160601e-05 0.9868270
[115,] 0.1232143 8.160601e-05 0.9873998
[116,] 0.1232857 8.160601e-05 0.9879725
[117,] 0.1233571 8.160601e-05 0.9885452
[118,] 0.1232857 8.160601e-05 0.9879725
[119,] 0.1233571 8.160601e-05 0.9885452
[120,] 0.1232857 8.160601e-05 0.9879725
[121,] 0.1232857 8.160601e-05 0.9879725
[122,] 0.1232857 8.160601e-05 0.9879725
[123,] 0.1233571 8.160601e-05 0.9885452
[124,] 0.1232857 8.160601e-05 0.9879725
[125,] 0.1232143 8.160601e-05 0.9873998
[126,] 0.1232143 8.160601e-05 0.9873998
[127,] 0.1233571 8.160601e-05 0.9885452
[128,] 0.1232857 0.000000e+00 0.9885452
[129,] 0.1234286 0.000000e+00 0.9896907
[130,] 0.1233571 0.000000e+00 0.9891180
[131,] 0.1234286 0.000000e+00 0.9896907
[132,] 0.1235000 0.000000e+00 0.9902635
[133,] 0.1233571 0.000000e+00 0.9891180
[134,] 0.1233571 0.000000e+00 0.9891180
[135,] 0.1233571 0.000000e+00 0.9891180
[136,] 0.1232857 0.000000e+00 0.9885452
[137,] 0.1233571 0.000000e+00 0.9891180
[138,] 0.1234286 0.000000e+00 0.9896907
[139,] 0.1232143 8.160601e-05 0.9873998
[140,] 0.1233571 1.632120e-04 0.9879725
[141,] 0.1232143 0.000000e+00 0.9879725
[142,] 0.1231429 0.000000e+00 0.9873998
[143,] 0.1230714 0.000000e+00 0.9868270
[144,] 0.1232857 8.160601e-05 0.9879725
[145,] 0.1232857 0.000000e+00 0.9885452
[146,] 0.1232143 0.000000e+00 0.9879725
[147,] 0.1232143 0.000000e+00 0.9879725
[148,] 0.1231429 0.000000e+00 0.9873998
[149,] 0.1231429 0.000000e+00 0.9873998
[150,] 0.1231429 0.000000e+00 0.9873998
[151,] 0.1232857 0.000000e+00 0.9885452
[152,] 0.1232857 0.000000e+00 0.9885452
[153,] 0.1233571 0.000000e+00 0.9891180
[154,] 0.1235000 0.000000e+00 0.9902635
[155,] 0.1234286 0.000000e+00 0.9896907
[156,] 0.1232857 0.000000e+00 0.9885452
[157,] 0.1232857 0.000000e+00 0.9885452
[158,] 0.1232857 0.000000e+00 0.9885452
[159,] 0.1232143 0.000000e+00 0.9879725
[160,] 0.1232143 0.000000e+00 0.9879725
[161,] 0.1232143 0.000000e+00 0.9879725
[162,] 0.1232857 0.000000e+00 0.9885452
[163,] 0.1232857 0.000000e+00 0.9885452
[164,] 0.1232857 0.000000e+00 0.9885452
[165,] 0.1232857 0.000000e+00 0.9885452
[166,] 0.1232143 0.000000e+00 0.9879725
[167,] 0.1232143 0.000000e+00 0.9879725
[168,] 0.1232857 0.000000e+00 0.9885452
[169,] 0.1232857 0.000000e+00 0.9885452
[170,] 0.1234286 0.000000e+00 0.9896907
[171,] 0.1234286 0.000000e+00 0.9896907
[172,] 0.1234286 0.000000e+00 0.9896907
[173,] 0.1234286 0.000000e+00 0.9896907
[174,] 0.1233571 0.000000e+00 0.9891180
[175,] 0.1233571 0.000000e+00 0.9891180
[176,] 0.1232857 0.000000e+00 0.9885452
[177,] 0.1232857 0.000000e+00 0.9885452
[178,] 0.1233571 0.000000e+00 0.9891180
[179,] 0.1234286 0.000000e+00 0.9896907
[180,] 0.1235000 0.000000e+00 0.9902635
[181,] 0.1234286 0.000000e+00 0.9896907
[182,] 0.1235000 0.000000e+00 0.9902635
[183,] 0.1234286 0.000000e+00 0.9896907
[184,] 0.1234286 0.000000e+00 0.9896907
[185,] 0.1234286 0.000000e+00 0.9896907
[186,] 0.1234286 0.000000e+00 0.9896907
[187,] 0.1233571 0.000000e+00 0.9891180
[188,] 0.1234286 0.000000e+00 0.9896907
[189,] 0.1235000 0.000000e+00 0.9902635
[190,] 0.1235714 0.000000e+00 0.9908362
[191,] 0.1235714 0.000000e+00 0.9908362
[192,] 0.1235714 0.000000e+00 0.9908362
[193,] 0.1234286 0.000000e+00 0.9896907
[194,] 0.1236429 0.000000e+00 0.9914089
[195,] 0.1235714 0.000000e+00 0.9908362
[196,] 0.1235714 0.000000e+00 0.9908362
[197,] 0.1235714 0.000000e+00 0.9908362
[198,] 0.1235714 0.000000e+00 0.9908362
[199,] 0.1235714 0.000000e+00 0.9908362
[200,] 0.1235714 0.000000e+00 0.9908362
[201,] 0.1235714 0.000000e+00 0.9908362
[202,] 0.1235714 0.000000e+00 0.9908362
[203,] 0.1235714 0.000000e+00 0.9908362
[204,] 0.1235714 0.000000e+00 0.9908362
[205,] 0.1235714 0.000000e+00 0.9908362
[206,] 0.1235714 0.000000e+00 0.9908362
[207,] 0.1235714 0.000000e+00 0.9908362
[208,] 0.1235000 0.000000e+00 0.9902635
[209,] 0.1235000 0.000000e+00 0.9902635
[210,] 0.1235000 0.000000e+00 0.9902635
[211,] 0.1234286 0.000000e+00 0.9896907
[212,] 0.1234286 0.000000e+00 0.9896907
[213,] 0.1235000 0.000000e+00 0.9902635
[214,] 0.1235000 0.000000e+00 0.9902635
[215,] 0.1235000 0.000000e+00 0.9902635
[216,] 0.1235000 0.000000e+00 0.9902635
[217,] 0.1234286 0.000000e+00 0.9896907
[218,] 0.1235000 0.000000e+00 0.9902635
[219,] 0.1235714 0.000000e+00 0.9908362
[220,] 0.1235000 0.000000e+00 0.9902635
[221,] 0.1235000 0.000000e+00 0.9902635
[222,] 0.1235714 0.000000e+00 0.9908362
[223,] 0.1235714 0.000000e+00 0.9908362
[224,] 0.1235714 0.000000e+00 0.9908362
[225,] 0.1235714 0.000000e+00 0.9908362
[226,] 0.1235000 0.000000e+00 0.9902635
[227,] 0.1235000 0.000000e+00 0.9902635
[228,] 0.1235714 0.000000e+00 0.9908362
[229,] 0.1235714 0.000000e+00 0.9908362
[230,] 0.1235714 0.000000e+00 0.9908362
[231,] 0.1235714 0.000000e+00 0.9908362
[232,] 0.1235714 0.000000e+00 0.9908362
[233,] 0.1234286 0.000000e+00 0.9896907
[234,] 0.1234286 0.000000e+00 0.9896907
[235,] 0.1235000 0.000000e+00 0.9902635
[236,] 0.1235000 0.000000e+00 0.9902635
[237,] 0.1235000 0.000000e+00 0.9902635
[238,] 0.1235000 0.000000e+00 0.9902635
[239,] 0.1235714 0.000000e+00 0.9908362
[240,] 0.1234286 0.000000e+00 0.9896907
[241,] 0.1235000 0.000000e+00 0.9902635
[242,] 0.1235714 0.000000e+00 0.9908362
[243,] 0.1235714 0.000000e+00 0.9908362
[244,] 0.1235714 0.000000e+00 0.9908362
[245,] 0.1235714 0.000000e+00 0.9908362
[246,] 0.1235714 0.000000e+00 0.9908362
[247,] 0.1235714 0.000000e+00 0.9908362
[248,] 0.1235000 0.000000e+00 0.9902635
[249,] 0.1235714 0.000000e+00 0.9908362
[250,] 0.1235714 0.000000e+00 0.9908362
[251,] 0.1235714 0.000000e+00 0.9908362
[252,] 0.1235714 0.000000e+00 0.9908362
[253,] 0.1235000 0.000000e+00 0.9902635
[254,] 0.1235714 0.000000e+00 0.9908362
[255,] 0.1235000 0.000000e+00 0.9902635
[256,] 0.1235000 0.000000e+00 0.9902635
[257,] 0.1235714 0.000000e+00 0.9908362
[258,] 0.1235000 0.000000e+00 0.9902635
[259,] 0.1235000 0.000000e+00 0.9902635
[260,] 0.1235000 0.000000e+00 0.9902635
[261,] 0.1235000 0.000000e+00 0.9902635
[262,] 0.1235000 0.000000e+00 0.9902635
[263,] 0.1235000 0.000000e+00 0.9902635
[264,] 0.1235000 0.000000e+00 0.9902635
[265,] 0.1235000 0.000000e+00 0.9902635
[266,] 0.1235000 0.000000e+00 0.9902635
[267,] 0.1235000 0.000000e+00 0.9902635
[268,] 0.1235000 0.000000e+00 0.9902635
[269,] 0.1235000 0.000000e+00 0.9902635
[270,] 0.1235000 0.000000e+00 0.9902635
[271,] 0.1235000 0.000000e+00 0.9902635
[272,] 0.1235000 0.000000e+00 0.9902635
[273,] 0.1235000 0.000000e+00 0.9902635
[274,] 0.1235000 0.000000e+00 0.9902635
[275,] 0.1235000 0.000000e+00 0.9902635
[276,] 0.1235000 0.000000e+00 0.9902635
[277,] 0.1235000 0.000000e+00 0.9902635
[278,] 0.1235000 0.000000e+00 0.9902635
[279,] 0.1235000 0.000000e+00 0.9902635
[280,] 0.1235714 0.000000e+00 0.9908362
[281,] 0.1235714 0.000000e+00 0.9908362
[282,] 0.1234286 0.000000e+00 0.9896907
[283,] 0.1235000 0.000000e+00 0.9902635
[284,] 0.1235714 0.000000e+00 0.9908362
[285,] 0.1235714 0.000000e+00 0.9908362
[286,] 0.1235714 0.000000e+00 0.9908362
[287,] 0.1235000 0.000000e+00 0.9902635
[288,] 0.1235714 0.000000e+00 0.9908362
[289,] 0.1235714 0.000000e+00 0.9908362
[290,] 0.1235714 0.000000e+00 0.9908362
[291,] 0.1235714 0.000000e+00 0.9908362
[292,] 0.1235714 0.000000e+00 0.9908362
[293,] 0.1235714 0.000000e+00 0.9908362
[294,] 0.1236429 0.000000e+00 0.9914089
[295,] 0.1236429 0.000000e+00 0.9914089
[296,] 0.1236429 0.000000e+00 0.9914089
[297,] 0.1236429 0.000000e+00 0.9914089
[298,] 0.1235714 0.000000e+00 0.9908362
[299,] 0.1235714 0.000000e+00 0.9908362
[300,] 0.1236429 0.000000e+00 0.9914089
[301,] 0.1236429 0.000000e+00 0.9914089
[302,] 0.1236429 0.000000e+00 0.9914089
[303,] 0.1236429 0.000000e+00 0.9914089
[304,] 0.1236429 0.000000e+00 0.9914089
[305,] 0.1236429 0.000000e+00 0.9914089
[306,] 0.1235714 0.000000e+00 0.9908362
[307,] 0.1236429 0.000000e+00 0.9914089
[308,] 0.1236429 0.000000e+00 0.9914089
[309,] 0.1236429 0.000000e+00 0.9914089
[310,] 0.1235714 0.000000e+00 0.9908362
[311,] 0.1235714 0.000000e+00 0.9908362
[312,] 0.1235714 0.000000e+00 0.9908362
[313,] 0.1235714 0.000000e+00 0.9908362
[314,] 0.1235714 0.000000e+00 0.9908362
[315,] 0.1235000 0.000000e+00 0.9902635
[316,] 0.1235000 0.000000e+00 0.9902635
[317,] 0.1235000 0.000000e+00 0.9902635
[318,] 0.1234286 0.000000e+00 0.9896907
[319,] 0.1235714 0.000000e+00 0.9908362
[320,] 0.1235000 0.000000e+00 0.9902635
[321,] 0.1235714 0.000000e+00 0.9908362
[322,] 0.1236429 0.000000e+00 0.9914089
[323,] 0.1236429 0.000000e+00 0.9914089
[324,] 0.1236429 0.000000e+00 0.9914089
[325,] 0.1236429 0.000000e+00 0.9914089
[326,] 0.1237143 0.000000e+00 0.9919817
[327,] 0.1237143 0.000000e+00 0.9919817
[328,] 0.1237143 0.000000e+00 0.9919817
[329,] 0.1237143 0.000000e+00 0.9919817
[330,] 0.1237143 0.000000e+00 0.9919817
[331,] 0.1237143 0.000000e+00 0.9919817
[332,] 0.1237143 0.000000e+00 0.9919817
[333,] 0.1237143 0.000000e+00 0.9919817
[ reached getOption("max.print") -- omitted 168 rows ]
Reaching the optimal tree number (16) lowest OOB, Tuning the random forest created and right after the variables Importance, plot with mean decrease (accuracy , Gini)
library(randomForest)
tRandom_Forest <- tuneRF(x = train[,-c(1,2)],
y=as.factor(train$TARGET),
mtryStart = 7,
ntreeTry=13,
stepFactor = 1.5,
improve = 0.001,
trace=TRUE,
plot = TRUE,
doBest = TRUE,
nodesize = 140,
importance=TRUE
)
mtry = 7 OOB error = 12.21%
Searching left ...
mtry = 5 OOB error = 12.37%
-0.01245447 0.001
Searching right ...
mtry = 10 OOB error = 12.25%
-0.002571925 0.001
print(tRandom_Forest)
Call:
randomForest(x = x, y = y, mtry = res[which.min(res[, 2]), 1], nodesize = 140, importance = TRUE)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 7
OOB estimate of error rate: 12.39%
Confusion matrix:
0 1 class.error
0 12254 0 0.0000000
1 1735 11 0.9936999
tRandom_Forest$importance
0 1 MeanDecreaseAccuracy MeanDecreaseGini
AGE 5.932247e-04 3.238225e-03 9.223429e-04 17.5608787
GENDER 5.076717e-04 1.480950e-03 6.288553e-04 7.8265449
BALANCE 1.187693e-03 1.674465e-02 3.126277e-03 50.4338403
OCCUPATION 1.838211e-03 1.386206e-02 3.336996e-03 34.7108695
AGE_BKT 8.259261e-04 6.532289e-03 1.534123e-03 25.2933806
SCR 1.255491e-03 1.351321e-02 2.781627e-03 50.6646445
HOLDING_PERIOD 9.190341e-03 1.494535e-02 9.907399e-03 49.3860194
ACC_TYPE 1.365082e-03 -1.066990e-03 1.064522e-03 2.3964117
ACC_OP_DATE 6.561090e-03 1.189101e-03 5.886271e-03 30.5775669
LEN_OF_RLTN_IN_MNTH 6.273823e-03 -1.036106e-03 5.357710e-03 21.5287976
NO_OF_L_CR_TXNS 2.096151e-02 -8.529834e-03 1.729329e-02 45.6320700
NO_OF_L_DR_TXNS 3.791543e-02 -1.248248e-02 3.164258e-02 26.9084114
TOT_NO_OF_L_TXNS 2.686632e-02 -1.570285e-02 2.156086e-02 46.5631157
NO_OF_BR_CSH_WDL_DR_TXNS 1.377455e-03 1.798422e-03 1.431976e-03 9.4886590
NO_OF_ATM_DR_TXNS 3.164534e-02 -1.461622e-02 2.589405e-02 13.7124619
NO_OF_NET_DR_TXNS 4.762323e-03 -1.544224e-03 3.969718e-03 6.2904795
NO_OF_MOB_DR_TXNS 2.167643e-03 -1.400384e-03 1.717745e-03 2.5774314
NO_OF_CHQ_DR_TXNS 6.725825e-03 1.396583e-03 6.059643e-03 12.6928994
FLG_HAS_CC 1.111514e-03 9.385764e-03 2.141909e-03 18.5910504
AMT_ATM_DR 1.560734e-02 -4.870681e-03 1.304836e-02 24.5654222
AMT_BR_CSH_WDL_DR 5.175306e-03 1.161031e-03 4.675804e-03 23.2062579
AMT_CHQ_DR 1.016798e-02 -2.699261e-03 8.561548e-03 20.2982371
AMT_NET_DR 5.580747e-03 -1.469755e-04 4.864232e-03 16.0770231
AMT_MOB_DR 2.662413e-03 1.095472e-03 2.463750e-03 12.7798973
AMT_L_DR 1.145189e-02 -3.329270e-03 9.610738e-03 31.6845867
FLG_HAS_ANY_CHGS 8.915067e-05 4.378423e-04 1.324044e-04 3.7201612
AMT_OTH_BK_ATM_USG_CHGS -1.512057e-05 1.070626e-04 9.683170e-10 0.3906525
AMT_MIN_BAL_NMC_CHGS 1.965656e-05 4.391212e-04 7.223683e-05 1.4938466
NO_OF_IW_CHQ_BNC_TXNS 4.750908e-05 2.046811e-04 6.689298e-05 1.8395124
NO_OF_OW_CHQ_BNC_TXNS 3.420120e-05 2.333444e-04 5.900232e-05 2.0296702
AVG_AMT_PER_ATM_TXN 1.308650e-02 -4.772692e-03 1.086068e-02 24.6949837
AVG_AMT_PER_CSH_WDL_TXN 4.300335e-03 7.484263e-04 3.861247e-03 20.5845711
AVG_AMT_PER_CHQ_TXN 1.011593e-02 -3.807532e-03 8.376979e-03 20.2305099
AVG_AMT_PER_NET_TXN 4.613202e-03 -1.970163e-04 4.009455e-03 16.5712449
AVG_AMT_PER_MOB_TXN 3.257375e-03 7.712126e-04 2.951839e-03 13.4358719
FLG_HAS_NOMINEE 1.238161e-05 3.042300e-05 1.471734e-05 1.0512241
FLG_HAS_OLD_LOAN 6.629774e-05 3.634775e-04 1.030548e-04 1.7875054
random -5.384687e-05 -1.090484e-05 -4.836985e-05 11.0100596
varImpPlot(tRandom_Forest,
sort = T,
main="Variable Importance",
n.var=37)
Scoring syntax <- Create columns for predicts score and class
train$predict.class <- predict(tRandom_Forest, train, type="class")
train$predict.score <- predict(tRandom_Forest, train, type="prob")
head(train)
NA
Model Performance Measures - Rank ordering - TrainSet
library("StatMeasures")
package 㤼㸱StatMeasures㤼㸲 was built under R version 3.6.3
Attaching package: 㤼㸱StatMeasures㤼㸲
The following object is masked _by_ 㤼㸱.GlobalEnv㤼㸲:
decile
The following object is masked from 㤼㸱package:VIM㤼㸲:
mape
decile <- function(x){
deciles <- vector(length=10)
for (i in seq(0.1,1,.1)){
deciles[i*10] <- quantile(x, i, na.rm=T)
}
return (
ifelse(x<deciles[1], 1,
ifelse(x<deciles[2], 2,
ifelse(x<deciles[3], 3,
ifelse(x<deciles[4], 4,
ifelse(x<deciles[5], 5,
ifelse(x<deciles[6], 6,
ifelse(x<deciles[7], 7,
ifelse(x<deciles[8], 8,
ifelse(x<deciles[9], 9, 10
))))))))))
}
train$deciles <- decile(train$predict.score[,2])
library(data.table)
tmp_DT_rf = data.table(train)
rank_rf <- tmp_DT_rf[, list(
cnt = length(TARGET),
cnt_resp = sum(TARGET),
cnt_non_resp = sum(TARGET == 0)),
by=deciles][order(-deciles)]
rank_rf$rrate <- round (rank_rf$cnt_resp / rank_rf$cnt,2);
rank_rf$cum_resp <- cumsum(rank_rf$cnt_resp)
rank_rf$cum_non_resp <- cumsum(rank_rf$cnt_non_resp)
rank_rf$cum_rel_resp <- round(rank_rf$cum_resp / sum(rank_rf$cnt_resp),2);
rank_rf$cum_rel_non_resp <- round(rank_rf$cum_non_resp / sum(rank_rf$cnt_non_resp),2);
rank_rf$ks <- abs(rank_rf$cum_rel_resp - rank_rf$cum_rel_non_resp)
library(scales)
package 㤼㸱scales㤼㸲 was built under R version 3.6.3
rank_rf$rrate <- percent(rank_rf$rrate)
rank_rf$cum_rel_resp <- percent(rank_rf$cum_rel_resp)
rank_rf$cum_rel_non_resp <- percent(rank_rf$cum_rel_non_resp)
rank_rf
NA
Receiver Operating Characteristic (ROC) A Receiver Operating Characteristic (ROC) Curve is a way to compare diagnostic tests. It is a plot of the true positive rate against the false positive rate.* A ROC plot shows: The relationship between sensitivity and specificity.
library(ROCR)
pred_rf <- prediction(train$predict.score[,2],train$TARGET)
perf_rf <- performance(pred_rf, "tpr", "fpr")
plot(perf_rf)
Validation of model using Test_Set <- Rank ordering technique: We have to create two columns as we did with the train set <- Predict score; predict class
test$predict.class <- predict(tRandom_Forest, test, type="class")
test$predict.score <- predict(tRandom_Forest, test, type="prob")
test$deciles <- decile(test$predict.score[,2])
Model Performance Measures - Rank ordering - TrainSet - Test Set
tmp_DT_rf2 = data.table(test)
h_rank_rf2 <- tmp_DT_rf2[, list(
cnt = length(TARGET),
cnt_resp = sum(TARGET),
cnt_non_resp = sum(TARGET == 0)) ,
by=deciles][order(-deciles)]
h_rank_rf2$rrate <- round (h_rank_rf2$cnt_resp / h_rank_rf2$cnt,2);
h_rank_rf2$cum_resp <- cumsum(h_rank_rf2$cnt_resp)
h_rank_rf2$cum_non_resp <- cumsum(h_rank_rf2$cnt_non_resp)
h_rank_rf2$cum_rel_resp <- round(h_rank_rf2$cum_resp / sum(h_rank_rf2$cnt_resp),2);
h_rank_rf2$cum_rel_non_resp <- round(h_rank_rf2$cum_non_resp / sum(h_rank_rf2$cnt_non_resp),2);
h_rank_rf2$ks <- abs(h_rank_rf2$cum_rel_resp - h_rank_rf2$cum_rel_non_resp)
library(scales)
h_rank_rf2$rrate <- percent(h_rank_rf2$rrate)
h_rank_rf2$cum_rel_resp <- percent(h_rank_rf2$cum_rel_resp)
h_rank_rf2$cum_rel_non_resp <- percent(h_rank_rf2$cum_rel_non_resp)
h_rank_rf2
NA
ROC curve (test set)
library(ROCR)
pred_rf2 <- prediction(test$predict.score[,2],test$TARGET)
perf_rf2 <- performance(pred_rf2, "tpr", "fpr")
plot(perf_rf2)
Confusion Matrix, as confusion_matrix1 <- Train set ** confusion_matrix2 <- Test Set
library(caret)
library(e1071)
package 㤼㸱e1071㤼㸲 was built under R version 3.6.3
train$TARGET = as.factor(train$TARGET)
class(train$TARGET)
[1] "factor"
test$TARGET = as.factor(test$TARGET)
class(test$predict.class)
[1] "factor"
#Train
confusion_matrix1 = confusionMatrix(train$predict.class, train$TARGET, positive = "1")
print(confusion_matrix1)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 12254 1725
1 0 21
Accuracy : 0.8768
95% CI : (0.8712, 0.8822)
No Information Rate : 0.8753
P-Value [Acc > NIR] : 0.3008
Kappa : 0.0209
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.01203
Specificity : 1.00000
Pos Pred Value : 1.00000
Neg Pred Value : 0.87660
Prevalence : 0.12471
Detection Rate : 0.00150
Detection Prevalence : 0.00150
Balanced Accuracy : 0.50601
'Positive' Class : 1
#Test
confusion_matrix2 = confusionMatrix(test$predict.class, test$TARGET, positive = "1")
print(confusion_matrix1)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 12254 1725
1 0 21
Accuracy : 0.8768
95% CI : (0.8712, 0.8822)
No Information Rate : 0.8753
P-Value [Acc > NIR] : 0.3008
Kappa : 0.0209
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.01203
Specificity : 1.00000
Pos Pred Value : 1.00000
Neg Pred Value : 0.87660
Prevalence : 0.12471
Detection Rate : 0.00150
Detection Prevalence : 0.00150
Balanced Accuracy : 0.50601
'Positive' Class : 1