The data set HMEQ reports characteristics and delinquency information for 5,960 home equity loans. A home equity loan is a loan where the obligor uses the equity of his or her home as the underlying collateral. The data set has the following characteristics: ◾ BAD: 1 = applicant defaulted on loan or seriously delinquent; 0 = applicant paid loan ◾ LOAN: Amount of the loan request ◾ MORTDUE: Amount due on existing mortgage ◾ VALUE: Value of current property ◾ REASON: DebtCon = debt consolidation; HomeImp = home improvement ◾ JOB: Occupational categories ◾ YOJ: Years at present job ◾ DEROG: Number of major derogatory reports ◾ DELINQ: Number of delinquent credit lines ◾ CLAGE: Age of oldest credit line in months ◾ NINQ: Number of recent credit inquiries ◾ CLNO: Number of credit lines ◾ DEBTINC: Debt-to-income ratio
library(ff)
## Loading required package: bit
##
## Attaching package: 'bit'
## The following object is masked from 'package:base':
##
## xor
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/HP/AppData/Local/Temp/RtmpiKgMyf/ff"
## - getOption("ffextension")=="ff"
## - getOption("ffdrop")==TRUE
## - getOption("fffinonexit")==TRUE
## - getOption("ffpagesize")==65536
## - getOption("ffcaching")=="mmnoflush" -- consider "ffeachflush" if your system stalls on large writes
## - getOption("ffbatchbytes")==38514196.48 -- consider a different value for tuning your system
## - getOption("ffmaxbytes")==1925709824 -- consider a different value for tuning your system
##
## Attaching package: 'ff'
## The following objects are masked from 'package:utils':
##
## write.csv, write.csv2
## The following objects are masked from 'package:base':
##
## is.factor, is.ordered
library(vroom)
library(bigmemory)
##
## Attaching package: 'bigmemory'
## The following object is masked from 'package:ff':
##
## is.readonly
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(readr)
## Registered S3 methods overwritten by 'readr':
## method from
## format.col_spec vroom
## print.col_spec vroom
## print.collector vroom
## print.date_names vroom
## print.locale vroom
## str.col_spec vroom
##
## Attaching package: 'readr'
## The following objects are masked from 'package:vroom':
##
## as.col_spec, col_character, col_date, col_datetime, col_double,
## col_factor, col_guess, col_integer, col_logical, col_number,
## col_skip, col_time, cols, cols_condense, cols_only, date_names,
## date_names_lang, date_names_langs, default_locale, fwf_cols,
## fwf_empty, fwf_positions, fwf_widths, locale, output_column,
## problems, spec
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(DataExplorer)
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
##
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
##
## sleep
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Home=read.csv("hmeq.csv",na.strings = "", stringsAsFactors = T)
str(Home)
## 'data.frame': 5960 obs. of 13 variables:
## $ BAD : int 1 1 1 1 0 1 1 1 1 1 ...
## $ LOAN : int 1100 1300 1500 1500 1700 1700 1800 1800 2000 2000 ...
## $ MORTDUE: num 25860 70053 13500 NA 97800 ...
## $ VALUE : num 39025 68400 16700 NA 112000 ...
## $ REASON : Factor w/ 2 levels "DebtCon","HomeImp": 2 2 2 NA 2 2 2 2 2 2 ...
## $ JOB : Factor w/ 6 levels "Mgr","Office",..: 3 3 3 NA 2 3 3 3 3 5 ...
## $ YOJ : num 10.5 7 4 NA 3 9 5 11 3 16 ...
## $ DEROG : int 0 0 0 NA 0 0 3 0 0 0 ...
## $ DELINQ : int 0 2 0 NA 0 0 2 0 2 0 ...
## $ CLAGE : num 94.4 121.8 149.5 NA 93.3 ...
## $ NINQ : int 1 0 1 NA 0 1 1 0 1 0 ...
## $ CLNO : int 9 14 10 NA 14 8 17 8 12 13 ...
## $ DEBTINC: num NA NA NA NA NA ...
summary(Home)
## BAD LOAN MORTDUE VALUE
## Min. :0.0000 Min. : 1100 Min. : 2063 Min. : 8000
## 1st Qu.:0.0000 1st Qu.:11100 1st Qu.: 46276 1st Qu.: 66076
## Median :0.0000 Median :16300 Median : 65019 Median : 89236
## Mean :0.1995 Mean :18608 Mean : 73761 Mean :101776
## 3rd Qu.:0.0000 3rd Qu.:23300 3rd Qu.: 91488 3rd Qu.:119824
## Max. :1.0000 Max. :89900 Max. :399550 Max. :855909
## NA's :518 NA's :112
## REASON JOB YOJ DEROG
## DebtCon:3928 Mgr : 767 Min. : 0.000 Min. : 0.0000
## HomeImp:1780 Office : 948 1st Qu.: 3.000 1st Qu.: 0.0000
## NA's : 252 Other :2388 Median : 7.000 Median : 0.0000
## ProfExe:1276 Mean : 8.922 Mean : 0.2546
## Sales : 109 3rd Qu.:13.000 3rd Qu.: 0.0000
## Self : 193 Max. :41.000 Max. :10.0000
## NA's : 279 NA's :515 NA's :708
## DELINQ CLAGE NINQ CLNO
## Min. : 0.0000 Min. : 0.0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 0.0000 1st Qu.: 115.1 1st Qu.: 0.000 1st Qu.:15.0
## Median : 0.0000 Median : 173.5 Median : 1.000 Median :20.0
## Mean : 0.4494 Mean : 179.8 Mean : 1.186 Mean :21.3
## 3rd Qu.: 0.0000 3rd Qu.: 231.6 3rd Qu.: 2.000 3rd Qu.:26.0
## Max. :15.0000 Max. :1168.2 Max. :17.000 Max. :71.0
## NA's :580 NA's :308 NA's :510 NA's :222
## DEBTINC
## Min. : 0.5245
## 1st Qu.: 29.1400
## Median : 34.8183
## Mean : 33.7799
## 3rd Qu.: 39.0031
## Max. :203.3121
## NA's :1267
colSums(is.na(Home))
## BAD LOAN MORTDUE VALUE REASON JOB YOJ DEROG DELINQ CLAGE
## 0 0 518 112 252 279 515 708 580 308
## NINQ CLNO DEBTINC
## 510 222 1267
colSums(Home=="")
## BAD LOAN MORTDUE VALUE REASON JOB YOJ DEROG DELINQ CLAGE
## 0 0 NA NA NA NA NA NA NA NA
## NINQ CLNO DEBTINC
## NA NA NA
Home$BAD=as.factor(Home$BAD)
Home1=kNN(Home)
summary(Home1)
## BAD LOAN MORTDUE VALUE REASON
## 0:4771 Min. : 1100 Min. : 2063 Min. : 8000 DebtCon:4072
## 1:1189 1st Qu.:11100 1st Qu.: 44863 1st Qu.: 66256 HomeImp:1888
## Median :16300 Median : 63466 Median : 89003
## Mean :18608 Mean : 71819 Mean :101468
## 3rd Qu.:23300 3rd Qu.: 89317 3rd Qu.:119361
## Max. :89900 Max. :399550 Max. :855909
## JOB YOJ DEROG DELINQ
## Mgr : 872 Min. : 0.000 Min. : 0.0000 Min. : 0.0000
## Office : 984 1st Qu.: 3.000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Other :2467 Median : 7.000 Median : 0.0000 Median : 0.0000
## ProfExe:1333 Mean : 8.949 Mean : 0.2574 Mean : 0.4921
## Sales : 110 3rd Qu.:13.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Self : 194 Max. :41.000 Max. :10.0000 Max. :15.0000
## CLAGE NINQ CLNO DEBTINC
## Min. : 0.0 Min. : 0.000 Min. : 0.00 Min. : 0.5245
## 1st Qu.: 115.8 1st Qu.: 0.000 1st Qu.:15.00 1st Qu.: 29.9500
## Median : 171.8 Median : 1.000 Median :20.00 Median : 35.4095
## Mean : 180.0 Mean : 1.159 Mean :21.35 Mean : 34.3276
## 3rd Qu.: 230.7 3rd Qu.: 2.000 3rd Qu.:26.00 3rd Qu.: 39.3376
## Max. :1168.2 Max. :17.000 Max. :71.00 Max. :203.3122
## BAD_imp LOAN_imp MORTDUE_imp VALUE_imp
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:5960 FALSE:5960 FALSE:5442 FALSE:5848
## TRUE :518 TRUE :112
##
##
##
## REASON_imp JOB_imp YOJ_imp DEROG_imp
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:5708 FALSE:5681 FALSE:5445 FALSE:5252
## TRUE :252 TRUE :279 TRUE :515 TRUE :708
##
##
##
## DELINQ_imp CLAGE_imp NINQ_imp CLNO_imp
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:5380 FALSE:5652 FALSE:5450 FALSE:5738
## TRUE :580 TRUE :308 TRUE :510 TRUE :222
##
##
##
## DEBTINC_imp
## Mode :logical
## FALSE:4693
## TRUE :1267
##
##
##
Home2=subset(Home1,select=BAD:DEBTINC)
str(Home2)
## 'data.frame': 5960 obs. of 13 variables:
## $ BAD : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 2 2 ...
## $ LOAN : int 1100 1300 1500 1500 1700 1700 1800 1800 2000 2000 ...
## $ MORTDUE: num 25860 70053 13500 52663 97800 ...
## $ VALUE : num 39025 68400 16700 70400 112000 ...
## $ REASON : Factor w/ 2 levels "DebtCon","HomeImp": 2 2 2 2 2 2 2 2 2 2 ...
## $ JOB : Factor w/ 6 levels "Mgr","Office",..: 3 3 3 3 2 3 3 3 3 5 ...
## $ YOJ : num 10.5 7 4 18 3 9 5 11 3 16 ...
## $ DEROG : int 0 0 0 1 0 0 3 0 0 0 ...
## $ DELINQ : int 0 2 0 0 0 0 2 0 2 0 ...
## $ CLAGE : num 94.4 121.8 149.5 122.8 93.3 ...
## $ NINQ : int 1 0 1 2 0 1 1 0 1 0 ...
## $ CLNO : int 9 14 10 19 14 8 17 8 12 13 ...
## $ DEBTINC: num 36.9 40.6 33.2 36.8 29.9 ...
summary(Home2)
## BAD LOAN MORTDUE VALUE REASON
## 0:4771 Min. : 1100 Min. : 2063 Min. : 8000 DebtCon:4072
## 1:1189 1st Qu.:11100 1st Qu.: 44863 1st Qu.: 66256 HomeImp:1888
## Median :16300 Median : 63466 Median : 89003
## Mean :18608 Mean : 71819 Mean :101468
## 3rd Qu.:23300 3rd Qu.: 89317 3rd Qu.:119361
## Max. :89900 Max. :399550 Max. :855909
## JOB YOJ DEROG DELINQ
## Mgr : 872 Min. : 0.000 Min. : 0.0000 Min. : 0.0000
## Office : 984 1st Qu.: 3.000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Other :2467 Median : 7.000 Median : 0.0000 Median : 0.0000
## ProfExe:1333 Mean : 8.949 Mean : 0.2574 Mean : 0.4921
## Sales : 110 3rd Qu.:13.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Self : 194 Max. :41.000 Max. :10.0000 Max. :15.0000
## CLAGE NINQ CLNO DEBTINC
## Min. : 0.0 Min. : 0.000 Min. : 0.00 Min. : 0.5245
## 1st Qu.: 115.8 1st Qu.: 0.000 1st Qu.:15.00 1st Qu.: 29.9500
## Median : 171.8 Median : 1.000 Median :20.00 Median : 35.4095
## Mean : 180.0 Mean : 1.159 Mean :21.35 Mean : 34.3276
## 3rd Qu.: 230.7 3rd Qu.: 2.000 3rd Qu.:26.00 3rd Qu.: 39.3376
## Max. :1168.2 Max. :17.000 Max. :71.00 Max. :203.3122
dim(Home2)
## [1] 5960 13
table(Home2$BAD)
##
## 0 1
## 4771 1189
plot_str(Home2)
plot_intro(Home2)
plot_missing(Home2)
plot_bar(Home2)
fig <- Home2 %>% plot_ly(labels = ~REASON, values = ~LOAN)
fig <- fig %>% add_pie(hole = 0.6)
fig <- fig %>% layout(title = "Donut charts Showing Reason for Loan",showlegend = F,
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
set.seed(1234)
Train=createDataPartition(Home2$BAD,p=0.8,list=F)
Training=Home2[Train,]
Testing=Home2[-Train,]
dim(Training)
## [1] 4769 13
dim(Testing)
## [1] 1191 13
Model=glm(BAD~.,data=Training,family="binomial")
summary(Model)
##
## Call:
## glm(formula = BAD ~ ., family = "binomial", data = Training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9729 -0.5828 -0.3815 -0.1973 3.7714
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.760e+00 3.265e-01 -14.578 < 2e-16 ***
## LOAN -2.099e-05 4.840e-06 -4.337 1.45e-05 ***
## MORTDUE -6.189e-06 1.669e-06 -3.709 0.000208 ***
## VALUE 3.755e-06 1.181e-06 3.179 0.001475 **
## REASONHomeImp 1.158e-01 9.747e-02 1.188 0.234902
## JOBOffice -1.672e-01 1.675e-01 -0.998 0.318284
## JOBOther 5.411e-01 1.334e-01 4.057 4.97e-05 ***
## JOBProfExe 5.393e-01 1.556e-01 3.466 0.000528 ***
## JOBSales 1.329e+00 3.024e-01 4.394 1.11e-05 ***
## JOBSelf 9.112e-01 2.560e-01 3.560 0.000371 ***
## YOJ -9.198e-03 6.376e-03 -1.442 0.149170
## DEROG 5.682e-01 5.527e-02 10.280 < 2e-16 ***
## DELINQ 6.676e-01 3.869e-02 17.255 < 2e-16 ***
## CLAGE -5.971e-03 6.336e-04 -9.424 < 2e-16 ***
## NINQ 1.875e-01 2.399e-02 7.815 5.49e-15 ***
## CLNO -2.292e-02 5.002e-03 -4.583 4.58e-06 ***
## DEBTINC 1.143e-01 7.569e-03 15.098 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4767.8 on 4768 degrees of freedom
## Residual deviance: 3523.2 on 4752 degrees of freedom
## AIC: 3557.2
##
## Number of Fisher Scoring iterations: 5
varImp(Model)
## Overall
## LOAN 4.3366175
## MORTDUE 3.7086457
## VALUE 3.1794843
## REASONHomeImp 1.1878257
## JOBOffice 0.9979909
## JOBOther 4.0571441
## JOBProfExe 3.4663113
## JOBSales 4.3942196
## JOBSelf 3.5600124
## YOJ 1.4424682
## DEROG 10.2798163
## DELINQ 17.2545078
## CLAGE 9.4240578
## NINQ 7.8151774
## CLNO 4.5829456
## DEBTINC 15.0975999
P=predict(Model,Training,type="response")
P=ifelse(P>0.5,"1","0")
table(P,Training$BAD)
##
## P 0 1
## 0 3707 602
## 1 110 350
confusionMatrix(table(P,Training$BAD))
## Confusion Matrix and Statistics
##
##
## P 0 1
## 0 3707 602
## 1 110 350
##
## Accuracy : 0.8507
## 95% CI : (0.8403, 0.8607)
## No Information Rate : 0.8004
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.4204
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.9712
## Specificity : 0.3676
## Pos Pred Value : 0.8603
## Neg Pred Value : 0.7609
## Prevalence : 0.8004
## Detection Rate : 0.7773
## Detection Prevalence : 0.9035
## Balanced Accuracy : 0.6694
##
## 'Positive' Class : 0
##
P1=predict(Model,Testing,type="response")
P1=ifelse(P1>0.5,"1","0")
table(P1,Testing$BAD)
##
## P1 0 1
## 0 910 142
## 1 44 95
confusionMatrix(table(P1,Testing$BAD))
## Confusion Matrix and Statistics
##
##
## P1 0 1
## 0 910 142
## 1 44 95
##
## Accuracy : 0.8438
## 95% CI : (0.8219, 0.864)
## No Information Rate : 0.801
## P-Value [Acc > NIR] : 8.275e-05
##
## Kappa : 0.42
##
## Mcnemar's Test P-Value : 1.141e-12
##
## Sensitivity : 0.9539
## Specificity : 0.4008
## Pos Pred Value : 0.8650
## Neg Pred Value : 0.6835
## Prevalence : 0.8010
## Detection Rate : 0.7641
## Detection Prevalence : 0.8833
## Balanced Accuracy : 0.6774
##
## 'Positive' Class : 0
##
library(ROCR)
predictions <- predict(Model, newdata=Testing, type="response")
ROCRpred <- prediction(predictions, Testing$BAD)
ROCRperf <- performance(ROCRpred, measure = "tpr", x.measure = "fpr")
plot(ROCRperf, colorize = TRUE, text.adj = c(-0.2,1.7), print.cutoffs.at = seq(0,1,0.1))
auc <- performance(ROCRpred, measure = "auc")
auc <- auc@y.values[[1]]
auc
## [1] 0.816584
Accuracy is 84% by cross-validation there is no much impact on the accuracy
Debt-to-incomerati0 and Number of delinquent credit lines and Number of major derogatory reports, Age of oldest credit line in months are playing a vital role in deciding bank people to provide the loan.
the model is perfect fit by Auc is 81%.
Number of delinquent credit lines: means more impact on credit scores if it is due in your home loan or due there is a huge impact on decreasing credit score.be in safer side pays home loan monthly on time without missing.
Debt-t Debt-to-income ratio: As per the banks the people are with high ratio they are at high risk . to be called as highly leveraged". so better to keep debt home ratio is bit minimal.
Number of major derogatory reports: it will check by the bank as background credit card due or any missed payment and also check whether the person capable to pay the loan or not .it is better to keep good marks in this report.
Age of oldest credit line in months: How the length of the credit score may impact on your paying home loans.and also to get the approval of the home loan is also difficult. So, better If you use credit regularly and lightly, and pay your bills on time every month, you’re doing the two essential things to have a good score.
Number of recent credit inquiries and job category: the credit inquiries should be minimal because otherwise there is a serious impact on your FICO score. then sales category they should have a report of good sales or equity cash to pay the loan on when there are fewer sales to avoid penalty from the bank.