The data set HMEQ reports characteristics and delinquency information for 5,960 home equity loans. A home equity loan is a loan where the obligor uses the equity of his or her home as the underlying collateral. The data set has the following characteristics: ◾ BAD: 1 = applicant defaulted on loan or seriously delinquent; 0 = applicant paid loan ◾ LOAN: Amount of the loan request ◾ MORTDUE: Amount due on existing mortgage ◾ VALUE: Value of current property ◾ REASON: DebtCon = debt consolidation; HomeImp = home improvement ◾ JOB: Occupational categories ◾ YOJ: Years at present job ◾ DEROG: Number of major derogatory reports ◾ DELINQ: Number of delinquent credit lines ◾ CLAGE: Age of oldest credit line in months ◾ NINQ: Number of recent credit inquiries ◾ CLNO: Number of credit lines ◾ DEBTINC: Debt-to-income ratio

library(ff)
## Loading required package: bit
## 
## Attaching package: 'bit'
## The following object is masked from 'package:base':
## 
##     xor
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/HP/AppData/Local/Temp/RtmpiKgMyf/ff"
## - getOption("ffextension")=="ff"
## - getOption("ffdrop")==TRUE
## - getOption("fffinonexit")==TRUE
## - getOption("ffpagesize")==65536
## - getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes
## - getOption("ffbatchbytes")==38514196.48 -- consider a different value for tuning your system
## - getOption("ffmaxbytes")==1925709824 -- consider a different value for tuning your system
## 
## Attaching package: 'ff'
## The following objects are masked from 'package:utils':
## 
##     write.csv, write.csv2
## The following objects are masked from 'package:base':
## 
##     is.factor, is.ordered
library(vroom)
library(bigmemory)
## 
## Attaching package: 'bigmemory'
## The following object is masked from 'package:ff':
## 
##     is.readonly
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(readr)
## Registered S3 methods overwritten by 'readr':
##   method           from 
##   format.col_spec  vroom
##   print.col_spec   vroom
##   print.collector  vroom
##   print.date_names vroom
##   print.locale     vroom
##   str.col_spec     vroom
## 
## Attaching package: 'readr'
## The following objects are masked from 'package:vroom':
## 
##     as.col_spec, col_character, col_date, col_datetime, col_double,
##     col_factor, col_guess, col_integer, col_logical, col_number,
##     col_skip, col_time, cols, cols_condense, cols_only, date_names,
##     date_names_lang, date_names_langs, default_locale, fwf_cols,
##     fwf_empty, fwf_positions, fwf_widths, locale, output_column,
##     problems, spec
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(DataExplorer)
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
Home=read.csv("hmeq.csv",na.strings = "", stringsAsFactors = T)
str(Home)
## 'data.frame':    5960 obs. of  13 variables:
##  $ BAD    : int  1 1 1 1 0 1 1 1 1 1 ...
##  $ LOAN   : int  1100 1300 1500 1500 1700 1700 1800 1800 2000 2000 ...
##  $ MORTDUE: num  25860 70053 13500 NA 97800 ...
##  $ VALUE  : num  39025 68400 16700 NA 112000 ...
##  $ REASON : Factor w/ 2 levels "DebtCon","HomeImp": 2 2 2 NA 2 2 2 2 2 2 ...
##  $ JOB    : Factor w/ 6 levels "Mgr","Office",..: 3 3 3 NA 2 3 3 3 3 5 ...
##  $ YOJ    : num  10.5 7 4 NA 3 9 5 11 3 16 ...
##  $ DEROG  : int  0 0 0 NA 0 0 3 0 0 0 ...
##  $ DELINQ : int  0 2 0 NA 0 0 2 0 2 0 ...
##  $ CLAGE  : num  94.4 121.8 149.5 NA 93.3 ...
##  $ NINQ   : int  1 0 1 NA 0 1 1 0 1 0 ...
##  $ CLNO   : int  9 14 10 NA 14 8 17 8 12 13 ...
##  $ DEBTINC: num  NA NA NA NA NA ...
summary(Home)
##       BAD              LOAN          MORTDUE           VALUE       
##  Min.   :0.0000   Min.   : 1100   Min.   :  2063   Min.   :  8000  
##  1st Qu.:0.0000   1st Qu.:11100   1st Qu.: 46276   1st Qu.: 66076  
##  Median :0.0000   Median :16300   Median : 65019   Median : 89236  
##  Mean   :0.1995   Mean   :18608   Mean   : 73761   Mean   :101776  
##  3rd Qu.:0.0000   3rd Qu.:23300   3rd Qu.: 91488   3rd Qu.:119824  
##  Max.   :1.0000   Max.   :89900   Max.   :399550   Max.   :855909  
##                                   NA's   :518      NA's   :112     
##      REASON          JOB            YOJ             DEROG        
##  DebtCon:3928   Mgr    : 767   Min.   : 0.000   Min.   : 0.0000  
##  HomeImp:1780   Office : 948   1st Qu.: 3.000   1st Qu.: 0.0000  
##  NA's   : 252   Other  :2388   Median : 7.000   Median : 0.0000  
##                 ProfExe:1276   Mean   : 8.922   Mean   : 0.2546  
##                 Sales  : 109   3rd Qu.:13.000   3rd Qu.: 0.0000  
##                 Self   : 193   Max.   :41.000   Max.   :10.0000  
##                 NA's   : 279   NA's   :515      NA's   :708      
##      DELINQ            CLAGE             NINQ             CLNO     
##  Min.   : 0.0000   Min.   :   0.0   Min.   : 0.000   Min.   : 0.0  
##  1st Qu.: 0.0000   1st Qu.: 115.1   1st Qu.: 0.000   1st Qu.:15.0  
##  Median : 0.0000   Median : 173.5   Median : 1.000   Median :20.0  
##  Mean   : 0.4494   Mean   : 179.8   Mean   : 1.186   Mean   :21.3  
##  3rd Qu.: 0.0000   3rd Qu.: 231.6   3rd Qu.: 2.000   3rd Qu.:26.0  
##  Max.   :15.0000   Max.   :1168.2   Max.   :17.000   Max.   :71.0  
##  NA's   :580       NA's   :308      NA's   :510      NA's   :222   
##     DEBTINC        
##  Min.   :  0.5245  
##  1st Qu.: 29.1400  
##  Median : 34.8183  
##  Mean   : 33.7799  
##  3rd Qu.: 39.0031  
##  Max.   :203.3121  
##  NA's   :1267

missing values Identified

colSums(is.na(Home))
##     BAD    LOAN MORTDUE   VALUE  REASON     JOB     YOJ   DEROG  DELINQ   CLAGE 
##       0       0     518     112     252     279     515     708     580     308 
##    NINQ    CLNO DEBTINC 
##     510     222    1267
colSums(Home=="")
##     BAD    LOAN MORTDUE   VALUE  REASON     JOB     YOJ   DEROG  DELINQ   CLAGE 
##       0       0      NA      NA      NA      NA      NA      NA      NA      NA 
##    NINQ    CLNO DEBTINC 
##      NA      NA      NA

Solving missing value by KNN imputataion Method

Home$BAD=as.factor(Home$BAD)
Home1=kNN(Home)
summary(Home1)
##  BAD           LOAN          MORTDUE           VALUE            REASON    
##  0:4771   Min.   : 1100   Min.   :  2063   Min.   :  8000   DebtCon:4072  
##  1:1189   1st Qu.:11100   1st Qu.: 44863   1st Qu.: 66256   HomeImp:1888  
##           Median :16300   Median : 63466   Median : 89003                 
##           Mean   :18608   Mean   : 71819   Mean   :101468                 
##           3rd Qu.:23300   3rd Qu.: 89317   3rd Qu.:119361                 
##           Max.   :89900   Max.   :399550   Max.   :855909                 
##       JOB            YOJ             DEROG             DELINQ       
##  Mgr    : 872   Min.   : 0.000   Min.   : 0.0000   Min.   : 0.0000  
##  Office : 984   1st Qu.: 3.000   1st Qu.: 0.0000   1st Qu.: 0.0000  
##  Other  :2467   Median : 7.000   Median : 0.0000   Median : 0.0000  
##  ProfExe:1333   Mean   : 8.949   Mean   : 0.2574   Mean   : 0.4921  
##  Sales  : 110   3rd Qu.:13.000   3rd Qu.: 0.0000   3rd Qu.: 0.0000  
##  Self   : 194   Max.   :41.000   Max.   :10.0000   Max.   :15.0000  
##      CLAGE             NINQ             CLNO          DEBTINC        
##  Min.   :   0.0   Min.   : 0.000   Min.   : 0.00   Min.   :  0.5245  
##  1st Qu.: 115.8   1st Qu.: 0.000   1st Qu.:15.00   1st Qu.: 29.9500  
##  Median : 171.8   Median : 1.000   Median :20.00   Median : 35.4095  
##  Mean   : 180.0   Mean   : 1.159   Mean   :21.35   Mean   : 34.3276  
##  3rd Qu.: 230.7   3rd Qu.: 2.000   3rd Qu.:26.00   3rd Qu.: 39.3376  
##  Max.   :1168.2   Max.   :17.000   Max.   :71.00   Max.   :203.3122  
##   BAD_imp         LOAN_imp       MORTDUE_imp     VALUE_imp      
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:5960      FALSE:5960      FALSE:5442      FALSE:5848     
##                                  TRUE :518       TRUE :112      
##                                                                 
##                                                                 
##                                                                 
##  REASON_imp       JOB_imp         YOJ_imp        DEROG_imp      
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:5708      FALSE:5681      FALSE:5445      FALSE:5252     
##  TRUE :252       TRUE :279       TRUE :515       TRUE :708      
##                                                                 
##                                                                 
##                                                                 
##  DELINQ_imp      CLAGE_imp        NINQ_imp        CLNO_imp      
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:5380      FALSE:5652      FALSE:5450      FALSE:5738     
##  TRUE :580       TRUE :308       TRUE :510       TRUE :222      
##                                                                 
##                                                                 
##                                                                 
##  DEBTINC_imp    
##  Mode :logical  
##  FALSE:4693     
##  TRUE :1267     
##                 
##                 
## 
Home2=subset(Home1,select=BAD:DEBTINC)
str(Home2)
## 'data.frame':    5960 obs. of  13 variables:
##  $ BAD    : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 2 2 ...
##  $ LOAN   : int  1100 1300 1500 1500 1700 1700 1800 1800 2000 2000 ...
##  $ MORTDUE: num  25860 70053 13500 52663 97800 ...
##  $ VALUE  : num  39025 68400 16700 70400 112000 ...
##  $ REASON : Factor w/ 2 levels "DebtCon","HomeImp": 2 2 2 2 2 2 2 2 2 2 ...
##  $ JOB    : Factor w/ 6 levels "Mgr","Office",..: 3 3 3 3 2 3 3 3 3 5 ...
##  $ YOJ    : num  10.5 7 4 18 3 9 5 11 3 16 ...
##  $ DEROG  : int  0 0 0 1 0 0 3 0 0 0 ...
##  $ DELINQ : int  0 2 0 0 0 0 2 0 2 0 ...
##  $ CLAGE  : num  94.4 121.8 149.5 122.8 93.3 ...
##  $ NINQ   : int  1 0 1 2 0 1 1 0 1 0 ...
##  $ CLNO   : int  9 14 10 19 14 8 17 8 12 13 ...
##  $ DEBTINC: num  36.9 40.6 33.2 36.8 29.9 ...
summary(Home2)
##  BAD           LOAN          MORTDUE           VALUE            REASON    
##  0:4771   Min.   : 1100   Min.   :  2063   Min.   :  8000   DebtCon:4072  
##  1:1189   1st Qu.:11100   1st Qu.: 44863   1st Qu.: 66256   HomeImp:1888  
##           Median :16300   Median : 63466   Median : 89003                 
##           Mean   :18608   Mean   : 71819   Mean   :101468                 
##           3rd Qu.:23300   3rd Qu.: 89317   3rd Qu.:119361                 
##           Max.   :89900   Max.   :399550   Max.   :855909                 
##       JOB            YOJ             DEROG             DELINQ       
##  Mgr    : 872   Min.   : 0.000   Min.   : 0.0000   Min.   : 0.0000  
##  Office : 984   1st Qu.: 3.000   1st Qu.: 0.0000   1st Qu.: 0.0000  
##  Other  :2467   Median : 7.000   Median : 0.0000   Median : 0.0000  
##  ProfExe:1333   Mean   : 8.949   Mean   : 0.2574   Mean   : 0.4921  
##  Sales  : 110   3rd Qu.:13.000   3rd Qu.: 0.0000   3rd Qu.: 0.0000  
##  Self   : 194   Max.   :41.000   Max.   :10.0000   Max.   :15.0000  
##      CLAGE             NINQ             CLNO          DEBTINC        
##  Min.   :   0.0   Min.   : 0.000   Min.   : 0.00   Min.   :  0.5245  
##  1st Qu.: 115.8   1st Qu.: 0.000   1st Qu.:15.00   1st Qu.: 29.9500  
##  Median : 171.8   Median : 1.000   Median :20.00   Median : 35.4095  
##  Mean   : 180.0   Mean   : 1.159   Mean   :21.35   Mean   : 34.3276  
##  3rd Qu.: 230.7   3rd Qu.: 2.000   3rd Qu.:26.00   3rd Qu.: 39.3376  
##  Max.   :1168.2   Max.   :17.000   Max.   :71.00   Max.   :203.3122
dim(Home2)
## [1] 5960   13
table(Home2$BAD)
## 
##    0    1 
## 4771 1189

EXPLORATORY DATA ANALYSIS

plot_str(Home2)
plot_intro(Home2)

plot_missing(Home2)

plot_bar(Home2)

fig <- Home2 %>% plot_ly(labels = ~REASON, values = ~LOAN)
fig <- fig %>% add_pie(hole = 0.6)
fig <- fig %>% layout(title = "Donut charts Showing Reason for Loan",showlegend = F,
                      xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
                      yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

fig
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Data Partition on 80% rule

set.seed(1234)
Train=createDataPartition(Home2$BAD,p=0.8,list=F)
Training=Home2[Train,]
Testing=Home2[-Train,]
dim(Training)
## [1] 4769   13
dim(Testing)
## [1] 1191   13

Logistic Regression Model

Model=glm(BAD~.,data=Training,family="binomial")
summary(Model)
## 
## Call:
## glm(formula = BAD ~ ., family = "binomial", data = Training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9729  -0.5828  -0.3815  -0.1973   3.7714  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -4.760e+00  3.265e-01 -14.578  < 2e-16 ***
## LOAN          -2.099e-05  4.840e-06  -4.337 1.45e-05 ***
## MORTDUE       -6.189e-06  1.669e-06  -3.709 0.000208 ***
## VALUE          3.755e-06  1.181e-06   3.179 0.001475 ** 
## REASONHomeImp  1.158e-01  9.747e-02   1.188 0.234902    
## JOBOffice     -1.672e-01  1.675e-01  -0.998 0.318284    
## JOBOther       5.411e-01  1.334e-01   4.057 4.97e-05 ***
## JOBProfExe     5.393e-01  1.556e-01   3.466 0.000528 ***
## JOBSales       1.329e+00  3.024e-01   4.394 1.11e-05 ***
## JOBSelf        9.112e-01  2.560e-01   3.560 0.000371 ***
## YOJ           -9.198e-03  6.376e-03  -1.442 0.149170    
## DEROG          5.682e-01  5.527e-02  10.280  < 2e-16 ***
## DELINQ         6.676e-01  3.869e-02  17.255  < 2e-16 ***
## CLAGE         -5.971e-03  6.336e-04  -9.424  < 2e-16 ***
## NINQ           1.875e-01  2.399e-02   7.815 5.49e-15 ***
## CLNO          -2.292e-02  5.002e-03  -4.583 4.58e-06 ***
## DEBTINC        1.143e-01  7.569e-03  15.098  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4767.8  on 4768  degrees of freedom
## Residual deviance: 3523.2  on 4752  degrees of freedom
## AIC: 3557.2
## 
## Number of Fisher Scoring iterations: 5
varImp(Model)
##                  Overall
## LOAN           4.3366175
## MORTDUE        3.7086457
## VALUE          3.1794843
## REASONHomeImp  1.1878257
## JOBOffice      0.9979909
## JOBOther       4.0571441
## JOBProfExe     3.4663113
## JOBSales       4.3942196
## JOBSelf        3.5600124
## YOJ            1.4424682
## DEROG         10.2798163
## DELINQ        17.2545078
## CLAGE          9.4240578
## NINQ           7.8151774
## CLNO           4.5829456
## DEBTINC       15.0975999

Prediction done by training data with threshold 0.5

P=predict(Model,Training,type="response")
P=ifelse(P>0.5,"1","0")
table(P,Training$BAD)
##    
## P      0    1
##   0 3707  602
##   1  110  350
confusionMatrix(table(P,Training$BAD))
## Confusion Matrix and Statistics
## 
##    
## P      0    1
##   0 3707  602
##   1  110  350
##                                           
##                Accuracy : 0.8507          
##                  95% CI : (0.8403, 0.8607)
##     No Information Rate : 0.8004          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.4204          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9712          
##             Specificity : 0.3676          
##          Pos Pred Value : 0.8603          
##          Neg Pred Value : 0.7609          
##              Prevalence : 0.8004          
##          Detection Rate : 0.7773          
##    Detection Prevalence : 0.9035          
##       Balanced Accuracy : 0.6694          
##                                           
##        'Positive' Class : 0               
## 

Prediction done by test data with threshold 0.5

P1=predict(Model,Testing,type="response")
P1=ifelse(P1>0.5,"1","0")
table(P1,Testing$BAD)
##    
## P1    0   1
##   0 910 142
##   1  44  95
confusionMatrix(table(P1,Testing$BAD))
## Confusion Matrix and Statistics
## 
##    
## P1    0   1
##   0 910 142
##   1  44  95
##                                          
##                Accuracy : 0.8438         
##                  95% CI : (0.8219, 0.864)
##     No Information Rate : 0.801          
##     P-Value [Acc > NIR] : 8.275e-05      
##                                          
##                   Kappa : 0.42           
##                                          
##  Mcnemar's Test P-Value : 1.141e-12      
##                                          
##             Sensitivity : 0.9539         
##             Specificity : 0.4008         
##          Pos Pred Value : 0.8650         
##          Neg Pred Value : 0.6835         
##              Prevalence : 0.8010         
##          Detection Rate : 0.7641         
##    Detection Prevalence : 0.8833         
##       Balanced Accuracy : 0.6774         
##                                          
##        'Positive' Class : 0              
## 

ROC CURVE WITH AUC

library(ROCR)
predictions <- predict(Model, newdata=Testing, type="response")
ROCRpred <- prediction(predictions, Testing$BAD)
ROCRperf <- performance(ROCRpred, measure = "tpr", x.measure = "fpr")

plot(ROCRperf, colorize = TRUE, text.adj = c(-0.2,1.7), print.cutoffs.at = seq(0,1,0.1))

auc <- performance(ROCRpred, measure = "auc")
auc <- auc@y.values[[1]]
auc
## [1] 0.816584

Conclusion of the above data set

Accuracy is 84% by cross-validation there is no much impact on the accuracy

Debt-to-incomerati0 and Number of delinquent credit lines and Number of major derogatory reports, Age of oldest credit line in months are playing a vital role in deciding bank people to provide the loan.

the model is perfect fit by Auc is 81%.

Recommendation

Number of delinquent credit lines: means more impact on credit scores if it is due in your home loan or due there is a huge impact on decreasing credit score.be in safer side pays home loan monthly on time without missing.

Debt-t Debt-to-income ratio: As per the banks the people are with high ratio they are at high risk . to be called as highly leveraged". so better to keep debt home ratio is bit minimal.

Number of major derogatory reports: it will check by the bank as background credit card due or any missed payment and also check whether the person capable to pay the loan or not .it is better to keep good marks in this report.

Age of oldest credit line in months: How the length of the credit score may impact on your paying home loans.and also to get the approval of the home loan is also difficult. So, better If you use credit regularly and lightly, and pay your bills on time every month, you’re doing the two essential things to have a good score.

Number of recent credit inquiries and job category: the credit inquiries should be minimal because otherwise there is a serious impact on your FICO score. then sales category they should have a report of good sales or equity cash to pay the loan on when there are fewer sales to avoid penalty from the bank.