This analysis used the LoanData.csv and looked loans that were late or in default. In addition to this, techniques used in this chapter are based off of topics and concepts covered in chapter 7.
library(readr)
LoanData <- read_csv("LoanData.csv")
## Parsed with column specification:
## cols(
## Status = col_character(),
## Credit.Grade = col_character(),
## Amount = col_double(),
## Age = col_double(),
## Borrower.Rate = col_double(),
## Debt.To.Income.Ratio = col_double()
## )
head(LoanData)
## # A tibble: 6 x 6
## Status Credit.Grade Amount Age Borrower.Rate Debt.To.Income.Ratio
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Current C 5000 4 0.15 0.04
## 2 Current HR 1900 6 0.265 0.02
## 3 Current HR 1000 3 0.15 0.02
## 4 Current HR 1000 5 0.290 0.02
## 5 Current AA 2550 8 0.0795 0.033
## 6 Current NC 1500 2 0.26 0.03
LoanData<-LoanData[,1:6]
table(LoanData$Status)
##
## Current Default Late
## 5186 75 350
table(LoanData$Credit.Grade)
##
## A AA B C D E HR NC
## 424 451 553 843 927 1129 1217 67
v1=rep(1,dim(LoanData)[1])
v2=rep(0,dim(LoanData)[1])
LoanData$BadLoanType = ifelse(LoanData$Status %in% c('Default', 'Late'),v1,v2)
head(LoanData)
## # A tibble: 6 x 7
## Status Credit.Grade Amount Age Borrower.Rate Debt.To.Income.Ra~ BadLoanType
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Current C 5000 4 0.15 0.04 0
## 2 Current HR 1900 6 0.265 0.02 0
## 3 Current HR 1000 3 0.15 0.02 0
## 4 Current HR 1000 5 0.290 0.02 0
## 5 Current AA 2550 8 0.0795 0.033 0
## 6 Current NC 1500 2 0.26 0.03 0
table(LoanData$BadLoanType,LoanData$Credit.Grade)
##
## A AA B C D E HR NC
## 0 413 446 530 812 881 1018 1033 53
## 1 11 5 23 31 46 111 184 14
table(LoanData$BadLoanType)
##
## 0 1
## 5186 425
BadLoans = 425/(5186+425)
BadLoans
## [1] 0.07574407
m1=glm(BadLoanType~Credit.Grade,family=binomial,data=LoanData)
summary(m1)
##
## Call:
## glm(formula = BadLoanType ~ Credit.Grade, family = binomial,
## data = LoanData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6847 -0.4550 -0.3190 -0.2737 3.0007
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.6256 0.3055 -11.868 < 2e-16 ***
## Credit.GradeAA -0.8653 0.5437 -1.592 0.1115
## Credit.GradeB 0.4882 0.3724 1.311 0.1899
## Credit.GradeC 0.3600 0.3561 1.011 0.3120
## Credit.GradeD 0.6731 0.3409 1.975 0.0483 *
## Credit.GradeE 1.4095 0.3214 4.385 1.16e-05 ***
## Credit.GradeHR 1.9003 0.3158 6.017 1.77e-09 ***
## Credit.GradeNC 2.2943 0.4285 5.354 8.60e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3010.3 on 5610 degrees of freedom
## Residual deviance: 2808.2 on 5603 degrees of freedom
## AIC: 2824.2
##
## Number of Fisher Scoring iterations: 7
exp(m1$coef[2])
## Credit.GradeAA
## 0.4209132
exp(m1$coef[3])
## Credit.GradeB
## 1.629331
exp(m1$coef[4])
## Credit.GradeC
## 1.433386
exp(m1$coef[5])
## Credit.GradeD
## 1.960376
exp(m1$coef[6])
## Credit.GradeE
## 4.093856
exp(m1$coef[7])
## Credit.GradeHR
## 6.687671
exp(m1$coef[8])
## Credit.GradeNC
## 9.917667
m2=glm(BadLoanType~Amount+Borrower.Rate,family=binomial,data=LoanData)
summary(m2)
##
## Call:
## glm(formula = BadLoanType ~ Amount + Borrower.Rate, family = binomial,
## data = LoanData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4187 -0.4539 -0.3058 -0.2105 3.2723
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.374e+00 2.374e-01 -22.641 <2e-16 ***
## Amount -1.163e-06 1.356e-05 -0.086 0.932
## Borrower.Rate 1.317e+01 8.912e-01 14.779 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3010.3 on 5610 degrees of freedom
## Residual deviance: 2737.9 on 5608 degrees of freedom
## AIC: 2743.9
##
## Number of Fisher Scoring iterations: 6
exp(m2$coef[2])
## Amount
## 0.9999988
exp(m2$coef[3])
## Borrower.Rate
## 525262.8
plot(LoanData$Amount,LoanData$BadLoanType)
plot(LoanData$Borrower.Rate,LoanData$BadLoanType)
It was shown that out of the 5,611 loans, 7.57% of them were late or in default. The plots for amount and borrower rate are very similar. They both show that the amount of money owed and borrowing rate was defendant on the type of loan. Bad loans, which were either late or default, occurred when there were large amounts of money owed or high borrowing rates. Credit Grade was ultimately related to borrowing rate and amount of money for a particular loan since the type of credit grade differed depending their value. The model for credit grade showed that Grade E, HR, and NC were significant as their p values of 1.16e-05, 1.77e-09, and 8.60e-08 were ≤ 0.001. Taking the logits also this their values were also significant in comparison to the previous credit grades as they were 4.093856, 6.687671, and 9.917667. THis was significant when looking at the other grades, which had values under 2. The model for borrower rate and amount showed that borrower rate was significant as its p of <2e-16 was ≤ 0.001. The logit also showed that borrowing rate was high as it was 525262.8. This was significant when comparing it to amount, which was under 1.Overall, it was shown that borrower rate impacts the credit grade and influenced GradeS E, HR, and N