We will study the loan status of a sampleset of 500 people and analyze the various factors, which correlate to their ability to fully repay their loans or the probability that they will default. We analyze their effect and develop a regression model in the end to predict the loan status.
loan.df<-read.csv(paste("loan2.csv", sep=""))
summary(loan.df)
## id member_id loan_amnt funded_amnt
## Min. : 822464 Min. : 943135 Min. : 1000 Min. : 1000
## 1st Qu.:1063592 1st Qu.:1295787 1st Qu.: 7000 1st Qu.: 7000
## Median :1065348 Median :1299063 Median :10000 Median :10000
## Mean :1062700 Mean :1294483 Mean :12080 Mean :11653
## 3rd Qu.:1067576 3rd Qu.:1301633 3rd Qu.:15425 3rd Qu.:15000
## Max. :1077501 Max. :1314167 Max. :35000 Max. :35000
##
## funded_amnt_inv term_months int_rate installment grade
## Min. : 1000 Min. :36.00 Min. : 6.03 Min. : 34.5 A:112
## 1st Qu.: 7000 1st Qu.:36.00 1st Qu.: 9.91 1st Qu.: 217.4 B:184
## Median :10000 Median :36.00 Median :12.42 Median : 330.8 C:100
## Mean :11546 Mean :41.96 Mean :12.64 Mean : 351.9 D: 62
## 3rd Qu.:15000 3rd Qu.:36.00 3rd Qu.:15.27 3rd Qu.: 452.4 E: 31
## Max. :35000 Max. :60.00 Max. :23.91 Max. :1140.1 F: 9
## G: 1
## grade_num sub_grade emp_title
## Min. :1.000 B3 : 43 : 24
## 1st Qu.:2.000 B4 : 40 United States Air Force: 3
## Median :2.000 B5 : 39 American Airlines : 2
## Mean :2.493 B1 : 36 Best Buy : 2
## 3rd Qu.:3.000 A4 : 33 cardinal logistics : 2
## Max. :7.000 C1 : 33 Multiband : 2
## (Other):275 (Other) :464
## emp_length home_ownership ownership annual_inc
## 10+ years:118 MORTGAGE:164 Min. :1.000 Min. : 12000
## 2 years : 56 OWN : 38 1st Qu.:1.000 1st Qu.: 40000
## 5 years : 54 RENT :297 Median :3.000 Median : 53000
## 3 years : 48 Mean :2.267 Mean : 60555
## 1 year : 46 3rd Qu.:3.000 3rd Qu.: 75000
## 4 years : 40 Max. :3.000 Max. :276000
## (Other) :137
## verification_status issue_d loan_status status_numeric
## Not Verified :199 Dec-11:499 Charged Off: 89 Min. :1.000
## Source Verified:132 Current : 40 1st Qu.:2.000
## Verified :168 Default : 1 Median :4.000
## Fully Paid :369 Mean :3.303
## 3rd Qu.:4.000
## Max. :4.000
##
## pymnt_plan purpose title
## n:499 debt_consolidation:276 Debt Consolidation Loan : 46
## credit_card :116 Debt Consolidation : 42
## other : 29 Credit Card Consolidation: 13
## home_improvement : 20 Credit Card : 11
## major_purchase : 15 Consolidation : 10
## small_business : 15 Debt consolidation : 9
## (Other) : 28 (Other) :368
## zip_code addr_state dti delinq_2yrs
## 900xx : 10 CA :111 Min. : 0.72 Min. :0.00000
## 921xx : 8 NY : 44 1st Qu.: 9.55 1st Qu.:0.00000
## 070xx : 7 FL : 38 Median :14.43 Median :0.00000
## 330xx : 7 TX : 36 Mean :14.44 Mean :0.07415
## 606xx : 7 NJ : 26 3rd Qu.:19.70 3rd Qu.:0.00000
## 850xx : 7 IL : 19 Max. :29.85 Max. :3.00000
## (Other):453 (Other):225
## earliest_cr_line inq_last_6mths open_acc pub_rec
## Oct-02 : 10 Min. :0.0000 Min. : 2.000 Min. :0.00000
## May-00 : 9 1st Qu.:0.0000 1st Qu.: 7.000 1st Qu.:0.00000
## Sep-98 : 9 Median :1.0000 Median : 8.000 Median :0.00000
## Apr-00 : 8 Mean :0.8457 Mean : 9.152 Mean :0.02204
## Oct-00 : 8 3rd Qu.:1.0000 3rd Qu.:11.000 3rd Qu.:0.00000
## Aug-00 : 6 Max. :5.0000 Max. :30.000 Max. :1.00000
## (Other):449
## revol_bal revol_util total_acc initial_list_status
## Min. : 0 Min. : 0.60 Min. : 3.00 f:499
## 1st Qu.: 7340 1st Qu.:49.10 1st Qu.:13.00
## Median :11638 Median :66.90 Median :18.00
## Mean :13824 Mean :63.02 Mean :20.23
## 3rd Qu.:17670 3rd Qu.:81.65 3rd Qu.:26.00
## Max. :93718 Max. :99.80 Max. :79.00
##
## out_prncp out_prncp_inv total_pymnt total_pymnt_inv
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 6857 1st Qu.: 6857
## Median : 0.0 Median : 0.0 Median :11123 Median :11082
## Mean : 343.3 Mean : 342.5 Mean :12449 Mean :12286
## 3rd Qu.: 0.0 3rd Qu.: 0.0 3rd Qu.:16061 3rd Qu.:15917
## Max. :9928.6 Max. :9921.5 Max. :45755 Max. :43294
##
## total_rec_prncp total_rec_int total_rec_late_fee recoveries
## Min. : 0 Min. : 0.0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 5363 1st Qu.: 912.1 1st Qu.: 0.000 1st Qu.: 0.0
## Median : 9000 Median : 1659.3 Median : 0.000 Median : 0.0
## Mean : 9911 Mean : 2428.8 Mean : 1.001 Mean : 108.6
## 3rd Qu.:13229 3rd Qu.: 2898.5 3rd Qu.: 0.000 3rd Qu.: 0.0
## Max. :35000 Max. :18875.7 Max. :36.247 Max. :3842.1
##
## collection_recovery_fee last_pymnt_d last_pymnt_amnt next_pymnt_d
## Min. : 0.000 Jan-15 :125 Min. : 0.0 :458
## 1st Qu.: 0.000 Dec-14 : 32 1st Qu.: 281.9 Feb-16: 24
## Median : 0.000 Dec-15 : 30 Median : 520.2 Jan-16: 17
## Mean : 8.893 Dec-13 : 20 Mean : 2808.1
## 3rd Qu.: 0.000 Oct-13 : 18 3rd Qu.: 3431.5
## Max. :687.780 Oct-12 : 17 Max. :28412.4
## (Other):257
## last_credit_pull_d collections_12_mths_ex_med policy_code
## Jan-16 :209 Min. :0 Min. :1
## Dec-14 : 59 1st Qu.:0 1st Qu.:1
## Jan-15 : 22 Median :0 Median :1
## Dec-15 : 11 Mean :0 Mean :1
## Jul-14 : 11 3rd Qu.:0 3rd Qu.:1
## Sep-15 : 11 Max. :0 Max. :1
## (Other):176
## application_type
## INDIVIDUAL:499
##
##
##
##
##
##
CHARGE-OFF: The declaration by a creditor (usually a credit card account) that an amount of debt is unlikely to be collected. This occurs when a consumer becomes severely delinquent on a debt. Traditionally, creditors will make this declaration at the point of six months without payment. DEFAULT: It occurs when a debtor is unable to meet the legal obligation of debt repayment, and it also refers to cases in which one party fails to perform on a futures contract as required by an exchange. CURRENT: There is still time for the loan to be paid.
Here, the loan status is aggregated by home ownership. The best group in terms of fully paying off its loans seems to be the ones having their houses in mortgage(79.8% of them pay compared to just 71% in the set of people who own and rent homes). Even though a small percentage (0.6%) of the mortgagers default on their payments, we see that all of the defaulters come from this group.
mytable<-xtabs(~loan_status+home_ownership, data=loan.df)
prop.table(mytable,2)
## home_ownership
## loan_status MORTGAGE OWN RENT
## Charged Off 0.091463415 0.236842105 0.218855219
## Current 0.103658537 0.052631579 0.070707071
## Default 0.006097561 0.000000000 0.000000000
## Fully Paid 0.798780488 0.710526316 0.710437710
chisq.test(mytable)
## Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable
## X-squared = 15.427, df = 6, p-value = 0.01718
The histogram helps us visualize better.
library(lattice)
## Warning: package 'lattice' was built under R version 3.4.3
histogram(~loan_status | home_ownership, data=loan.df)
There doesn’t seem to be a wide variation in the mean amount loaned by these three groups, though people with mortgaged homes needed the highest loan amounts on an average.
aggregate(loan.df$loan_amnt, by=list(salary=loan.df$home_ownership), mean)
## salary x
## 1 MORTGAGE 13584.76
## 2 OWN 12509.21
## 3 RENT 11194.02
We see that means might be misleading, since the maximum loaned value is significantly higher for mortgagers.
boxplot(loan_amnt~home_ownership, data=loan.df)
On comparing loan status with respect to loan amounts, we see that people who fully paid their loans had the least loaned amount while those who defaulted had the highest amount to be paid.
## salary x
## 1 Charged Off 12977.25
## 2 Current 16645.00
## 3 Default 18000.00
## 4 Fully Paid 11352.57
People who charged off had the lowest average income.
aggregate(loan.df$annual_inc, by=list(salary=loan.df$loan_status), mean)
## salary x
## 1 Charged Off 54217.61
## 2 Current 70203.60
## 3 Default 62000.00
## 4 Fully Paid 61034.14
boxplot(loan_amnt~loan_status, data=loan.df)
The interest rates follow a similar trend, with the highest for defaulters and the lowest for those who fully paid.
aggregate(loan.df$int_rate, by=list(salary=loan.df$loan_status), mean)
## salary x
## 1 Charged Off 13.90461
## 2 Current 15.79625
## 3 Default 17.27000
## 4 Fully Paid 11.97553
boxplot(int_rate~loan_status, data=loan.df)
Mean last total payment amount received was highest for those who fully paid the loan and similar for the other three categories.
aggregate(loan.df$last_pymnt_amnt, by=list(salary=loan.df$loan_status), mean)
## salary x
## 1 Charged Off 407.6597
## 2 Current 373.6968
## 3 Default 449.9700
## 4 Fully Paid 3657.3260
Those who fully paid the loans did not only have a high mean, they had a much higher maximum last payment, with outliers in the category rising even further.
boxplot(last_pymnt_amnt~loan_status, data=loan.df)
When we analyze the loan status wrt the grading, we see that the grading has been correct to a good extent. We see the percentage of people who charged off increase from 10.7% in A to 33.3% in F. The percentage of people who fully pay their loans decreases from 89.28% to 44.44% in F. G proves to be an anomaly since everyone in G seems to have paid their loans. However, we realize that there is just one person in the entire G subset, making that one data point an outlier.
mytable2<-xtabs(~loan_status+grade, data=loan.df)
mytable2
## grade
## loan_status A B C D E F G
## Charged Off 12 30 18 18 8 3 0
## Current 0 11 10 10 7 2 0
## Default 0 0 0 1 0 0 0
## Fully Paid 100 143 72 33 16 4 1
prop.table(mytable2,2)
## grade
## loan_status A B C D E
## Charged Off 0.10714286 0.16304348 0.18000000 0.29032258 0.25806452
## Current 0.00000000 0.05978261 0.10000000 0.16129032 0.22580645
## Default 0.00000000 0.00000000 0.00000000 0.01612903 0.00000000
## Fully Paid 0.89285714 0.77717391 0.72000000 0.53225806 0.51612903
## grade
## loan_status F G
## Charged Off 0.33333333 0.00000000
## Current 0.22222222 0.00000000
## Default 0.00000000 0.00000000
## Fully Paid 0.44444444 1.00000000
chisq.test(mytable2)
## Warning in chisq.test(mytable2): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable2
## X-squared = 54.202, df = 18, p-value = 1.707e-05
The p-value shows there is little correlation between number of enquiries in last 6 months and loan status.
mytable5<-xtabs(~loan_status+inq_last_6mths, data=loan.df)
prop.table(mytable5,2)
## inq_last_6mths
## loan_status 0 1 2 3 4
## Charged Off 0.15720524 0.16025641 0.22891566 0.28000000 0.20000000
## Current 0.07423581 0.08333333 0.08433735 0.08000000 0.20000000
## Default 0.00000000 0.00000000 0.01204819 0.00000000 0.00000000
## Fully Paid 0.76855895 0.75641026 0.67469880 0.64000000 0.60000000
## inq_last_6mths
## loan_status 5
## Charged Off 1.00000000
## Current 0.00000000
## Default 0.00000000
## Fully Paid 0.00000000
chisq.test(mytable5)
## Warning in chisq.test(mytable5): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable5
## X-squared = 15.422, df = 15, p-value = 0.4215
Purpose seems to have no correlation to the loan status.
mytable5<-xtabs(~loan_status+purpose, data=loan.df)
prop.table(mytable5,2)
## purpose
## loan_status car credit_card debt_consolidation home_improvement
## Charged Off 0.272727273 0.103448276 0.199275362 0.150000000
## Current 0.090909091 0.060344828 0.076086957 0.150000000
## Default 0.000000000 0.000000000 0.003623188 0.000000000
## Fully Paid 0.636363636 0.836206897 0.721014493 0.700000000
## purpose
## loan_status house major_purchase medical moving
## Charged Off 0.000000000 0.133333333 0.000000000 0.333333333
## Current 0.000000000 0.133333333 0.000000000 0.000000000
## Default 0.000000000 0.000000000 0.000000000 0.000000000
## Fully Paid 1.000000000 0.733333333 1.000000000 0.666666667
## purpose
## loan_status other small_business vacation wedding
## Charged Off 0.241379310 0.400000000 0.000000000 0.000000000
## Current 0.137931034 0.066666667 0.200000000 0.000000000
## Default 0.000000000 0.000000000 0.000000000 0.000000000
## Fully Paid 0.620689655 0.533333333 0.800000000 1.000000000
chisq.test(mytable5)
## Warning in chisq.test(mytable5): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable5
## X-squared = 23.193, df = 33, p-value = 0.8977
The p-value indicates there is little correlation between loan status and the number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years
mytable6<-xtabs(~loan_status+delinq_2yrs, data=loan.df)
prop.table(mytable6,2)
## delinq_2yrs
## loan_status 0 1 2 3
## Charged Off 0.179104478 0.125000000 0.200000000 1.000000000
## Current 0.078891258 0.125000000 0.000000000 0.000000000
## Default 0.002132196 0.000000000 0.000000000 0.000000000
## Fully Paid 0.739872068 0.750000000 0.800000000 0.000000000
chisq.test(mytable6)
## Warning in chisq.test(mytable6): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable6
## X-squared = 6.107, df = 9, p-value = 0.7292
The p-value of 0.4621 seems to indicate there is almost no correlation between employment length and loan status.
mytable3<-xtabs(~loan_status+emp_length, data=loan.df)
prop.table(mytable3,2)
## emp_length
## loan_status < 1 year 1 year 10+ years 2 years 3 years
## Charged Off 0.24324324 0.17391304 0.22033898 0.14285714 0.18750000
## Current 0.05405405 0.06521739 0.10169492 0.08928571 0.08333333
## Default 0.00000000 0.02173913 0.00000000 0.00000000 0.00000000
## Fully Paid 0.70270270 0.73913043 0.67796610 0.76785714 0.72916667
## emp_length
## loan_status 4 years 5 years 6 years 7 years 8 years
## Charged Off 0.12500000 0.12962963 0.10344828 0.11538462 0.28571429
## Current 0.05000000 0.01851852 0.10344828 0.15384615 0.19047619
## Default 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## Fully Paid 0.82500000 0.85185185 0.79310345 0.73076923 0.52380952
## emp_length
## loan_status 9 years n/a
## Charged Off 0.15789474 0.40000000
## Current 0.00000000 0.00000000
## Default 0.00000000 0.00000000
## Fully Paid 0.84210526 0.60000000
chisq.test(mytable3)
## Warning in chisq.test(mytable3): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable3
## X-squared = 33.105, df = 33, p-value = 0.4621
The term has a correlation to the loan status, since the p-value is <0.01. Among those with a term of 36 months, 85.6% of the people fully paid their loans compared to just 38.7% in those who had term of 60 months.
mytable3<-xtabs(~loan_status+term_months, data=loan.df)
prop.table(mytable3,2)
## term_months
## loan_status 36 60
## Charged Off 0.144000000 0.282258065
## Current 0.000000000 0.322580645
## Default 0.000000000 0.008064516
## Fully Paid 0.856000000 0.387096774
chisq.test(mytable3)
## Warning in chisq.test(mytable3): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: mytable3
## X-squared = 161.69, df = 3, p-value < 2.2e-16
We have a column, which denotes the status in numerical terms. 1=Charged Off, 2=Current, 3=Default, 4=Fully Paid
t.test(loan.df$status_numeric,loan.df$int_rate)
##
## Welch Two Sample t-test
##
## data: loan.df$status_numeric and loan.df$int_rate
## t = -51.929, df = 595.08, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9.686873 -8.980862
## sample estimates:
## mean of x mean of y
## 3.302605 12.636473
t.test(loan.df$status_numeric,loan.df$term_months)
##
## Welch Two Sample t-test
##
## data: loan.df$status_numeric and loan.df$term_months
## t = -82.635, df = 511.35, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -39.58048 -37.74216
## sample estimates:
## mean of x mean of y
## 3.302605 41.963928
t.test(loan.df$status_numeric,loan.df$last_pymnt_amnt)
##
## Welch Two Sample t-test
##
## data: loan.df$status_numeric and loan.df$last_pymnt_amnt
## t = -13.384, df = 498, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3216.514 -2393.044
## sample estimates:
## mean of x mean of y
## 3.302605 2808.081844
t.test(loan.df$status_numeric,loan.df$loan_amnt)
##
## Welch Two Sample t-test
##
## data: loan.df$status_numeric and loan.df$loan_amnt
## t = -37.931, df = 498, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -12702.15 -11451.07
## sample estimates:
## mean of x mean of y
## 3.302605 12079.909820
The column ownership uses numbers to denote home ownership. 1=mortgage, 2=own, 3=rent
t.test(loan.df$status_numeric,loan.df$ownership)
##
## Welch Two Sample t-test
##
## data: loan.df$status_numeric and loan.df$ownership
## t = 15.264, df = 934.48, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.9028668 1.1692775
## sample estimates:
## mean of x mean of y
## 3.302605 2.266533
t.test(loan.df$status_numeric,loan.df$inq_last_6mths)
##
## Welch Two Sample t-test
##
## data: loan.df$status_numeric and loan.df$inq_last_6mths
## t = 35.648, df = 950.53, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.321658 2.592170
## sample estimates:
## mean of x mean of y
## 3.3026052 0.8456914
The number of inquiries in past 6 months (excluding auto and mortgage inquiries) are very difficult to predict using the below linear regression model.
fit<-lm(inq_last_6mths~ownership+loan_amnt+int_rate+term_months+grade_num+annual_inc, data = loan.df)
summary(fit)
##
## Call:
## lm(formula = inq_last_6mths ~ ownership + loan_amnt + int_rate +
## term_months + grade_num + annual_inc, data = loan.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4639 -0.7096 -0.1544 0.5267 3.8540
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.897e-01 3.086e-01 0.939 0.3482
## ownership -4.886e-02 4.671e-02 -1.046 0.2961
## loan_amnt -1.260e-05 7.077e-06 -1.781 0.0755 .
## int_rate 9.722e-02 4.005e-02 2.428 0.0156 *
## term_months -2.904e-03 5.120e-03 -0.567 0.5708
## grade_num -7.559e-02 1.223e-01 -0.618 0.5368
## annual_inc -1.639e-06 1.468e-06 -1.116 0.2648
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9307 on 492 degrees of freedom
## Multiple R-squared: 0.0758, Adjusted R-squared: 0.06453
## F-statistic: 6.725 on 6 and 492 DF, p-value: 7.355e-07
The loan status is also tough to decipher from this regression model, since it can explain only 18.5% of the variation.
fit<-lm(status_numeric~ownership+loan_amnt+int_rate+term_months+grade_num+annual_inc+delinq_2yrs+inq_last_6mths+last_pymnt_amnt+funded_amnt+out_prncp+total_rec_prncp+total_rec_late_fee+total_rec_int, data = loan.df)
summary(fit)
##
## Call:
## lm(formula = status_numeric ~ ownership + loan_amnt + int_rate +
## term_months + grade_num + annual_inc + delinq_2yrs + inq_last_6mths +
## last_pymnt_amnt + funded_amnt + out_prncp + total_rec_prncp +
## total_rec_late_fee + total_rec_int, data = loan.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6268 -0.1358 0.1759 0.3499 2.5940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.346e+00 2.763e-01 15.733 < 2e-16 ***
## ownership -7.747e-02 3.477e-02 -2.228 0.026344 *
## loan_amnt -7.537e-06 1.794e-05 -0.420 0.674663
## int_rate -3.466e-02 2.974e-02 -1.165 0.244464
## term_months -1.340e-02 4.614e-03 -2.905 0.003840 **
## grade_num 8.857e-02 9.098e-02 0.973 0.330801
## annual_inc 2.683e-07 1.103e-06 0.243 0.807860
## delinq_2yrs -5.124e-02 1.002e-01 -0.511 0.609353
## inq_last_6mths -4.432e-02 3.348e-02 -1.324 0.186211
## last_pymnt_amnt 1.609e-05 1.011e-05 1.593 0.111902
## funded_amnt -2.037e-04 2.036e-05 -10.007 < 2e-16 ***
## out_prncp -1.161e-04 3.446e-05 -3.370 0.000812 ***
## total_rec_prncp 2.348e-04 1.139e-05 20.619 < 2e-16 ***
## total_rec_late_fee -8.101e-03 6.602e-03 -1.227 0.220428
## total_rec_int 3.186e-05 2.976e-05 1.071 0.284800
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6831 on 484 degrees of freedom
## Multiple R-squared: 0.6861, Adjusted R-squared: 0.677
## F-statistic: 75.55 on 14 and 484 DF, p-value: < 2.2e-16
While the regression model could predict 68.61% of the variation in loan status,a higher R-squared can be achieved with more exhaustive parameters. The key findings are as follow: 1) The total recovered principal, house ownership type, payment month terms and funded amount are the most important parameters which are most strongly correlated to the loan status. 2) The loan amount and annual income have the least p-value, showing they are least correlated. 3) The adjusted R-squared is 0.677, which is close to 0.6861, showing that the R-squared isn’t achieved by having a lot of unnecessary parameters.
The previous findings tell us that: 1) Higher interest rates and amount loaned increase the probability of defaulting. 2) People who charged off had the highest average income. 3) The grades are a good enough predictor of a person’s ability to fully pay the loan. 4) People who fully paid had a much higher last amount paid.