We will study the loan status of a sampleset of 500 people and analyze the various factors, which correlate to their ability to fully repay their loans or the probability that they will default. We analyze their effect and develop a regression model in the end to predict the loan status.

loan.df<-read.csv(paste("loan2.csv", sep=""))
summary(loan.df)
##        id            member_id         loan_amnt      funded_amnt   
##  Min.   : 822464   Min.   : 943135   Min.   : 1000   Min.   : 1000  
##  1st Qu.:1063592   1st Qu.:1295787   1st Qu.: 7000   1st Qu.: 7000  
##  Median :1065348   Median :1299063   Median :10000   Median :10000  
##  Mean   :1062700   Mean   :1294483   Mean   :12080   Mean   :11653  
##  3rd Qu.:1067576   3rd Qu.:1301633   3rd Qu.:15425   3rd Qu.:15000  
##  Max.   :1077501   Max.   :1314167   Max.   :35000   Max.   :35000  
##                                                                     
##  funded_amnt_inv  term_months       int_rate      installment     grade  
##  Min.   : 1000   Min.   :36.00   Min.   : 6.03   Min.   :  34.5   A:112  
##  1st Qu.: 7000   1st Qu.:36.00   1st Qu.: 9.91   1st Qu.: 217.4   B:184  
##  Median :10000   Median :36.00   Median :12.42   Median : 330.8   C:100  
##  Mean   :11546   Mean   :41.96   Mean   :12.64   Mean   : 351.9   D: 62  
##  3rd Qu.:15000   3rd Qu.:36.00   3rd Qu.:15.27   3rd Qu.: 452.4   E: 31  
##  Max.   :35000   Max.   :60.00   Max.   :23.91   Max.   :1140.1   F:  9  
##                                                                   G:  1  
##    grade_num       sub_grade                     emp_title  
##  Min.   :1.000   B3     : 43                          : 24  
##  1st Qu.:2.000   B4     : 40   United States Air Force:  3  
##  Median :2.000   B5     : 39   American Airlines      :  2  
##  Mean   :2.493   B1     : 36   Best Buy               :  2  
##  3rd Qu.:3.000   A4     : 33   cardinal logistics     :  2  
##  Max.   :7.000   C1     : 33   Multiband              :  2  
##                  (Other):275   (Other)                :464  
##      emp_length   home_ownership   ownership       annual_inc    
##  10+ years:118   MORTGAGE:164    Min.   :1.000   Min.   : 12000  
##  2 years  : 56   OWN     : 38    1st Qu.:1.000   1st Qu.: 40000  
##  5 years  : 54   RENT    :297    Median :3.000   Median : 53000  
##  3 years  : 48                   Mean   :2.267   Mean   : 60555  
##  1 year   : 46                   3rd Qu.:3.000   3rd Qu.: 75000  
##  4 years  : 40                   Max.   :3.000   Max.   :276000  
##  (Other)  :137                                                   
##       verification_status   issue_d         loan_status  status_numeric 
##  Not Verified   :199      Dec-11:499   Charged Off: 89   Min.   :1.000  
##  Source Verified:132                   Current    : 40   1st Qu.:2.000  
##  Verified       :168                   Default    :  1   Median :4.000  
##                                        Fully Paid :369   Mean   :3.303  
##                                                          3rd Qu.:4.000  
##                                                          Max.   :4.000  
##                                                                         
##  pymnt_plan               purpose                          title    
##  n:499      debt_consolidation:276   Debt Consolidation Loan  : 46  
##             credit_card       :116   Debt Consolidation       : 42  
##             other             : 29   Credit Card Consolidation: 13  
##             home_improvement  : 20   Credit Card              : 11  
##             major_purchase    : 15   Consolidation            : 10  
##             small_business    : 15   Debt consolidation       :  9  
##             (Other)           : 28   (Other)                  :368  
##     zip_code     addr_state       dti         delinq_2yrs     
##  900xx  : 10   CA     :111   Min.   : 0.72   Min.   :0.00000  
##  921xx  :  8   NY     : 44   1st Qu.: 9.55   1st Qu.:0.00000  
##  070xx  :  7   FL     : 38   Median :14.43   Median :0.00000  
##  330xx  :  7   TX     : 36   Mean   :14.44   Mean   :0.07415  
##  606xx  :  7   NJ     : 26   3rd Qu.:19.70   3rd Qu.:0.00000  
##  850xx  :  7   IL     : 19   Max.   :29.85   Max.   :3.00000  
##  (Other):453   (Other):225                                    
##  earliest_cr_line inq_last_6mths      open_acc         pub_rec       
##  Oct-02 : 10      Min.   :0.0000   Min.   : 2.000   Min.   :0.00000  
##  May-00 :  9      1st Qu.:0.0000   1st Qu.: 7.000   1st Qu.:0.00000  
##  Sep-98 :  9      Median :1.0000   Median : 8.000   Median :0.00000  
##  Apr-00 :  8      Mean   :0.8457   Mean   : 9.152   Mean   :0.02204  
##  Oct-00 :  8      3rd Qu.:1.0000   3rd Qu.:11.000   3rd Qu.:0.00000  
##  Aug-00 :  6      Max.   :5.0000   Max.   :30.000   Max.   :1.00000  
##  (Other):449                                                         
##    revol_bal       revol_util      total_acc     initial_list_status
##  Min.   :    0   Min.   : 0.60   Min.   : 3.00   f:499              
##  1st Qu.: 7340   1st Qu.:49.10   1st Qu.:13.00                      
##  Median :11638   Median :66.90   Median :18.00                      
##  Mean   :13824   Mean   :63.02   Mean   :20.23                      
##  3rd Qu.:17670   3rd Qu.:81.65   3rd Qu.:26.00                      
##  Max.   :93718   Max.   :99.80   Max.   :79.00                      
##                                                                     
##    out_prncp      out_prncp_inv     total_pymnt    total_pymnt_inv
##  Min.   :   0.0   Min.   :   0.0   Min.   :    0   Min.   :    0  
##  1st Qu.:   0.0   1st Qu.:   0.0   1st Qu.: 6857   1st Qu.: 6857  
##  Median :   0.0   Median :   0.0   Median :11123   Median :11082  
##  Mean   : 343.3   Mean   : 342.5   Mean   :12449   Mean   :12286  
##  3rd Qu.:   0.0   3rd Qu.:   0.0   3rd Qu.:16061   3rd Qu.:15917  
##  Max.   :9928.6   Max.   :9921.5   Max.   :45755   Max.   :43294  
##                                                                   
##  total_rec_prncp total_rec_int     total_rec_late_fee   recoveries    
##  Min.   :    0   Min.   :    0.0   Min.   : 0.000     Min.   :   0.0  
##  1st Qu.: 5363   1st Qu.:  912.1   1st Qu.: 0.000     1st Qu.:   0.0  
##  Median : 9000   Median : 1659.3   Median : 0.000     Median :   0.0  
##  Mean   : 9911   Mean   : 2428.8   Mean   : 1.001     Mean   : 108.6  
##  3rd Qu.:13229   3rd Qu.: 2898.5   3rd Qu.: 0.000     3rd Qu.:   0.0  
##  Max.   :35000   Max.   :18875.7   Max.   :36.247     Max.   :3842.1  
##                                                                       
##  collection_recovery_fee  last_pymnt_d last_pymnt_amnt   next_pymnt_d
##  Min.   :  0.000         Jan-15 :125   Min.   :    0.0         :458  
##  1st Qu.:  0.000         Dec-14 : 32   1st Qu.:  281.9   Feb-16: 24  
##  Median :  0.000         Dec-15 : 30   Median :  520.2   Jan-16: 17  
##  Mean   :  8.893         Dec-13 : 20   Mean   : 2808.1               
##  3rd Qu.:  0.000         Oct-13 : 18   3rd Qu.: 3431.5               
##  Max.   :687.780         Oct-12 : 17   Max.   :28412.4               
##                          (Other):257                                 
##  last_credit_pull_d collections_12_mths_ex_med  policy_code
##  Jan-16 :209        Min.   :0                  Min.   :1   
##  Dec-14 : 59        1st Qu.:0                  1st Qu.:1   
##  Jan-15 : 22        Median :0                  Median :1   
##  Dec-15 : 11        Mean   :0                  Mean   :1   
##  Jul-14 : 11        3rd Qu.:0                  3rd Qu.:1   
##  Sep-15 : 11        Max.   :0                  Max.   :1   
##  (Other):176                                               
##    application_type
##  INDIVIDUAL:499    
##                    
##                    
##                    
##                    
##                    
## 

CHARGE-OFF: The declaration by a creditor (usually a credit card account) that an amount of debt is unlikely to be collected. This occurs when a consumer becomes severely delinquent on a debt. Traditionally, creditors will make this declaration at the point of six months without payment. DEFAULT: It occurs when a debtor is unable to meet the legal obligation of debt repayment, and it also refers to cases in which one party fails to perform on a futures contract as required by an exchange. CURRENT: There is still time for the loan to be paid.

Here, the loan status is aggregated by home ownership. The best group in terms of fully paying off its loans seems to be the ones having their houses in mortgage(79.8% of them pay compared to just 71% in the set of people who own and rent homes). Even though a small percentage (0.6%) of the mortgagers default on their payments, we see that all of the defaulters come from this group.

mytable<-xtabs(~loan_status+home_ownership, data=loan.df)
prop.table(mytable,2)
##              home_ownership
## loan_status      MORTGAGE         OWN        RENT
##   Charged Off 0.091463415 0.236842105 0.218855219
##   Current     0.103658537 0.052631579 0.070707071
##   Default     0.006097561 0.000000000 0.000000000
##   Fully Paid  0.798780488 0.710526316 0.710437710
chisq.test(mytable)
## Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 15.427, df = 6, p-value = 0.01718

The histogram helps us visualize better.

library(lattice)
## Warning: package 'lattice' was built under R version 3.4.3
histogram(~loan_status | home_ownership, data=loan.df)

There doesn’t seem to be a wide variation in the mean amount loaned by these three groups, though people with mortgaged homes needed the highest loan amounts on an average.

aggregate(loan.df$loan_amnt, by=list(salary=loan.df$home_ownership), mean)
##     salary        x
## 1 MORTGAGE 13584.76
## 2      OWN 12509.21
## 3     RENT 11194.02

We see that means might be misleading, since the maximum loaned value is significantly higher for mortgagers.

boxplot(loan_amnt~home_ownership, data=loan.df)

On comparing loan status with respect to loan amounts, we see that people who fully paid their loans had the least loaned amount while those who defaulted had the highest amount to be paid.

##        salary        x
## 1 Charged Off 12977.25
## 2     Current 16645.00
## 3     Default 18000.00
## 4  Fully Paid 11352.57

People who charged off had the lowest average income.

aggregate(loan.df$annual_inc, by=list(salary=loan.df$loan_status), mean)
##        salary        x
## 1 Charged Off 54217.61
## 2     Current 70203.60
## 3     Default 62000.00
## 4  Fully Paid 61034.14
boxplot(loan_amnt~loan_status, data=loan.df)

The interest rates follow a similar trend, with the highest for defaulters and the lowest for those who fully paid.

aggregate(loan.df$int_rate, by=list(salary=loan.df$loan_status), mean)
##        salary        x
## 1 Charged Off 13.90461
## 2     Current 15.79625
## 3     Default 17.27000
## 4  Fully Paid 11.97553
boxplot(int_rate~loan_status, data=loan.df)

Mean last total payment amount received was highest for those who fully paid the loan and similar for the other three categories.

aggregate(loan.df$last_pymnt_amnt, by=list(salary=loan.df$loan_status), mean)
##        salary         x
## 1 Charged Off  407.6597
## 2     Current  373.6968
## 3     Default  449.9700
## 4  Fully Paid 3657.3260

Those who fully paid the loans did not only have a high mean, they had a much higher maximum last payment, with outliers in the category rising even further.

boxplot(last_pymnt_amnt~loan_status, data=loan.df)

When we analyze the loan status wrt the grading, we see that the grading has been correct to a good extent. We see the percentage of people who charged off increase from 10.7% in A to 33.3% in F. The percentage of people who fully pay their loans decreases from 89.28% to 44.44% in F. G proves to be an anomaly since everyone in G seems to have paid their loans. However, we realize that there is just one person in the entire G subset, making that one data point an outlier.

mytable2<-xtabs(~loan_status+grade, data=loan.df)
mytable2
##              grade
## loan_status     A   B   C   D   E   F   G
##   Charged Off  12  30  18  18   8   3   0
##   Current       0  11  10  10   7   2   0
##   Default       0   0   0   1   0   0   0
##   Fully Paid  100 143  72  33  16   4   1
prop.table(mytable2,2)
##              grade
## loan_status            A          B          C          D          E
##   Charged Off 0.10714286 0.16304348 0.18000000 0.29032258 0.25806452
##   Current     0.00000000 0.05978261 0.10000000 0.16129032 0.22580645
##   Default     0.00000000 0.00000000 0.00000000 0.01612903 0.00000000
##   Fully Paid  0.89285714 0.77717391 0.72000000 0.53225806 0.51612903
##              grade
## loan_status            F          G
##   Charged Off 0.33333333 0.00000000
##   Current     0.22222222 0.00000000
##   Default     0.00000000 0.00000000
##   Fully Paid  0.44444444 1.00000000
chisq.test(mytable2)
## Warning in chisq.test(mytable2): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable2
## X-squared = 54.202, df = 18, p-value = 1.707e-05

The p-value shows there is little correlation between number of enquiries in last 6 months and loan status.

mytable5<-xtabs(~loan_status+inq_last_6mths, data=loan.df)
prop.table(mytable5,2)
##              inq_last_6mths
## loan_status            0          1          2          3          4
##   Charged Off 0.15720524 0.16025641 0.22891566 0.28000000 0.20000000
##   Current     0.07423581 0.08333333 0.08433735 0.08000000 0.20000000
##   Default     0.00000000 0.00000000 0.01204819 0.00000000 0.00000000
##   Fully Paid  0.76855895 0.75641026 0.67469880 0.64000000 0.60000000
##              inq_last_6mths
## loan_status            5
##   Charged Off 1.00000000
##   Current     0.00000000
##   Default     0.00000000
##   Fully Paid  0.00000000
chisq.test(mytable5)
## Warning in chisq.test(mytable5): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable5
## X-squared = 15.422, df = 15, p-value = 0.4215

Purpose seems to have no correlation to the loan status.

mytable5<-xtabs(~loan_status+purpose, data=loan.df)
prop.table(mytable5,2)
##              purpose
## loan_status           car credit_card debt_consolidation home_improvement
##   Charged Off 0.272727273 0.103448276        0.199275362      0.150000000
##   Current     0.090909091 0.060344828        0.076086957      0.150000000
##   Default     0.000000000 0.000000000        0.003623188      0.000000000
##   Fully Paid  0.636363636 0.836206897        0.721014493      0.700000000
##              purpose
## loan_status         house major_purchase     medical      moving
##   Charged Off 0.000000000    0.133333333 0.000000000 0.333333333
##   Current     0.000000000    0.133333333 0.000000000 0.000000000
##   Default     0.000000000    0.000000000 0.000000000 0.000000000
##   Fully Paid  1.000000000    0.733333333 1.000000000 0.666666667
##              purpose
## loan_status         other small_business    vacation     wedding
##   Charged Off 0.241379310    0.400000000 0.000000000 0.000000000
##   Current     0.137931034    0.066666667 0.200000000 0.000000000
##   Default     0.000000000    0.000000000 0.000000000 0.000000000
##   Fully Paid  0.620689655    0.533333333 0.800000000 1.000000000
chisq.test(mytable5)
## Warning in chisq.test(mytable5): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable5
## X-squared = 23.193, df = 33, p-value = 0.8977

The p-value indicates there is little correlation between loan status and the number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years

mytable6<-xtabs(~loan_status+delinq_2yrs, data=loan.df)
prop.table(mytable6,2)
##              delinq_2yrs
## loan_status             0           1           2           3
##   Charged Off 0.179104478 0.125000000 0.200000000 1.000000000
##   Current     0.078891258 0.125000000 0.000000000 0.000000000
##   Default     0.002132196 0.000000000 0.000000000 0.000000000
##   Fully Paid  0.739872068 0.750000000 0.800000000 0.000000000
chisq.test(mytable6)
## Warning in chisq.test(mytable6): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable6
## X-squared = 6.107, df = 9, p-value = 0.7292

The p-value of 0.4621 seems to indicate there is almost no correlation between employment length and loan status.

mytable3<-xtabs(~loan_status+emp_length, data=loan.df)
prop.table(mytable3,2)
##              emp_length
## loan_status     < 1 year     1 year  10+ years    2 years    3 years
##   Charged Off 0.24324324 0.17391304 0.22033898 0.14285714 0.18750000
##   Current     0.05405405 0.06521739 0.10169492 0.08928571 0.08333333
##   Default     0.00000000 0.02173913 0.00000000 0.00000000 0.00000000
##   Fully Paid  0.70270270 0.73913043 0.67796610 0.76785714 0.72916667
##              emp_length
## loan_status      4 years    5 years    6 years    7 years    8 years
##   Charged Off 0.12500000 0.12962963 0.10344828 0.11538462 0.28571429
##   Current     0.05000000 0.01851852 0.10344828 0.15384615 0.19047619
##   Default     0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
##   Fully Paid  0.82500000 0.85185185 0.79310345 0.73076923 0.52380952
##              emp_length
## loan_status      9 years        n/a
##   Charged Off 0.15789474 0.40000000
##   Current     0.00000000 0.00000000
##   Default     0.00000000 0.00000000
##   Fully Paid  0.84210526 0.60000000
chisq.test(mytable3)
## Warning in chisq.test(mytable3): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable3
## X-squared = 33.105, df = 33, p-value = 0.4621

The term has a correlation to the loan status, since the p-value is <0.01. Among those with a term of 36 months, 85.6% of the people fully paid their loans compared to just 38.7% in those who had term of 60 months.

mytable3<-xtabs(~loan_status+term_months, data=loan.df)
prop.table(mytable3,2)
##              term_months
## loan_status            36          60
##   Charged Off 0.144000000 0.282258065
##   Current     0.000000000 0.322580645
##   Default     0.000000000 0.008064516
##   Fully Paid  0.856000000 0.387096774
chisq.test(mytable3)
## Warning in chisq.test(mytable3): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable3
## X-squared = 161.69, df = 3, p-value < 2.2e-16

We have a column, which denotes the status in numerical terms. 1=Charged Off, 2=Current, 3=Default, 4=Fully Paid

t.test(loan.df$status_numeric,loan.df$int_rate)
## 
##  Welch Two Sample t-test
## 
## data:  loan.df$status_numeric and loan.df$int_rate
## t = -51.929, df = 595.08, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.686873 -8.980862
## sample estimates:
## mean of x mean of y 
##  3.302605 12.636473
t.test(loan.df$status_numeric,loan.df$term_months)
## 
##  Welch Two Sample t-test
## 
## data:  loan.df$status_numeric and loan.df$term_months
## t = -82.635, df = 511.35, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -39.58048 -37.74216
## sample estimates:
## mean of x mean of y 
##  3.302605 41.963928
t.test(loan.df$status_numeric,loan.df$last_pymnt_amnt)
## 
##  Welch Two Sample t-test
## 
## data:  loan.df$status_numeric and loan.df$last_pymnt_amnt
## t = -13.384, df = 498, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3216.514 -2393.044
## sample estimates:
##   mean of x   mean of y 
##    3.302605 2808.081844
t.test(loan.df$status_numeric,loan.df$loan_amnt)
## 
##  Welch Two Sample t-test
## 
## data:  loan.df$status_numeric and loan.df$loan_amnt
## t = -37.931, df = 498, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12702.15 -11451.07
## sample estimates:
##    mean of x    mean of y 
##     3.302605 12079.909820

The column ownership uses numbers to denote home ownership. 1=mortgage, 2=own, 3=rent

t.test(loan.df$status_numeric,loan.df$ownership)
## 
##  Welch Two Sample t-test
## 
## data:  loan.df$status_numeric and loan.df$ownership
## t = 15.264, df = 934.48, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.9028668 1.1692775
## sample estimates:
## mean of x mean of y 
##  3.302605  2.266533
t.test(loan.df$status_numeric,loan.df$inq_last_6mths)
## 
##  Welch Two Sample t-test
## 
## data:  loan.df$status_numeric and loan.df$inq_last_6mths
## t = 35.648, df = 950.53, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.321658 2.592170
## sample estimates:
## mean of x mean of y 
## 3.3026052 0.8456914

The number of inquiries in past 6 months (excluding auto and mortgage inquiries) are very difficult to predict using the below linear regression model.

fit<-lm(inq_last_6mths~ownership+loan_amnt+int_rate+term_months+grade_num+annual_inc, data = loan.df)
summary(fit)
## 
## Call:
## lm(formula = inq_last_6mths ~ ownership + loan_amnt + int_rate + 
##     term_months + grade_num + annual_inc, data = loan.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4639 -0.7096 -0.1544  0.5267  3.8540 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  2.897e-01  3.086e-01   0.939   0.3482  
## ownership   -4.886e-02  4.671e-02  -1.046   0.2961  
## loan_amnt   -1.260e-05  7.077e-06  -1.781   0.0755 .
## int_rate     9.722e-02  4.005e-02   2.428   0.0156 *
## term_months -2.904e-03  5.120e-03  -0.567   0.5708  
## grade_num   -7.559e-02  1.223e-01  -0.618   0.5368  
## annual_inc  -1.639e-06  1.468e-06  -1.116   0.2648  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9307 on 492 degrees of freedom
## Multiple R-squared:  0.0758, Adjusted R-squared:  0.06453 
## F-statistic: 6.725 on 6 and 492 DF,  p-value: 7.355e-07

The loan status is also tough to decipher from this regression model, since it can explain only 18.5% of the variation.

fit<-lm(status_numeric~ownership+loan_amnt+int_rate+term_months+grade_num+annual_inc+delinq_2yrs+inq_last_6mths+last_pymnt_amnt+funded_amnt+out_prncp+total_rec_prncp+total_rec_late_fee+total_rec_int, data = loan.df)
summary(fit)
## 
## Call:
## lm(formula = status_numeric ~ ownership + loan_amnt + int_rate + 
##     term_months + grade_num + annual_inc + delinq_2yrs + inq_last_6mths + 
##     last_pymnt_amnt + funded_amnt + out_prncp + total_rec_prncp + 
##     total_rec_late_fee + total_rec_int, data = loan.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6268 -0.1358  0.1759  0.3499  2.5940 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.346e+00  2.763e-01  15.733  < 2e-16 ***
## ownership          -7.747e-02  3.477e-02  -2.228 0.026344 *  
## loan_amnt          -7.537e-06  1.794e-05  -0.420 0.674663    
## int_rate           -3.466e-02  2.974e-02  -1.165 0.244464    
## term_months        -1.340e-02  4.614e-03  -2.905 0.003840 ** 
## grade_num           8.857e-02  9.098e-02   0.973 0.330801    
## annual_inc          2.683e-07  1.103e-06   0.243 0.807860    
## delinq_2yrs        -5.124e-02  1.002e-01  -0.511 0.609353    
## inq_last_6mths     -4.432e-02  3.348e-02  -1.324 0.186211    
## last_pymnt_amnt     1.609e-05  1.011e-05   1.593 0.111902    
## funded_amnt        -2.037e-04  2.036e-05 -10.007  < 2e-16 ***
## out_prncp          -1.161e-04  3.446e-05  -3.370 0.000812 ***
## total_rec_prncp     2.348e-04  1.139e-05  20.619  < 2e-16 ***
## total_rec_late_fee -8.101e-03  6.602e-03  -1.227 0.220428    
## total_rec_int       3.186e-05  2.976e-05   1.071 0.284800    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6831 on 484 degrees of freedom
## Multiple R-squared:  0.6861, Adjusted R-squared:  0.677 
## F-statistic: 75.55 on 14 and 484 DF,  p-value: < 2.2e-16

While the regression model could predict 68.61% of the variation in loan status,a higher R-squared can be achieved with more exhaustive parameters. The key findings are as follow: 1) The total recovered principal, house ownership type, payment month terms and funded amount are the most important parameters which are most strongly correlated to the loan status. 2) The loan amount and annual income have the least p-value, showing they are least correlated. 3) The adjusted R-squared is 0.677, which is close to 0.6861, showing that the R-squared isn’t achieved by having a lot of unnecessary parameters.

The previous findings tell us that: 1) Higher interest rates and amount loaned increase the probability of defaulting. 2) People who charged off had the highest average income. 3) The grades are a good enough predictor of a person’s ability to fully pay the loan. 4) People who fully paid had a much higher last amount paid.