1. Explanation

About Loan Data

This data set is about loan data from LendingClub.com that publish for comercial in Kaggle. LendingClub.com is platform that connect people who need money (borrower) and who have money (investor). As an investor, we need to manage our risk. That’s why our priority is to determine good criteria that have high probility of paying us back. Other that we can find another criteria for borrower that we avoid.

Data Column

Data set contain : credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.

purpose: The purpose of the loan (takes values “creditcard”, “debtconsolidation”, “educational”, “majorpurchase”, “smallbusiness”, and “all_other”).

int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.

installment: The monthly installments owed by the borrower if the loan is funded.

log.annual.inc: The natural log of the self-reported annual income of the borrower.

dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).

fico: The FICO credit score of the borrower.

days.with.cr.line: The number of days the borrower has had a credit line.

revol.bal: The borrower’s revolving balance (amount unpaid at the end of the credit card billing cycle).

revol.util: The borrower’s revolving line utilization rate (the amount of the credit line used relative to total credit available).

inq.last.6mths: The borrower’s number of inquiries by creditors in the last 6 months.

delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.

pub.rec: The borrower’s number of derogatory public records (bankruptcy filings, tax liens, or judgments).

not.fully.paid : The borrower can’t paid their loan.

2. Open and Checking Data

loan <- read.csv("loan_data.csv")
head(loan)
anyNA(loan)
## [1] FALSE

not find missing value before cleansing

Cleansing Data

Checking the data type’s

str(loan)
## 'data.frame':    9578 obs. of  14 variables:
##  $ credit.policy    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ purpose          : chr  "debt_consolidation" "credit_card" "debt_consolidation" "debt_consolidation" ...
##  $ int.rate         : num  0.119 0.107 0.136 0.101 0.143 ...
##  $ installment      : num  829 228 367 162 103 ...
##  $ log.annual.inc   : num  11.4 11.1 10.4 11.4 11.3 ...
##  $ dti              : num  19.5 14.3 11.6 8.1 15 ...
##  $ fico             : int  737 707 682 712 667 727 667 722 682 707 ...
##  $ days.with.cr.line: num  5640 2760 4710 2700 4066 ...
##  $ revol.bal        : int  28854 33623 3511 33667 4740 50807 3839 24220 69909 5630 ...
##  $ revol.util       : num  52.1 76.7 25.6 73.2 39.5 51 76.8 68.6 51.1 23 ...
##  $ inq.last.6mths   : int  0 0 1 1 0 0 0 0 1 1 ...
##  $ delinq.2yrs      : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ pub.rec          : int  0 0 0 0 0 0 1 0 0 0 ...
##  $ not.fully.paid   : int  0 0 0 0 0 0 1 1 0 0 ...

Loan data contain 9578 row

Change

credit.policy, pupose, inq.last.6mths, delinq.2yrs, pub.rec, and not.fully.paid as factor

loan[,c("credit.policy", "purpose", "inq.last.6mths", "delinq.2yrs", "pub.rec", "not.fully.paid")] <- lapply(loan[,c("credit.policy", "purpose", "inq.last.6mths", "delinq.2yrs", "pub.rec", "not.fully.paid")], as.factor)
str(loan)
## 'data.frame':    9578 obs. of  14 variables:
##  $ credit.policy    : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
##  $ purpose          : Factor w/ 7 levels "all_other","credit_card",..: 3 2 3 3 2 2 3 1 5 3 ...
##  $ int.rate         : num  0.119 0.107 0.136 0.101 0.143 ...
##  $ installment      : num  829 228 367 162 103 ...
##  $ log.annual.inc   : num  11.4 11.1 10.4 11.4 11.3 ...
##  $ dti              : num  19.5 14.3 11.6 8.1 15 ...
##  $ fico             : int  737 707 682 712 667 727 667 722 682 707 ...
##  $ days.with.cr.line: num  5640 2760 4710 2700 4066 ...
##  $ revol.bal        : int  28854 33623 3511 33667 4740 50807 3839 24220 69909 5630 ...
##  $ revol.util       : num  52.1 76.7 25.6 73.2 39.5 51 76.8 68.6 51.1 23 ...
##  $ inq.last.6mths   : Factor w/ 28 levels "0","1","2","3",..: 1 1 2 2 1 1 1 1 2 2 ...
##  $ delinq.2yrs      : Factor w/ 11 levels "0","1","2","3",..: 1 1 1 1 2 1 1 1 1 1 ...
##  $ pub.rec          : Factor w/ 6 levels "0","1","2","3",..: 1 1 1 1 1 1 2 1 1 1 ...
##  $ not.fully.paid   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 1 1 ...
anyNA(loan)
## [1] FALSE

Not find missing value after cleansing

head(loan)

3. Data Explanation

We can read the summary of loan data

summary(loan)
##  credit.policy               purpose        int.rate       installment    
##  0:1868        all_other         :2331   Min.   :0.0600   Min.   : 15.67  
##  1:7710        credit_card       :1262   1st Qu.:0.1039   1st Qu.:163.77  
##                debt_consolidation:3957   Median :0.1221   Median :268.95  
##                educational       : 343   Mean   :0.1226   Mean   :319.09  
##                home_improvement  : 629   3rd Qu.:0.1407   3rd Qu.:432.76  
##                major_purchase    : 437   Max.   :0.2164   Max.   :940.14  
##                small_business    : 619                                    
##  log.annual.inc        dti              fico       days.with.cr.line
##  Min.   : 7.548   Min.   : 0.000   Min.   :612.0   Min.   :  179    
##  1st Qu.:10.558   1st Qu.: 7.213   1st Qu.:682.0   1st Qu.: 2820    
##  Median :10.929   Median :12.665   Median :707.0   Median : 4140    
##  Mean   :10.932   Mean   :12.607   Mean   :710.8   Mean   : 4561    
##  3rd Qu.:11.291   3rd Qu.:17.950   3rd Qu.:737.0   3rd Qu.: 5730    
##  Max.   :14.528   Max.   :29.960   Max.   :827.0   Max.   :17640    
##                                                                     
##    revol.bal         revol.util    inq.last.6mths  delinq.2yrs   pub.rec 
##  Min.   :      0   Min.   :  0.0   0      :3637   0      :8458   0:9019  
##  1st Qu.:   3187   1st Qu.: 22.6   1      :2462   1      : 832   1: 533  
##  Median :   8596   Median : 46.3   2      :1384   2      : 192   2:  19  
##  Mean   :  16914   Mean   : 46.8   3      : 864   3      :  65   3:   5  
##  3rd Qu.:  18250   3rd Qu.: 70.9   4      : 475   4      :  19   4:   1  
##  Max.   :1207359   Max.   :119.0   5      : 278   5      :   6   5:   1  
##                                    (Other): 478   (Other):   6           
##  not.fully.paid
##  0:8045        
##  1:1533        
##                
##                
##                
##                
## 

Summary :

1.From 9578 borrower, 7710 borrower had approval for criteria policy from LendingClub.com

2.The most purpose of loan from borrower is debt consolidation

3.The intrest rate is between 0.06 (the lower risk) and 0.2164 (the highest risk)

4.Lowest installment is 15.67 and the highest installment is 940.14

5.Debt devide by annual income mean is 12.607 point

6.Fico score mean is 710.8

7.The borrower days line’s to pay their loan is ussually between 2820 - 5730 days (Q1-Q3)

8.The borrower’s remaining payment ussually between 3187-18250

9.Some of borrower use more than available total credit amount more than 100 %

10.Many borrower never late paying installment

11.Many borrower have a good record (dont have public derogatory records)

12.From 9578 borrower, 8045 fully paid their loan

4. Data Manipulation

From this summary we want to determine our criteria for borrower.

1.We want use borrower that have credit policy from LendingClub.com.

2.Our borrower must fully piad their last loan

3.Never late paying their installment (before 31 days)

4.Never have a derogatory public records (bankruptcy filings, tax liens, or judgments).

loan_good <- loan[loan$credit.policy == 1 & loan$not.fully.paid == 0 & loan$delinq.2yrs == 0 & loan$pub.rec %in% 0,]
head(loan_good)
dim(loan_good)
## [1] 5695   14
table(loan_good$purpose)
## 
##          all_other        credit_card debt_consolidation        educational 
##               1357                796               2359                181 
##   home_improvement     major_purchase     small_business 
##                381                294                327

Back to the column interest rate of the loan says Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates. Question : How much the median of interest rate of the loan? (we want to know it because at that point we have many borrower (50%) type with low risk)

quantile(loan_good$int.rate)
##     0%    25%    50%    75%   100% 
## 0.0600 0.0932 0.1166 0.1322 0.2121

Half of all borrower have good credit policy is in under mean (50%) int.rate = 0.1166. So we still have many option and have a low risk.

boxplot(loan_good$fico)

loan_g_r <- loan_good[loan_good$int.rate <= 0.1166,]
dim(loan_g_r)
## [1] 2863   14

some source says that a good fico score is more than 740. Question : 1. How much the range of Fico point? 2. That We have a good Fico point on borrower group that good policy criteri from LendingClub.com and have low interest rate of the loan?

quantile(loan_g_r$fico)
##   0%  25%  50%  75% 100% 
##  662  722  747  767  827

Answer : 1. Range for this group is 662-827. half of them have a good fico score (>740) 2. A half of this group have a good Fico point (more than 740) and the median is 747

To make sure this answer we want to see how the borrower finished their loan or installment on their days credit line.

plot(loan_g_r$revol.bal, loan_g_r$days.with.cr.line)

At this group we know that many borrower have finished to pay their loan before their days credit line done.

boxplot(loan_g_r$installment)

quantile(loan_g_r$installment)
##     0%    25%    50%    75%   100% 
##  15.69 155.38 231.08 347.72 826.31
Max_box = 347.72 + ((347.72 - 155.38)*1.5)
Max_box
## [1] 636.23

As an Investor, we want to get high return from our invest. So we will see group of borrower that have high installments (more the max boxplot)

loan_g_i <- loan_g_r[loan_g_r$installment >= Max_box,]
head(loan_g_i)
plot(loan_g_i$revol.bal, loan_g_i$days.with.cr.line)

From this plot we can know that with high installment, most of them can paid fully their loan (revol.bal point at 0). This borrower group have a good financial.

Who is the most purpose at that group and how much they have the unpaid loan?

xtabs(revol.bal ~ purpose, loan_g_i)
## purpose
##          all_other        credit_card debt_consolidation        educational 
##             351590             686406            1084834                  0 
##   home_improvement     major_purchase     small_business 
##             327173              84936             146660
table(loan_g_i$purpose)
## 
##          all_other        credit_card debt_consolidation        educational 
##                 25                 19                 57                  0 
##   home_improvement     major_purchase     small_business 
##                 27                  6                 12

The highest total unpaid loan and purpose is from debt consolidation.

aggregate(fico ~ purpose, loan_g_i, FUN = mean)

5. Summary

From loan data set from LendingClub.com, most of the borrower in this website have a good record. We know the good record from approval credit policy by LendingClub.com and dont have deragotary record. At this point we want small group with a good criteria to be borrower to help investor to decide their invest in this website. After we determine variable from data set, we can get some criteria for good borrower.

  • Fico score minimal at 662, better if the borrower have 747 fico score (median of good borrorwer’s group)
  • Interest rate of loan’s score less than 0.1166 still have many option and we can use that score for investor standard
  • At that dico socre and interest rate of loan’s score standard we still have many option in debt consolidation, home improvment, and credit card purpose.