This data set is about loan data from LendingClub.com that publish for comercial in Kaggle. LendingClub.com is platform that connect people who need money (borrower) and who have money (investor). As an investor, we need to manage our risk. That’s why our priority is to determine good criteria that have high probility of paying us back. Other that we can find another criteria for borrower that we avoid.
Data set contain : credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
purpose: The purpose of the loan (takes values “creditcard”, “debtconsolidation”, “educational”, “majorpurchase”, “smallbusiness”, and “all_other”).
int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.
installment: The monthly installments owed by the borrower if the loan is funded.
log.annual.inc: The natural log of the self-reported annual income of the borrower.
dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).
fico: The FICO credit score of the borrower.
days.with.cr.line: The number of days the borrower has had a credit line.
revol.bal: The borrower’s revolving balance (amount unpaid at the end of the credit card billing cycle).
revol.util: The borrower’s revolving line utilization rate (the amount of the credit line used relative to total credit available).
inq.last.6mths: The borrower’s number of inquiries by creditors in the last 6 months.
delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
pub.rec: The borrower’s number of derogatory public records (bankruptcy filings, tax liens, or judgments).
not.fully.paid : The borrower can’t paid their loan.
loan <- read.csv("loan_data.csv")
head(loan)anyNA(loan)## [1] FALSE
not find missing value before cleansing
Checking the data type’s
str(loan)## 'data.frame': 9578 obs. of 14 variables:
## $ credit.policy : int 1 1 1 1 1 1 1 1 1 1 ...
## $ purpose : chr "debt_consolidation" "credit_card" "debt_consolidation" "debt_consolidation" ...
## $ int.rate : num 0.119 0.107 0.136 0.101 0.143 ...
## $ installment : num 829 228 367 162 103 ...
## $ log.annual.inc : num 11.4 11.1 10.4 11.4 11.3 ...
## $ dti : num 19.5 14.3 11.6 8.1 15 ...
## $ fico : int 737 707 682 712 667 727 667 722 682 707 ...
## $ days.with.cr.line: num 5640 2760 4710 2700 4066 ...
## $ revol.bal : int 28854 33623 3511 33667 4740 50807 3839 24220 69909 5630 ...
## $ revol.util : num 52.1 76.7 25.6 73.2 39.5 51 76.8 68.6 51.1 23 ...
## $ inq.last.6mths : int 0 0 1 1 0 0 0 0 1 1 ...
## $ delinq.2yrs : int 0 0 0 0 1 0 0 0 0 0 ...
## $ pub.rec : int 0 0 0 0 0 0 1 0 0 0 ...
## $ not.fully.paid : int 0 0 0 0 0 0 1 1 0 0 ...
Loan data contain 9578 row
credit.policy, pupose, inq.last.6mths, delinq.2yrs, pub.rec, and not.fully.paid as factor
loan[,c("credit.policy", "purpose", "inq.last.6mths", "delinq.2yrs", "pub.rec", "not.fully.paid")] <- lapply(loan[,c("credit.policy", "purpose", "inq.last.6mths", "delinq.2yrs", "pub.rec", "not.fully.paid")], as.factor)
str(loan)## 'data.frame': 9578 obs. of 14 variables:
## $ credit.policy : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ purpose : Factor w/ 7 levels "all_other","credit_card",..: 3 2 3 3 2 2 3 1 5 3 ...
## $ int.rate : num 0.119 0.107 0.136 0.101 0.143 ...
## $ installment : num 829 228 367 162 103 ...
## $ log.annual.inc : num 11.4 11.1 10.4 11.4 11.3 ...
## $ dti : num 19.5 14.3 11.6 8.1 15 ...
## $ fico : int 737 707 682 712 667 727 667 722 682 707 ...
## $ days.with.cr.line: num 5640 2760 4710 2700 4066 ...
## $ revol.bal : int 28854 33623 3511 33667 4740 50807 3839 24220 69909 5630 ...
## $ revol.util : num 52.1 76.7 25.6 73.2 39.5 51 76.8 68.6 51.1 23 ...
## $ inq.last.6mths : Factor w/ 28 levels "0","1","2","3",..: 1 1 2 2 1 1 1 1 2 2 ...
## $ delinq.2yrs : Factor w/ 11 levels "0","1","2","3",..: 1 1 1 1 2 1 1 1 1 1 ...
## $ pub.rec : Factor w/ 6 levels "0","1","2","3",..: 1 1 1 1 1 1 2 1 1 1 ...
## $ not.fully.paid : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 1 1 ...
anyNA(loan)## [1] FALSE
Not find missing value after cleansing
head(loan)We can read the summary of loan data
summary(loan)## credit.policy purpose int.rate installment
## 0:1868 all_other :2331 Min. :0.0600 Min. : 15.67
## 1:7710 credit_card :1262 1st Qu.:0.1039 1st Qu.:163.77
## debt_consolidation:3957 Median :0.1221 Median :268.95
## educational : 343 Mean :0.1226 Mean :319.09
## home_improvement : 629 3rd Qu.:0.1407 3rd Qu.:432.76
## major_purchase : 437 Max. :0.2164 Max. :940.14
## small_business : 619
## log.annual.inc dti fico days.with.cr.line
## Min. : 7.548 Min. : 0.000 Min. :612.0 Min. : 179
## 1st Qu.:10.558 1st Qu.: 7.213 1st Qu.:682.0 1st Qu.: 2820
## Median :10.929 Median :12.665 Median :707.0 Median : 4140
## Mean :10.932 Mean :12.607 Mean :710.8 Mean : 4561
## 3rd Qu.:11.291 3rd Qu.:17.950 3rd Qu.:737.0 3rd Qu.: 5730
## Max. :14.528 Max. :29.960 Max. :827.0 Max. :17640
##
## revol.bal revol.util inq.last.6mths delinq.2yrs pub.rec
## Min. : 0 Min. : 0.0 0 :3637 0 :8458 0:9019
## 1st Qu.: 3187 1st Qu.: 22.6 1 :2462 1 : 832 1: 533
## Median : 8596 Median : 46.3 2 :1384 2 : 192 2: 19
## Mean : 16914 Mean : 46.8 3 : 864 3 : 65 3: 5
## 3rd Qu.: 18250 3rd Qu.: 70.9 4 : 475 4 : 19 4: 1
## Max. :1207359 Max. :119.0 5 : 278 5 : 6 5: 1
## (Other): 478 (Other): 6
## not.fully.paid
## 0:8045
## 1:1533
##
##
##
##
##
Summary :
1.From 9578 borrower, 7710 borrower had approval for criteria policy from LendingClub.com
2.The most purpose of loan from borrower is debt consolidation
3.The intrest rate is between 0.06 (the lower risk) and 0.2164 (the highest risk)
4.Lowest installment is 15.67 and the highest installment is 940.14
5.Debt devide by annual income mean is 12.607 point
6.Fico score mean is 710.8
7.The borrower days line’s to pay their loan is ussually between 2820 - 5730 days (Q1-Q3)
8.The borrower’s remaining payment ussually between 3187-18250
9.Some of borrower use more than available total credit amount more than 100 %
10.Many borrower never late paying installment
11.Many borrower have a good record (dont have public derogatory records)
12.From 9578 borrower, 8045 fully paid their loan
From this summary we want to determine our criteria for borrower.
1.We want use borrower that have credit policy from LendingClub.com.
2.Our borrower must fully piad their last loan
3.Never late paying their installment (before 31 days)
4.Never have a derogatory public records (bankruptcy filings, tax liens, or judgments).
loan_good <- loan[loan$credit.policy == 1 & loan$not.fully.paid == 0 & loan$delinq.2yrs == 0 & loan$pub.rec %in% 0,]
head(loan_good)dim(loan_good)## [1] 5695 14
table(loan_good$purpose)##
## all_other credit_card debt_consolidation educational
## 1357 796 2359 181
## home_improvement major_purchase small_business
## 381 294 327
Back to the column interest rate of the loan says Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates. Question : How much the median of interest rate of the loan? (we want to know it because at that point we have many borrower (50%) type with low risk)
quantile(loan_good$int.rate)## 0% 25% 50% 75% 100%
## 0.0600 0.0932 0.1166 0.1322 0.2121
Half of all borrower have good credit policy is in under mean (50%) int.rate = 0.1166. So we still have many option and have a low risk.
boxplot(loan_good$fico)loan_g_r <- loan_good[loan_good$int.rate <= 0.1166,]
dim(loan_g_r)## [1] 2863 14
some source says that a good fico score is more than 740. Question : 1. How much the range of Fico point? 2. That We have a good Fico point on borrower group that good policy criteri from LendingClub.com and have low interest rate of the loan?
quantile(loan_g_r$fico)## 0% 25% 50% 75% 100%
## 662 722 747 767 827
Answer : 1. Range for this group is 662-827. half of them have a good fico score (>740) 2. A half of this group have a good Fico point (more than 740) and the median is 747
To make sure this answer we want to see how the borrower finished their loan or installment on their days credit line.
plot(loan_g_r$revol.bal, loan_g_r$days.with.cr.line)At this group we know that many borrower have finished to pay their loan before their days credit line done.
boxplot(loan_g_r$installment)quantile(loan_g_r$installment)## 0% 25% 50% 75% 100%
## 15.69 155.38 231.08 347.72 826.31
Max_box = 347.72 + ((347.72 - 155.38)*1.5)
Max_box## [1] 636.23
As an Investor, we want to get high return from our invest. So we will see group of borrower that have high installments (more the max boxplot)
loan_g_i <- loan_g_r[loan_g_r$installment >= Max_box,]
head(loan_g_i)plot(loan_g_i$revol.bal, loan_g_i$days.with.cr.line)From this plot we can know that with high installment, most of them can paid fully their loan (revol.bal point at 0). This borrower group have a good financial.
Who is the most purpose at that group and how much they have the unpaid loan?
xtabs(revol.bal ~ purpose, loan_g_i)## purpose
## all_other credit_card debt_consolidation educational
## 351590 686406 1084834 0
## home_improvement major_purchase small_business
## 327173 84936 146660
table(loan_g_i$purpose)##
## all_other credit_card debt_consolidation educational
## 25 19 57 0
## home_improvement major_purchase small_business
## 27 6 12
The highest total unpaid loan and purpose is from debt consolidation.
aggregate(fico ~ purpose, loan_g_i, FUN = mean)From loan data set from LendingClub.com, most of the borrower in this website have a good record. We know the good record from approval credit policy by LendingClub.com and dont have deragotary record. At this point we want small group with a good criteria to be borrower to help investor to decide their invest in this website. After we determine variable from data set, we can get some criteria for good borrower.