Analysis report of Lending Club loan

Introduction

The subject of this analysis report is to identify and quantify the associations between the interest rate and other variables as listed below. As the FICO score is industry standard, this report also attempt to identify the important of other variables after considering the applicant's FICO score. The data considered in this report are from 2,500 loan applicants issued through the Lending Club.

Methods & Analysis

At first, data is reviewed for any missing value. There are two observations that have missing data. They have same Loan purpose as 'other', and missing values in Open credit lines, Revolving credit balance, and Inquiries in the last 6 months; one of them has also missing values in Home ownership and Monthly income. As these number of missing data is very small to overall data. it would be safe to remove these data.

# Remove rows have NA value Number of 'NA' data rows is 2
data <- loansData[rowSums(is.na(loansData[, ])) == 0, ]

Second, some data columns are converted from percentage to numeric data. It will be helpful for linear regression analysis later.

attach(data)

# Convert factor into numeric class
data$Interest.Rate <- sapply(Interest.Rate, function(x) {
    as.numeric(sub("%", "", as.character(x)))
})
data$Debt.To.Income.Ratio <- sapply(Debt.To.Income.Ratio, function(x) {
    as.numeric(sub("%", "", as.character(x)))
})

# Calculate mean values of FICO range
data$FICO <- sapply(FICO.Range, function(x) {
    mean(sapply(strsplit(as.character(x), "-"), as.numeric))
})

We fit the regression model to find association interest rate to all other variables.

# Overview data association
lmAll <- lm(Interest.Rate ~ Amount.Requested + Amount.Funded.By.Investors + 
    as.factor(Loan.Length) + as.factor(Loan.Purpose) + Debt.To.Income.Ratio + 
    as.factor(State) + as.factor(Home.Ownership) + Monthly.Income + as.factor(FICO.Range) + 
    Open.CREDIT.Lines + Revolving.CREDIT.Balance + Inquiries.in.the.Last.6.Months + 
    as.factor(Employment.Length), data = data)

Based on their P-values, the summary of this model strongly indicates the association between interest rate and amount requested, amount funded, monthly income, FICO range, open credit lines, inquiries in the last 6 months.

Let's look at the association between interest rate and individual variables.

Amount request

lmRequest <- lm(Interest.Rate ~ Amount.Requested, data = data)
lmRequest$coefficients
##      (Intercept) Amount.Requested 
##        1.086e+01        1.777e-04

plot of chunk unnamed-chunk-7

The histogram plot below shows the high similarity between Amount requested and Amount funded by investors, we will only consider Amount requested.

plot of chunk unnamed-chunk-8

Monthly income

There are 3 outliers that have monthly income greater than $30,000 and they are removed off this model.

lmIncome <- lm(Interest.Rate[Monthly.Income < 30000] ~ Monthly.Income[Monthly.Income < 
    30000], data = data)
lmIncome$coefficients
##                            (Intercept) 
##                              1.278e+01 
## Monthly.Income[Monthly.Income < 30000] 
##                              5.269e-05

plot of chunk unnamed-chunk-10

Open credit lines

There are 4 outliers that have number of open credit lines greater than 30 and they are removed off this model.

lmCrLines <- lm(Interest.Rate[Open.CREDIT.Lines < 30] ~ Open.CREDIT.Lines[Open.CREDIT.Lines < 
    30], data = data)
lmCrLines$coefficients
##                               (Intercept) 
##                                  12.29041 
## Open.CREDIT.Lines[Open.CREDIT.Lines < 30] 
##                                   0.07691

plot of chunk unnamed-chunk-12

Inquiries in the last 6 months

lmCrInquiries <- lm(Interest.Rate ~ as.factor(Inquiries.in.the.Last.6.Months), 
    data = data)
lmCrInquiries$coefficients
##                                (Intercept) 
##                                    12.2402 
## as.factor(Inquiries.in.the.Last.6.Months)1 
##                                     1.4599 
## as.factor(Inquiries.in.the.Last.6.Months)2 
##                                     1.8004 
## as.factor(Inquiries.in.the.Last.6.Months)3 
##                                     2.4334 
## as.factor(Inquiries.in.the.Last.6.Months)4 
##                                     0.5320 
## as.factor(Inquiries.in.the.Last.6.Months)5 
##                                     0.8712 
## as.factor(Inquiries.in.the.Last.6.Months)6 
##                                     4.0185 
## as.factor(Inquiries.in.the.Last.6.Months)7 
##                                     0.5669 
## as.factor(Inquiries.in.the.Last.6.Months)8 
##                                     4.1448 
## as.factor(Inquiries.in.the.Last.6.Months)9 
##                                     3.0418

FICO range mean

Let's plot the graph between Interest rate and FICO range mean to visual the its association.

par(mfrow=c(1,2))
lmFICO <- lm(Interest.Rate ~ FICO, data = data)
plot(data$FICO, data$Interest.Rate, pch=20, col='#00ff0050', xlab='FICO range mean', ylab='Interest rate (%)')
abline(lmFICO, lwd=3, col='blue')
title("Plot between Interest rate \nand FICO range mean", cex.main=1)
plot(lmFICO$residuals, pch=20, col='#00ff0050', ylab='Residuals')
abline(c(0,0), lwd=2, col='blue')
title("Residuals of linear model", cex.main=1)

plot of chunk unnamed-chunk-14

lmFICO$coefficients
## (Intercept)        FICO 
##    73.00811    -0.08467

After comparing the slope of regression lines of FICO range mean and other variable, we can see strong influence between FICO score to the interest rate. However, if we look closer to each range, we can observe that the strong association between the score range is only for FICO score greater than 715.

lmFICO.Range <- lm(Interest.Rate ~ FICO.Range, data = data)
summary(lmFICO.Range)
## 
## Call:
## lm(formula = Interest.Rate ~ FICO.Range, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.147 -2.037 -0.493  1.668 10.403 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        15.2120     1.2696   11.98  < 2e-16 ***
## FICO.Range645-649  -0.3287     2.0732   -0.16  0.87405    
## FICO.Range650-654  -0.0820     3.1098   -0.03  0.97897    
## FICO.Range655-659  -0.2820     1.9044   -0.15  0.88229    
## FICO.Range660-664   3.2805     1.2947    2.53  0.01135 *  
## FICO.Range665-669   2.2361     1.2913    1.73  0.08346 .  
## FICO.Range670-674   1.0365     1.2880    0.80  0.42106    
## FICO.Range675-679   0.6427     1.2886    0.50  0.61798    
## FICO.Range680-684  -0.0853     1.2896   -0.07  0.94726    
## FICO.Range685-689  -0.5252     1.2925   -0.41  0.68453    
## FICO.Range690-694  -0.4799     1.2920   -0.37  0.71034    
## FICO.Range695-699  -1.0651     1.2902   -0.83  0.40912    
## FICO.Range700-704  -1.8551     1.2936   -1.43  0.15167    
## FICO.Range705-709  -2.5521     1.2930   -1.97  0.04852 *  
## FICO.Range710-714  -2.7789     1.2976   -2.14  0.03233 *  
## FICO.Range715-719  -4.0301     1.3033   -3.09  0.00201 ** 
## FICO.Range720-724  -4.1762     1.2971   -3.22  0.00130 ** 
## FICO.Range725-729  -4.5590     1.3029   -3.50  0.00048 ***
## FICO.Range730-734  -5.2558     1.3029   -4.03  5.7e-05 ***
## FICO.Range735-739  -5.5885     1.3175   -4.24  2.3e-05 ***
## FICO.Range740-744  -5.6197     1.3281   -4.23  2.4e-05 ***
## FICO.Range745-749  -5.3103     1.3271   -4.00  6.5e-05 ***
## FICO.Range750-754  -6.7443     1.3206   -5.11  3.5e-07 ***
## FICO.Range755-759  -6.2159     1.3368   -4.65  3.5e-06 ***
## FICO.Range760-764  -6.5844     1.3368   -4.93  9.0e-07 ***
## FICO.Range765-769  -7.4287     1.3549   -5.48  4.6e-08 ***
## FICO.Range770-774  -8.4591     1.4443   -5.86  5.3e-09 ***
## FICO.Range775-779  -6.4679     1.4065   -4.60  4.5e-06 ***
## FICO.Range780-784  -7.6227     1.3783   -5.53  3.5e-08 ***
## FICO.Range785-789  -6.7141     1.4269   -4.71  2.7e-06 ***
## FICO.Range790-794  -7.6515     1.4194   -5.39  7.7e-08 ***
## FICO.Range795-799  -6.8205     1.4939   -4.57  5.2e-06 ***
## FICO.Range800-804  -7.5562     1.5111   -5.00  6.1e-07 ***
## FICO.Range805-809  -7.7856     1.5312   -5.08  4.0e-07 ***
## FICO.Range810-814  -6.9183     1.6184   -4.27  2.0e-05 ***
## FICO.Range815-819  -8.2870     1.7190   -4.82  1.5e-06 ***
## FICO.Range820-824  -7.3120     3.1098   -2.35  0.01879 *  
## FICO.Range830-834  -7.5920     3.1098   -2.44  0.01470 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 2.84 on 2460 degrees of freedom
## Multiple R-squared: 0.545,   Adjusted R-squared: 0.538 
## F-statistic: 79.7 on 37 and 2460 DF,  p-value: <2e-16

I believe there are affection of other variables to FICO score. However, the simple linear models between FICO range, other variable, and their interactive does not show it clearly, so I won't include it in this report. With further exploratory, we will see the affection more clearly.

Conclusions

The linear model of Interest range and FICO range mean indicates the most significant association. Other variables affect the Interest rate are amount requested, monthly income, open credit lines, inquiries in the last 6 months. Further exploratory is necessary to reveal the affection between other variables to FICO score.