Part 1 - Introduction
Lending Club (LC) is a peer to peer lending platform that allows borrowers to put in loan requests and lets investors fund those loans. Borrowers can ask to up to $40K and according to LC they can be funded in days. Based on borrowers’ information (income, credit history, debt, etc.) their loan is given a score from A (best) to G (worst). Investors can use this score to decide if they would like to help fund that loan or not.
In this study we are going to look at if the average amount funded differs for each grade. Just like bank loans, LC loans have higher interest payments when the loan grade worsens. With that being known should a borrower ask for more or less money with a good score? If there is no difference in amount funded by grade should a borrower with a grade of G even worry about the grade? We will try to answer that here.
Part 2 - Data
Data Collection
LC posts all it’s funded and reject loan information on it’s site here: https://www.lendingclub.com/info/download-data.action For this study we will look at the funded information which contains details about the borrower. Details including annual income, state, job, loan amount, etc.
Cases
We will take a sample of 1,000 loans from 2014
Variables
We will focus on the fields grade and funded_amt_inv, which is the amount that investors have given to the loan.
Grade: The explanatory variable that is categorical
Funded Amount: The response variable that is numerical
Type of Study and Scope of Inference
This will be an observational study so we will not be able to establish casual links between the variables. LC lends to all of the U.S. so our population of interest could be anyone looking to loan or lend money in the United States.
library(ggplot2)
library(dplyr)
library(DT)loangrades <- read.csv ("loangrades.csv",header=TRUE, sep=",", stringsAsFactors=FALSE)
loangrades <- as.data.frame(loangrades)
datatable(head(loangrades,10))Part 3 - Exploratory Data Analysis
Loan Grade Summary
summary(loangrades)## loan_amnt funded_amnt funded_amnt_inv grade
## Min. : 1000 Min. : 1000 Min. : 0 Length:1000
## 1st Qu.: 6000 1st Qu.: 5900 1st Qu.: 5000 Class :character
## Median : 9900 Median : 9600 Median : 8991 Mode :character
## Mean :11188 Mean :10900 Mean :10327
## 3rd Qu.:15000 3rd Qu.:15000 3rd Qu.:14000
## Max. :35000 Max. :35000 Max. :35000
Looking at the summary of the loangrades table we see that the mean of the funded amount by investors (funded_amnt_inv) is $10.3K. That’s less than the mean of the total loan funded amount (funded_amnt) which is $10.9K and less than the average requested amount (loan_amnt) which is $11.2K.
loangrades_grp <- loangrades %>%
group_by(grade)
loangrades_grp <- summarise(
loangrades_grp,
loans = n(),
mean_funded = mean(funded_amnt_inv),
sd_funded = sd(funded_amnt_inv),
max_funded = max(funded_amnt_inv),
mean_funding_diff = mean(loan_amnt-funded_amnt_inv)
)
datatable(loangrades_grp)When we group the loans by grade we see that the group with the most loans is A while G has the least. When comparing the amount funded to the amount requested G grade loans have the lowest difference.
ggplot(loangrades,aes(x=funded_amnt_inv)) + geom_histogram(aes(fill=grade),bins=5) + ggtitle("Funded Amount Histogram")Our histogram shows us that most loans are funded around $10K.
Part 4 - Inference
Hypothesis
\[{ H }_{ 0 }:\quad { \mu }_{ A }={ \mu }_{ B }={ \mu }_{ C }={ \mu }_{ D }={ \mu }_{ E }={ \mu }_{ F }={ \mu }_{ G }\quad \] H0: There is no difference in the average funded amounts between loan grades
HA: There is a difference in the average funded amounts between loan grades
Check conditions for ANOVA
ggplot(loangrades,aes(x=grade,y=funded_amnt_inv)) + geom_boxplot(aes(fill=grade)) + xlab("") + theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())With all these observations being independent and the distributions being close to normal the conditions are met for the ANOVA test.
ANOVA Test
anova <- aov(loangrades$funded_amnt_inv ~ loangrades$grade)
summary(anova)## Df Sum Sq Mean Sq F value Pr(>F)
## loangrades$grade 6 3.938e+09 656266473 14.48 6.88e-16 ***
## Residuals 993 4.500e+10 45319714
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a low p value we can reject H0 and conclude that there are differences in the average funded amount by loan grade.
Tukey Honest Significant Difference
thsd <- TukeyHSD(anova, ordered=TRUE)
thsd## Tukey multiple comparisons of means
## 95% family-wise confidence level
## factor levels have been ordered
##
## Fit: aov(formula = loangrades$funded_amnt_inv ~ loangrades$grade)
##
## $`loangrades$grade`
## diff lwr upr p adj
## B-A 1885.05254 240.02845 3530.077 0.0130194
## C-A 1913.34057 14.88406 3811.797 0.0467810
## D-A 3527.17222 1433.26487 5621.080 0.0000157
## E-A 5875.03106 3129.12666 8620.935 0.0000000
## F-A 7186.80087 2682.25198 11691.350 0.0000569
## G-A 13899.04127 6285.61364 21512.469 0.0000018
## C-B 28.28802 -1816.06539 1872.641 1.0000000
## D-B 1642.11968 -402.86209 3687.101 0.2114425
## E-B 3989.97852 1281.19772 6698.759 0.0002992
## F-B 5301.74833 819.73281 9783.764 0.0089409
## G-B 12013.98872 4413.87145 19614.106 0.0000699
## D-C 1613.83165 -640.05016 3867.713 0.3442363
## E-C 3961.69049 1091.92983 6831.451 0.0009607
## F-C 5273.46030 692.35826 9854.562 0.0123568
## G-C 11985.70070 4326.73109 19644.670 0.0000871
## E-D 2347.85884 -654.77952 5350.497 0.2400224
## F-D 3659.62865 -1005.86241 8325.120 0.2365462
## G-D 10371.86905 2662.12675 18081.611 0.0014680
## F-E 1311.76981 -3680.45914 6303.999 0.9871724
## G-E 8024.01020 112.26771 15935.753 0.0443046
## G-F 6712.24040 -1968.00378 15392.485 0.2524488
Using Tukey’s Honest Significant Difference method we see the differences in average funded amount by loan grade. The p values can be used to see the range of differences between two groups.
Part 5 - Conclusion
It’s safe to say that the grade of a Lending Club loan will have an effect on the amount investors will fund. Knowing that lower grade loans will yield a higher return on their investment investors may be more inclined to give more funds to them. However, investors must keep things like charge off rates in mind when making these decisions.
Lending Club’s data is very interesting and useful. With previous loans’ performance available to investors many more tests like the one done here can help investors know what to look for before helping borrowers. Test looking at certain locations and income levels may help someone find the perfect area to invest in. Using statistics and probability in addition to data analysis we can not only help others looking for funding but make good investments ourselves.