Research question
Lending Club (LC) is a P2P lendging company that allows investors to pick who they want to lend to. In this study we will see if the grade of a loan has any affect of how much it has been funded.
Cases
Each case represents a loan from LC Q1 2017 with a grade from A (best) to G (worst). We will choose to look at 1000 randomly selected loans.
Data collection
Since 2007 LC has posted their loan and rejected applicaiton information online. It shows data around funding, rates, merchant information, etc. LC posts loan data on it’s site at: https://www.lendingclub.com/info/download-data.action
Type of study
This will be an observational study.
Data Source
LC posts loan data on it’s site at: https://www.lendingclub.com/info/download-data.action
Response and Explanatory
Average funded amount (funded_amnt_inv) is the response variable which is numerical and loan grade is the explanatory variable which is categorical.
Relevant summary statistics
library(ggplot2)
library(dplyr)Load File
Grades
Loan Amount: Amount Applied
Funded Amount: Amount Funded
Funded Amount Inv: Amount Funded by Investors
loangrades <- read.csv ("loangrades.csv",header=TRUE, sep=",", stringsAsFactors=FALSE)
loangrades <- as.data.frame(loangrades)
summary(loangrades)## loan_amnt funded_amnt funded_amnt_inv grade
## Min. : 1000 Min. : 1000 Min. : 0 Length:1000
## 1st Qu.: 6000 1st Qu.: 5900 1st Qu.: 5000 Class :character
## Median : 9900 Median : 9600 Median : 8991 Mode :character
## Mean :11188 Mean :10900 Mean :10327
## 3rd Qu.:15000 3rd Qu.:15000 3rd Qu.:14000
## Max. :35000 Max. :35000 Max. :35000
Box Plot Funded Amount Inv by Loan Grade
ggplot(loangrades,aes(x=grade,y=funded_amnt_inv)) + geom_boxplot()Summarize Data
By loan grade we’ll show the count of loans, average funded amount by investors, the standard deviation of funded amount by investors, the max funded amount by investors, and the average difference between amount applied for and the amount funded by investors.
loangrades <- loangrades %>%
group_by(grade)
loangrades <- summarise(
loangrades,
collisions = n(),
mean_funded = mean(funded_amnt_inv),
sd_funded = sd(funded_amnt_inv),
max_funded = max(funded_amnt_inv),
funding_diff = mean(loan_amnt-funded_amnt_inv)
)
loangrades## # A tibble: 7 × 6
## grade collisions mean_funded sd_funded max_funded funding_diff
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 A 272 8273.469 5006.255 34950.00 366.05314
## 2 B 316 10158.521 6647.293 33386.40 870.67158
## 3 C 184 10186.809 7626.370 35000.00 1183.43508
## 4 D 135 11800.641 6801.370 34975.00 926.58109
## 5 E 65 14148.500 9306.285 35000.00 1246.88464
## 6 F 21 15460.270 8608.370 31522.87 2945.68259
## 7 G 7 22172.510 8412.548 31813.20 84.63267