Data 606 Project Proposal

David Quarshie

October 28, 2017

Research question

Lending Club (LC) is a P2P lendging company that allows investors to pick who they want to lend to. In this study we will see if the grade of a loan has any affect of how much it has been funded.

Cases

Each case represents a loan from LC Q1 2017 with a grade from A (best) to G (worst). We will choose to look at 1000 randomly selected loans.

Data collection

Since 2007 LC has posted their loan and rejected applicaiton information online. It shows data around funding, rates, merchant information, etc. LC posts loan data on it’s site at: https://www.lendingclub.com/info/download-data.action

Type of study

This will be an observational study.

Data Source

LC posts loan data on it’s site at: https://www.lendingclub.com/info/download-data.action

Response and Explanatory

Average funded amount (funded_amnt_inv) is the response variable which is numerical and loan grade is the explanatory variable which is categorical.

Relevant summary statistics

library(ggplot2)
library(dplyr)

Load File

Grades
Loan Amount: Amount Applied
Funded Amount: Amount Funded
Funded Amount Inv: Amount Funded by Investors

loangrades <- read.csv ("loangrades.csv",header=TRUE, sep=",", stringsAsFactors=FALSE)

loangrades <- as.data.frame(loangrades) 

summary(loangrades)
##    loan_amnt      funded_amnt    funded_amnt_inv    grade          
##  Min.   : 1000   Min.   : 1000   Min.   :    0   Length:1000       
##  1st Qu.: 6000   1st Qu.: 5900   1st Qu.: 5000   Class :character  
##  Median : 9900   Median : 9600   Median : 8991   Mode  :character  
##  Mean   :11188   Mean   :10900   Mean   :10327                     
##  3rd Qu.:15000   3rd Qu.:15000   3rd Qu.:14000                     
##  Max.   :35000   Max.   :35000   Max.   :35000

Box Plot Funded Amount Inv by Loan Grade

ggplot(loangrades,aes(x=grade,y=funded_amnt_inv)) + geom_boxplot()

Summarize Data

By loan grade we’ll show the count of loans, average funded amount by investors, the standard deviation of funded amount by investors, the max funded amount by investors, and the average difference between amount applied for and the amount funded by investors.

loangrades  <- loangrades %>% 
  group_by(grade) 

loangrades <- summarise(
  loangrades,
  collisions = n(),
  mean_funded = mean(funded_amnt_inv),
  sd_funded = sd(funded_amnt_inv),
  max_funded = max(funded_amnt_inv),
  funding_diff = mean(loan_amnt-funded_amnt_inv)
)

loangrades
## # A tibble: 7 × 6
##   grade collisions mean_funded sd_funded max_funded funding_diff
##   <chr>      <int>       <dbl>     <dbl>      <dbl>        <dbl>
## 1     A        272    8273.469  5006.255   34950.00    366.05314
## 2     B        316   10158.521  6647.293   33386.40    870.67158
## 3     C        184   10186.809  7626.370   35000.00   1183.43508
## 4     D        135   11800.641  6801.370   34975.00    926.58109
## 5     E         65   14148.500  9306.285   35000.00   1246.88464
## 6     F         21   15460.270  8608.370   31522.87   2945.68259
## 7     G          7   22172.510  8412.548   31813.20     84.63267