Preface

Credit Card Transactions

Introduction

Literature Review

Exploratory Data Analysis

Variable Description
loan_amnt The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
funded_amnt The total amount committed to that loan at that point in time.
disbursement_method The number of payments on the loan. Values are in months and can be either 36 or 60.
term The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY.
int_rate Interest Rate on the loan
installment Interest Rate on the loan
grade The monthly payment owed by the borrower if the loan originates.
emp_title Lending Club assigned loan grade, The job title supplied by the Borrower when applying for the loan.
emp_length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
home_ownership The home ownership status provided by the borrower during registration or obtained from the credit report. Our values are: RENT, OWN, MORTGAGE, OTHER
annual_inc The self-reported annual income provided by the borrower during registration.
annual_inc_joint The self-reported joint annual income provided by the borrower during registration.
loan_status Current status of the loan: fully-paid, current, charged-off
pymnt_plan Indicates if a payment plan has been put in place for the loan
purpose A category provided by the borrower for the loan request.
title The loan title provided by the borrower
addr_state The state provided by the borrower in the loan application
dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
dti_joint Joint debt to income ratio.
delinq_2yrs The number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years.
delinq_amnt The past-due amount owed for the accounts on which the borrower is now delinquent.
fico_range_low The lower boundary range the borrower’s FICO at loan origination belongs to.
fico_range_high The upper boundary range the borrower’s FICO at loan origination belongs to.
inq_last_6mths The number of inquiries in past 6 months (excluding auto and mortgage inquiries).
open_acc The number of open credit lines in the borrower’s credit file.
pub_rec Number of derogatory public records.
revol_bal Total credit revolving balance.
revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
total_acc The total number of credit lines currently in the borrower’s credit file.
total_rev_hi_lim Total revolving high credit/credit limit.
total_rec_late_fee Late fees received to date.
collections_12_mths_ex_med Number of collections in 12 months excluding medical collections.
application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers.
max_bal_bc Maximum current balance owed on all revolving accounts.
inq_fi Number of personal finance inquiries.
avg_cur_bal Average current balance of all accounts.
tax_liens Number of tax liens.
hardship_flag Flags whether or not the borrower is on a hardship plan.
##    loan_amnt      funded_amnt        term              int_rate    
##  Min.   :  500   Min.   :  500   Length:2260701     Min.   : 5.31  
##  1st Qu.: 8000   1st Qu.: 8000   Class :character   1st Qu.: 9.49  
##  Median :12900   Median :12875   Mode  :character   Median :12.62  
##  Mean   :15047   Mean   :15042                      Mean   :13.09  
##  3rd Qu.:20000   3rd Qu.:20000                      3rd Qu.:15.99  
##  Max.   :40000   Max.   :40000                      Max.   :30.99  
##  NA's   :33      NA's   :33                         NA's   :33     
##   installment         grade            emp_title          emp_length       
##  Min.   :   4.93   Length:2260701     Length:2260701     Length:2260701    
##  1st Qu.: 251.65   Class :character   Class :character   Class :character  
##  Median : 377.99   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 445.81                                                           
##  3rd Qu.: 593.32                                                           
##  Max.   :1719.83                                                           
##  NA's   :33                                                                
##  home_ownership       annual_inc        annual_inc_joint  loan_status       
##  Length:2260701     Min.   :        0   Min.   :   5694   Length:2260701    
##  Class :character   1st Qu.:    46000   1st Qu.:  83400   Class :character  
##  Mode  :character   Median :    65000   Median : 110000   Mode  :character  
##                     Mean   :    77992   Mean   : 123625                     
##                     3rd Qu.:    93000   3rd Qu.: 147995                     
##                     Max.   :110000000   Max.   :7874821                     
##                     NA's   :37          NA's   :2139991                     
##   pymnt_plan          purpose             title            addr_state       
##  Length:2260701     Length:2260701     Length:2260701     Length:2260701    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       dti           dti_joint        delinq_2yrs       delinq_amnt       
##  Min.   : -1.00   Min.   : 0.0      Min.   : 0.0000   Min.   :     0.00  
##  1st Qu.: 11.89   1st Qu.:13.5      1st Qu.: 0.0000   1st Qu.:     0.00  
##  Median : 17.84   Median :18.8      Median : 0.0000   Median :     0.00  
##  Mean   : 18.82   Mean   :19.3      Mean   : 0.3069   Mean   :    12.37  
##  3rd Qu.: 24.49   3rd Qu.:24.6      3rd Qu.: 0.0000   3rd Qu.:     0.00  
##  Max.   :999.00   Max.   :69.5      Max.   :58.0000   Max.   :249925.00  
##  NA's   :1744     NA's   :2139995   NA's   :62        NA's   :62         
##  fico_range_low  fico_range_high inq_last_6mths       open_acc     
##  Min.   :610.0   Min.   :614.0   Min.   : 0.0000   Min.   :  0.00  
##  1st Qu.:675.0   1st Qu.:679.0   1st Qu.: 0.0000   1st Qu.:  8.00  
##  Median :690.0   Median :694.0   Median : 0.0000   Median : 11.00  
##  Mean   :698.6   Mean   :702.6   Mean   : 0.5768   Mean   : 11.61  
##  3rd Qu.:715.0   3rd Qu.:719.0   3rd Qu.: 1.0000   3rd Qu.: 14.00  
##  Max.   :845.0   Max.   :850.0   Max.   :33.0000   Max.   :101.00  
##  NA's   :33      NA's   :33      NA's   :63        NA's   :62      
##     pub_rec          revol_bal         revol_util       total_acc     
##  Min.   : 0.0000   Min.   :      0   Min.   :  0.00   Min.   :  1.00  
##  1st Qu.: 0.0000   1st Qu.:   5950   1st Qu.: 31.50   1st Qu.: 15.00  
##  Median : 0.0000   Median :  11324   Median : 50.30   Median : 22.00  
##  Mean   : 0.1975   Mean   :  16658   Mean   : 50.34   Mean   : 24.16  
##  3rd Qu.: 0.0000   3rd Qu.:  20246   3rd Qu.: 69.40   3rd Qu.: 31.00  
##  Max.   :86.0000   Max.   :2904836   Max.   :892.30   Max.   :176.00  
##  NA's   :62        NA's   :33        NA's   :1835     NA's   :62      
##  total_rev_hi_lim  total_rec_late_fee collections_12_mths_ex_med
##  Min.   :      0   Min.   :   0.000   Min.   : 0.00000          
##  1st Qu.:  14700   1st Qu.:   0.000   1st Qu.: 0.00000          
##  Median :  25400   Median :   0.000   Median : 0.00000          
##  Mean   :  34574   Mean   :   1.518   Mean   : 0.01815          
##  3rd Qu.:  43200   3rd Qu.:   0.000   3rd Qu.: 0.00000          
##  Max.   :9999999   Max.   :1484.340   Max.   :20.00000          
##  NA's   :70309     NA's   :33         NA's   :178               
##  application_type     max_bal_bc          inq_fi        avg_cur_bal    
##  Length:2260701     Min.   :      0   Min.   : 0       Min.   :     0  
##  Class :character   1st Qu.:   2284   1st Qu.: 0       1st Qu.:  3080  
##  Mode  :character   Median :   4413   Median : 1       Median :  7335  
##                     Mean   :   5806   Mean   : 1       Mean   : 13548  
##                     3rd Qu.:   7598   3rd Qu.: 1       3rd Qu.: 18783  
##                     Max.   :1170668   Max.   :48       Max.   :958084  
##                     NA's   :866162    NA's   :866162   NA's   :70379   
##    tax_liens        hardship_flag      disbursement_method
##  Min.   : 0.00000   Length:2260701     Length:2260701     
##  1st Qu.: 0.00000   Class :character   Class :character   
##  Median : 0.00000   Mode  :character   Mode  :character   
##  Mean   : 0.04677                                         
##  3rd Qu.: 0.00000                                         
##  Max.   :85.00000                                         
##  NA's   :138
ggplot(accepted_loans, aes(grade, fill=grade)) +
    geom_bar(stat="count", color="white", size=0.25)

Methodology

factors_affecting_approved_balance_1 <-loan_amnt ~ lm(funded_amnt + annual_inc, data = accepted_loans)
stargazer(factors_affecting_approved_balance_1, type = "text")
## 
## % Error: Unrecognized object type.
summary(accepted_loans)
##    loan_amnt      funded_amnt        term              int_rate    
##  Min.   :  500   Min.   :  500   Length:2260701     Min.   : 5.31  
##  1st Qu.: 8000   1st Qu.: 8000   Class :character   1st Qu.: 9.49  
##  Median :12900   Median :12875   Mode  :character   Median :12.62  
##  Mean   :15047   Mean   :15042                      Mean   :13.09  
##  3rd Qu.:20000   3rd Qu.:20000                      3rd Qu.:15.99  
##  Max.   :40000   Max.   :40000                      Max.   :30.99  
##  NA's   :33      NA's   :33                         NA's   :33     
##   installment         grade            emp_title          emp_length       
##  Min.   :   4.93   Length:2260701     Length:2260701     Length:2260701    
##  1st Qu.: 251.65   Class :character   Class :character   Class :character  
##  Median : 377.99   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 445.81                                                           
##  3rd Qu.: 593.32                                                           
##  Max.   :1719.83                                                           
##  NA's   :33                                                                
##  home_ownership       annual_inc        annual_inc_joint  loan_status       
##  Length:2260701     Min.   :        0   Min.   :   5694   Length:2260701    
##  Class :character   1st Qu.:    46000   1st Qu.:  83400   Class :character  
##  Mode  :character   Median :    65000   Median : 110000   Mode  :character  
##                     Mean   :    77992   Mean   : 123625                     
##                     3rd Qu.:    93000   3rd Qu.: 147995                     
##                     Max.   :110000000   Max.   :7874821                     
##                     NA's   :37          NA's   :2139991                     
##   pymnt_plan          purpose             title            addr_state       
##  Length:2260701     Length:2260701     Length:2260701     Length:2260701    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       dti           dti_joint        delinq_2yrs       delinq_amnt       
##  Min.   : -1.00   Min.   : 0.0      Min.   : 0.0000   Min.   :     0.00  
##  1st Qu.: 11.89   1st Qu.:13.5      1st Qu.: 0.0000   1st Qu.:     0.00  
##  Median : 17.84   Median :18.8      Median : 0.0000   Median :     0.00  
##  Mean   : 18.82   Mean   :19.3      Mean   : 0.3069   Mean   :    12.37  
##  3rd Qu.: 24.49   3rd Qu.:24.6      3rd Qu.: 0.0000   3rd Qu.:     0.00  
##  Max.   :999.00   Max.   :69.5      Max.   :58.0000   Max.   :249925.00  
##  NA's   :1744     NA's   :2139995   NA's   :62        NA's   :62         
##  fico_range_low  fico_range_high inq_last_6mths       open_acc     
##  Min.   :610.0   Min.   :614.0   Min.   : 0.0000   Min.   :  0.00  
##  1st Qu.:675.0   1st Qu.:679.0   1st Qu.: 0.0000   1st Qu.:  8.00  
##  Median :690.0   Median :694.0   Median : 0.0000   Median : 11.00  
##  Mean   :698.6   Mean   :702.6   Mean   : 0.5768   Mean   : 11.61  
##  3rd Qu.:715.0   3rd Qu.:719.0   3rd Qu.: 1.0000   3rd Qu.: 14.00  
##  Max.   :845.0   Max.   :850.0   Max.   :33.0000   Max.   :101.00  
##  NA's   :33      NA's   :33      NA's   :63        NA's   :62      
##     pub_rec          revol_bal         revol_util       total_acc     
##  Min.   : 0.0000   Min.   :      0   Min.   :  0.00   Min.   :  1.00  
##  1st Qu.: 0.0000   1st Qu.:   5950   1st Qu.: 31.50   1st Qu.: 15.00  
##  Median : 0.0000   Median :  11324   Median : 50.30   Median : 22.00  
##  Mean   : 0.1975   Mean   :  16658   Mean   : 50.34   Mean   : 24.16  
##  3rd Qu.: 0.0000   3rd Qu.:  20246   3rd Qu.: 69.40   3rd Qu.: 31.00  
##  Max.   :86.0000   Max.   :2904836   Max.   :892.30   Max.   :176.00  
##  NA's   :62        NA's   :33        NA's   :1835     NA's   :62      
##  total_rev_hi_lim  total_rec_late_fee collections_12_mths_ex_med
##  Min.   :      0   Min.   :   0.000   Min.   : 0.00000          
##  1st Qu.:  14700   1st Qu.:   0.000   1st Qu.: 0.00000          
##  Median :  25400   Median :   0.000   Median : 0.00000          
##  Mean   :  34574   Mean   :   1.518   Mean   : 0.01815          
##  3rd Qu.:  43200   3rd Qu.:   0.000   3rd Qu.: 0.00000          
##  Max.   :9999999   Max.   :1484.340   Max.   :20.00000          
##  NA's   :70309     NA's   :33         NA's   :178               
##  application_type     max_bal_bc          inq_fi        avg_cur_bal    
##  Length:2260701     Min.   :      0   Min.   : 0       Min.   :     0  
##  Class :character   1st Qu.:   2284   1st Qu.: 0       1st Qu.:  3080  
##  Mode  :character   Median :   4413   Median : 1       Median :  7335  
##                     Mean   :   5806   Mean   : 1       Mean   : 13548  
##                     3rd Qu.:   7598   3rd Qu.: 1       3rd Qu.: 18783  
##                     Max.   :1170668   Max.   :48       Max.   :958084  
##                     NA's   :866162    NA's   :866162   NA's   :70379   
##    tax_liens        hardship_flag      disbursement_method
##  Min.   : 0.00000   Length:2260701     Length:2260701     
##  1st Qu.: 0.00000   Class :character   Class :character   
##  Median : 0.00000   Mode  :character   Mode  :character   
##  Mean   : 0.04677                                         
##  3rd Qu.: 0.00000                                         
##  Max.   :85.00000                                         
##  NA's   :138
# Data Exploration 

#1. Geographical Distrubution of Loan
# the Number of Loan Funded in Different States
a=data.table(table(accepted_loans$addr_state))
setnames(a,c("region","count"))
a$region=sapply(state.name[match(a$region,state.abb)],tolower)
all_states <- map_data("state")
total <-merge(all_states,a,by="region")
ggplot(total, aes(x=long, y=lat, map_id = region)) + 
  geom_map(aes(fill= count), map = all_states)+
  labs(title="Loan counts in respective states",x="",y="")+
  scale_fill_gradientn("",colours=terrain.colors(10),guide = "legend")+
  theme_bw()

accepted_loans%>%
   ggplot(aes(x=forcats::fct_infreq(grade), fill=grade)) +
  geom_bar(show.legend = F)+
  geom_text(stat = 'count',
             aes(label=paste0(round(after_stat(prop*100), digits=1), "%"), group=1),
        vjust=-0.4,
        size=4    )+
  labs(x="Grade",
       y="Count",
       title = "Applicants Distubution in Different Grades")+
   theme(
        plot.title = element_text(size=20),
        axis.text.x = element_text(size=16),
        axis.text.y = element_text(size=16))

accepted_loans%>%
  ggplot(aes(x=grade,fill=purpose))+
  geom_bar(stat = 'count', position = 'fill')+
  labs(x='count',
       y='Grade',
       title = 'Applicant with Different Purposes')+
   theme(
        axis.text.x = element_text(angle = 45,hjust=1))+coord_flip()+
   theme(
        plot.title = element_text(size=20),
        axis.text.x = element_text(size=16),
        axis.text.y = element_text(size=16))

#to see the interest rates cluster over different variable

accepted_loans%>%
  ggplot(aes(x=int_rate, color=home_ownership))+
  geom_density(adjust=2)+
  theme_wsj()
## Warning: Removed 33 rows containing non-finite values (stat_density).