1 Analysis of Lending Data

1.1 Data Input and Properties

  1. Import Data
library(readr)
loan <- read_csv("loan.csv")
loan
  1. Checking the dimensions of the data
dim(loan)
## [1] 2260668     145
  1. Checking the structure of the data
str(loan)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 2260668 obs. of  145 variables:
##  $ id                                        : logi  NA NA NA NA NA NA ...
##  $ member_id                                 : logi  NA NA NA NA NA NA ...
##  $ loan_amnt                                 : num  2500 30000 5000 4000 30000 5550 2000 6000 5000 6000 ...
##  $ funded_amnt                               : num  2500 30000 5000 4000 30000 5550 2000 6000 5000 6000 ...
##  $ funded_amnt_inv                           : num  2500 30000 5000 4000 30000 5550 2000 6000 5000 6000 ...
##  $ term                                      : chr  "36 months" "60 months" "36 months" "36 months" ...
##  $ int_rate                                  : num  13.6 18.9 18 18.9 16.1 ...
##  $ installment                               : num  84.9 777.2 180.7 146.5 731.8 ...
##  $ grade                                     : chr  "C" "D" "D" "D" ...
##  $ sub_grade                                 : chr  "C1" "D2" "D1" "D2" ...
##  $ emp_title                                 : chr  "Chef" "Postmaster" "Administrative" "IT Supervisor" ...
##  $ emp_length                                : chr  "10+ years" "10+ years" "6 years" "10+ years" ...
##  $ home_ownership                            : chr  "RENT" "MORTGAGE" "MORTGAGE" "MORTGAGE" ...
##  $ annual_inc                                : num  55000 90000 59280 92000 57250 ...
##  $ verification_status                       : chr  "Not Verified" "Source Verified" "Source Verified" "Source Verified" ...
##  $ issue_d                                   : chr  "Dec-2018" "Dec-2018" "Dec-2018" "Dec-2018" ...
##  $ loan_status                               : chr  "Current" "Current" "Current" "Current" ...
##  $ pymnt_plan                                : chr  "n" "n" "n" "n" ...
##  $ url                                       : logi  NA NA NA NA NA NA ...
##  $ desc                                      : logi  NA NA NA NA NA NA ...
##  $ purpose                                   : chr  "debt_consolidation" "debt_consolidation" "debt_consolidation" "debt_consolidation" ...
##  $ title                                     : chr  "Debt consolidation" "Debt consolidation" "Debt consolidation" "Debt consolidation" ...
##  $ zip_code                                  : chr  "109xx" "713xx" "490xx" "985xx" ...
##  $ addr_state                                : chr  "NY" "LA" "MI" "WA" ...
##  $ dti                                       : num  18.2 26.5 10.5 16.7 26.4 ...
##  $ delinq_2yrs                               : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ earliest_cr_line                          : chr  "Apr-2001" "Jun-1987" "Apr-2011" "Feb-2006" ...
##  $ inq_last_6mths                            : num  1 0 0 0 0 3 1 0 1 1 ...
##  $ mths_since_last_delinq                    : num  NA 71 NA NA NA NA NA NA 32 17 ...
##  $ mths_since_last_record                    : num  45 75 NA NA NA NA NA NA NA NA ...
##  $ open_acc                                  : num  9 13 8 10 12 18 1 19 8 38 ...
##  $ pub_rec                                   : num  1 1 0 0 0 0 0 0 0 0 ...
##  $ revol_bal                                 : num  4341 12315 4599 5468 829 ...
##  $ revol_util                                : num  10.3 24.2 19.1 78.1 3.6 48.1 NA 69.3 35.2 49.8 ...
##  $ total_acc                                 : num  34 44 13 13 26 44 9 37 38 58 ...
##  $ initial_list_status                       : chr  "w" "w" "w" "w" ...
##  $ out_prncp                                 : num  2386 29388 4787 3832 29339 ...
##  $ out_prncp_inv                             : num  2386 29388 4787 3832 29339 ...
##  $ total_pymnt                               : num  167 1507 354 287 1423 ...
##  $ total_pymnt_inv                           : num  167 1507 354 287 1423 ...
##  $ total_rec_prncp                           : num  114 612 213 168 661 ...
##  $ total_rec_int                             : num  53 895 141 119 762 ...
##  $ total_rec_late_fee                        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ recoveries                                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ collection_recovery_fee                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ last_pymnt_d                              : chr  "Feb-2019" "Feb-2019" "Feb-2019" "Feb-2019" ...
##  $ last_pymnt_amnt                           : num  84.9 777.2 180.7 146.5 731.8 ...
##  $ next_pymnt_d                              : chr  "Mar-2019" "Mar-2019" "Mar-2019" "Mar-2019" ...
##  $ last_credit_pull_d                        : chr  "Feb-2019" "Feb-2019" "Feb-2019" "Feb-2019" ...
##  $ collections_12_mths_ex_med                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mths_since_last_major_derog               : num  NA NA NA NA NA NA NA NA 45 NA ...
##  $ policy_code                               : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ application_type                          : chr  "Individual" "Individual" "Individual" "Individual" ...
##  $ annual_inc_joint                          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ dti_joint                                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ verification_status_joint                 : chr  NA NA NA NA ...
##  $ acc_now_delinq                            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ tot_coll_amt                              : num  0 1208 0 686 0 ...
##  $ tot_cur_bal                               : num  16901 321915 110299 305049 116007 ...
##  $ open_acc_6m                               : num  2 4 0 1 3 1 0 0 5 1 ...
##  $ open_act_il                               : num  2 4 1 5 5 7 0 5 2 4 ...
##  $ open_il_12m                               : num  1 2 0 3 3 2 2 0 5 1 ...
##  $ open_il_24m                               : num  2 3 2 5 5 3 3 1 5 3 ...
##  $ mths_since_rcnt_il                        : num  2 3 14 5 4 4 7 23 3 7 ...
##  $ total_bal_il                              : num  12560 87153 7150 30683 28845 ...
##  $ il_util                                   : num  69 88 72 68 89 72 NA 87 98 45 ...
##  $ open_rv_12m                               : num  2 4 0 0 2 1 0 0 1 1 ...
##  $ open_rv_24m                               : num  7 5 2 0 4 4 1 2 6 12 ...
##  $ max_bal_bc                                : num  2137 998 0 3761 516 ...
##  $ all_util                                  : num  28 57 35 70 54 58 100 74 73 48 ...
##  $ total_rev_hi_lim                          : num  42000 50800 24100 7000 23100 ...
##  $ inq_fi                                    : num  1 2 1 2 1 2 0 1 2 2 ...
##  $ total_cu_tl                               : num  11 15 5 4 0 4 0 2 1 2 ...
##  $ inq_last_12m                              : num  2 2 0 3 0 6 1 0 4 2 ...
##  $ acc_open_past_24mths                      : num  9 10 4 5 9 8 4 3 12 15 ...
##  $ avg_cur_bal                               : num  1878 24763 18383 30505 9667 ...
##  $ bc_open_to_buy                            : num  34360 13761 13800 1239 8471 ...
##  $ bc_util                                   : num  5.9 8.3 0 75.2 8.9 64 NA 90.8 35.9 60.6 ...
##  $ chargeoff_within_12_mths                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ delinq_amnt                               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mo_sin_old_il_acct                        : num  140 163 87 62 53 195 169 169 145 166 ...
##  $ mo_sin_old_rev_tl_op                      : num  212 378 92 154 216 176 40 253 244 200 ...
##  $ mo_sin_rcnt_rev_tl_op                     : num  1 4 15 64 2 10 23 13 6 4 ...
##  $ mo_sin_rcnt_tl                            : num  1 3 14 5 2 4 7 13 3 4 ...
##  $ mort_acc                                  : num  0 3 2 3 2 6 0 1 3 1 ...
##  $ mths_since_recent_bc                      : num  1 4 77 64 2 20 NA 14 6 4 ...
##  $ mths_since_recent_bc_dlq                  : num  NA NA NA NA NA NA NA NA 33 NA ...
##  $ mths_since_recent_inq                     : num  2 4 14 5 13 3 1 13 2 4 ...
##  $ mths_since_recent_revol_delinq            : num  NA NA NA NA NA NA NA NA 32 17 ...
##  $ num_accts_ever_120_pd                     : num  0 0 0 0 0 0 0 0 2 0 ...
##  $ num_actv_bc_tl                            : num  2 2 0 1 2 4 0 7 4 16 ...
##  $ num_actv_rev_tl                           : num  5 4 3 2 2 6 0 12 5 20 ...
##  $ num_bc_sats                               : num  3 4 3 1 3 6 0 8 5 19 ...
##  $ num_bc_tl                                 : num  3 9 3 2 8 10 3 10 10 26 ...
##  $ num_il_tl                                 : num  16 27 4 7 9 23 5 15 20 9 ...
##  $ num_op_rev_tl                             : num  7 8 6 2 6 9 0 14 6 33 ...
##  $ num_rev_accts                             : num  18 14 7 3 15 15 3 20 15 48 ...
##  $ num_rev_tl_bal_gt_0                       : num  5 4 3 2 2 7 0 12 5 20 ...
##  $ num_sats                                  : num  9 13 8 10 12 18 1 19 8 38 ...
##   [list output truncated]
##  - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 462349 obs. of  5 variables:
##   ..$ row     : int  92797 92797 92797 92797 92797 92797 95386 95386 95386 95386 ...
##   ..$ col     : chr  "debt_settlement_flag_date" "settlement_status" "settlement_date" "settlement_amount" ...
##   ..$ expected: chr  "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
##   ..$ actual  : chr  "Feb-2019" "ACTIVE" "Feb-2019" "5443" ...
##   ..$ file    : chr  "'loan.csv'" "'loan.csv'" "'loan.csv'" "'loan.csv'" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_logical(),
##   ..   member_id = col_logical(),
##   ..   loan_amnt = col_double(),
##   ..   funded_amnt = col_double(),
##   ..   funded_amnt_inv = col_double(),
##   ..   term = col_character(),
##   ..   int_rate = col_double(),
##   ..   installment = col_double(),
##   ..   grade = col_character(),
##   ..   sub_grade = col_character(),
##   ..   emp_title = col_character(),
##   ..   emp_length = col_character(),
##   ..   home_ownership = col_character(),
##   ..   annual_inc = col_double(),
##   ..   verification_status = col_character(),
##   ..   issue_d = col_character(),
##   ..   loan_status = col_character(),
##   ..   pymnt_plan = col_character(),
##   ..   url = col_logical(),
##   ..   desc = col_logical(),
##   ..   purpose = col_character(),
##   ..   title = col_character(),
##   ..   zip_code = col_character(),
##   ..   addr_state = col_character(),
##   ..   dti = col_double(),
##   ..   delinq_2yrs = col_double(),
##   ..   earliest_cr_line = col_character(),
##   ..   inq_last_6mths = col_double(),
##   ..   mths_since_last_delinq = col_double(),
##   ..   mths_since_last_record = col_double(),
##   ..   open_acc = col_double(),
##   ..   pub_rec = col_double(),
##   ..   revol_bal = col_double(),
##   ..   revol_util = col_double(),
##   ..   total_acc = col_double(),
##   ..   initial_list_status = col_character(),
##   ..   out_prncp = col_double(),
##   ..   out_prncp_inv = col_double(),
##   ..   total_pymnt = col_double(),
##   ..   total_pymnt_inv = col_double(),
##   ..   total_rec_prncp = col_double(),
##   ..   total_rec_int = col_double(),
##   ..   total_rec_late_fee = col_double(),
##   ..   recoveries = col_double(),
##   ..   collection_recovery_fee = col_double(),
##   ..   last_pymnt_d = col_character(),
##   ..   last_pymnt_amnt = col_double(),
##   ..   next_pymnt_d = col_character(),
##   ..   last_credit_pull_d = col_character(),
##   ..   collections_12_mths_ex_med = col_double(),
##   ..   mths_since_last_major_derog = col_double(),
##   ..   policy_code = col_double(),
##   ..   application_type = col_character(),
##   ..   annual_inc_joint = col_double(),
##   ..   dti_joint = col_double(),
##   ..   verification_status_joint = col_character(),
##   ..   acc_now_delinq = col_double(),
##   ..   tot_coll_amt = col_double(),
##   ..   tot_cur_bal = col_double(),
##   ..   open_acc_6m = col_double(),
##   ..   open_act_il = col_double(),
##   ..   open_il_12m = col_double(),
##   ..   open_il_24m = col_double(),
##   ..   mths_since_rcnt_il = col_double(),
##   ..   total_bal_il = col_double(),
##   ..   il_util = col_double(),
##   ..   open_rv_12m = col_double(),
##   ..   open_rv_24m = col_double(),
##   ..   max_bal_bc = col_double(),
##   ..   all_util = col_double(),
##   ..   total_rev_hi_lim = col_double(),
##   ..   inq_fi = col_double(),
##   ..   total_cu_tl = col_double(),
##   ..   inq_last_12m = col_double(),
##   ..   acc_open_past_24mths = col_double(),
##   ..   avg_cur_bal = col_double(),
##   ..   bc_open_to_buy = col_double(),
##   ..   bc_util = col_double(),
##   ..   chargeoff_within_12_mths = col_double(),
##   ..   delinq_amnt = col_double(),
##   ..   mo_sin_old_il_acct = col_double(),
##   ..   mo_sin_old_rev_tl_op = col_double(),
##   ..   mo_sin_rcnt_rev_tl_op = col_double(),
##   ..   mo_sin_rcnt_tl = col_double(),
##   ..   mort_acc = col_double(),
##   ..   mths_since_recent_bc = col_double(),
##   ..   mths_since_recent_bc_dlq = col_double(),
##   ..   mths_since_recent_inq = col_double(),
##   ..   mths_since_recent_revol_delinq = col_double(),
##   ..   num_accts_ever_120_pd = col_double(),
##   ..   num_actv_bc_tl = col_double(),
##   ..   num_actv_rev_tl = col_double(),
##   ..   num_bc_sats = col_double(),
##   ..   num_bc_tl = col_double(),
##   ..   num_il_tl = col_double(),
##   ..   num_op_rev_tl = col_double(),
##   ..   num_rev_accts = col_double(),
##   ..   num_rev_tl_bal_gt_0 = col_double(),
##   ..   num_sats = col_double(),
##   ..   num_tl_120dpd_2m = col_double(),
##   ..   num_tl_30dpd = col_double(),
##   ..   num_tl_90g_dpd_24m = col_double(),
##   ..   num_tl_op_past_12m = col_double(),
##   ..   pct_tl_nvr_dlq = col_double(),
##   ..   percent_bc_gt_75 = col_double(),
##   ..   pub_rec_bankruptcies = col_double(),
##   ..   tax_liens = col_double(),
##   ..   tot_hi_cred_lim = col_double(),
##   ..   total_bal_ex_mort = col_double(),
##   ..   total_bc_limit = col_double(),
##   ..   total_il_high_credit_limit = col_double(),
##   ..   revol_bal_joint = col_double(),
##   ..   sec_app_earliest_cr_line = col_character(),
##   ..   sec_app_inq_last_6mths = col_double(),
##   ..   sec_app_mort_acc = col_double(),
##   ..   sec_app_open_acc = col_double(),
##   ..   sec_app_revol_util = col_double(),
##   ..   sec_app_open_act_il = col_double(),
##   ..   sec_app_num_rev_accts = col_double(),
##   ..   sec_app_chargeoff_within_12_mths = col_double(),
##   ..   sec_app_collections_12_mths_ex_med = col_double(),
##   ..   sec_app_mths_since_last_major_derog = col_double(),
##   ..   hardship_flag = col_character(),
##   ..   hardship_type = col_logical(),
##   ..   hardship_reason = col_logical(),
##   ..   hardship_status = col_logical(),
##   ..   deferral_term = col_logical(),
##   ..   hardship_amount = col_logical(),
##   ..   hardship_start_date = col_logical(),
##   ..   hardship_end_date = col_logical(),
##   ..   payment_plan_start_date = col_logical(),
##   ..   hardship_length = col_logical(),
##   ..   hardship_dpd = col_logical(),
##   ..   hardship_loan_status = col_logical(),
##   ..   orig_projected_additional_accrued_interest = col_logical(),
##   ..   hardship_payoff_balance_amount = col_logical(),
##   ..   hardship_last_payment_amount = col_logical(),
##   ..   disbursement_method = col_character(),
##   ..   debt_settlement_flag = col_character(),
##   ..   debt_settlement_flag_date = col_logical(),
##   ..   settlement_status = col_logical(),
##   ..   settlement_date = col_logical(),
##   ..   settlement_amount = col_logical(),
##   ..   settlement_percentage = col_logical(),
##   ..   settlement_term = col_logical()
##   .. )
  1. Deleting unnecessary data
loan1 <- loan[,-c(1,2,19,20,29,30,36,43,44,45,50,51,54,55,57,87,89,112,114:137,139:145)]
  1. Checking the changing the dimension of the data after deleting unnecessary data
dim(loan1)
## [1] 2260668      96

Insight: The original data (loan.csv) contain 145 variables, after deleting the data (loan1) become 96 variables.

  1. Checking the summary of the data
summary(loan1)
##    loan_amnt      funded_amnt    funded_amnt_inv     term          
##  Min.   :  500   Min.   :  500   Min.   :    0   Length:2260668    
##  1st Qu.: 8000   1st Qu.: 8000   1st Qu.: 8000   Class :character  
##  Median :12900   Median :12875   Median :12800   Mode  :character  
##  Mean   :15047   Mean   :15042   Mean   :15023                     
##  3rd Qu.:20000   3rd Qu.:20000   3rd Qu.:20000                     
##  Max.   :40000   Max.   :40000   Max.   :40000                     
##                                                                    
##     int_rate      installment         grade            sub_grade        
##  Min.   : 5.31   Min.   :   4.93   Length:2260668     Length:2260668    
##  1st Qu.: 9.49   1st Qu.: 251.65   Class :character   Class :character  
##  Median :12.62   Median : 377.99   Mode  :character   Mode  :character  
##  Mean   :13.09   Mean   : 445.81                                        
##  3rd Qu.:15.99   3rd Qu.: 593.32                                        
##  Max.   :30.99   Max.   :1719.83                                        
##                                                                         
##   emp_title          emp_length        home_ownership    
##  Length:2260668     Length:2260668     Length:2260668    
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    annual_inc        verification_status   issue_d         
##  Min.   :        0   Length:2260668      Length:2260668    
##  1st Qu.:    46000   Class :character    Class :character  
##  Median :    65000   Mode  :character    Mode  :character  
##  Mean   :    77992                                         
##  3rd Qu.:    93000                                         
##  Max.   :110000000                                         
##  NA's   :4                                                 
##  loan_status         pymnt_plan          purpose         
##  Length:2260668     Length:2260668     Length:2260668    
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##     title             zip_code          addr_state             dti        
##  Length:2260668     Length:2260668     Length:2260668     Min.   : -1.00  
##  Class :character   Class :character   Class :character   1st Qu.: 11.89  
##  Mode  :character   Mode  :character   Mode  :character   Median : 17.84  
##                                                           Mean   : 18.82  
##                                                           3rd Qu.: 24.49  
##                                                           Max.   :999.00  
##                                                           NA's   :1711    
##   delinq_2yrs      earliest_cr_line   inq_last_6mths       open_acc     
##  Min.   : 0.0000   Length:2260668     Min.   : 0.0000   Min.   :  0.00  
##  1st Qu.: 0.0000   Class :character   1st Qu.: 0.0000   1st Qu.:  8.00  
##  Median : 0.0000   Mode  :character   Median : 0.0000   Median : 11.00  
##  Mean   : 0.3069                      Mean   : 0.5768   Mean   : 11.61  
##  3rd Qu.: 0.0000                      3rd Qu.: 1.0000   3rd Qu.: 14.00  
##  Max.   :58.0000                      Max.   :33.0000   Max.   :101.00  
##  NA's   :29                           NA's   :30        NA's   :29      
##     pub_rec          revol_bal         revol_util       total_acc     
##  Min.   : 0.0000   Min.   :      0   Min.   :  0.00   Min.   :  1.00  
##  1st Qu.: 0.0000   1st Qu.:   5950   1st Qu.: 31.50   1st Qu.: 15.00  
##  Median : 0.0000   Median :  11324   Median : 50.30   Median : 22.00  
##  Mean   : 0.1975   Mean   :  16658   Mean   : 50.34   Mean   : 24.16  
##  3rd Qu.: 0.0000   3rd Qu.:  20246   3rd Qu.: 69.40   3rd Qu.: 31.00  
##  Max.   :86.0000   Max.   :2904836   Max.   :892.30   Max.   :176.00  
##  NA's   :29                          NA's   :1802     NA's   :29      
##    out_prncp     out_prncp_inv    total_pymnt    total_pymnt_inv
##  Min.   :    0   Min.   :    0   Min.   :    0   Min.   :    0  
##  1st Qu.:    0   1st Qu.:    0   1st Qu.: 4273   1st Qu.: 4258  
##  Median :    0   Median :    0   Median : 9061   Median : 9043  
##  Mean   : 4446   Mean   : 4445   Mean   :11824   Mean   :11806  
##  3rd Qu.: 6713   3rd Qu.: 6710   3rd Qu.:16708   3rd Qu.:16683  
##  Max.   :40000   Max.   :40000   Max.   :63297   Max.   :63297  
##                                                                 
##  total_rec_prncp total_rec_int     last_pymnt_d       last_pymnt_amnt  
##  Min.   :    0   Min.   :    0.0   Length:2260668     Min.   :    0.0  
##  1st Qu.: 2846   1st Qu.:  693.6   Class :character   1st Qu.:  308.6  
##  Median : 6823   Median : 1485.3   Mode  :character   Median :  588.5  
##  Mean   : 9300   Mean   : 2386.4                      Mean   : 3364.0  
##  3rd Qu.:13398   3rd Qu.: 3052.2                      3rd Qu.: 3535.0  
##  Max.   :40000   Max.   :28192.5                      Max.   :42192.1  
##                                                                        
##  next_pymnt_d       last_credit_pull_d  policy_code application_type  
##  Length:2260668     Length:2260668     Min.   :1    Length:2260668    
##  Class :character   Class :character   1st Qu.:1    Class :character  
##  Mode  :character   Mode  :character   Median :1    Mode  :character  
##                                        Mean   :1                      
##                                        3rd Qu.:1                      
##                                        Max.   :1                      
##                                                                       
##  verification_status_joint  tot_coll_amt      tot_cur_bal     
##  Length:2260668            Min.   :      0   Min.   :      0  
##  Class :character          1st Qu.:      0   1st Qu.:  29092  
##  Mode  :character          Median :      0   Median :  79240  
##                            Mean   :    233   Mean   : 142492  
##                            3rd Qu.:      0   3rd Qu.: 213204  
##                            Max.   :9152545   Max.   :9971659  
##                            NA's   :70276     NA's   :70276    
##   open_acc_6m      open_act_il      open_il_12m      open_il_24m    
##  Min.   : 0.0     Min.   : 0.0     Min.   : 0.0     Min.   : 0.0    
##  1st Qu.: 0.0     1st Qu.: 1.0     1st Qu.: 0.0     1st Qu.: 0.0    
##  Median : 1.0     Median : 2.0     Median : 0.0     Median : 1.0    
##  Mean   : 0.9     Mean   : 2.8     Mean   : 0.7     Mean   : 1.6    
##  3rd Qu.: 1.0     3rd Qu.: 3.0     3rd Qu.: 1.0     3rd Qu.: 2.0    
##  Max.   :18.0     Max.   :57.0     Max.   :25.0     Max.   :51.0    
##  NA's   :866130   NA's   :866129   NA's   :866129   NA's   :866129  
##  mths_since_rcnt_il  total_bal_il        il_util         open_rv_12m    
##  Min.   :  0.0      Min.   :      0   Min.   :   0.0    Min.   : 0.0    
##  1st Qu.:  7.0      1st Qu.:   8695   1st Qu.:  55.0    1st Qu.: 0.0    
##  Median : 13.0      Median :  23127   Median :  72.0    Median : 1.0    
##  Mean   : 21.2      Mean   :  35507   Mean   :  69.1    Mean   : 1.3    
##  3rd Qu.: 24.0      3rd Qu.:  46095   3rd Qu.:  86.0    3rd Qu.: 2.0    
##  Max.   :511.0      Max.   :1837038   Max.   :1000.0    Max.   :28.0    
##  NA's   :909924     NA's   :866129    NA's   :1068850   NA's   :866129  
##   open_rv_24m       max_bal_bc         all_util      total_rev_hi_lim 
##  Min.   : 0.0     Min.   :      0   Min.   :  0      Min.   :      0  
##  1st Qu.: 1.0     1st Qu.:   2284   1st Qu.: 43      1st Qu.:  14700  
##  Median : 2.0     Median :   4413   Median : 58      Median :  25400  
##  Mean   : 2.7     Mean   :   5806   Mean   : 57      Mean   :  34574  
##  3rd Qu.: 4.0     3rd Qu.:   7598   3rd Qu.: 72      3rd Qu.:  43200  
##  Max.   :60.0     Max.   :1170668   Max.   :239      Max.   :9999999  
##  NA's   :866129   NA's   :866129    NA's   :866348   NA's   :70276    
##      inq_fi        total_cu_tl      inq_last_12m    acc_open_past_24mths
##  Min.   : 0       Min.   :  0.0    Min.   : 0       Min.   : 0.00       
##  1st Qu.: 0       1st Qu.:  0.0    1st Qu.: 0       1st Qu.: 2.00       
##  Median : 1       Median :  0.0    Median : 1       Median : 4.00       
##  Mean   : 1       Mean   :  1.5    Mean   : 2       Mean   : 4.52       
##  3rd Qu.: 1       3rd Qu.:  2.0    3rd Qu.: 3       3rd Qu.: 6.00       
##  Max.   :48       Max.   :111.0    Max.   :67       Max.   :64.00       
##  NA's   :866129   NA's   :866130   NA's   :866130   NA's   :50030       
##   avg_cur_bal     bc_open_to_buy      bc_util     
##  Min.   :     0   Min.   :     0   Min.   :  0.0  
##  1st Qu.:  3080   1st Qu.:  1722   1st Qu.: 35.4  
##  Median :  7335   Median :  5442   Median : 60.2  
##  Mean   : 13548   Mean   : 11394   Mean   : 57.9  
##  3rd Qu.: 18783   3rd Qu.: 14187   3rd Qu.: 83.1  
##  Max.   :958084   Max.   :711140   Max.   :339.6  
##  NA's   :70346    NA's   :74935    NA's   :76071  
##  chargeoff_within_12_mths  delinq_amnt        mo_sin_old_il_acct
##  Min.   : 0.00000         Min.   :     0.00   Min.   :  0.0     
##  1st Qu.: 0.00000         1st Qu.:     0.00   1st Qu.: 96.0     
##  Median : 0.00000         Median :     0.00   Median :130.0     
##  Mean   : 0.00846         Mean   :    12.37   Mean   :125.7     
##  3rd Qu.: 0.00000         3rd Qu.:     0.00   3rd Qu.:154.0     
##  Max.   :10.00000         Max.   :249925.00   Max.   :999.0     
##  NA's   :145              NA's   :29          NA's   :139071    
##  mo_sin_old_rev_tl_op mo_sin_rcnt_rev_tl_op mo_sin_rcnt_tl 
##  Min.   :  1.0        Min.   :  0.00        Min.   :  0.0  
##  1st Qu.:116.0        1st Qu.:  4.00        1st Qu.:  3.0  
##  Median :164.0        Median :  8.00        Median :  6.0  
##  Mean   :181.5        Mean   : 14.02        Mean   :  8.3  
##  3rd Qu.:232.0        3rd Qu.: 17.00        3rd Qu.: 11.0  
##  Max.   :999.0        Max.   :547.00        Max.   :382.0  
##  NA's   :70277        NA's   :70277         NA's   :70276  
##     mort_acc     mths_since_recent_bc mths_since_recent_inq
##  Min.   : 0.00   Min.   :  0.00       Min.   : 0.00        
##  1st Qu.: 0.00   1st Qu.:  6.00       1st Qu.: 2.00        
##  Median : 1.00   Median : 14.00       Median : 5.00        
##  Mean   : 1.56   Mean   : 24.84       Mean   : 7.02        
##  3rd Qu.: 3.00   3rd Qu.: 30.00       3rd Qu.:11.00        
##  Max.   :94.00   Max.   :661.00       Max.   :25.00        
##  NA's   :50030   NA's   :73412        NA's   :295435       
##  num_accts_ever_120_pd num_actv_bc_tl  num_actv_rev_tl  num_bc_sats   
##  Min.   : 0.0          Min.   : 0.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 0.0          1st Qu.: 2.00   1st Qu.: 3.00   1st Qu.: 3.00  
##  Median : 0.0          Median : 3.00   Median : 5.00   Median : 4.00  
##  Mean   : 0.5          Mean   : 3.68   Mean   : 5.63   Mean   : 4.77  
##  3rd Qu.: 0.0          3rd Qu.: 5.00   3rd Qu.: 7.00   3rd Qu.: 6.00  
##  Max.   :58.0          Max.   :50.00   Max.   :72.00   Max.   :71.00  
##  NA's   :70276         NA's   :70276   NA's   :70276   NA's   :58590  
##    num_bc_tl       num_il_tl      num_op_rev_tl   num_rev_accts  
##  Min.   : 0.00   Min.   :  0.00   Min.   : 0.00   Min.   :  0    
##  1st Qu.: 4.00   1st Qu.:  3.00   1st Qu.: 5.00   1st Qu.:  8    
##  Median : 7.00   Median :  6.00   Median : 7.00   Median : 12    
##  Mean   : 7.73   Mean   :  8.41   Mean   : 8.25   Mean   : 14    
##  3rd Qu.:10.00   3rd Qu.: 11.00   3rd Qu.:10.00   3rd Qu.: 18    
##  Max.   :86.00   Max.   :159.00   Max.   :91.00   Max.   :151    
##  NA's   :70276   NA's   :70276    NA's   :70276   NA's   :70277  
##  num_rev_tl_bal_gt_0    num_sats      num_tl_120dpd_2m  num_tl_30dpd  
##  Min.   : 0.00       Min.   :  0.00   Min.   :0        Min.   :0      
##  1st Qu.: 3.00       1st Qu.:  8.00   1st Qu.:0        1st Qu.:0      
##  Median : 5.00       Median : 11.00   Median :0        Median :0      
##  Mean   : 5.58       Mean   : 11.63   Mean   :0        Mean   :0      
##  3rd Qu.: 7.00       3rd Qu.: 14.00   3rd Qu.:0        3rd Qu.:0      
##  Max.   :65.00       Max.   :101.00   Max.   :7        Max.   :4      
##  NA's   :70276       NA's   :58590    NA's   :153657   NA's   :70276  
##  num_tl_90g_dpd_24m num_tl_op_past_12m pct_tl_nvr_dlq   percent_bc_gt_75
##  Min.   : 0.00      Min.   : 0.00      Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 0.00      1st Qu.: 1.00      1st Qu.: 91.30   1st Qu.:  0.00  
##  Median : 0.00      Median : 2.00      Median :100.00   Median : 37.50  
##  Mean   : 0.08      Mean   : 2.08      Mean   : 94.11   Mean   : 42.44  
##  3rd Qu.: 0.00      3rd Qu.: 3.00      3rd Qu.:100.00   3rd Qu.: 71.40  
##  Max.   :58.00      Max.   :32.00      Max.   :100.00   Max.   :100.00  
##  NA's   :70276      NA's   :70276      NA's   :70431    NA's   :75379   
##  pub_rec_bankruptcies   tax_liens        tot_hi_cred_lim  
##  Min.   : 0.0000      Min.   : 0.00000   Min.   :      0  
##  1st Qu.: 0.0000      1st Qu.: 0.00000   1st Qu.:  50731  
##  Median : 0.0000      Median : 0.00000   Median : 114298  
##  Mean   : 0.1282      Mean   : 0.04677   Mean   : 178243  
##  3rd Qu.: 0.0000      3rd Qu.: 0.00000   3rd Qu.: 257755  
##  Max.   :12.0000      Max.   :85.00000   Max.   :9999999  
##  NA's   :1365         NA's   :105        NA's   :70276    
##  total_bal_ex_mort total_bc_limit    total_il_high_credit_limit
##  Min.   :      0   Min.   :      0   Min.   :      0           
##  1st Qu.:  20892   1st Qu.:   8300   1st Qu.:  15000           
##  Median :  37864   Median :  16300   Median :  32696           
##  Mean   :  51023   Mean   :  23194   Mean   :  43732           
##  3rd Qu.:  64350   3rd Qu.:  30300   3rd Qu.:  58804           
##  Max.   :3408095   Max.   :1569000   Max.   :2118996           
##  NA's   :50030     NA's   :50030     NA's   :70276             
##  sec_app_earliest_cr_line disbursement_method
##  Length:2260668           Length:2260668     
##  Class :character         Class :character   
##  Mode  :character         Mode  :character   
##                                              
##                                              
##                                              
## 

1.2 Loan Distribution

Data to be observed

loan1[,c("loan_amnt","purpose")]
data1 <- loan1[,c("loan_amnt","purpose")]

1.2.1 Analyze the top three frequent loan requested

  1. Loading the required package
library(ggplot2)
library(scales)
  1. To see the distribution of loan amount requested using histogram
ggplot(data1, aes(loan_amnt)) + geom_histogram(bins=40, color="yellow", fill="blue") +
  scale_y_continuous(label=comma) +
  scale_x_continuous(label=comma)

Analysis result based on the histogram: The most frequent loan are: 1. USD 10,000 2. USD 20,000 3. USD 15,000.

1.2.2 The purpose of the loan

  1. To see the distribution of the loan purpose using geom_bar
ggplot(data1, aes(x = factor(""), fill = purpose) ) +  geom_bar() + coord_polar(theta = "y") +
     scale_x_discrete("")

  1. To see the percentage of the loan purpose distribution
prop.table(table(data1$purpose))
## 
##                car        credit_card debt_consolidation 
##       0.0106220816       0.2286806378       0.5652652225 
##        educational   home_improvement              house 
##       0.0001875552       0.0665542220       0.0062530190 
##     major_purchase            medical             moving 
##       0.0223142009       0.0121592379       0.0068134728 
##              other   renewable_energy     small_business 
##       0.0616808837       0.0006391916       0.0109211083 
##           vacation            wedding 
##       0.0068674392       0.0010417275

Analysis result: The most purpose of the loan is for debt consolidation (56.5%)