The dataset is loaded using the read_csv(). It is found that the dataset has 10000 rows and 55 columns, that is, 55 features. Given below is a glimpse of the dataset. The dataset has two types of values in it, namely, character type and double class.

## Rows: 10000 Columns: 55
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): emp_title, state, homeownership, verified_income, verification_inc...
## dbl (42): emp_length, annual_income, debt_to_income, annual_income_joint, de...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The first 10 rows of the dataset are given below:

emp_title emp_length state homeownership annual_income verified_income debt_to_income annual_income_joint verification_income_joint debt_to_income_joint delinq_2y months_since_last_delinq earliest_credit_line inquiries_last_12m total_credit_lines open_credit_lines total_credit_limit total_credit_utilized num_collections_last_12m num_historical_failed_to_pay months_since_90d_late current_accounts_delinq total_collection_amount_ever current_installment_accounts accounts_opened_24m months_since_last_credit_inquiry num_satisfactory_accounts num_accounts_120d_past_due num_accounts_30d_past_due num_active_debit_accounts total_debit_limit num_total_cc_accounts num_open_cc_accounts num_cc_carrying_balance num_mort_accounts account_never_delinq_percent tax_liens public_record_bankrupt loan_purpose application_type loan_amount term interest_rate installment grade sub_grade issue_month loan_status initial_listing_status disbursement_method balance paid_total paid_principal paid_interest paid_late_fees
global config engineer 3 NJ MORTGAGE 90000 Verified 18.01 NA NA NA 0 38 2001 6 28 10 70795 38767 0 0 38 0 1250 2 5 5 10 0 0 2 11100 14 8 6 1 92.9 0 0 moving individual 28000 60 14.07 652.53 C C3 Mar-2018 Current whole Cash 27015.86 1999.33 984.14 1015.19 0
warehouse office clerk 10 HI RENT 40000 Not Verified 5.04 NA NA NA 0 NA 1996 1 30 14 28800 4321 0 1 NA 0 0 0 11 8 14 0 0 3 16500 24 14 4 0 100.0 0 1 debt_consolidation individual 5000 36 12.61 167.54 C C1 Feb-2018 Current whole Cash 4651.37 499.12 348.63 150.49 0
assembly 3 WI RENT 40000 Source Verified 21.15 NA NA NA 0 28 2006 4 31 10 24193 16000 0 0 28 0 432 1 13 7 10 0 0 3 4300 14 8 6 0 93.5 0 0 other individual 2000 36 17.09 71.40 D D1 Feb-2018 Current fractional Cash 1824.63 281.80 175.37 106.43 0
customer service 1 PA RENT 30000 Not Verified 10.16 NA NA NA 0 NA 2007 0 4 4 25400 4997 0 1 NA 0 0 1 1 15 4 0 0 2 19400 3 3 2 0 100.0 1 0 debt_consolidation individual 21600 36 6.72 664.19 A A3 Jan-2018 Current whole Cash 18853.26 3312.89 2746.74 566.15 0
security supervisor 10 CA RENT 35000 Verified 57.96 57000 Verified 37.66 0 NA 2008 7 22 16 69839 52722 0 0 NA 0 0 1 6 4 16 0 0 10 32700 20 15 13 0 100.0 0 0 credit_card joint 23000 36 14.07 786.87 C C3 Mar-2018 Current whole Cash 21430.15 2324.65 1569.85 754.80 0
NA NA KY OWN 34000 Not Verified 6.46 NA NA NA 1 3 1990 6 32 12 42100 3898 0 0 60 0 0 0 2 5 12 0 0 1 27200 27 12 5 3 78.1 0 0 other individual 5000 36 6.72 153.75 A A3 Jan-2018 Current whole Cash 4256.71 873.13 743.29 129.84 0
hr 10 MI MORTGAGE 35000 Source Verified 23.66 155000 Not Verified 13.12 0 NA 2004 1 12 10 291852 18916 0 0 NA 0 0 2 1 9 10 0 0 3 9100 8 7 6 2 100.0 0 0 credit_card joint 24000 60 13.59 553.35 C C2 Jan-2018 Current whole Cash 22560.00 2730.51 1440.00 1290.51 0
police 10 AZ MORTGAGE 110000 Source Verified 16.19 NA NA NA 1 19 2005 1 30 15 342336 60805 0 0 71 0 0 2 4 7 15 NA 0 5 22250 16 12 10 7 93.0 0 0 debt_consolidation individual 20000 60 11.99 444.79 B B5 Jan-2018 Current whole Cash 19005.39 1765.84 994.61 771.23 0
parts 10 NV MORTGAGE 65000 Source Verified 36.48 NA NA NA 1 18 1998 3 35 21 283190 69561 0 0 18 0 0 6 10 4 21 0 0 11 36700 19 14 14 2 97.1 0 0 home_improvement individual 20000 36 13.59 679.58 C C2 Feb-2018 Current whole Cash 18156.66 2703.22 1843.34 859.88 0
4th person 3 IL RENT 30000 Not Verified 18.91 NA NA NA 0 NA 2001 0 9 6 33114 17768 0 0 NA 0 0 1 5 17 6 0 0 3 14700 7 5 3 0 100.0 0 0 credit_card individual 6400 36 6.71 196.77 A A3 Mar-2018 Current fractional Cash 6077.13 391.15 322.87 68.28 0

A more detailed description of the dataset is given below:

## Rows: 10,000
## Columns: 55
## $ emp_title                        <chr> "global config engineer", "warehouse …
## $ emp_length                       <dbl> 3, 10, 3, 1, 10, NA, 10, 10, 10, 3, 1…
## $ state                            <chr> "NJ", "HI", "WI", "PA", "CA", "KY", "…
## $ homeownership                    <chr> "MORTGAGE", "RENT", "RENT", "RENT", "…
## $ annual_income                    <dbl> 90000, 40000, 40000, 30000, 35000, 34…
## $ verified_income                  <chr> "Verified", "Not Verified", "Source V…
## $ debt_to_income                   <dbl> 18.01, 5.04, 21.15, 10.16, 57.96, 6.4…
## $ annual_income_joint              <dbl> NA, NA, NA, NA, 57000, NA, 155000, NA…
## $ verification_income_joint        <chr> NA, NA, NA, NA, "Verified", NA, "Not …
## $ debt_to_income_joint             <dbl> NA, NA, NA, NA, 37.66, NA, 13.12, NA,…
## $ delinq_2y                        <dbl> 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0…
## $ months_since_last_delinq         <dbl> 38, NA, 28, NA, NA, 3, NA, 19, 18, NA…
## $ earliest_credit_line             <dbl> 2001, 1996, 2006, 2007, 2008, 1990, 2…
## $ inquiries_last_12m               <dbl> 6, 1, 4, 0, 7, 6, 1, 1, 3, 0, 4, 4, 8…
## $ total_credit_lines               <dbl> 28, 30, 31, 4, 22, 32, 12, 30, 35, 9,…
## $ open_credit_lines                <dbl> 10, 14, 10, 4, 16, 12, 10, 15, 21, 6,…
## $ total_credit_limit               <dbl> 70795, 28800, 24193, 25400, 69839, 42…
## $ total_credit_utilized            <dbl> 38767, 4321, 16000, 4997, 52722, 3898…
## $ num_collections_last_12m         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ num_historical_failed_to_pay     <dbl> 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0…
## $ months_since_90d_late            <dbl> 38, NA, 28, NA, NA, 60, NA, 71, 18, N…
## $ current_accounts_delinq          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ total_collection_amount_ever     <dbl> 1250, 0, 432, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ current_installment_accounts     <dbl> 2, 0, 1, 1, 1, 0, 2, 2, 6, 1, 2, 1, 2…
## $ accounts_opened_24m              <dbl> 5, 11, 13, 1, 6, 2, 1, 4, 10, 5, 6, 7…
## $ months_since_last_credit_inquiry <dbl> 5, 8, 7, 15, 4, 5, 9, 7, 4, 17, 3, 4,…
## $ num_satisfactory_accounts        <dbl> 10, 14, 10, 4, 16, 12, 10, 15, 21, 6,…
## $ num_accounts_120d_past_due       <dbl> 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, …
## $ num_accounts_30d_past_due        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ num_active_debit_accounts        <dbl> 2, 3, 3, 2, 10, 1, 3, 5, 11, 3, 2, 2,…
## $ total_debit_limit                <dbl> 11100, 16500, 4300, 19400, 32700, 272…
## $ num_total_cc_accounts            <dbl> 14, 24, 14, 3, 20, 27, 8, 16, 19, 7, …
## $ num_open_cc_accounts             <dbl> 8, 14, 8, 3, 15, 12, 7, 12, 14, 5, 8,…
## $ num_cc_carrying_balance          <dbl> 6, 4, 6, 2, 13, 5, 6, 10, 14, 3, 5, 3…
## $ num_mort_accounts                <dbl> 1, 0, 0, 0, 0, 3, 2, 7, 2, 0, 2, 3, 3…
## $ account_never_delinq_percent     <dbl> 92.9, 100.0, 93.5, 100.0, 100.0, 78.1…
## $ tax_liens                        <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ public_record_bankrupt           <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0…
## $ loan_purpose                     <chr> "moving", "debt_consolidation", "othe…
## $ application_type                 <chr> "individual", "individual", "individu…
## $ loan_amount                      <dbl> 28000, 5000, 2000, 21600, 23000, 5000…
## $ term                             <dbl> 60, 36, 36, 36, 36, 36, 60, 60, 36, 3…
## $ interest_rate                    <dbl> 14.07, 12.61, 17.09, 6.72, 14.07, 6.7…
## $ installment                      <dbl> 652.53, 167.54, 71.40, 664.19, 786.87…
## $ grade                            <chr> "C", "C", "D", "A", "C", "A", "C", "B…
## $ sub_grade                        <chr> "C3", "C1", "D1", "A3", "C3", "A3", "…
## $ issue_month                      <chr> "Mar-2018", "Feb-2018", "Feb-2018", "…
## $ loan_status                      <chr> "Current", "Current", "Current", "Cur…
## $ initial_listing_status           <chr> "whole", "whole", "fractional", "whol…
## $ disbursement_method              <chr> "Cash", "Cash", "Cash", "Cash", "Cash…
## $ balance                          <dbl> 27015.86, 4651.37, 1824.63, 18853.26,…
## $ paid_total                       <dbl> 1999.330, 499.120, 281.800, 3312.890,…
## $ paid_principal                   <dbl> 984.14, 348.63, 175.37, 2746.74, 1569…
## $ paid_interest                    <dbl> 1015.19, 150.49, 106.43, 566.15, 754.…
## $ paid_late_fees                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

The dataset has the following (55) columns:

##  [1] "emp_title"                        "emp_length"                      
##  [3] "state"                            "homeownership"                   
##  [5] "annual_income"                    "verified_income"                 
##  [7] "debt_to_income"                   "annual_income_joint"             
##  [9] "verification_income_joint"        "debt_to_income_joint"            
## [11] "delinq_2y"                        "months_since_last_delinq"        
## [13] "earliest_credit_line"             "inquiries_last_12m"              
## [15] "total_credit_lines"               "open_credit_lines"               
## [17] "total_credit_limit"               "total_credit_utilized"           
## [19] "num_collections_last_12m"         "num_historical_failed_to_pay"    
## [21] "months_since_90d_late"            "current_accounts_delinq"         
## [23] "total_collection_amount_ever"     "current_installment_accounts"    
## [25] "accounts_opened_24m"              "months_since_last_credit_inquiry"
## [27] "num_satisfactory_accounts"        "num_accounts_120d_past_due"      
## [29] "num_accounts_30d_past_due"        "num_active_debit_accounts"       
## [31] "total_debit_limit"                "num_total_cc_accounts"           
## [33] "num_open_cc_accounts"             "num_cc_carrying_balance"         
## [35] "num_mort_accounts"                "account_never_delinq_percent"    
## [37] "tax_liens"                        "public_record_bankrupt"          
## [39] "loan_purpose"                     "application_type"                
## [41] "loan_amount"                      "term"                            
## [43] "interest_rate"                    "installment"                     
## [45] "grade"                            "sub_grade"                       
## [47] "issue_month"                      "loan_status"                     
## [49] "initial_listing_status"           "disbursement_method"             
## [51] "balance"                          "paid_total"                      
## [53] "paid_principal"                   "paid_interest"                   
## [55] "paid_late_fees"

There are a few problems with the dataset: 1. It has missing values, which can be counted column-wise as follows:

##                        emp_title                       emp_length 
##                              833                              817 
##                            state                    homeownership 
##                                0                                0 
##                    annual_income                  verified_income 
##                                0                                0 
##                   debt_to_income              annual_income_joint 
##                               24                             8505 
##        verification_income_joint             debt_to_income_joint 
##                             8545                             8505 
##                        delinq_2y         months_since_last_delinq 
##                                0                             5658 
##             earliest_credit_line               inquiries_last_12m 
##                                0                                0 
##               total_credit_lines                open_credit_lines 
##                                0                                0 
##               total_credit_limit            total_credit_utilized 
##                                0                                0 
##         num_collections_last_12m     num_historical_failed_to_pay 
##                                0                                0 
##            months_since_90d_late          current_accounts_delinq 
##                             7715                                0 
##     total_collection_amount_ever     current_installment_accounts 
##                                0                                0 
##              accounts_opened_24m months_since_last_credit_inquiry 
##                                0                             1271 
##        num_satisfactory_accounts       num_accounts_120d_past_due 
##                                0                              318 
##        num_accounts_30d_past_due        num_active_debit_accounts 
##                                0                                0 
##                total_debit_limit            num_total_cc_accounts 
##                                0                                0 
##             num_open_cc_accounts          num_cc_carrying_balance 
##                                0                                0 
##                num_mort_accounts     account_never_delinq_percent 
##                                0                                0 
##                        tax_liens           public_record_bankrupt 
##                                0                                0 
##                     loan_purpose                 application_type 
##                                0                                0 
##                      loan_amount                             term 
##                                0                                0 
##                    interest_rate                      installment 
##                                0                                0 
##                            grade                        sub_grade 
##                                0                                0 
##                      issue_month                      loan_status 
##                                0                                0 
##           initial_listing_status              disbursement_method 
##                                0                                0 
##                          balance                       paid_total 
##                                0                                0 
##                   paid_principal                    paid_interest 
##                                0                                0 
##                   paid_late_fees 
##                                0

We find that the column emp_title has 833 missing values, emp_length has 817 missing values, debt_to_income, annual_income_joint, verification_income_joint have 24, 8505 and 8545 missing values respectively, and so on. We first drop all the rows with missing values in the emp_title.

##                        emp_title                       emp_length 
##                                0                                0 
##                            state                    homeownership 
##                                0                                0 
##                    annual_income                  verified_income 
##                                0                                0 
##                   debt_to_income              annual_income_joint 
##                                1                             7875 
##        verification_income_joint             debt_to_income_joint 
##                             7909                             7875 
##                        delinq_2y         months_since_last_delinq 
##                                0                             5183 
##             earliest_credit_line               inquiries_last_12m 
##                                0                                0 
##               total_credit_lines                open_credit_lines 
##                                0                                0 
##               total_credit_limit            total_credit_utilized 
##                                0                                0 
##         num_collections_last_12m     num_historical_failed_to_pay 
##                                0                                0 
##            months_since_90d_late          current_accounts_delinq 
##                             7090                                0 
##     total_collection_amount_ever     current_installment_accounts 
##                                0                                0 
##              accounts_opened_24m months_since_last_credit_inquiry 
##                                0                             1122 
##        num_satisfactory_accounts       num_accounts_120d_past_due 
##                                0                              295 
##        num_accounts_30d_past_due        num_active_debit_accounts 
##                                0                                0 
##                total_debit_limit            num_total_cc_accounts 
##                                0                                0 
##             num_open_cc_accounts          num_cc_carrying_balance 
##                                0                                0 
##                num_mort_accounts     account_never_delinq_percent 
##                                0                                0 
##                        tax_liens           public_record_bankrupt 
##                                0                                0 
##                     loan_purpose                 application_type 
##                                0                                0 
##                      loan_amount                             term 
##                                0                                0 
##                    interest_rate                      installment 
##                                0                                0 
##                            grade                        sub_grade 
##                                0                                0 
##                      issue_month                      loan_status 
##                                0                                0 
##           initial_listing_status              disbursement_method 
##                                0                                0 
##                          balance                       paid_total 
##                                0                                0 
##                   paid_principal                    paid_interest 
##                                0                                0 
##                   paid_late_fees 
##                                0

We also change the column name of “emp_title” as “Job” and “emp_length” as “employment_len”

We replace the missing values in the rest of the columns with character class type as “Not Applicable” and for double class type as -1

We now take a look at the dataset, which has zero missing values

##                              job                   employment_len 
##                                0                                0 
##                            state                    homeownership 
##                                0                                0 
##                    annual_income                  verified_income 
##                                0                                0 
##                   debt_to_income              annual_income_joint 
##                                0                                0 
##        verification_income_joint             debt_to_income_joint 
##                                0                                0 
##                        delinq_2y         months_since_last_delinq 
##                                0                                0 
##             earliest_credit_line               inquiries_last_12m 
##                                0                                0 
##               total_credit_lines                open_credit_lines 
##                                0                                0 
##               total_credit_limit            total_credit_utilized 
##                                0                                0 
##         num_collections_last_12m     num_historical_failed_to_pay 
##                                0                                0 
##            months_since_90d_late          current_accounts_delinq 
##                                0                                0 
##     total_collection_amount_ever     current_installment_accounts 
##                                0                                0 
##              accounts_opened_24m months_since_last_credit_inquiry 
##                                0                                0 
##        num_satisfactory_accounts       num_accounts_120d_past_due 
##                                0                                0 
##        num_accounts_30d_past_due        num_active_debit_accounts 
##                                0                                0 
##                total_debit_limit            num_total_cc_accounts 
##                                0                                0 
##             num_open_cc_accounts          num_cc_carrying_balance 
##                                0                                0 
##                num_mort_accounts     account_never_delinq_percent 
##                                0                                0 
##                        tax_liens           public_record_bankrupt 
##                                0                                0 
##                     loan_purpose                 application_type 
##                                0                                0 
##                      loan_amount                             term 
##                                0                                0 
##                    interest_rate                      installment 
##                                0                                0 
##                            grade                        sub_grade 
##                                0                                0 
##                      issue_month                      loan_status 
##                                0                                0 
##           initial_listing_status              disbursement_method 
##                                0                                0 
##                          balance                       paid_total 
##                                0                                0 
##                   paid_principal                    paid_interest 
##                                0                                0 
##                   paid_late_fees 
##                                0

#1 Visualization - Interest Rate VS Grade on term of loan

We find that as the value of Grade increases from A to G, the interest rate also increases linearly. Therefore, a linear relationship between the Grade and Interest Rate is seen, which is applicable for both terms of 36 months and 60 months. Which is in compliance with the fact that Higher-grade loans (i.e. A,B,C) indicate better credit and lower risk while lower grade loans (i.e. E, F, G) indicate the opposite, and thus lower interest rate for Higher-grade loans and higher interest rates for lower grade loans.

#2 Visualization - Purpose of Loan VS Loan Amount highlighting the status of Loan on term of loan

We find that for term of 36 months, the highest loan amounts correspond to the purposes like Renewable Energy, Moving, Major Purchases, Debt Consolidation, Home Improvement, Car and Credit Cards, all of whose loan status is currently going on. We also found that people take higher loans for all purposes for the 60 month term, and there are less number of late loan statuses for 60 month term, which shows that there are lesser instances where people are not able to pay back their loan on time for a longer loan period.

#3 Visualization - Loan Amount VS Grade highlighting Sub Grade on term of loan

Here we do not observe a linear relationship between Grade and Loan Amount, rather higher loan amounts are observed for 60 month term than the 30 month term for all the Grades (A-G). We also observe that Grade A in the 60 month term has the highest Loan Amount, and that Grade G does not have a significant Loan Amount for 36 month term.

#4 Visualization - State VS Percentage of Loans

Here we observe (in decreasing order) the state-wise trends in the form of loan percentage for loan percentage greated than 1%. We find that people in California takes the majority of loans with 13.39%.

#5 Visualization - Purpose VS Loan Amount on Type of Application

The relationship between the purpose and loan amount is observed here, and we can also see the type of application for the loan here. It is seen that a joint application type corresponds to higher loan amount, and this relationship is extremely visible for purposes like Small Businesses and House.

#6 Visualization - Loan Amount VS Annual Income highlighting Rate of Interest on Term of Loan

Here we observe the trends between Loan Amount, Annual Income and Rate of Interest over 36 month and 60 month terms. It can be clearly seen that people prefer to take loans of higher amounts for a 60 term period for lesser Rate of Interests.

## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 81 rows containing non-finite values (stat_smooth).
## Warning: Removed 81 rows containing missing values (geom_point).