Insights

This dashboard communicates the ABZ business Loan Default Analysis for loans issued from 2007 to 2013:

  • The Metadata Info pane shows the data structure, before and after cleaning the data set, it shows visualization of missing values, data distribution and major statistics to look at before deciding on how to further manipulate/ analyse the data set. A clean data function was written to handle the data cleaning and other feature engineering.

  • The Loan Overview pane shows various information into current loan status and general loan status which is formulated to contain just 2 factors of Default and non default. Finding shows that:

    • 13,158 loans are in outstanding with 7,260 defaulting or overdue loans and 5,898 maintaining good standing as normal loans. 36,841 were already fully paid. The pie charts displays more info on the distribution.
    • Distribution of Payment Plan among all loan status category showed no difference whereas their verification status are distinguished in the chart.
    • The Distribution of Loan Amounts and interest rate by Current Loan Status was plotted as an histogram. The Loan amount distribution showed a slightly right skewed distribution, showing lots of lower amounts and some higher amount in default. The interest rate distribution shows us an almost normal distribution and more rate in the middle defaulting.
  • The Numeric Factors Correlation pane focused on numeric factors visual engaging Default/ loan overdue and more statistic. Finding shows that:

    • 36,149,044 loans volume principal are in outstanding with 2,054,452 overdue principal amount. 36 months shows the loan term of most defaulter.
    • The Portfolio at Risk report 6% volume for overdue >= 1 month and 55% customer counts for loan overdue >= 1 month which is a pointer that high number of clients are in overdue.
    • Correlation of Money and otherVariables with Default was displyed on the heat map to show their correlation with each other and the point biseral to show correlation with default, blue reflects very high correlation.
  • The Categorical Factors Correlation pane focused on categorical factors visual engaging Default/ loan overdue. Finding shows that:

    • The crosstabs or stack bar charts shows the proportion of loans in overdue compared with normal loans and shows pattern in some of the factors sub-category distribution.
    • Crosstabs that has shown no difference is payment plan, the visual is a continum from Loan overview where Y payment plan was absent, this is because of its insufficiency in the dataset, and hence cannot be seen as a contributor to default.
    • Crosstabs that has shown patterns is the grade and subgrade factor, showing as grade rises so does default.
    • Others that may show some effect on default are Loan purpose, house ownership, employment length, verisfication status and loan term
    • While 100% overdue doesnt implies high impact, because sufficiency for those attribute may be low, hence the need for statistical test to determine their true impact
  • The Statistical pane gives more information on the variables that have shown some influence on default. Finding shows that:

    • The Chisquare test measures the difference between the observed and expected frequencies of a categorical variable. Larger values indicate a greater discrepancy between the observed and expected distributions. The Logistic regression analysis was used to determine the best factors of high influence on default.
    • The Tukey HSD test results indicate that several numerical variables show significant differences between “Overdue” and “Normal” loans. This includes total accounts, open accounts, loan amount, outstanding principal, annual income, debt-to-income ratio, and revolving balance. The logistic regression will be used to emphasize what factor has high influence on default.
    • The logistic regression results indicate that several numerical variables, such as loan amount, outstanding principal, interest rate, annual income, debt-to-income ratio, revolving balance, and revolving utilization, significantly impact the likelihood of loan default.
    • Also, Higher loan grades and some sub-grades significantly increase the odds of default.Renting a home and certain loan terms or months of issuance are associated with higher or lower default odds. Verification status affects the likelihood of default, with verified loans being less risky.
  • The Default Prediction pane focused on trend and geographical visual engaging Default/ loan overdue. Finding shows that:

    • The line chart showing default trend by loan issued date date and last payment date reveals that as loan were issued accross the years, default rises and visible straight line of probably a disbursement stop between 2011 and 2013. Conversely default lowers with last payment date while repayments continued till 2016, this is an expected nuance.
    • The forcast line plot of default count by their issued date has an Mean Absolute Error MAE of 14.92 this means that the typical prediction error (the average amount by which the forecasted default counts deviate from the actual counts) is about 14.92 defaults. While the RMSE i.e. Root Mean Squared Error indicates that the standard deviation of the prediction errors (residuals) is about 18.39 defaults .
    • The presence of an RMSE higher than the MAE but still relatively close in value suggests that while the model is generally accurate, there are instances where it can significantly deviate.
    • For decision-making or strategic planning regarding loan defaults, understanding the magnitude of these errors helps in assessing the level of confidence to be placed in the forecasts. An MAE and RMSE in the range provided might be acceptable depending on the total volume of defaults and the operational thresholds for risk management.
    • The geographical Distribution of Loan Default by state shows the proportion of default counts from different regions in the USA. The plot reviews that Nebraska with 55% has more than half of the defaulter settled there. Operations needs to find out if there was a recent economic issue or events that might have contributed to them defaulting their loans.

Metadata Info

Row

Raw Data Metadata Info

Cleaned Data Metadata Info

Loan Overview

Row

Number of Current Outstanding Loans

13,158

Number of Current Overdue Loans

7,260

Number of Normal Outstanding Loans

5,898

Number of Fully Paid Loans

36,841

Row

Distribution of Customers by Current Loan Status

Distribution of Customers by General Loan Status

Row

Distribution of Payment Plans by Current Loan Status

Distribution of Verification by Current Loan Status

Row

Distribution of Loan Amounts by Current Loan Status

Distribution of Interest Rate by Current Loan Status

Numeric Factors Correlation

Row

Outstanding Principal Amount

36,149,044

Overdue Principal Amount

2,054,452

Not Overdue Principal Amount

34,094,592

Row

Portfolio at Risk of Outstanding Principal Volume >= 1 Month

Portfolio at Risk of Number of Outstanding Loans >= 1 Month

Row

Correlation of Money Variables

Correlation of Money Variables with Default

Row

Correlation of Other Numeric variables

Correlation of Other Numeric with Default

Categorical Factors Correlation

Row

Proportion of Loan Status by PaymentPlan

Proportion of Loan Status by Loan Purpose

Row

Proportion of Loan Status by Loan Grade

Proportion of Loan Status by Loan Sub-Grade

Row

Proportion of Loan Status by Home Ownership

Proportion of Loan Status by Home Ownership

Row

Proportion of Loan Status by Employment Length

Proportion of Loan Status by Term

Statistical Tests

Row

Chi-Squared Test with Categorical Variables

ChiSquare Model Coefficients
Variable Chi_squared_statistic p_value df
X-squared…1 pymnt_plan 0.0000 1 1
X-squared…2 grade 250.1126 0 6
X-squared…3 sub_grade 310.6010 0 34
X-squared…4 emp_length 218.1491 0 11
X-squared…5 home_ownership 207.8676 0 4
X-squared…6 verification_status 313.0192 0 1
X-squared…7 term 226.7492 0 1
X-squared…8 issue_d_month 3329.5434 0 11
X-squared…9 purpose 691.0797 0 13
X-squared…10 issue_d_quarter 1942.7961 0 3

Logistic Regression for Categorical Variables

Logistic Regression Model Coefficients
term estimate std.error statistic p.value Significant
(Intercept) 1.6527058 0.3499907 4.7221420 0.0000023 Yes
pymnt_plany 14.4606042 1455.3975412 0.0099358 0.9920725 No
gradeB 0.7124852 0.2857476 2.4934074 0.0126524 Yes
gradeC 0.9955667 0.2946349 3.3789842 0.0007275 Yes
gradeD 1.3629641 0.2988294 4.5610113 0.0000051 Yes
gradeE 1.4295385 0.3183308 4.4907331 0.0000071 Yes
gradeF 2.6006351 0.4164791 6.2443350 0.0000000 Yes
gradeG 2.4942394 0.7772359 3.2091148 0.0013314 Yes
sub_gradeA2 0.3828415 0.3348361 1.1433698 0.2528851 No
sub_gradeA3 0.3274452 0.3167883 1.0336403 0.3013043 No
sub_gradeA4 0.3030784 0.2999851 1.0103115 0.3123461 No
sub_gradeA5 0.2643993 0.2933961 0.9011683 0.3674989 No
sub_gradeB1 -0.5951774 0.1459064 -4.0791725 0.0000452 Yes
sub_gradeB2 -0.3879491 0.1313180 -2.9542722 0.0031341 Yes
sub_gradeB3 -0.3789249 0.1172080 -3.2329262 0.0012253 Yes
sub_gradeB4 0.0250711 0.1210175 0.2071695 0.8358775 No
sub_gradeB5 NA NA NA NA NA
sub_gradeC1 -0.4419554 0.1385400 -3.1900934 0.0014223 Yes
sub_gradeC2 -0.2232151 0.1394812 -1.6003236 0.1095268 No
sub_gradeC3 0.0186992 0.1450954 0.1288754 0.8974562 No
sub_gradeC4 0.0981383 0.1533159 0.6401054 0.5221041 No
sub_gradeC5 NA NA NA NA NA
sub_gradeD1 0.1816620 0.1800501 1.0089523 0.3129975 No
sub_gradeD2 -0.1536750 0.1547450 -0.9930853 0.3206684 No
sub_gradeD3 -0.0719586 0.1577842 -0.4560569 0.6483491 No
sub_gradeD4 -0.0611497 0.1632772 -0.3745148 0.7080213 No
sub_gradeD5 NA NA NA NA NA
sub_gradeE1 0.0852105 0.2031977 0.4193477 0.6749620 No
sub_gradeE2 0.0300125 0.2077623 0.1444561 0.8851403 No
sub_gradeE3 -0.1859306 0.2157402 -0.8618265 0.3887830 No
sub_gradeE4 0.5787950 0.2294200 2.5228622 0.0116404 Yes
sub_gradeE5 NA NA NA NA NA
sub_gradeF1 -1.2404505 0.3546911 -3.4972699 0.0004700 Yes
sub_gradeF2 -0.7764628 0.3886273 -1.9979624 0.0457207 Yes
sub_gradeF3 -1.0875014 0.3853991 -2.8217537 0.0047762 Yes
sub_gradeF4 -1.1513175 0.3885268 -2.9632897 0.0030437 Yes
sub_gradeF5 NA NA NA NA NA
sub_gradeG1 -0.8577060 0.7907521 -1.0846711 0.2780673 No
sub_gradeG2 -0.5485866 0.8198077 -0.6691649 0.5033903 No
sub_gradeG3 0.3270608 0.8794660 0.3718857 0.7099780 No
sub_gradeG4 0.3225451 1.0034121 0.3214483 0.7478707 No
sub_gradeG5 NA NA NA NA NA
emp_length< 1 year 0.2323862 0.1270072 1.8297092 0.0672934 No
emp_length1 year -0.0251927 0.1338577 -0.1882052 0.8507158 No
emp_length10+ years -0.1392011 0.1116754 -1.2464790 0.2125886 No
emp_length2 years 0.0220134 0.1263624 0.1742085 0.8617016 No
emp_length3 years 0.0345993 0.1283296 0.2696126 0.7874583 No
emp_length4 years -0.0082924 0.1341254 -0.0618256 0.9507017 No
emp_length5 years 0.0182283 0.1306379 0.1395329 0.8890290 No
emp_length6 years -0.1503224 0.1385617 -1.0848768 0.2779762 No
emp_length7 years -0.0691292 0.1421295 -0.4863817 0.6266965 No
emp_length8 years -0.1312769 0.1491507 -0.8801632 0.3787709 No
emp_length9 years -0.1802490 0.1557792 -1.1570802 0.2472396 No
home_ownershipNONE 15.3871139 1455.3975557 0.0105724 0.9915646 No
home_ownershipOTHER 13.3728978 252.0283816 0.0530611 0.9576832 No
home_ownershipOWN 0.0341118 0.0817470 0.4172849 0.6764700 No
home_ownershipRENT 0.2460572 0.0480739 5.1183108 0.0000003 Yes
verification_statusVerified -0.2062906 0.0504321 -4.0904602 0.0000431 Yes
term 60 months -1.6436077 0.0572637 -28.7024116 0.0000000 Yes
issue_d_month2 -0.7534895 0.1943474 -3.8770246 0.0001057 Yes
issue_d_month3 -0.7896859 0.1844267 -4.2818409 0.0000185 Yes
issue_d_month4 -0.7413206 0.1837779 -4.0337849 0.0000549 Yes
issue_d_month5 -0.7959124 0.1748894 -4.5509459 0.0000053 Yes
issue_d_month6 -0.1981516 0.1838645 -1.0777046 0.2811656 No
issue_d_month7 -0.4641927 0.1802142 -2.5757826 0.0100014 Yes
issue_d_month8 -0.5415381 0.1793230 -3.0199025 0.0025286 Yes
issue_d_month9 -0.3199730 0.1766392 -1.8114499 0.0700712 No
issue_d_month10 -0.4581047 0.1745120 -2.6250610 0.0086633 Yes
issue_d_month11 -0.4759382 0.1743484 -2.7298105 0.0063371 Yes
issue_d_month12 -3.1376292 0.1559773 -20.1159328 0.0000000 Yes
purposecredit_card -0.0287635 0.1446732 -0.1988170 0.8424059 No
purposedebt_consolidation 0.1924760 0.1369301 1.4056514 0.1598277 No
purposeeducational 13.7260310 148.1673699 0.0926387 0.9261906 No
purposehome_improvement 0.1686611 0.1584450 1.0644768 0.2871128 No
purposehouse 0.4597431 0.2826579 1.6265002 0.1038433 No
purposemajor_purchase 0.1564864 0.1786961 0.8757128 0.3811862 No
purposemedical 0.1244118 0.2140970 0.5810999 0.5611731 No
purposemoving 0.8203359 0.2786686 2.9437685 0.0032424 Yes
purposeother 0.3010665 0.1526723 1.9719786 0.0486120 Yes
purposerenewable_energy 0.9189327 0.5994389 1.5329881 0.1252788 No
purposesmall_business 0.6752288 0.1682018 4.0143980 0.0000596 Yes
purposevacation 0.4531766 0.2987755 1.5167794 0.1293224 No
purposewedding 0.0947866 0.2438763 0.3886669 0.6975226 No
issue_d_quarterQ2 NA NA NA NA NA
issue_d_quarterQ3 NA NA NA NA NA
issue_d_quarterQ4 NA NA NA NA NA

Row

Anova Of Numerical Variables Impact on Default

Tukey HSD Test Results
Comparison diff lwr upr p adj Variable
Overdue-Normal -2.619879e+00 -3.018276e+00 -2.2214831 0.0000000 total_acc
Overdue-Normal -1.419187e+00 -1.582460e+00 -1.2559145 0.0000000 open_acc
Overdue-Normal -2.265000e-03 -3.579300e-03 -0.0009508 0.0007317 acc_now_delinq
Overdue-Normal 6.270124e-01 5.732695e-01 0.6807553 0.0000000 inq_last_6mths
Overdue-Normal -3.107890e+03 -3.394621e+03 -2821.1594815 0.0000000 loan_amnt
Overdue-Normal -5.495482e+03 -5.625112e+03 -5365.8516433 0.0000000 out_prncp
Overdue-Normal -1.279968e-01 -2.623189e-01 0.0063253 0.0618065 int_rate
Overdue-Normal -1.142136e+04 -1.309231e+04 -9750.4080317 0.0000000 annual_inc
Overdue-Normal -2.041774e+00 -2.287704e+00 -1.7958444 0.0000000 dti
Overdue-Normal -1.256104e+03 -2.052050e+03 -460.1578888 0.0019832 revol_bal
Overdue-Normal -1.639959e-01 -1.055646e+00 0.7276539 0.7184656 revol_util
Note:
Tukey HSD results for numerical variables impacting default status.

Logistic Regression On Numerical Variables Impact

Logistic Regression Model Coefficients
term estimate std.error statistic p.value Significant
(Intercept) 1.0174557 0.1289445 7.8906478 0.0000000 Yes
loan_amnt 0.0000736 0.0000045 16.4315970 0.0000000 Yes
out_prncp -0.0010657 0.0000204 -52.3335657 0.0000000 Yes
int_rate -0.0307713 0.0081177 -3.7906518 0.0001503 Yes
annual_inc -0.0000032 0.0000007 -4.4663762 0.0000080 Yes
dti -0.0090563 0.0043186 -2.0970384 0.0359902 Yes
revol_bal 0.0000042 0.0000020 2.1301550 0.0331588 Yes
revol_util 0.0084652 0.0011602 7.2965356 0.0000000 Yes
total_acc -0.0040641 0.0034812 -1.1674484 0.2430293 No
open_acc -0.0010083 0.0083322 -0.1210132 0.9036806 No
acc_now_delinq -1.0496846 1.1554438 -0.9084688 0.3636306 No
inq_last_6mths 0.3051685 0.0236848 12.8845781 0.0000000 Yes

Default Prediction

Column

Forecast of Default Loans Counts by Loan Issue Dates

MAE: 14.92 RMSE: 18.39

Geographic Distribution of Loan Defaults by State