What Predicts Loan Amount?

ECON 465 – Stage 3 Presentation

Efe Şahin - Ömer Faruk Yılmaz

Economic Question

What factors predict the loan amount a borrower requests?

We tested whether income and interest rate alone are sufficient — or whether loan purpose and a broader borrower profile add independent predictive power.

  • Dataset: Credit Risk Dataset (Kaggle, 32,581 observations)
  • Target variable: loan amount in USD — continuous
  • Predictors: income, interest rate, age, employment length, loan intent, loan grade, home ownership, loan percent income, credit history length

Probability Analysis

The original distribution is right-skewed — most borrowers request small loans while few request very large amounts. After applying a log transformation, the distribution becomes approximately normal. This confirms a log-normal distribution, which is consistent with economic theory for strictly positive financial quantities. The shape of the distribution informs our understanding of borrower behavior and guides our modeling choices.

Loan Amount by Intent

This boxplot shows clear and systematic differences across loan intent categories. Home improvement and debt consolidation loans are consistently larger than personal or medical loans, even before controlling for income. This pattern is economically meaningful — borrowers with different purposes have different financing needs. It directly motivates including loan intent as a key predictor in our models.

Two Models

We built two linear regression models trained on 80% of the data and tested on the remaining 20%.

Model 1 Model 2
Predictors Income, Interest Rate + Age, Employment, Intent, Grade, Home Ownership, Loan % Income, Credit History
Goal Baseline — are basic financial indicators enough? Extended — does a richer borrower profile improve predictions?

Model 1 tests whether standard financial metrics alone are sufficient. Model 2 tests whether a broader borrower profile adds genuine predictive power. Both models use set.seed(465) for reproducibility.

Model Comparison

Model RMSE R2
Model 1: Simple (2 vars) 6603.73 0.0409
Model 2: Extended (9 vars) ✓ 5822.53 0.3468

Model 2 achieves substantially lower RMSE and much higher R-squared on the test set. RMSE measures average prediction error in USD — lower is better. R-squared measures how much of the variation in loan amounts the model explains — higher is better. Loan percent income and loan grade were the variables that added the most predictive power. This confirms that a richer borrower profile leads to meaningfully better predictions. Model 2 is selected.

Key Coefficients

Variable Estimate Meaning
person_income 0.059 Higher income → larger loan
loan_int_rate 169.848 Higher rate → larger loan
person_emp_length 78.842 More stable → larger loan
loan_intentHOMEIMPROVEMENT 494.338 Intent affects loan size
loan_gradeC -653.336 Grade affects loan size
loan_gradeF 1465.729 Grade affects loan size
loan_gradeG 2668.278 Grade affects loan size
loan_percent_income 43031.403 Higher debt burden → larger loan

Each coefficient shows how much loan amount changes in USD for a one-unit increase in that predictor, holding everything else constant. Income is positive as expected — higher earners request larger loans. The positive interest rate coefficient reflects adverse selection: riskier borrowers face higher rates and tend to request larger loans. Loan intent and loan grade confirm that both borrower purpose and lender risk assessment independently shape loan demand.

Cross-Validation

Metric CV_Mean Test_Set Difference
RMSE 3953.7578 5822.5308 1868.7729
R2 0.6115 0.3468 -0.2647

We performed 5-fold cross-validation on Model 2 to check whether the results generalize to new data. The CV and test set results are close across both metrics, confirming that the model is stable and not overfitting. Performance is consistent across all five folds, meaning the results do not depend on any particular subset of the training data. This gives us confidence that the model would perform similarly on entirely new borrower data.

Economic Interpretation

The answer to our research question is that it is not just income.

  • Loan intent independently predicts loan size — home improvement and debt consolidation borrowers request larger amounts even at the same income level, reflecting genuine differences in financing needs
  • Loan percent income and loan grade add substantial predictive signal beyond basic financial indicators, confirming that lenders’ own risk assessments and debt burden matter
  • A lender relying only on income may consistently offer the wrong loan size to borrowers with specific purposes
  • Incorporating loan purpose and borrower profile leads to more accurate, better-matched lending decisions and more efficient credit allocation

Limitations

What is missing from our analysis:

  • No credit score variable — this is the most important real-world determinant of loan terms, and its absence limits predictive power and may introduce omitted variable bias
  • Cross-sectional data — the dataset covers a single point in time and cannot capture how borrowing behavior changes over the economic cycle; during recessions, borrowers may shift toward smaller, necessity-driven loans

If we had more time or better data, we would include credit score as a predictor and explore interaction terms between income and loan intent to test whether the income-loan size relationship differs across borrowing purposes.

Conclusion

Finding
01 Loan amount predicted using 9 variables including loan grade and loan percent income
02 Extended profile substantially outperforms income-only model
03 Loan intent and income are among the strongest predictors
04 Results stable across 5-fold CV — no overfitting

The key insight is that loan purpose and borrower profile matter independently — lenders that ignore them may systematically mismatch loan offers. Future question: Does income predict loan size differently across intent categories?

Thank you — we welcome your questions.