| Model | RMSE | R2 |
|---|---|---|
| Model 1: Simple (2 vars) | 6603.73 | 0.0409 |
| Model 2: Extended (9 vars) ✓ | 5822.53 | 0.3468 |
ECON 465 – Stage 3 Presentation
What factors predict the loan amount a borrower requests?
We tested whether income and interest rate alone are sufficient — or whether loan purpose and a broader borrower profile add independent predictive power.
The original distribution is right-skewed — most borrowers request small loans while few request very large amounts. After applying a log transformation, the distribution becomes approximately normal. This confirms a log-normal distribution, which is consistent with economic theory for strictly positive financial quantities. The shape of the distribution informs our understanding of borrower behavior and guides our modeling choices.
This boxplot shows clear and systematic differences across loan intent categories. Home improvement and debt consolidation loans are consistently larger than personal or medical loans, even before controlling for income. This pattern is economically meaningful — borrowers with different purposes have different financing needs. It directly motivates including loan intent as a key predictor in our models.
We built two linear regression models trained on 80% of the data and tested on the remaining 20%.
| Model 1 | Model 2 | |
|---|---|---|
| Predictors | Income, Interest Rate | + Age, Employment, Intent, Grade, Home Ownership, Loan % Income, Credit History |
| Goal | Baseline — are basic financial indicators enough? | Extended — does a richer borrower profile improve predictions? |
Model 1 tests whether standard financial metrics alone are sufficient. Model 2 tests whether a broader borrower profile adds genuine predictive power. Both models use set.seed(465) for reproducibility.
| Model | RMSE | R2 |
|---|---|---|
| Model 1: Simple (2 vars) | 6603.73 | 0.0409 |
| Model 2: Extended (9 vars) ✓ | 5822.53 | 0.3468 |
Model 2 achieves substantially lower RMSE and much higher R-squared on the test set. RMSE measures average prediction error in USD — lower is better. R-squared measures how much of the variation in loan amounts the model explains — higher is better. Loan percent income and loan grade were the variables that added the most predictive power. This confirms that a richer borrower profile leads to meaningfully better predictions. Model 2 is selected.
| Variable | Estimate | Meaning |
|---|---|---|
| person_income | 0.059 | Higher income → larger loan |
| loan_int_rate | 169.848 | Higher rate → larger loan |
| person_emp_length | 78.842 | More stable → larger loan |
| loan_intentHOMEIMPROVEMENT | 494.338 | Intent affects loan size |
| loan_gradeC | -653.336 | Grade affects loan size |
| loan_gradeF | 1465.729 | Grade affects loan size |
| loan_gradeG | 2668.278 | Grade affects loan size |
| loan_percent_income | 43031.403 | Higher debt burden → larger loan |
Each coefficient shows how much loan amount changes in USD for a one-unit increase in that predictor, holding everything else constant. Income is positive as expected — higher earners request larger loans. The positive interest rate coefficient reflects adverse selection: riskier borrowers face higher rates and tend to request larger loans. Loan intent and loan grade confirm that both borrower purpose and lender risk assessment independently shape loan demand.
| Metric | CV_Mean | Test_Set | Difference |
|---|---|---|---|
| RMSE | 3953.7578 | 5822.5308 | 1868.7729 |
| R2 | 0.6115 | 0.3468 | -0.2647 |
We performed 5-fold cross-validation on Model 2 to check whether the results generalize to new data. The CV and test set results are close across both metrics, confirming that the model is stable and not overfitting. Performance is consistent across all five folds, meaning the results do not depend on any particular subset of the training data. This gives us confidence that the model would perform similarly on entirely new borrower data.
The answer to our research question is that it is not just income.
What is missing from our analysis:
If we had more time or better data, we would include credit score as a predictor and explore interaction terms between income and loan intent to test whether the income-loan size relationship differs across borrowing purposes.
| Finding | |
|---|---|
| 01 | Loan amount predicted using 9 variables including loan grade and loan percent income |
| 02 | Extended profile substantially outperforms income-only model |
| 03 | Loan intent and income are among the strongest predictors |
| 04 | Results stable across 5-fold CV — no overfitting |
The key insight is that loan purpose and borrower profile matter independently — lenders that ignore them may systematically mismatch loan offers. Future question: Does income predict loan size differently across intent categories?
Thank you — we welcome your questions.
ECON 465 · Stage 3