Credit Risk Analysis Presentation

Miguel

title: “Credit Risk Analysis Presentation” author: “Miguel” format: revealjs: theme: sky transition: fade slide-number: true

1. Introduction & Objectives

Main Goal: Create a data model to predict if a client will pay back their loan or default (default_status).
Why it matters: Banks need a clear, automatic system to check risk instead of just guessing.
The Balance: We want to reject risky clients (to save money) but accept good clients.

---

2. Economic Question

“How do annual income, housing status, and the loan amount together determine a borrower’s financial constraint and change their risk of default?”

The Problem: Looking at variables separately is not enough. We need to see the whole picture.

Resources vs Obligations: Income and housing are resources; the loan amount is the obligation.

Hypothesis: When a client has low income, rents, and asks for a big loan, their risk increases exponentially.

3. Data Description & Structure

Data Source: Financial dataset containing 32,581 historical loan applications.
Outcome Variable (Target): default_status (Factor: 1 for Default, 0 for Non-Default).
Main Socioeconomic Predictors: * income: Borrower’s annual income (numerical).
- home_ownership: (categorical: Rent, Mortgage, Own).
- * loan_amount: Total size of the requested loan (numerical).
Control Variables: interest_rate (numerical) and age (numerical).

---

4. Key Findings: Distributions

Target Distribution: Around 78% of the clients in our dataset are safe (0), while 22% defaulted (1).
The Imbalance Challenge: Because default is less common, the model has to work harder to detect the risky profiles.
Loan Amount Insights: The average loan size in our data is around $9,600, but it ranges from small amounts up to $35,000.
Economic Reality: Most borrowers ask for standard amounts, but the high-risk “tail” (large loans) is where the danger lies.

---

5. Modeling Strategy

The Data Split: We divided our dataset into two parts using a fixed seed (465).
- Training Set (80%): 26,065 observations to build and train our models.
- Testing Set (20%): 6,516 observations to test how the models perform with new data.
Model 1 (Baseline): A simple model using only loan_amount and income.
Model 2 (Advanced): Our main model. It combines the 3 key pillars (loan_amount, income, home_ownership) plus control variables (interest_rate and age).

---

6. Model Validation (Cross-Validation)

The Technique: We used a 5-Fold Cross-Validation on our training data.
How it works: The data is split into 5 parts. The model trains on 4 parts and tests on the remaining 1. This repeats 5 times.
Why we do it: To avoid overfitting and make sure our model works well with any random sample of clients.
The Result: Model 2 achieved a very stable average Accuracy of 83.1% across all folds.

7. Model Performance Comparison

Accuracy: 0.831

Precision: 0.716

Recall: 0.396

Metric	Model 1 (Baseline)	Model 2 (Advanced)	Why it matters
Accuracy (Total Correct)	78.2%	83.1%	Model 2 makes fewer total mistakes.
Recall (Detecting Defaults)	12.4%	39.6%	Model 2 catches 3x more risky clients!
Precision (True Alarms)	52.1%	71.6%	When Model 2 flags a client, it is more reliable.

---

8. Main Results: Error Analysis

The Alternative Rejected (Model 1): Left the bank blind to risk with too many False Negatives .
The Selected Model (Model 2): Minimizes the most dangerous errors
Economic Interpretation of Errors:
- False Positives (Precision = 71.6%): Rejecting a good client. .
- False Negatives (Recall = 39.6%): Accepting a bad client.
The Decision: Model 2 is chosen because it optimizes this trade-off, protecting the bank’s capital from catastrophic losses.

---

9. Recommendations & Limitations

Strategic Recommendation: Lower the classification threshold to 0.3.
- Losing the money from a defaulted loan is much more expensive than a false alarm.
Study Limitations:
- Data Constraints: We only have a snapshot of historical data, not real-time financial tracking.
- Missing Variables: The dataset lacks macroeconomic indicators (like inflation or unemployment rates) that also affect a client’s risk of default.

---

10. Final Reflections & Future Research

Proposed Improvement: Incorporate the borrowers’ Debt-to-Income Ratio (DTI) and Credit Score/Past Delinquency History to capture behavioral data and eliminate the Low Recall issue.
Future Economic Question Inspired: > “How do macroeconomic shocks (inflation and rate hikes) alter the financial constraints of low-income renters compared to wealthy homeowners, and how does this asymmetry impact a bank’s default rate?” —