# Core stack
import numpy as np
import pandas as pd
# Modeling
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
# Metrics & diagnostics
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import statsmodels.api as smCredit Risk
FZ2024 Financial Modeling and Programming
0.1 Before you begin: important instructions for all Workshops
Welcome to our workshop series! Please read these instructions carefully before starting any activity. Following these guidelines will make your work smoother and ensure that your submissions are graded without issues.
0.1.1 Working environment
We will use Google Colab for all workshops. Colab runs Python in the cloud — you don’t need to install anything locally.
- Access Colab at: https://colab.research.google.com/
- Sign in with your institutional Google account for access to all features.
- Always save a copy of the notebook to your Google Drive:
- Go to File → Save a copy in Drive.
0.1.2 Loading data
You may work with datasets provided by the instructor or public datasets online. You will receive instructions each time to load the data with Python code. However, it is a good idea to store files, like data or your own notes, in a dedicated Google Drive folder:
- Create a folder in your Google Drive named
fz2024_workshops(or similar). - Upload your datasets there.
0.1.3 Output and submission format
- After completing the workshop, export your notebook as PDF:
- In Colab: File → Print → Save as PDF.
- Submit the PDF file through Canvas, as well as the
.ipynb. - Include all outputs, tables, and graphs in your PDF — make sure you run all cells before exporting.
- Name your PDF file using the following format:
Lastname_Firstname_WorkshopX.pdf
0.1.4 Deadlines
All assignments must be uploaded to Canvas before the stated deadline. Late submissions are not accepted. Once you have read and understood these instructions, you are ready to begin the workshop!
1 Overview
In this workshop you will build credit risk models to estimate the Probability of Default (PD) using Logistic Regression, Linear Discriminant Analysis (LDA), Decision Tree Classifier, and Ridge Classifier. We work with a LendingClub sample and follow a simple Internal Ratings-Based (IRB)-style flow:
- Define the dependent variable
- Prepare features
- Split data
- Train models
- Evaluate with Accuracy and Recall, and
- Extract PD from the best model.
Learning goals: understand credit risk and PD, translate Basel IRB intuition into a modeling workflow, train/compare 4 classifiers, evaluate with Accuracy & Recall, and produce PD estimates and a new-applicant prediction.
2 Credit Risk & Basel in a nutshell
Credit risk: the risk a borrower will not meet obligations. This definition can be expanded to the probability that the counterpart in any agreement will not fulfill their commitment. Basel expected loss (EL) uses three pillars:
\text{EL} = \text{PD} \times \text{LGD} \times \text{EAD}.
- Probability of Default (PD): likelihood of default within a horizon (often 12 months).
- Loss Given Default (LGD): fraction of exposure lost when default occurs (after recoveries).
- Exposure at Default (EAD): amount ($) outstanding at the moment of default.
Takeaway: PD is the probability lever; LGD is the severity lever; EAD is the exposure lever. Changing any lever changes EL — and capital, pricing, and strategy follow.
2.1 Basel Accords: why PD modeling matters
Banks don’t model default just for sport — they do it because capital, provisioning, pricing, and strategy depend on it. Basel’s Expected Loss (EL) identity, is the “bridge” between risk measurement and business decisions.
Basel I → II → III (and now IV in some jurisdictions):
- Basel I (1988): first global minimum capital rules; simple risk weights, mostly about credit risk.
- Basel II (2004): three pillars (Pillar 1 capital, Pillar 2 supervisory review, Pillar 3 market discipline), plus Operational Risk; crucially, allows IRB so banks can model PD, LGD, EAD.
- Basel III (2010+): post-GFC tightening — higher quality capital, leverage ratio, liquidity, and buffers (capital conservation, countercyclical).
How we study credit risk in practice for retail/SME lending:
- Standardized Approach (SA) uses regulator-set risk weights — easy, conservative, less sensitive to borrower risk.
- IRB Approach estimates PD/LGD/EAD internally — more risk-sensitive but requires model governance: data lineage, validation, backtesting, stability, and conservatism.
We focus on PD — mapping borrower features X to \Pr(\text{Default}=1 \mid X).
Even if you’re not building regulatory capital models, a good PD model informs loan approvals, risk-based pricing, limits, and collections strategy. Later, PDs roll up into portfolio EL, stress testing, and capital planning.
A PD model is a probability map from borrower features X to the chance of default. With logistic regression:
\Pr(\text{Default}=1\mid X)=\frac{1}{1+e^{-(\beta_0+X\beta)}}.
3 Setup
We’ll use libraries available in Google Colab and standard Python environments. If you’re in Colab, no special installs should be required.
3.1 Core Python libraries for Financial Modeling and Machine Learning
In this workshop, we use a small ecosystem of Python libraries that together form the foundation of most data science and machine learning projects.
Each one has a specific role in the workflow:
| Library | Import Name | Main Purpose |
|---|---|---|
| NumPy | import numpy as np |
Provides fast mathematical operations, array manipulation, and linear algebra tools. It is the numerical backbone of all modern Python data libraries. |
| pandas | import pandas as pd |
Used for data loading, cleaning, and manipulation. It introduces the DataFrame, a spreadsheet-like structure ideal for tabular data. |
| scikit-learn | from sklearn import ... |
The main machine learning library in Python. It includes algorithms for classification, regression, and clustering, as well as tools for preprocessing, model validation, and performance metrics. |
| matplotlib | import matplotlib.pyplot as plt |
The standard library for data visualization. We use it to create simple charts, histograms, and diagnostic plots. |
| statsmodels | import statsmodels.api as sm |
Focused on statistical modeling. It complements scikit-learn by providing detailed regression summaries, significance tests, and econometric-style analysis. |
Together, these libraries let us:
- Load and explore data (
pandas,NumPy), - Prepare and transform variables (
scikit-learnpreprocessors), - Build and validate models (
scikit-learn,statsmodels), - Visualize results (
matplotlib).
Think of them as a pipeline: NumPy handles raw numbers → pandas organizes data → scikit-learn learns patterns → statsmodels interprets results → matplotlib visualizes findings.
3.2 Data Workflow Summary
Before any modeling, it’s essential to understand the full data workflow — from loading the dataset to connecting preprocessing with the model. Each step below ensures that the data entering the model is structured, scaled, and encoded correctly. This guarantees consistency, reproducibility, and fair comparison among different algorithms.
| Step | Tool / Function | Purpose |
|---|---|---|
| Load data | pd.read_csv(url_or_path) |
Imports the dataset from a file or URL into a pandas DataFrame. |
| Inspect data | .head(), .info(), .describe() |
Quickly review the first rows, data types, and summary statistics to detect anomalies or missing values. |
| Split dataset | train_test_split(X, y, test_size=..., random_state=#) |
Divides data into training and testing sets, ensuring the same class distribution and reproducible results. |
Together, these steps form the foundation of any supervised learning project. They standardize how data is handled before reaching the model, making experiments replicable and minimizing manual mistakes.
Quick reminders
- Load data:
pd.read_csv(url_or_path); inspect with.head(),.info(),.describe(). - Split:
train_test_split(X, y, test_size=..., random_state=#).
4 Data: LendingClub Sample
We use a cleaned sample hosted online. Below, you will find the variables descriptions. The original data set has at least 2 million observations and 150 variables. Inside the file “credit.xlsx,” you will find only 873 observations (rows) and 70 columns. Each row represents a Lendingclub client. We previously made the data cleaning (missing values, correlated variables, Zero- and Near Zero-Variance Predictors).
The goal is to predict whether the loan will default (Charged Off) or be fully repaid (Fully Paid).
| Variable | Type | Meaning / Description | Interpretation in Credit Risk |
|---|---|---|---|
Default |
Categorical (target) | Loan outcome: "Fully Paid" (0) or "Charged Off" (1). |
Dependent variable — whether the client defaulted. |
term |
Numeric | Length of the loan in years (1 = short-term, 2 = long-term). | Longer terms increase uncertainty and usually risk. |
installment |
Numeric | Monthly payment amount due on the loan. | Higher installments may strain borrower capacity. |
grade |
Categorical (A–G mapped to numeric) | Internal credit grade assigned by LendingClub. | Proxy for borrower credit quality (lower = riskier). |
emp_length |
Numeric | Years of employment at current job. | Longer employment often signals stability and lower risk. |
home_ownership |
Categorical | Borrower’s housing status (e.g., rent, mortgage, own). | Homeowners may be more stable; renters slightly riskier. |
annual_inc |
Numeric | Annual income of the borrower in U.S. dollars. | Higher income improves repayment capacity. |
verification_status |
Categorical | Indicates whether the borrower’s income was verified. | Verified income reduces uncertainty about true earnings. |
purpose |
Categorical | Purpose of the loan (e.g., debt consolidation, car, credit card). | Some purposes (like debt consolidation) historically riskier. |
num_il_tl |
Numeric | Number of installment accounts the borrower has. | Many existing loans may signal leverage. |
num_rev_accts |
Numeric | Number of revolving (credit card) accounts. | Too many revolving accounts can raise risk. |
percent_bc_gt_75 |
Numeric (percentage) | % of bankcard accounts where balance > 75% of limit. | High utilization indicates potential over-indebtedness. |
pub_rec_bankruptcies |
Numeric (count) | Number of public bankruptcy records. | Any record increases default risk significantly. |
total_bc_limit |
Numeric | Total bankcard credit limit available. | Higher limits may show creditworthiness or exposure risk. |
These features capture capacity, character, and credit behavior — the core “3 Cs” of credit analysis. Models use them jointly to estimate the Probability of Default (PD) for new applicants.
url = "https://raw.githubusercontent.com/abernal30/ml_book/main/credit.csv"
credit = pd.read_csv(url)
credit.info()
credit.head()
credit.describe()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 873 entries, 0 to 872
Data columns (total 71 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Default 873 non-null object
1 term 873 non-null int64
2 installment 873 non-null float64
3 grade 873 non-null int64
4 emp_title 873 non-null float64
5 emp_length 873 non-null int64
6 home_ownership 873 non-null int64
7 annual_inc 873 non-null float64
8 verification_status 873 non-null int64
9 purpose 873 non-null int64
10 title 873 non-null int64
11 zip_code 873 non-null int64
12 addr_state 873 non-null int64
13 dti 873 non-null float64
14 delinq_2yrs 873 non-null int64
15 earliest_cr_line 873 non-null int64
16 fico_range_high 873 non-null int64
17 inq_last_6mths 873 non-null int64
18 pub_rec 873 non-null int64
19 revol_bal 873 non-null int64
20 revol_util 873 non-null float64
21 total_acc 873 non-null int64
22 total_rec_int 873 non-null float64
23 recoveries 873 non-null float64
24 last_pymnt_d 873 non-null int64
25 last_pymnt_amnt 873 non-null float64
26 last_credit_pull_d 873 non-null int64
27 last_fico_range_high 873 non-null int64
28 last_fico_range_low 873 non-null int64
29 tot_coll_amt 873 non-null int64
30 tot_cur_bal 873 non-null int64
31 open_acc_6m 873 non-null int64
32 open_act_il 873 non-null int64
33 open_il_12m 873 non-null int64
34 open_il_24m 873 non-null int64
35 mths_since_rcnt_il 873 non-null int64
36 total_bal_il 873 non-null int64
37 il_util 873 non-null int64
38 open_rv_12m 873 non-null int64
39 open_rv_24m 873 non-null int64
40 max_bal_bc 873 non-null int64
41 all_util 873 non-null int64
42 total_rev_hi_lim 873 non-null int64
43 inq_fi 873 non-null int64
44 total_cu_tl 873 non-null int64
45 inq_last_12m 873 non-null int64
46 acc_open_past_24mths 873 non-null int64
47 avg_cur_bal 873 non-null int64
48 bc_open_to_buy 873 non-null int64
49 bc_util 873 non-null float64
50 mo_sin_old_il_acct 873 non-null float64
51 mo_sin_old_rev_tl_op 873 non-null int64
52 mo_sin_rcnt_rev_tl_op 873 non-null int64
53 mo_sin_rcnt_tl 873 non-null int64
54 mort_acc 873 non-null int64
55 mths_since_recent_bc 873 non-null int64
56 mths_since_recent_inq 873 non-null int64
57 num_accts_ever_120_pd 873 non-null int64
58 num_actv_bc_tl 873 non-null int64
59 num_bc_sats 873 non-null int64
60 num_bc_tl 873 non-null int64
61 num_il_tl 873 non-null int64
62 num_op_rev_tl 873 non-null int64
63 num_rev_accts 873 non-null int64
64 num_rev_tl_bal_gt_0 873 non-null int64
65 num_sats 873 non-null int64
66 num_tl_op_past_12m 873 non-null int64
67 pct_tl_nvr_dlq 873 non-null float64
68 percent_bc_gt_75 873 non-null float64
69 pub_rec_bankruptcies 873 non-null int64
70 total_bc_limit 873 non-null int64
dtypes: float64(12), int64(58), object(1)
memory usage: 484.4+ KB
| term | installment | grade | emp_title | emp_length | home_ownership | annual_inc | verification_status | purpose | title | ... | num_il_tl | num_op_rev_tl | num_rev_accts | num_rev_tl_bal_gt_0 | num_sats | num_tl_op_past_12m | pct_tl_nvr_dlq | percent_bc_gt_75 | pub_rec_bankruptcies | total_bc_limit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | ... | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 | 873.000000 |
| mean | 1.219931 | 447.911707 | 2.538373 | 334.707331 | 4.298969 | 1.792669 | 79421.277010 | 1.768614 | 3.306987 | 4.223368 | ... | 8.798396 | 8.749141 | 15.305842 | 5.840779 | 12.406644 | 2.463918 | 94.175716 | 39.036197 | 0.135166 | 24827.071019 |
| std | 0.414437 | 258.247786 | 1.271135 | 181.498130 | 2.513489 | 0.915885 | 42055.600733 | 0.779070 | 1.814736 | 1.667774 | ... | 7.385868 | 4.635333 | 8.239209 | 3.217439 | 5.543059 | 1.944503 | 9.026038 | 35.194738 | 0.355253 | 24568.268359 |
| min | 1.000000 | 32.970000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 13000.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 0.000000 | 1.000000 | 2.000000 | 0.000000 | 2.000000 | 0.000000 | 39.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 1.000000 | 256.050000 | 2.000000 | 191.000000 | 3.000000 | 1.000000 | 50000.000000 | 1.000000 | 2.000000 | 3.000000 | ... | 4.000000 | 5.000000 | 10.000000 | 4.000000 | 9.000000 | 1.000000 | 91.700000 | 0.000000 | 0.000000 | 8800.000000 |
| 50% | 1.000000 | 391.620000 | 2.000000 | 335.500000 | 3.000000 | 1.000000 | 70000.000000 | 2.000000 | 3.000000 | 4.000000 | ... | 7.000000 | 8.000000 | 14.000000 | 5.000000 | 11.000000 | 2.000000 | 97.900000 | 33.300000 | 0.000000 | 17500.000000 |
| 75% | 1.000000 | 612.890000 | 3.000000 | 487.000000 | 6.000000 | 3.000000 | 100000.000000 | 2.000000 | 3.000000 | 4.000000 | ... | 11.000000 | 11.000000 | 19.000000 | 7.000000 | 15.000000 | 3.000000 | 100.000000 | 66.700000 | 0.000000 | 33200.000000 |
| max | 2.000000 | 1252.560000 | 7.000000 | 648.000000 | 11.000000 | 3.000000 | 450000.000000 | 3.000000 | 11.000000 | 11.000000 | ... | 59.000000 | 30.000000 | 80.000000 | 28.000000 | 46.000000 | 12.000000 | 100.000000 | 100.000000 | 2.000000 | 281300.000000 |
8 rows × 70 columns
The target Default has two labels: Charged Off (default) and Fully Paid (no default).
credit['Default'].value_counts(dropna=False)Default
Fully Paid 728
Charged Off 145
Name: count, dtype: int64
Define dependent and independent variables
y=credit["Default"] # select the Defalut variable
X=credit.drop(columns=["Default"]) # we drop the dependent variable, Default5 Train / Test Split
Use 80/20 split, stratified by the target to preserve the default rate, and the assignment seed.
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=43
)6 Estimate models
import warnings
warnings.filterwarnings('ignore')
model = LogisticRegression()
model.fit(X_train,y_train) LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| penalty | 'l2' | |
| dual | False | |
| tol | 0.0001 | |
| C | 1.0 | |
| fit_intercept | True | |
| intercept_scaling | 1 | |
| class_weight | None | |
| random_state | None | |
| solver | 'lbfgs' | |
| max_iter | 100 | |
| multi_class | 'deprecated' | |
| verbose | 0 | |
| warm_start | False | |
| n_jobs | None | |
| l1_ratio | None |
For now, we will not focus on causality analysis, meaning that we are not asking what is the effect of the independent variables on the dependent variable (that will come later.) For now, we will focus on the predictability power of the models. So we need to estimate those predictions.
predict_logit=model.predict(X_test)
predict_logit[:5]array(['Fully Paid', 'Fully Paid', 'Charged Off', 'Fully Paid',
'Charged Off'], dtype=object)
Now, to measure the performance of our prediction we will use the accuracy measure (see more below).
accuracy_score(y_test, predict_logit) 0.9314285714285714
This means that 93.14% of the times, the algorithm predicted the same output as the observed data. So, the bigger this number, the better the model.
6.1 LinearDiscriminantAnalysis
We will now estimate the predictions with a different model. For explanations on this and the other models involved in the exercise, please read the next section.
# Define the model
model_LDA=LDA()
# Follow the same steps as before
model_LDA.fit(X_train,y_train)
predict_LDA=model_LDA.predict(X_test)
accuracy_score(y_test, predict_LDA)0.9428571428571428
This model resulted with a higher accuracy (94.29%) than the Logit model.
6.2 Probability of Default
Now, to estimate the probability of default for each observation with the test data base, data that was not used to train the model, we run the following code.
pd.DataFrame(model_LDA.predict_proba(X_test)).head()| 0 | 1 | |
|---|---|---|
| 0 | 0.000036 | 0.999964 |
| 1 | 0.999554 | 0.000446 |
| 2 | 0.489148 | 0.510852 |
| 3 | 0.000002 | 0.999998 |
| 4 | 0.999953 | 0.000047 |
7 Model Families
7.1 Logistic Regression (Logit)
What it is:
A classic statistical model that predicts the probability that a borrower will default.
It models the log-odds of default as a linear combination of borrower characteristics:
\text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_k x_k
How it works:
It finds the line (or surface) that best separates “good” and “bad” borrowers, then uses the logistic (sigmoid) function to turn that into probabilities between 0 and 1.
Why it’s good:
- Produces probabilities, not just yes/no labels.
- Coefficients can be interpreted as odds ratios, great for explainability.
- Transparent, simple, and widely accepted under Basel frameworks.
Why it’s popular:
- The standard for credit scoring.
- Easy to implement and explain.
- Performs well even with limited data.
7.2 Linear Discriminant Analysis (LDA)
What it is:
A method that finds the line (or plane) that best separates two groups (defaulters vs non-defaulters).
How it works:
It assumes each group follows a normal distribution and that both groups share the same covariance matrix.
Then it computes a linear boundary that maximizes the separation between the two groups.
Why it’s good:
- Fast to compute.
- Works well if features are roughly continuous and normally distributed.
- A useful benchmark model.
Why it’s less used now:
- Assumes data are Gaussian and continuous.
- Doesn’t handle categorical variables or non-linear effects easily.
7.3 Decision Tree Classifier
What it is:
A model that makes predictions by asking a sequence of if/else questions, such as:
“Is income < 30,000?” → “Does the borrower own a house?” → “Is the loan > 10,000?”
How it works:
It recursively splits the data into smaller and smaller groups to make each leaf node as “pure” as possible — meaning mostly all good or all bad borrowers.
Why it’s good:
- Handles both numeric and categorical data.
- Captures non-linear relationships naturally.
- Easy to visualize and explain to managers and regulators.
Why it’s tricky:
- Can overfit if not pruned (too deep or too many leaves).
- Needs regularization to generalize well.
Why it’s popular:
- Intuitive and visual.
- Foundation for advanced ensemble models (Random Forest, XGBoost).
7.4 Ridge Classifier
What it is:
A linear model similar to Logistic Regression but includes a penalty that discourages large coefficients — this is called L2 regularization.
How it works:
It still tries to draw a linear boundary but shrinks less important coefficients toward zero to reduce overfitting and noise sensitivity.
Why it’s good:
- Handles many correlated features well (useful after one-hot encoding).
- More stable than plain logistic regression.
Why it’s limited:
- Doesn’t directly output probabilities (needs calibration).
- Slightly harder to interpret.
Why it’s popular:
- Fast, robust, and good when you have many features or little data.
- Often used as a strong baseline in machine learning competitions.
7.5 Why these models matter for PD estimation
In credit risk, we don’t just care about overall accuracy — we care about the type of error:
| Error Type | Meaning | Cost |
|---|---|---|
| False Negative | Predict “good” but borrower defaults | 💸 Very costly |
| False Positive | Predict “bad” but borrower pays | ❌ Lost opportunity |
Therefore, banks often prioritize Recall for class 1 (defaults) —
it’s better to be conservative and reject a few safe borrowers than to approve one who defaults.
7.6 Model Evaluation Metrics: Accuracy, Recall & Precision
Once a model predicts who will default and who will not, we must evaluate how good those predictions are. In credit risk, we usually care about both overall performance and the type of mistakes the model makes.
7.6.1 The Confusion Matrix
A confusion matrix summarizes model predictions versus actual outcomes:
| Predicted: No Default (0) | Predicted: Default (1) | |
|---|---|---|
| Actual: No Default (0) | ✅ True Negative (TN) – borrower pays | ❌ False Positive (FP) – model wrongly predicts default |
| Actual: Default (1) | ❌ False Negative (FN) – borrower defaults but model missed it | ✅ True Positive (TP) – borrower defaults and model predicted it |
We can think of it as a simple 2×2 grid of model outcomes.
7.6.2 Accuracy
Accuracy measures the percentage of total predictions that the model got right.
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
It answers: “Out of all cases, how many did we classify correctly?”
✅ Good for: general model performance
⚠️ Limitation: misleading with imbalanced data (e.g., if only 5% of loans default, a model that always predicts “no default” gets 95% accuracy but is useless).
7.6.3 Recall (Sensitivity or True Positive Rate)
Recall focuses on the default (1) class and measures how many actual defaulters we correctly identified.
\text{Recall} = \frac{TP}{TP + FN}
It answers: “Of all borrowers who actually defaulted, how many did we catch?”
✅ Good for: credit risk and fraud detection, where missing a defaulter is costly.
⚠️ Limitation: increasing Recall can reduce Precision — catching more defaults may mean more false alarms.
7.6.4 Precision
Precision tells us how many of the borrowers predicted as defaulters actually were defaulters.
\text{Precision} = \frac{TP}{TP + FP}
It answers: “When we predict someone will default, how often are we correct?”
✅ Good for: when false positives (rejecting good clients) are costly.
⚠️ Trade-off: a model can have high Recall but low Precision, or vice versa.
7.6.5 Putting it Together
| Metric | Focus | Measures | Ideal Use Case |
|---|---|---|---|
| Accuracy | Overall performance | Correct predictions over total | Balanced datasets |
| Recall (Sensitivity) | Defaults caught | TP / (TP + FN) | Credit risk, fraud, medical tests |
| Precision | Correct alarms | TP / (TP + FP) | Lending profitability, marketing |
| F1-Score | Balance of Precision & Recall | ( 2 ) | When both types of error matter |
Why this matters for credit risk
- A bank that misses many defaulters (low Recall) loses money.
- A bank that flags too many safe borrowers (low Precision) loses business.
- The right balance depends on strategy:
- Retail lending: prioritize Recall (avoid defaults).
- Marketing or growth: prioritize Precision (avoid false alarms).
- Retail lending: prioritize Recall (avoid defaults).
In practice, Recall for class 1 (Default) is usually emphasized in PD models under Basel and regulatory frameworks.
Summary:
- Use Accuracy for a quick overview.
- Use Recall to measure how many defaulters the model finds.
- Use Precision to measure how often predicted defaulters really default.
- Combine both in F1-Score when you need balance.