Default risk specifically refers to the risk of non-payment or failure to meet contractual obligations by a borrower or issuer of debt. It focuses on the probability of a borrower defaulting on their financial obligations, such as interest payments or principal repayments. Default risk is typically assessed by credit ratings agencies, which assign ratings reflecting the likelihood of default.
Counterparty risk pertains to the risk of loss arising from the potential default or failure of a trading partner or counterparty in a financial transaction. It is particularly relevant in derivatives trading, where parties are exposed to the risk of their counterparties not fulfilling their contractual obligations. Counterparty risk extends beyond default risk, as it includes other risks like settlement risk, operational risk, and legal risk associated with the counterparty.2
EAD = Exposure At Default represents the amount of exposure or outstanding balance at the time of a borrower’s default. It captures the potential maximum loss if default occurs.
PD = Probability of Default represents the likelihood of a borrower or counterparty defaulting on their financial obligations within a specific time horizon.
LGD = Loss Given Default represents the expected loss that a lender or investor would incur in the event of a borrower’s default. It is the proportion of the exposure that is not recoverable after default and is typically expressed as a percentage of the exposure amount.
The Data Case: The German Credit Dataset
Build your first machine learning model!
Can we do better?
Regularization
K-Fold Cross Validation
Neural Networks Model
A brief background in classic credit risk models: Structural Model (Merton, 1974) Reduced-form Model (Jarrow, 1995)
More Models & Techniques
Support Vector Machine (SVM)
Decision Tree (DT) and Random Forest (RF)
Clustering Techniques
Boosting, Bagging and Stacking Techniques
Hyperparameter Tuning!
https://machinelearningmastery.com/ by Jason Brownlee
ChatGPT (really!)
Link to the dataset: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
This dataset is widely used in academic paper: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=credit+risk+neural+network+german+data&btnG=
… and also in case competition: https://www.kaggle.com/datasets/uciml/german-credit
# Clear Working Space
rm(list = ls())
# Import Dataset (You can also import the data manually using the "Import Dataset" button on the upper right environment)
data <- read.table("~/Downloads/german.data")
# If you want to manually import the german.csv:
# write.csv(data, file = "german.csv")
data <- german[,2:22]
# Data Description
# See: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
# Define column names
colnames(data) <- c("CHK", "DUR", "CRH", "PUR", "CRA",
"SAB", "EMY", "INT", "STA", "GUA",
"RES", "PRP", "AGE", "OIP", "HOU",
"CRD", "JOB", "PPL", "TEL", "FOR", "TYP")
# Data Structure
str(data)
# Attribute Information:
#
# Attribute 1: (qualitative)
# Status of existing checking account - CHK
# A11 : ... < 0 DM
# A12 : 0 <= ... < 200 DM
# A13 : ... >= 200 DM / salary assignments for at least 1 year
# A14 : no checking account
#
# Attribute 2: (numerical)
# Duration in month - DUR
#
# Attribute 3: (qualitative)
# Credit history - CRH
# A30 : no credits taken/ all credits paid back duly
# A31 : all credits at this bank paid back duly
# A32 : existing credits paid back duly till now
# A33 : delay in paying off in the past
# A34 : critical account/ other credits existing (not at this bank)
#
# Attribute 4: (qualitative)
# Purpose - PUR
# A40 : car (new)
# A41 : car (used)
# A42 : furniture/equipment
# A43 : radio/television
# A44 : domestic appliances
# A45 : repairs
# A46 : education
# A47 : (vacation - does not exist?)
# A48 : retraining
# A49 : business
# A410 : others
#
# Attribute 5: (numerical)
# Credit amount - CRA
#
# Attibute 6: (qualitative)
# Savings account/bonds - SAB
# A61 : ... < 100 DM
# A62 : 100 <= ... < 500 DM
# A63 : 500 <= ... < 1000 DM
# A64 : .. >= 1000 DM
# A65 : unknown/ no savings account
#
# Attribute 7: (qualitative)
# Present employment since - EMY
# A71 : unemployed
# A72 : ... < 1 year
# A73 : 1 <= ... < 4 years
# A74 : 4 <= ... < 7 years
# A75 : .. >= 7 years
#
# Attribute 8: (numerical)
# Installment rate in percentage of disposable income - INT
#
# Attribute 9: (qualitative)
# Personal status and sex - STA
# A91 : male : divorced/separated
# A92 : female : divorced/separated/married
# A93 : male : single
# A94 : male : married/widowed
# A95 : female : single
#
# Attribute 10: (qualitative)
# Other debtors / guarantors - GUA
# A101 : none
# A102 : co-applicant
# A103 : guarantor
#
# Attribute 11: (numerical)
# Present residence since - RES
#
# Attribute 12: (qualitative)
# Property - PRP
# A121 : real estate
# A122 : if not A121 : building society savings agreement/ life insurance
# A123 : if not A121/A122 : car or other, not in attribute 6
# A124 : unknown / no property
#
# Attribute 13: (numerical)
# Age in years - AGE
#
# Attribute 14: (qualitative)
# Other installment plans - OIP
# A141 : bank
# A142 : stores
# A143 : none
#
# Attribute 15: (qualitative)
# Housing - HOU
# A151 : rent
# A152 : own
# A153 : for free
#
# Attribute 16: (numerical)
# Number of existing credits at this bank - CRD
#
# Attribute 17: (qualitative)
# Job - JOB
# A171 : unemployed/ unskilled - non-resident
# A172 : unskilled - resident
# A173 : skilled employee / official
# A174 : management/ self-employed/
# highly qualified employee/ officer
#
# Attribute 18: (numerical)
# Number of people being liable to provide maintenance for - PPL
#
# Attribute 19: (qualitative)
# Telephone - TEL
# A191 : none
# A192 : yes, registered under the customers name
#
# Attribute 20: (qualitative)
# foreign worker - FOR
# A201 : yes
# A202 : no
# Data Cleaning
# Define: Good Type = 0 and Bad Type = 1
data$TYP <- data$TYP -1
summary(data$TYP)
# Expand the dataset (include dummy variables )
x.con <- data[,c(2,5,8,11,13,16,18)]
x.dum.1 <- model.matrix(~ CHK - 1, data)
x.dum.2 <- model.matrix(~ CRH - 1, data)
x.dum.3 <- model.matrix(~ PUR - 1, data)
x.dum.4 <- model.matrix(~ SAB - 1, data)
x.dum.5 <- model.matrix(~ EMY - 1, data)
x.dum.6 <- model.matrix(~ STA - 1, data)
x.dum.7 <- model.matrix(~ GUA - 1, data)
x.dum.8 <- model.matrix(~ PRP - 1, data)
x.dum.9 <- model.matrix(~ OIP - 1, data)
x.dum.10 <- model.matrix(~ HOU - 1, data)
x.dum.11 <- model.matrix(~ JOB - 1, data)
x.dum.12 <- model.matrix(~ TEL - 1, data)
x.dum.13 <- model.matrix(~ FOR - 1, data)
x.dum <- cbind(x.dum.1, x.dum.2, x.dum.3, x.dum.4, x.dum.5,
x.dum.6, x.dum.7, x.dum.8, x.dum.9, x.dum.10,
x.dum.11, x.dum.12, x.dum.13)
x <- cbind(x.con, x.dum)
rm(x.con, x.dum, x.dum.1, x.dum.2, x.dum.3, x.dum.4, x.dum.5,
x.dum.6, x.dum.7, x.dum.8, x.dum.9, x.dum.10, x.dum.11,
x.dum.12, x.dum.13)
y <- data[,21]
# Define a function: Factor normalization by column
factor_normalize <- function(x) {
normalized <- t(apply(x, 2, function(col) (col - min(col)) / (max(col) - min(col))))
return(normalized)
}
# Apply factor normalization to the data
x <- factor_normalize(x)
x <- t(x)
### Now we have two datasets:
### "data" is the original data: 20 factors and 1 label ("TYP")
### "x" is expanded from the original 20-factor dataset to 61-factor dataset including dummy variables; and "y" is the label
# Data Exploration
psych::describe(data)
psych::describe(x)
psych::describe(y)
# Data Visualization
# install.packages("scatterPlotMatrix")
library(scatterPlotMatrix)
scatterPlotMatrix(data)
scatterPlotMatrix(data, zAxisDim = "TYP")
scatterPlotMatrix(data, zAxisDim = "TYP", regressionType = 1)
# Model estimation using Logistic regression
mod_1 <- glm(TYP ~ 1, data = data, family = "binomial")
summary(mod_1)
mod_full <- glm(TYP ~ ., data = data, family = "binomial")
summary(mod_full)
# Type Prediction
prob.pre <- predict.glm(mod_full, type = "response")
type.pre <- as.matrix(ifelse(prob.pre<= 0.5,0,1), 1000,1)
type.act <- as.matrix(data$TYP,1000,1)
# Compare the Prediction with the Actual Data
# We will compute the following metrics to evaluate the model:
# 1. Confusion matrix
# 2. Accuracy
# 3. F1 Score
# 4. Receiver Operating Curve (ROC)
# 5. Area Under ROC (AUROC or AUC)
# I will create a function called "accuracyFun" - that takes pred.pre, type.pre and type.act as inputs - to compute and return all of the above metrics as outputs
accuracyFun <- function(prob.pre, type.pre, type.act){
mod.tab <- table(type.pre, type.act)
tp <- mod.tab[2,2]
tn <- mod.tab[1,1]
fp <- mod.tab[2,1]
fn <- mod.tab[1,2]
tot <- tp + tn + fp + fn
mod.acc <- (tp + tn) / tot
mod.f1 <- (2 * tp)/(2*tp + fp + fn)
# Draw the ROC Curve and calculate Area Under ROC
# install.packages("pROC")
library(pROC)
mod.roc <- roc(response = type.act, predictor = prob.pre)
plot(mod.roc)
auc(mod.roc)
ans <- list(mod.tab, mod.acc, mod.f1, plot(mod.roc), auc(mod.roc))
return(ans)
}
accuracyFun(prob.pre, type.pre, type.act)
# Import the data
data <- read.table("~/Downloads/german.data")
# ... maybe a bit of cleaning
colnames(data) <- c("CHK", "DUR", "CRH", "PUR", "CRA",
"SAB", "EMY", "INT", "STA", "GUA",
"RES", "PRP", "AGE", "OIP", "HOU",
"CRD", "JOB", "PPL", "TEL", "FOR", "TYP")
data$TYP <- data$TYP -1
# Let's throw all the factors in and estimate the model
mod_full <- glm(TYP ~ ., data = data, family = "binomial")
# ... and predict
prob.pre <- predict.glm(mod_full, type = "response")
type.pre <- as.matrix(ifelse(prob.pre<= 0.5,0,1), 1000,1)
type.act <- as.matrix(data$TYP,1000,1)
# ... and evaluate!
accuracyFun(prob.pre, type.pre, type.act)
Stepwise Regression and Regularization
K-fold Cross Validation
Use other types of Machine Learning Models
Artificial Neural Nets (ANN) and Deep Neural Nets (DNN)
[Next Class] Support Vector Machine (SVM)
[Next Class] Decision Tree (DT) and Random Forest (RF)
[Next Class] Boosting, Bagging and Stacking Techniques
[Sort of covered in this class] Hyperparameter Tuning
Stepwise regression is a technique used to select a subset of predictor variables from a larger set of variables based on their statistical significance or their ability to improve a model’s fit. The goal is to find the best subset of variables that minimizes a chosen criterion, such as the Akaike Information Criterion (AIC).
Forward Stepwise Regression:
Start with an empty model (no predictors included).
Fit simple linear regression models for each predictor variable and select the one with the lowest AIC.
Add the selected predictor variable to the model.
Iterate through the remaining predictor variables, selecting the one that results in the lowest AIC when added to the current model.
Continue adding variables one by one until no more variables significantly improve the AIC.
Stop and select the final model with the lowest AIC.
Backward Stepwise Regression:
Start with a full model including all predictor variables.
Fit a multiple regression model with all predictors included and calculate the AIC.
Iterate through the predictors and remove the one that, when removed from the model, results in the lowest AIC.
Continue removing variables one by one until removing any more variables does not significantly improve the AIC.
Stop and select the final model with the lowest AIC.
# Stepwise GLM: Forward and Backward Selection
mod_step <- step(mod_full)
formula(mod_step)
summary(mod_step)
# ... and predict
prob.pre.step <- predict.glm(mod_step, type = "response")
type.pre.step <- as.matrix(ifelse(prob.pre.step<= 0.5,0,1), 1000,1)
type.act.step <- as.matrix(data$TYP,1000,1)
# ... and evaluate!
accuracyFun(prob.pre.step, type.pre.step, type.act.step)
Too many inputs can lead to an overfitting problem
We already applied a stepwise regression technique, but that’s not enough. We can do better!
LASSO - L1 regularization
Ridge Regression - L2 regularization
Elastic Net - A blending between LASSO and ridge
See reference: https://glmnet.stanford.edu/articles/glmnet.html
In lasso regression, the objective is to minimize the sum of squared residuals, similar to ordinary linear regression, while simultaneously adding a penalty term based on the absolute values of the regression coefficients. This penalty term encourages sparsity in the coefficient estimates, effectively shrinking some coefficients to zero, resulting in automatic feature selection.3
The lasso regression model can be represented mathematically as:
minimize: (1 / (2 * n)) * ||Y - X * β||^2 + λ * ||β||_1
Where:
Y represents the response variable.
X represents the matrix of predictor variables.
β represents the vector of regression coefficients.
λ (lambda) is the tuning parameter that controls the strength of the regularization.
By varying the value of λ, you can control the amount of regularization applied. Higher values of λ lead to more coefficients being set to zero, resulting in a sparser model.
Ridge regression is another type of regularized linear regression technique that helps mitigate the issue of multicollinearity and overfitting in predictive models. It adds a penalty term to the least squares objective function, which is based on the sum of squared regression coefficients.
The ridge regression model can be represented mathematically as:
minimize: (1 / (2 * n)) * ||Y - X * β||^2 + λ * ||β||_2^2
Where:
Y represents the response variable.
X represents the matrix of predictor variables.
β represents the vector of regression coefficients.
λ (lambda) is the tuning parameter that controls the strength of the regularization.
By varying the value of λ, you can control the amount of regularization applied. Higher values of λ increase the amount of shrinkage applied to the coefficients, resulting in a more constrained model.4
Elastic Net is a regularized regression technique that combines both L1 (Lasso) and L2 (Ridge) regularization penalties. It is useful when dealing with high-dimensional datasets and situations where there are many correlated predictor variables.
The elastic net regression model can be represented mathematically as:
minimize: (1 / (2 * n)) * ||Y - X * β||^2 + λ1 * ||β||_1 + λ2 * ||β||_2^2
Where:
Y represents the response variable.
X represents the matrix of predictor variables.
β represents the vector of regression coefficients.
λ1 and λ2 are the tuning parameters that control the strength of the L1 and L2 regularization, respectively.
The λ1 parameter controls the amount of L1 regularization (sparsity-inducing) applied, similar to the Lasso penalty. The λ2 parameter controls the amount of L2 regularization (shrinkage) applied, similar to the Ridge penalty.
By adjusting the values of λ1 and λ2, you can control the trade-off between sparsity and shrinkage in the model.5
From a practical standpoint, L1 (LASSO) tends to shrink coefficients to zero whereas L2 (RIDGE) tends to shrink coefficients evenly.
L1 is therefore useful for feature selection, as we can drop any variables associated with coefficients that go to zero. L2, on the other hand, is useful when you have collinear/codependent features.
# install.packages("glmnet")
library(glmnet)
# We will use "x" and "y" for glmnet. See https://glmnet.stanford.edu/ for the manual.
#############################
# Regularization: LASSO
#############################
# Just set the alpha to "1"
mod.lasso <- glmnet(x, y, family = "binomial", alpha = 1, lambda = NULL)
print(mod.lasso)
plot(mod.lasso)
coef(mod.lasso, s = 0.01)
# Once we decide on the value of lambda (or "s") in the formula, we can proceed to the prediction and evaluation parts
# ... and predict
prob.pre.lasso <- predict(mod.lasso, x, s = 0.01, type = "response")
type.pre.lasso <- as.matrix(ifelse(prob.pre.lasso<= 0.5,0,1), 1000,1)
type.act.lasso <- as.matrix(y,1000,1)
# ... and evaluate!
accuracyFun(prob.pre.lasso, type.pre.lasso, type.act.lasso)
### You can vary/experiment with the s (or lambda) and see if the results improve!
#############################
# Regularization: RIDGE
#############################
# Just set the alpha to "0"
mod.ridge <- glmnet(x, y, family = "binomial", alpha = 0, lambda = NULL)
print(mod.ridge)
plot(mod.ridge)
coef(mod.ridge, s = 1)
# Once we decide on the value of lambda (or "s") in the formula, we can proceed to the prediction and evaluation parts
# ... and predict
prob.pre.ridge <- predict(mod.ridge, x, s = 0.01, type = "response")
type.pre.ridge <- as.matrix(ifelse(prob.pre.ridge<= 0.5,0,1), 1000,1)
type.act.ridge <- as.matrix(y,1000,1)
# ... and evaluate!
accuracyFun(prob.pre.ridge, type.pre.ridge, type.act.ridge)
#############################
# Regularization: ELASTIC NET
#############################
# Now alpha can vary between 0 and 1. Let's say alpha = 0.5
mod.elas <- glmnet(x, y, family = "binomial", alpha = 0.25, lambda = NULL)
print(mod.elas)
plot(mod.elas)
coef(mod.elas, s = 0.08)
# Once we decide on the value of lambda (or "s") in the formula, we can proceed to the prediction and evaluation parts
# ... and predict
prob.pre.elas <- predict(mod.elas, x, s = 0.08, type = "response")
type.pre.elas <- as.matrix(ifelse(prob.pre.elas<= 0.5,0,1), 1000,1)
type.act.elas <- as.matrix(y,1000,1)
# ... and evaluate!
accuracyFun(prob.pre.elas, type.pre.elas, type.act.elas)
The model must perform well on the “never-before-seen” dataset
We need to do “cross validation”
Train the model on the “train” dataset and test the model on the “test” dataset
We can do cross validation “k” times - called “K-fold cross validation”
For example, for 5-fold cross validation. We will randomly split the data into 5 equal-sized datasets. Use 4 smaller datasets as the “train” dataset to train the model. Then use the model to predict the left-out data. Repeat this step 5 times.
#####################################
# A Simple Cross Validation
#####################################
# Randomly split data into train vs test datasets: 80% vs 20%
train_percentage <- 0.8
train_samples <- floor(train_percentage * nrow(data))
train_data <- data[1:train_samples, ]
test_data <- data[(train_samples + 1):nrow(data), ]
# Train the model on the train dataset
# Let's throw all the factors in and estimate the model
mod_full_train <- glm(TYP ~ ., data = train_data, family = "binomial")
# ... and predict
prob.pre.train <- predict.glm(mod_full_train, type = "response")
type.pre.train <- ifelse(prob.pre.train<= 0.5,0,1)
type.act.train <- data$TYP[1:train_samples]
# ... and evaluate!
accuracyFun(prob.pre.train,type.pre.train,type.act.train)
# Test the model on the test dataset
# Use the train model to predict the default probabiilty on the test data
prob.pre.test <- predict.glm(mod_full_train, type = "response", newdata = test_data)
type.pre.test <- ifelse(prob.pre.test<= 0.5,0,1)
type.act.test <- data$TYP[(train_samples+1):1000]
# ... and evaluate
accuracyFun(prob.pre.test,type.pre.test,type.act.test)
#############################################
# 5-fold Cross Validation (The Short Version)
#############################################
# Define the number of folds for cross-validation
numFold <- 5
# Perform k-fold cross-validation
# install.packages("caret")
library(caret)
set.seed(123)
folds <- caret::createFolds(data$TYP, k = numFold, list = TRUE)
accuracy <- numeric(numFold)
for (i in 1:numFold) {
# Split data into training and testing sets based on the fold indices
train_indices <- unlist(folds[-i])
test_indices <- folds[[i]]
train <- data[train_indices,]
x_test <- data[test_indices,1:20]
y_test <- data[test_indices,21]
# Build and train your model
model <- glm(TYP ~ ., data = train, family = "binomial")
# Make predictions on the test set
y_pred <- ifelse(predict(model, x_test, type = "response")<=0.5,0,1)
# Evaluate the model's performance
accuracy[i] <- sum(y_pred == y_test) / length(y_test)
}
mean(accuracy)
Logistic regression is good enough in many applications especially in the tabular dataset. See 6
Use other types of Machine Learning Models
Artificial Neural Nets (ANN) and Deep Neural Nets (DNN)
Support Vector Machine (SVM)
Decision Tree (DT) and Random Forest (RF)
Boosting, Bagging and Stacking Techniques
A neural network (NN) is a type of artificial neural network (ANN) that consists of multiple layers of interconnected nodes (neurons). It is designed to learn and model complex patterns and relationships in data by leveraging a hierarchical representation of features. Here’s the quick introduction of the model features:
Architecture: A DNN typically consists of an input layer, one or more hidden layers, and an output layer. Each layer is composed of multiple neurons, and the neurons are interconnected with weights.
Deep Learning: The “deep” in deep neural networks refers to the presence of multiple hidden layers. Having multiple layers allows the network to learn hierarchical representations of the input data, enabling it to capture increasingly abstract features as the information flows through the network.
Activation Functions: Neurons within each layer apply an activation function to the weighted sum of their inputs to introduce non-linearities into the network. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh.
Forward Propagation: During the forward propagation phase, data is fed into the network, and computations are performed layer by layer. Each layer applies a weighted sum followed by the activation function to generate the output.
Backpropagation: The backpropagation algorithm is used to train the DNN. It involves computing the prediction error (or loss) and propagating it backward through the network to update the weights using gradient descent optimization. This process iteratively adjusts the weights to minimize the prediction error.
Here is a list of popular activation functions commonly used in neural networks:
Sigmoid Function (Logistic Function): The sigmoid function is defined as f(x) = 1 / (1 + e^(-x)). It maps the input to a value between 0 and 1, which is useful for binary classification problems or as a smooth approximation of a step function.
Hyperbolic Tangent Function (Tanh): The tanh function is defined as f(x) = (e^x - e^(-x)) / (e^x + e^(-x)). It maps the input to a value between -1 and 1, and it is symmetric around the origin. It is commonly used in hidden layers of neural networks.
Rectified Linear Unit (ReLU): The ReLU function is defined as f(x) = max(0, x). It returns 0 for negative inputs and the input value for positive inputs. ReLU has become widely used in deep learning due to its simplicity and ability to alleviate the vanishing gradient problem.
Softmax Function: The softmax function is commonly used in the output layer of a neural network for multi-class classification tasks. It takes a vector of real numbers as input and converts them into a probability distribution, where the sum of the probabilities is equal to 1. It is defined as f(x_i) = e^(x_i) / (Σ(e^(x_j))) for each element x_i in the input vector.
There are other types such as leaky ReLU, parametric ReLU, Exponential LU etc.
# install.packages("keras")
library(keras)
# Prepare the dataset for the deep neural nets
# I will use the datasets "x" and "y" here. We supposed to organize the data this way for keras
# See https://tensorflow.rstudio.com/guides/keras/basics
# Split the data into training and testing sets
train_percentage <- 0.6
train_indices <- sample(1:nrow(data), floor(train_percentage * nrow(data)))
x_train <- x[train_indices,]
y_train <- y[train_indices]
x_test <- x[-train_indices,]
y_test <- y[-train_indices]
# Define the neural network model
mod.nn <- keras_model_sequential()
mod.nn %>%
layer_dense(units = 10, activation = "relu", input_shape = 61) %>%
layer_dense(units = 4, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
# Compile the model
mod.nn %>% compile(
loss = "binary_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
)
# Train the model
history <- mod.nn %>% fit(
x_train, y_train,
epochs = 20,
batch_size = 32,
validation_data = list(x_test, y_test)
)
# Evaluate the model
evaluation <- mod.nn %>% evaluate(x_test, y_test)
print(evaluation)
# predicted probability
prob.pre.nn <- predict(mod.nn, x_test)
type.pre.nn <- ifelse(prob.pre.nn <= 0.5,0,1)
type.act.nn <- y_test
# ... and evaluate!
accuracyFun(prob.pre.nn,type.pre.nn,type.act.nn)
We built our first machine learning model using logistic regression & Model evaluation using ROC, F1 Score and AUROC
Stepwise Regression and Regularization
K-fold Cross Validation
Artificial Neural Nets (ANN) and Deep Neural Nets (DNN)
A brief background in credit risk: Structural Model (Merton, 1974) Reduced-form Model (Jarrow, 1995)
More Models & Techniques
Support Vector Machine (SVM)
Decision Tree (DT) and Random Forest (RF)
Clustering Techniques
Boosting, Bagging and Stacking Techniques
Hyperparameter Tuning!
International Monetary Fund (IMF). (2018). Global Financial Stability Report: A Bumpy Road Ahead. Retrieved from https://www.imf.org/en/Publications/GFSR/Issues/2018/09/25/global-financial-stability-report-october-2018↩︎
Financial Stability Board (FSB). (2013). Key Attributes of Effective Resolution Regimes for Financial Institutions. Retrieved from https://www.fsb.org/wp-content/uploads/r_131015.pdf
Federal Reserve Bank of San Francisco. (2011). Credit Risk. Retrieved from https://www.frbsf.org/education/publications/doctor-econ/2005/november/credit-risk/
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1), 267-288.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22.↩︎
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58(1), 267-288.↩︎
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22.↩︎