Abstract

The standard Vasicek credit loss model assumes a constant probability of default. We extend it to allow the default probability to depend on macroeconomic factors and show that maximum likelihood estimates of all parameters, including the asset value correlation, reduce to a probit-transformed OLS regression with a simple algebraic recovery step. The estimator is exact, closed-form, and extends naturally to any number of factors. We illustrate the approach with an example, and discuss bias correction for small samples.

1 Background

The Vasicek model is the single-factor Gaussian copula model commonly used, for example, in the Basel Advanced Internal Ratings-Based (A-IRB) approach. It assumes a constant probability of default, but some applications require modeling tail risk when the underlying default probability changes with external factors such as macroeconomic conditions. The Vasicek model can be made sensitive to external common factors and calibrated using closed-form maximum likelihood parameter estimates.

The original Vasicek model [1] has two parameters, \(p\) (PD, probability of default) and \(\rho\) (AVC, asset value correlation). Its probability density function (PDF) is

\[f_{p,\rho}(x)=\sqrt{\frac{1-\rho}{\rho}}\exp\left\{\frac{1}{2}\left(\Phi^{-1}(x)^2-\left[\frac{\sqrt{1-\rho}\Phi^{-1}(x)-\Phi^{-1}(p)}{\sqrt{\rho}}\right]^2\right)\right\},\]

where \(\Phi\) is the cumulative distribution function of the standard normal distribution, and \(\Phi^{-1}\) is its inverse.

2 Model Extension

A natural way to make the model sensitive to an external common factor is to replace the constant \(p\) with \[\Phi\left(\Phi^{-1}(p_0)+\sum_{j=1}^{m}\kappa_j u_j\right),\]

where \(\kappa_1,\kappa_2,\ldots,\kappa_m\) are the new model parameters, \(u_1,u_2,\ldots,u_m\) are external common factors, and \(p_0\) is the baseline default probability. After this change, the PDF becomes

\[f_{p_0,\rho, \kappa_{1 \ldots m}}(x, u_{1\ldots m})=\sqrt{\frac{1-\rho}{\rho}}\exp\left\{\frac{1}{2}\left(\Phi^{-1}(x)^2-\left[\frac{\sqrt{1-\rho}\Phi^{-1}(x)-\sum_{j=1}^{m}\kappa_j u_j-\Phi^{-1}(p_0)}{\sqrt{\rho}}\right]^2\right)\right\}.\]

We write \(p\) for \(p_0\) hereafter.

We define \(\kappa_j\) with a sign convention such that positive \(\kappa_j\) represents a factor that increases default risk, e.g., high unemployment.

The standard Vasicek model is a special case where \(\kappa_1=\kappa_2=\ldots=\kappa_m=0\).

The default rate at confidence level \(\alpha\), given macroeconomic conditions, is:

\[Q_\alpha = \Phi\!\left(\frac{\Phi^{-1}(p)+\sum_{j}\kappa_j u_j+\sqrt{\rho}\,\Phi^{-1}(\alpha)}{\sqrt{1-\rho}}\right)\]

This is the formula used for the 99th percentile projections below, and to generate random samples using inverse transform sampling.

3 Parameter Estimation

Given a sample of \(N\) observations \((X_1, X_2, \ldots, X_N)\), the Vasicek model can be fitted using maximum likelihood estimation with the following closed-form expressions:

\[\hat{\mu} = \frac{1}{N}\sum_{i=1}^{N}\Phi^{-1}(X_i),\quad \hat{\sigma}^2 = \frac{1}{N}\sum_{i=1}^{N}\Phi^{-1}(X_i)^2 - \hat{\mu}^2,\quad \hat{p} = \Phi\left(\frac{\hat{\mu}}{\sqrt{1 + \hat{\sigma}^2}}\right),\quad \hat{\rho} = \frac{\hat{\sigma}^2}{1 + \hat{\sigma}^2}.\]

Theorem 4.2 in [2] establishes that closed-form MLEs exist for the extended model as well. Here we provide an explicit derivation that makes the connection to OLS transparent and yields directly computable expressions for all parameters.

Let \(Y_i=\Phi^{-1}(X_i)\). The PDF of \(Y\) then is

\[f_Y(y)=f_X(\Phi(y))\cdot\phi(y)=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{1-\rho}{\rho}}\exp{\left\{-\frac{1}{2\rho}\left(y\sqrt{1-\rho}-\sum_{j=1}^m\kappa_j u_j-\Phi^{-1}(p)\right)^2 \right\}},\] which is a Gaussian PDF

\[Y \sim \mathcal{N}\left(\frac{\Phi^{-1}(p)+\sum_{j=1}^m\kappa_j u_j}{\sqrt{1-\rho}},\frac{\rho}{1-\rho}\right).\]

With the new parameters

\[\beta_0=\frac{\Phi^{-1}(p)}{\sqrt{1-\rho}},\quad \beta_j=\frac{\kappa_j}{\sqrt{1-\rho}},\quad j=1\ldots m,\quad \sigma^2=\frac{\rho}{1-\rho},\]

the model becomes a standard normal linear regression:

\[Y_i = \beta_0 + \beta_1u_1 + \beta_2u_2+\ldots+\beta_mu_m+\epsilon_i.\]

Parameters of this model can be estimated using OLS. In matrix form, where \(\boldsymbol{U}\) is the \(N \times (m+1)\) design matrix whose \(i\)-th row is \([1,u_{1i},u_{2i},\ldots,u_{mi}]\),

\[\hat{\boldsymbol{\beta}}=(\boldsymbol{U}^T\boldsymbol{U})^{-1}\boldsymbol{U}^T\boldsymbol{Y},\quad \hat{\sigma}^2=\frac{1}{N}||\boldsymbol{Y}-\boldsymbol{U}\hat{\boldsymbol{\beta}}||^2,\]

and parameter estimates are

\[\hat{p} = \Phi\left(\frac{\hat{\beta}_0}{\sqrt{1+\hat{\sigma}^2}}\right),\quad \hat{\rho} = \frac{\hat{\sigma}^2}{1+\hat{\sigma}^2},\quad \hat{\kappa}_j = \frac{\hat{\beta_j}}{\sqrt{1+\hat{\sigma}^2}},\quad j=1\ldots m.\]

4 Example

Consider a model predicting the delinquency rate on all loans across all US commercial banks based on macroeconomic conditions. In this illustrative example, we made several choices to keep the model as simple as possible:

Time series plots of macroeconomic drivers, with recessions shaded

Figure 4.1: Time series plots of macroeconomic drivers, with recessions shaded

Observed delinquency rate vs. 99th percentiles, with recessions shaded.

Figure 4.2: Observed delinquency rate vs. 99th percentiles, with recessions shaded.

Figure 4.2 shows how the 99th percentile of the extended Vasicek model changes over time compared to the flat 99th percentile of the traditional Vasicek model.

5 Comparing to Numeric Optimization

To check whether the maximum likelihood estimates match known model parameters, we generate 10,000 random sets of data by drawing from the distribution with known parameters, with 152 observations per dataset and standard normal \(u_j\), as independent variables were standardized before using them in the OLS regression.

For each dataset, we estimate parameters using the closed-form formulas and numerically with the BFGS method. The curves in Figure 5.1 overlap, as the results match within numerical tolerance.

Comparison of known values, closed-form ML, and numeric ML estimates.

Figure 5.1: Comparison of known values, closed-form ML, and numeric ML estimates.

6 Bias Reduction

Figure 5.1 shows a slight bias in maximum likelihood estimate of \(\rho\) that happens primarily because of the bias in variance estimator. It vanishes asymptotically, so the MLE is still consistent, but for small samples adjusting \(\hat{\sigma}^2\) to account for the number of observations and regressors can help:

\[\hat{\sigma}_{adj}^2=\frac{N}{N-m-1}\hat{\sigma}^2\]

Comparison of known values and closed-form ML after bias adjustment

Figure 6.1: Comparison of known values and closed-form ML after bias adjustment

Table 1: Parameter Estimates Before and After Bias Adjustment

\(p\) \(\rho\) \(\kappa_1\) (Unemployment) \(\kappa_2\) (HPI)
Known target value 0.0281862 0.0209138 0.128099 -0.071587
Mean estimate before adjustment 0.0281675 0.0205021 0.128152 -0.071754
Mean estimate after adjustment 0.028193 0.0209061 0.128126 -0.071739

Figure 6.1 shows that the adjustment reduced the bias in the estimate of \(\rho\). Simulation results are summarized in Table 1. The remaining bias may be from Jensen’s inequality, as the recovery transformation is a concave function of \(\hat{\sigma}^2\).

99th percentiles over time before and after bias adjustment.

Figure 6.2: 99th percentiles over time before and after bias adjustment.

Figure 6.2 shows that in this example, bias adjustment has relatively small effect compared to other sources of error.

7 Finite Portfolio Size

In a finite portfolio, it is possible to observe no defaults. The likelihood is undefined at zero, so maximum likelihood estimation cannot handle zero values of \(X_i\), and simply excluding zeros may bias estimates.

To avoid that, a correction for the variance of \(x\) can be applied as described in section 4.3 of [2]:

\[v_r=\operatorname{Var}(x),\quad v_0 = v_r - \frac{\bar{x} (1 - \bar{x}) - v_r}{s - 1},\quad Y_{corrected}=\bar{x} + (Y - \bar{x})\sqrt{\frac{v_0}{v_r}},\]

where \(s\) is portfolio size. This correction is not needed in the example above due to large \(s\).

8 Source Code

The source code of the R function that was used to generate the graphs is shown below.

# Dependent variable is the first one in dataframe df
extended_vasicek_ml <- function(df, biascorr = FALSE, portfolio_size = NULL)
{
  names(df)[1] <- 'Y'

  # Step 1: Correction for the variance of portfolio level PD
  if (!is.null(portfolio_size) && portfolio_size > 1) {
    p0 <- mean(df$Y)
    vr <- var(df$Y)
    v0 <- vr - (p0 * (1 - p0) - vr) / (portfolio_size - 1)
    df$Y <- p0 + (df$Y - p0) * sqrt(v0 / vr)
  }

  # Step 2: Probit-transform
  df$Y <- qnorm(df$Y)

  # Step 3: OLS regression of Y on everything else
  m <- lm(Y~., data=df)
  beta0_hat <- coef(m)[1]
  betas_hat <- coef(m)[-1] # all other betas except beta0 

  # MLE variance (divides by N, not N-1)
  sigma2_hat <- mean(m$residuals^2)
  if (biascorr) {
    N <- nrow(df) # number of observations
    m <- ncol(df) - 1 # number of regressors
    sigma2_hat <- sigma2_hat * N / (N - m - 1) 
  }

  # Step 4: Recover original parameters
  p_hat      <- pnorm(beta0_hat / sqrt(1 + sigma2_hat))
  rho_hat    <- sigma2_hat / (1 + sigma2_hat)
  kappa_hat  <- betas_hat / sqrt(1 + sigma2_hat)
  
  c(list(p=p_hat, rho=rho_hat), kappa_hat)
}

Data Sources

References

  1. Vasicek, O. A. (2002), The distribution of loan portfolio value. Risk, 15(12), 160-162.

  2. Yang, Bill Huajian (2014), Estimating Long-Run PD, Asset Correlation, and Portfolio Level PD by Vasicek Models, MPRA Paper No. 57244 https://mpra.ub.uni-muenchen.de/57244/1/MPRA_paper_57244.pdf