Modeling Frailty Correlated Defaults

dummy slide

Why?

\[ \definecolor{gray}{RGB}{192,192,192} \def\vect#1{\boldsymbol #1} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1 \mid #2\right)} \def\diff{{\mathop{}\!\mathrm{d}}} \]

Motivation

Want to model the loss distribution of b banks.

Motivation

Want to model the loss distribution of b banks.

Loss is given by

\[ L_{bt} = \sum_{i\in R_{bt}} E_{bit}G_{bit}Y_{it} \]

\(R_{bt}\): risk set, \(E_{bit}\in (0,\infty)\): exposure, \(G_{bit}\in[0,1]\): loss-given-default, and \(Y_{it}\in\{0,1\}\): default indicator.

Motivation

Want to model the loss distribution of b banks.

Loss is given by

\[ L_{bt} = \sum_{i\in R_{bt}} \color{gray}{E_{bit}G_{bit}}Y_{it} \]

\(R_{bt}\): risk set, \(E_{bit}\in (0,\infty)\): exposure, \(G_{bit}\in[0,1]\): loss-given-default, and \(Y_{it}\in\{0,1\}\): default indicator.

Focus on \(Y_{it}\).

First Idea

Assume conditional independence and e.g., let the default intensity be

\[\log\lambda_{it} = \vect\beta^\top \vect x_{it} + \vect\gamma^\top \vect z_t\]

So the probability of default is

\[ \begin{multline*} P(Y_{i,t}=1\mid Y_{i,1}=\cdots=Y_{i,t-1}=0, \\ \lambda_{it} = \lambda) = 1 - \exp\left(-\lambda\right) \end{multline*} \]

Poor choice for tail risk if invalid.

First Idea

In-sample predicted less realized default rate. Black bars are outside 90 pct. confidence intervals.

Add Frailty

Duffie et al. (2009) suggest to generalize to

\[ \begin{aligned} \log\lambda_{it} &= \vect\beta^\top \vect x_{it} + \vect\gamma^\top \vect z_t + A_t \\ A_t &\sim \theta A_{t-1} + \epsilon_t \\ \epsilon_t&\sim N(0,\sigma^2) \end{aligned} \]

The auto-regressive frailty, \(A_k\), captures clustering.

Thoughts

Very nice paper!

But is only the intercept time-varying?

Findings in Lando et al. (2013), Filipe, Grammatikos, and Michala (2016), and Jensen, Lando, and Medhat (2017) suggest not.

And are all the effects linear on the hazard scale?

Findings in Berg (2007), Christoffersen, Matin, and Mølgaard (2018), and the ML literature suggest not.

Generalize

\[ \begin{aligned} \log\lambda_{it} &= \vect\beta^{(1)\top}\vect x_{it}^{(1)} + \vect\gamma^\top \vect z_t + \vect\beta^{(2)\top} \vect f(\vect x_{it}^{(2)}) + \vect A_t^\top\vect u_{it} \\ \vect A_t &\sim F\vect A_{t-1} + \vect \epsilon_t \\ \vect \epsilon_t&\sim \vect N(\vect 0, Q) \\ \vect x_{it} &= \left(\vect x_{it}^{(1)\top}, \vect x_{it}^{(2)\top}\right)^\top \end{aligned} \]

\(\vect A_t \in \mathbb{R}^p\) is low dimensional and some elements in \(\vect u_{it}\) and \(\vect x_{it}\) may match.

Generalize

\[ \begin{aligned} \log\lambda_{it} &= \color{gray}{\vect\beta^{(1)\top}\vect x_{it}^{(1)} + \vect\gamma^\top \vect z_t} + \vect\beta^{(2)\top} \vect f(\vect x_{it}^{(2)}) + \vect A_t^\top\vect u_{it} \\ \color{gray}{\vect A_t} &\sim F \color{gray}{\vect A_{t-1} + \vect \epsilon_t} \\ \color{gray}{\vect \epsilon_t} &\color{gray}\sim \color{gray}{\vect N(\vect 0, }Q\color{gray}) \\ \color{gray}{\vect x_{it}} & \color{gray}= \color{gray}{ \left(\vect x_{it}^{(1)\top}, \vect x_{it}^{(2)\top}\right)^\top} \end{aligned} \]

\(\vect A_t \in \mathbb{R}^p\) is low dimensional and some elements in \(\vect u_{it}\) and \(\vect x_{it}\) may match.

Need to Evaluate

\[ \begin{aligned} L &= \int_{\mathbb R^{pd}} \mu_0(\vect A_1)g_1\Cond{\vect y_1}{\vect A_1} \\ &\hspace{25pt} \cdot\prod_{t=2}^d g_t\Cond{\vect y_t}{\vect A_t} f\Cond{\vect A_t}{\vect A_{t-1}}\mathrm{d}A_{1:d} \\ \vect y_t &= \{y_{it}\}_{i\in O_t} \end{aligned} \]

\(O_t\) is the risk set.

Talk Overview

How?

Introduction to the dynamichazard package.

Example

Summary of paper with application.

Why another package?

Motivation for the mssm package.

How?

Fast Approximation

Need to estimate parameters. Could use a fast approximation.

E.g., extended Kalman filter, unscented Kalman filter, pseudo-likelihood approximation, Laplace approximation, etc.

dynamichazard contains an extended Kalman filter and unscented Kalman filter for the random walk model.

Especially the former is extremely fast. Try dynamichazard::ddhazard_app().

Monte Carlo Method

Use Monte Carlo expectation maximization.

Approximate E-step with particle smoother.

Get arbitrary precision.

Particle Smoother

Contains an implementation of the generalized two-filter smoother suggested by Briers, Doucet, and Maskell (2009).

Method is \(\bigO{N^2}\)

where \(N\) is the number of particles. Not a problem for \(N<2000\).

Contains an implementation of the particle smoother suggested by Fearnhead, Wyncoll, and Tawn (2010).

This is \(\bigO{N}\) with some extra overhead per particle.

Small Example

head(lung)
##   time status ph.ecog age id
## 1  306   TRUE       1  74  1
## 2  455   TRUE       0  68  2
## 3  800  FALSE       0  56  3
## 4  210   TRUE       1  57  4
## 5  800  FALSE       0  60  5
## 6  800  FALSE       1  74  6

Fit Model

set.seed(59366336)
system.time(pf_fit <- PF_EM(
  fixed = Surv(time, status) ~ ph.ecog + age, random = ~ 1,
  model = "exponential", Q = as.matrix(.0001), type = "VAR", 
  by = 50L, Fmat = as.matrix(.0001),
  control = PF_control(
    N_fw_n_bw = 500L, N_smooth = 1000L, N_first = 1000L, 
    nu = 6, smoother = "Fearnhead_O_N", n_threads = 6, 
    averaging_start = 200L, n_max = 300L, eps = 1e-4),
  data = lung, max_T = max(lung$time), id = lung$id, 
  Q_0 = as.matrix(4)))
##    user  system elapsed 
## 110.291   3.327  18.604

Smoothed Predicted Value

par(mar = c(6, 4, 1, 1))
plot(pf_fit)

Compare Fits

# fit models
const_fit <- survreg(Surv(time, status) ~ ph.ecog + age, 
                     data = lung, dist = "exponential")
coxf <- coxph(Surv(time, status) ~ ph.ecog + age, data = lung)

rbind(
  survreg = -const_fit$coefficients,
  coxph   = c(NA_real_, coef(coxf)), 
  PF_EM   = pf_fit$fixed_effects)
##         (Intercept)   ph.ecog        age
## survreg   -7.141724 0.3793578 0.01163572
## coxph            NA 0.4351896 0.01192587
## PF_EM     -7.108086 0.4187690 0.01148562

Log-Likelihood Approximations

par(mar = c(6, 4, 1, 1))
plot(pf_fit$log_likes, type = "l", ylab = "Log-Likelihood")
abline(h = logLik(const_fit), lty = 2)

Features

Few options for conditional model given state variables.

Discrete time models with logit and cloglog link function and log link in continuous time.

Approximations of gradient and observed information matrix are available.

Both method suggested by Poyiadjis, Doucet, and Singh (2011) and method mentioned in Cappe and Moulines (2005). See the dynamichazard::PF_get_score_n_hess function.

Example

Paper

Show example from Christoffersen and Matin (2019).

Rastin Matin

Danmarks Nationalbank, rma@nationalbanken.dk

Summary

Add covariates, non-linear effects, and a random slope to model in Duffie, Saita, and Wang (2007) and Duffie et al. (2009).

Find less evidence of time-varying intercept.

As shown by Lando and Nielsen (2010).

Provide evidence of time-varying size slope.

Show improved firm-level performance and industry-level performance.

Shameless Plug

Need to compute distance-to-default and perform rolling regressions.

Uses dtd and rollRegres package. The latter is a fast alternative:

#R Unit: milliseconds
#R                     expr       mean     median
#R             roll_regress   5.007243   5.027944
#R          roll_regress_df   5.786401   5.539363
#R         roll_regress_zoo 513.787995 512.832266
#R  roll_regress_R_for_loop 300.358362 301.748222
#R                  roll_lm  63.389449  63.475249

https://cran.r-project.org/web/packages/rollRegres/vignettes/Comparisons.html

Smoothed Predicted Random Effect

Log market size is an in Shumway (2001). This is just the zero-mean random effect \(A_{tj}\).

Out-of-Sample AUCs

Blue: lowest, black: highest. ◇: model as in Duffie, Saita, and Wang (2007), ▽: + covariates and non-linear effects, ▲: + random intercept, and ◆: + random size slope.

Out-of-Sample AUCs

Blue: lowest, black: highest. ◇: model as in Duffie, Saita, and Wang (2007), ▽: + covariates and non-linear effects, ▲: + random intercept, and ◆: + random size slope.

Out-of-Sample Industry Default Rate

Bars: 90% prediction interval, ○: realized rate. ◇: model as in Duffie, Saita, and Wang (2007), ▽: + covariates and non-linear effects, ▲: + random intercept, and ◆: + random size slope.

Out-of-Sample Industry Default Rate

Bars: 90% prediction interval, ○: realized rate. ◇: model as in Duffie, Saita, and Wang (2007), ▽: + covariates and non-linear effects, ▲: + random intercept, and ◆: + random size slope.

Risk Set's Size and PD Effect

From Christoffersen, Matin, and Mølgaard (2018) with iid random effects and a much larger sample.

Why Another Package?

Cons

This is \(\bigO{N}\) with some extra overhead per particle.

fit <- PF_EM(
  ..., control = PF_control(N_fw_n_bw = 200L, N_smooth = 1000L))

Extra evaluations of \(g_t\Cond{\vect y_t}{\vect A_t}\). Expensive!

And What If…

… I have non-binary outcomes or not time-to-event data?

E.g., Poisson or Gamma distributed.

Solution

Use methods suggested by Lin et al. (2005) and Poyiadjis, Doucet, and Singh (2011).

Lower variance of estimates as function of time but \(\bigO{N^2}\).

Use approximation shown in Klaas et al. (2006) and like suggested in Gray and Moore (2003).

Reduces computational complexity to \(\bigO{N\log N}\).

Fast Approximation

##          method
## N         Dual-tree      Naive
##   12288    0.039704  0.7137595
##   24576    0.062140  2.6771194
##   49152    0.115722 10.9796716
##   98304    0.227879         NA
##   196608   0.450209         NA
##   393216   0.913649         NA
##   786432   1.844256         NA
##   1572864  4.057902         NA

https://github.com/boennecd/mssm

Starting Values

Contains fast Laplace approximation to get starting values.

Cons

Higher memory usage.

Need some extra work to fit models in dynamichazard.

Thank You!

Paper is at ssrn.com/abstract=3339981.

Slides are at rpubs.com/boennecd/R-Fin19.

Markdown is at github.com/boennecd/Talks.

More examples at github.com/boennecd/dynamichazard/tree/master/examples.

References on next slide.

References

Berg, Daniel. 2007. “Bankruptcy Prediction by Generalized Additive Models.” Applied Stochastic Models in Business and Industry 23 (2). John Wiley & Sons, Ltd.: 129–43. doi:10.1002/asmb.658.

Briers, Mark, Arnaud Doucet, and Simon Maskell. 2009. “Smoothing Algorithms for State–space Models.” Annals of the Institute of Statistical Mathematics 62 (1): 61. doi:10.1007/s10463-009-0236-2.

Cappe, O., and E. Moulines. 2005. “Recursive Computation of the Score and Observed Information Matrix in Hidden Markov Models.” In IEEE/Sp 13th Workshop on Statistical Signal Processing, 2005, 703–8. doi:10.1109/SSP.2005.1628685.

Christoffersen, Benjamin, and Rastin Matin. 2019. “Modeling Frailty Correlated Defaults with Multivariate Latent Factors.”

Christoffersen, Benjamin, Rastin Matin, and Pia Mølgaard. 2018. “Can Machine Learning Models Capture Correlations in Corporate Distresses?”

Duffie, Darrell, Andreas Eckner, Guillaume Horel, and Leandro Saita. 2009. “Frailty Correlated Default.” The Journal of Finance 64 (5). Blackwell Publishing Inc: 2089–2123. doi:10.1111/j.1540-6261.2009.01495.x.

Duffie, Darrell, Leandro Saita, and Ke Wang. 2007. “Multi-Period Corporate Default Prediction with Stochastic Covariates.” Journal of Financial Economics 83 (3): 635–65. doi:https://doi.org/10.1016/j.jfineco.2005.10.011.

Fearnhead, Paul, David Wyncoll, and Jonathan Tawn. 2010. “A Sequential Smoothing Algorithm with Linear Computational Cost.” Biometrika 97 (2). [Oxford University Press, Biometrika Trust]: 447–64. http://www.jstor.org/stable/25734097.

Filipe, Sara Ferreira, Theoharry Grammatikos, and Dimitra Michala. 2016. “Forecasting Distress in European Sme Portfolios.” Journal of Banking & Finance 64: 112–35. doi:https://doi.org/10.1016/j.jbankfin.2015.12.007.

Gray, A., and A. Moore. 2003. “Rapid Evaluation of Multiple Density Models.” In Artificial Intelligence and Statistics.

Jensen, Thais, David Lando, and Mamdouh Medhat. 2017. “Cyclicality and Firm-Size in Private Firm Defaults.” International Journal of Central Banking 13 (4): 97–145.

Klaas, Mike, Mark Briers, Nando de Freitas, Arnaud Doucet, Simon Maskell, and Dustin Lang. 2006. “Fast Particle Smoothing: If I Had a Million Particles.” In Proceedings of the 23rd International Conference on Machine Learning, 481–88. ICML ’06. New York, NY, USA: ACM. doi:10.1145/1143844.1143905.

Lando, David, and Mads Stenbo Nielsen. 2010. “Correlation in Corporate Defaults: Contagion or Conditional Independence?” Journal of Financial Intermediation 19 (3): 355–72. doi:https://doi.org/10.1016/j.jfi.2010.03.002.

Lando, David, Mamdouh Medhat, Mads Stenbo Nielsen, and Søren Feodor Nielsen. 2013. “Additive Intensity Regression Models in Corporate Default Analysis.” Journal of Financial Econometrics 11 (3): 443–85. doi:10.1093/jjfinec/nbs018.

Lin, Ming T, Junni L Zhang, Qiansheng Cheng, and Rong Chen. 2005. “Independent Particle Filters.” Journal of the American Statistical Association 100 (472). Taylor & Francis: 1412–21. doi:10.1198/016214505000000349.

Poyiadjis, George, Arnaud Doucet, and Sumeetpal S. Singh. 2011. “Particle Approximations of the Score and Observed Information Matrix in State Space Models with Application to Parameter Estimation.” Biometrika 98 (1). Biometrika Trust: 65–80. http://www.jstor.org/stable/29777165.

Shumway, Tyler. 2001. “Forecasting Bankruptcy More Accurately: A Simple Hazard Model.” The Journal of Business 74 (1). The University of Chicago Press: 101–24. http://www.jstor.org/stable/10.1086/209665.