source("Pre-Processing.R")
source("Cox Proportional Hazard Modeling.R")
Error in install.packages : Updating loaded packages

Restarting R session...

Since the surivival package is relatively straight-forward and the use case for predicting bank failure is very well-suited to Cox-PH modeling, I first want to play around with fitting all the data we have for a particular pre-crisis date. Univariate modeling vs. multi-variate modeling, as we’ve seen, might first show that certain of the CAMEL variables are (1) more statistically significant predictors of failure than others, and (2) that there are interactions at play between these variables.

Some conventions I’m asserting about the data at first are that:

Building Univariate Models

build_univariate_models()

Chart 1: Measuring predictors on Q2 2007

Some custom code I’ve adapted to produce for the five CAMEL variables in question. Each row above represents a CoxPH model fitted with only the CAMEL variable in that row – along with the beta and the hazard ratio, and the outputs for the Wald test and the p.value of the model for that beta’s significance. This is a sanity check since the results are very intuitive:

build_univariate_models()

Chart 2: Measuring predictors on Q4 2006

Interesting finding – these betas, and significance levels, are highly dependent on the selected timeframe. In Chart 1, 4/5 variables are significant by themselves at the 0.01 level. When we consider measurements in 12/31/2006, only 2/5 are.

Finally, out of curiosity, I wonder if this type of model is sensitive to the scaling of the data by time period. Instead of days, let’s bucket the failures (or lack of failures) by the number of weeks rather than days (since the zero date). With the Q22006 zero-date again:

build_univariate_models()

Chart 3: Measuring predictors on Q2 2007, weeks (instead of months)

Literally indentical results! Forget I said anything.

Building Multivariate Models

Here, we’ll try the same zero-date of Q2 2006 and run the Cox-PH model on all 5 variables to inform about how these variables might interact. The summary() function provides most of what we’d want to know:

all_variable_model = coxph(Surv(time, failed) ~ ., data = dataset)
summary(all_variable_model)
Call:
coxph(formula = Surv(time, failed) ~ ., data = dataset)

  n= 7731, number of events= 462 

                                    coef  exp(coef)   se(coef)      z Pr(>|z|)    
rb_tier_1_capital_ratio       -5.391e+00  4.558e-03  7.248e-01 -7.438 1.02e-13 ***
non_performing_loan_ratio      2.998e+00  2.004e+01  9.023e-01  3.322 0.000893 ***
cost_to_income_ratio          -4.731e-05  1.000e+00  8.924e-04 -0.053 0.957716    
return_on_assets              -3.846e+01  1.982e-17  4.151e+00 -9.265  < 2e-16 ***
liquid_assets_on_total_assets -1.908e+00  1.484e-01  3.845e-01 -4.962 6.97e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                              exp(coef) exp(-coef) lower .95 upper .95
rb_tier_1_capital_ratio       4.558e-03  2.194e+02 1.101e-03 1.887e-02
non_performing_loan_ratio     2.004e+01  4.990e-02 3.419e+00 1.175e+02
cost_to_income_ratio          1.000e+00  1.000e+00 9.982e-01 1.002e+00
return_on_assets              1.982e-17  5.045e+16 5.806e-21 6.768e-14
liquid_assets_on_total_assets 1.484e-01  6.738e+00 6.986e-02 3.153e-01

Concordance= 0.681  (se = 0.013 )
Rsquare= 0.018   (max possible= 0.656 )
Likelihood ratio test= 138.2  on 5 df,   p=<2e-16
Wald test            = 187.1  on 5 df,   p=<2e-16
Score (logrank) test = 99.52  on 5 df,   p=<2e-16

Chart 4: Measuting Predictors all at once, from Q2 2007 to 2011

Interesting to note how the coefficients remain constant for some variables (Tier 1 capital ratio) and shift for others (ROA).

Predicting Failure

It appears that MSE and traditional regression-related metrics are trickier to quantify for the Cox-PH models, but their appears to be some value in comparing the likelihood-ratio test produced by the fits of the different models as a measure of goodness of fit.

The below are the ratios of the goodness-of-fit of models using a n = 100 training set pulled at random from the full set available at that time. I’m a little unsure about how to interpret the ratios, but I imagine the full model’s highest LR test indicates that even for small training set sizes, the full model fits prediction better than its individual counterparts.

just_rb_tier_1_capital_ratio = coxph(Surv(time, failed) ~ rb_tier_1_capital_ratio, data = training_set)
just_non_performing_loan_ratio = coxph(Surv(time, failed) ~ non_performing_loan_ratio, data = training_set)
just_cost_to_income_ratio = coxph(Surv(time, failed) ~ cost_to_income_ratio, data = training_set)
just_return_on_assets = coxph(Surv(time, failed) ~ return_on_assets, data = training_set)
just_liquid_assets_on_total_assets = coxph(Surv(time, failed) ~ liquid_assets_on_total_assets, data = training_set)
full_model = coxph(Surv(time, failed) ~ ., data = training_set)
print(just_rb_tier_1_capital_ratio)
Call:
coxph(formula = Surv(time, failed) ~ rb_tier_1_capital_ratio, 
    data = training_set)

                         coef exp(coef) se(coef)    z    p
rb_tier_1_capital_ratio 1.148     3.153    0.585 1.96 0.05

Likelihood ratio test=2.28  on 1 df, p=0.1
n= 100, number of events= 10 
print(just_non_performing_loan_ratio)
Call:
coxph(formula = Surv(time, failed) ~ non_performing_loan_ratio, 
    data = training_set)

                             coef exp(coef) se(coef)    z    p
non_performing_loan_ratio    8.82   6787.10    12.39 0.71 0.48

Likelihood ratio test=0.4  on 1 df, p=0.5
n= 100, number of events= 10 
print(just_cost_to_income_ratio)
Call:
coxph(formula = Surv(time, failed) ~ cost_to_income_ratio, data = training_set)

                        coef exp(coef) se(coef)     z    p
cost_to_income_ratio -0.0315    0.9690   0.0266 -1.18 0.24

Likelihood ratio test=1.09  on 1 df, p=0.3
n= 100, number of events= 10 
print(just_return_on_assets)
Call:
coxph(formula = Surv(time, failed) ~ return_on_assets, data = training_set)

                      coef exp(coef)  se(coef)     z    p
return_on_assets -4.46e+01  4.40e-20  3.39e+01 -1.31 0.19

Likelihood ratio test=1.4  on 1 df, p=0.2
n= 100, number of events= 10 
print(just_liquid_assets_on_total_assets)
Call:
coxph(formula = Surv(time, failed) ~ liquid_assets_on_total_assets, 
    data = training_set)

                                 coef exp(coef) se(coef)     z     p
liquid_assets_on_total_assets -2.8173    0.0598   1.3932 -2.02 0.043

Likelihood ratio test=2.8  on 1 df, p=0.09
n= 100, number of events= 10 
print(full_model)
Call:
coxph(formula = Surv(time, failed) ~ ., data = training_set)

                                   coef exp(coef)  se(coef)     z    p
rb_tier_1_capital_ratio        3.06e-02  1.03e+00  1.45e+00  0.02 0.98
non_performing_loan_ratio      1.28e+01  3.51e+05  1.54e+01  0.83 0.41
cost_to_income_ratio          -3.26e-02  9.68e-01  3.32e-02 -0.98 0.33
return_on_assets              -3.40e+00  3.33e-02  6.28e+01 -0.05 0.96
liquid_assets_on_total_assets -2.68e+00  6.87e-02  3.15e+00 -0.85 0.40

Likelihood ratio test=4.47  on 5 df, p=0.5
n= 100, number of events= 10 
LS0tCnRpdGxlOiAiQ294IFByb3BvcnRpb25hbCBIYXphcmRzIE1vZGVsaW5nIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpzb3VyY2UoIlByZS1Qcm9jZXNzaW5nLlIiKQpzb3VyY2UoIkNveCBQcm9wb3J0aW9uYWwgSGF6YXJkIE1vZGVsaW5nLlIiKQpgYGAKClNpbmNlIHRoZSBzdXJpdml2YWwgcGFja2FnZSBpcyByZWxhdGl2ZWx5IHN0cmFpZ2h0LWZvcndhcmQgYW5kIHRoZSB1c2UgY2FzZSBmb3IgcHJlZGljdGluZyBiYW5rIGZhaWx1cmUgaXMgdmVyeSB3ZWxsLXN1aXRlZCB0byBDb3gtUEggbW9kZWxpbmcsIEkgZmlyc3Qgd2FudCB0byBwbGF5IGFyb3VuZCB3aXRoIGZpdHRpbmcgX2FsbF8gdGhlIGRhdGEgd2UgaGF2ZSBmb3IgYSBwYXJ0aWN1bGFyIHByZS1jcmlzaXMgZGF0ZS4gVW5pdmFyaWF0ZSBtb2RlbGluZyB2cy4gbXVsdGktdmFyaWF0ZSBtb2RlbGluZywgYXMgd2UndmUgc2VlbiwgbWlnaHQgZmlyc3Qgc2hvdyB0aGF0IGNlcnRhaW4gb2YgdGhlIENBTUVMIHZhcmlhYmxlcyBhcmUgKDEpIG1vcmUgc3RhdGlzdGljYWxseSBzaWduaWZpY2FudCBwcmVkaWN0b3JzIG9mIGZhaWx1cmUgdGhhbiBvdGhlcnMsIGFuZCAoMikgdGhhdCB0aGVyZSBhcmUgaW50ZXJhY3Rpb25zIGF0IHBsYXkgYmV0d2VlbiB0aGVzZSB2YXJpYWJsZXMuIAoKU29tZSBjb252ZW50aW9ucyBJJ20gYXNzZXJ0aW5nIGFib3V0IHRoZSBkYXRhIGF0IGZpcnN0IGFyZSB0aGF0OiAKCiogT3VyIGB0aW1lYCBmdW5jdGlvbiB3aWxsIG1lYXN1cmUgdGhlIF9kYXlzXyBiZXR3ZWVuIHRoZSBtZWFzdXJlIGRhdGUgYW5kIHRoZSBldmVudCBkYXRlLiBGb3IgZXhhbXBsZSwgdGhlIFNhbiBKb2FxdWluIEJhbmsgKHJzc2RfaWQgPSAyMzI2NiksIHdoaWNoIGZhaWxlZCBvbiA2LzI2LzIwMDksIHdvdWxkIGhhdmUgYSAocm91bmRlciBpbnRlZ2VyKSBgdGltZWAgb2YgNzI3IGlmIHdlIGNvbGxlY3RlZCBkYXRhIGZyb20gMjAwNy0wNi0zMC4KKiBUaGUgdGhyZXNob2xkIG9mIG91ciBtZWFzdXJpbmcgZGF0YSBpbiB0aGUgQ294IG1vZGVsLCB3aGljaCBJJ2xsIGNhbGwgdGhlICJjZW5zb3JpbmcgZGF0ZSIsIGlzIERlY2VtYmVyIDMxLCAyMDEyLiBUaGlzIG1lYW5zIHdlJ3JlIG5vdCBjb25zaWRlcmluZyBhbnkgZGF0YSBwb2ludHMgYWZ0ZXIgdGhhdCBkYXRlLCBhbmQgaWYgYSBiYW5rIGZhaWxlZCBhZnRlciB0aGF0IHBvaW50LCB3ZSBkb24ndCBoYXZlIHZpc2liaWxpdHkgdG8gdGhhdCAoaXQncyAiY2Vuc29yZWQiIGFzIHBlciB0aGUgbGFuZ3VhZ2Ugb2YgdGhlIENveCBtb2RlbCkuIEZvciB0aGVzZSBkYXRhIHBvaW50cywgdGhlIGJhbmsgaXMgImNlbnNvcmVkIiBhbmQgdGhlIGRhdGUgb2YgdGhlICJldmVudCIgaXMganVzdCAyMDEyLTEyLTMxLiAKCiMjIyMgQnVpbGRpbmcgVW5pdmFyaWF0ZSBNb2RlbHMKCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CmJ1aWxkX3VuaXZhcmlhdGVfbW9kZWxzKCkKYGBgCgoqQ2hhcnQgMTogTWVhc3VyaW5nIHByZWRpY3RvcnMgb24gUTIgMjAwNyoKClNvbWUgY3VzdG9tIGNvZGUgSSd2ZSBhZGFwdGVkIHRvIHByb2R1Y2UgZm9yIHRoZSBmaXZlIENBTUVMIHZhcmlhYmxlcyBpbiBxdWVzdGlvbi4gRWFjaCByb3cgYWJvdmUgcmVwcmVzZW50cyBhIENveFBIIG1vZGVsIGZpdHRlZCB3aXRoIF9vbmx5XyB0aGUgQ0FNRUwgdmFyaWFibGUgaW4gdGhhdCByb3cgLS0gYWxvbmcgd2l0aCB0aGUgYmV0YSBhbmQgdGhlIGhhemFyZCByYXRpbywgYW5kIHRoZSBvdXRwdXRzIGZvciB0aGUgV2FsZCB0ZXN0IGFuZCB0aGUgcC52YWx1ZSBvZiB0aGUgbW9kZWwgZm9yIHRoYXQgYmV0YSdzIHNpZ25pZmljYW5jZS4gVGhpcyBpcyBhIHNhbml0eSBjaGVjayBzaW5jZSB0aGUgcmVzdWx0cyBhcmUgdmVyeSBpbnR1aXRpdmU6IAoKKiBIZXJlLCBuZWdhdGl2ZSBfYmV0YXNfIGltcGx5IGEgZGVjcmVhc2VkIGxpa2VsaWhvb2Qgb2YgdGhlIGV2ZW50IChmYWlsdXJlKSBhcyB0aGUgdmFyaWFibGUgaW5jcmVhc2VzLCBhbmQgdGhlIG1hZ25pdHVkZSBkZXNjcmliZXMgdGhhdCBlZmZlY3QncyBzaXplLiBTaW5jZSBST0EgYW5kIFRpZXIgMSBjYXBpdGFsIHJhdGlvIGFyZSBib3RoIGhlYWx0aHkgYnVzaW5lc3MgbWV0cmljcyB3aGVuIHRoZXkgYXJlIGhpZ2gsIHRoZSBuZWdhdGl2ZXMgaGVyZSBhcmUgdW5zdXJwcmlzaW5nLiBTYW1lIGlzIHRydWUgZm9yIHRoZSBwb3NpdGl2ZSBub24tcGVyZm9ybWluZyBsb2FuIHJhdGlvLiAKKiBDb3N0LXRvLWluY29tZSBiYXJlbHkgbWF0dGVycyAtLSB3ZSBnZXQgYSBzZW5zZSBvZiB3aGljaCB2YXJpYWJsZXMgc2VlbSB0byBiZSB2YWx1YWJsZSBhbmQgd2hpY2ggYXJlbid0LiAKCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CmJ1aWxkX3VuaXZhcmlhdGVfbW9kZWxzKCkKYGBgCgoqQ2hhcnQgMjogTWVhc3VyaW5nIHByZWRpY3RvcnMgb24gUTQgMjAwNioKCipJbnRlcmVzdGluZyBmaW5kaW5nKiAtLSB0aGVzZSBiZXRhcywgYW5kIHNpZ25pZmljYW5jZSBsZXZlbHMsIGFyZSBoaWdobHkgZGVwZW5kZW50IG9uIHRoZSBzZWxlY3RlZCB0aW1lZnJhbWUuIEluIENoYXJ0IDEsIDQvNSB2YXJpYWJsZXMgYXJlIHNpZ25pZmljYW50IGJ5IHRoZW1zZWx2ZXMgYXQgdGhlIDAuMDEgbGV2ZWwuIFdoZW4gd2UgY29uc2lkZXIgbWVhc3VyZW1lbnRzIGluIDEyLzMxLzIwMDYsIG9ubHkgMi81IGFyZS4gCgpGaW5hbGx5LCBvdXQgb2YgY3VyaW9zaXR5LCBJIHdvbmRlciBpZiB0aGlzIHR5cGUgb2YgbW9kZWwgaXMgc2Vuc2l0aXZlIHRvIHRoZSBzY2FsaW5nIG9mIHRoZSBkYXRhIGJ5IHRpbWUgcGVyaW9kLiBJbnN0ZWFkIG9mIGRheXMsIGxldCdzIGJ1Y2tldCB0aGUgZmFpbHVyZXMgKG9yIGxhY2sgb2YgZmFpbHVyZXMpIGJ5IHRoZSBudW1iZXIgb2Ygd2Vla3MgcmF0aGVyIHRoYW4gZGF5cyAoc2luY2UgdGhlIHplcm8gZGF0ZSkuIFdpdGggdGhlIFEyMjAwNiB6ZXJvLWRhdGUgYWdhaW46CgpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpidWlsZF91bml2YXJpYXRlX21vZGVscygpCmBgYAoKKkNoYXJ0IDM6IE1lYXN1cmluZyBwcmVkaWN0b3JzIG9uIFEyIDIwMDcsIHdlZWtzIChpbnN0ZWFkIG9mIG1vbnRocykqCgpMaXRlcmFsbHkgaW5kZW50aWNhbCByZXN1bHRzISBGb3JnZXQgSSBzYWlkIGFueXRoaW5nLgoKIyMjIyBCdWlsZGluZyBNdWx0aXZhcmlhdGUgTW9kZWxzCgpIZXJlLCB3ZSdsbCB0cnkgdGhlIHNhbWUgemVyby1kYXRlIG9mIFEyIDIwMDYgYW5kIHJ1biB0aGUgQ294LVBIIG1vZGVsIG9uIGFsbCA1IHZhcmlhYmxlcyB0byBpbmZvcm0gYWJvdXQgaG93IHRoZXNlIHZhcmlhYmxlcyBtaWdodCBpbnRlcmFjdC4gVGhlIHN1bW1hcnkoKSBmdW5jdGlvbiBwcm92aWRlcyBtb3N0IG9mIHdoYXQgd2UnZCB3YW50IHRvIGtub3c6CgpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQphbGxfdmFyaWFibGVfbW9kZWwgPSBjb3hwaChTdXJ2KHRpbWUsIGZhaWxlZCkgfiAuLCBkYXRhID0gZGF0YXNldCkKc3VtbWFyeShhbGxfdmFyaWFibGVfbW9kZWwpCmBgYAoKKkNoYXJ0IDQ6IE1lYXN1dGluZyBQcmVkaWN0b3JzIGFsbCBhdCBvbmNlLCBmcm9tIFEyIDIwMDcgdG8gMjAxMSoKCkludGVyZXN0aW5nIHRvIG5vdGUgaG93IHRoZSBjb2VmZmljaWVudHMgcmVtYWluIGNvbnN0YW50IGZvciBzb21lIHZhcmlhYmxlcyAoVGllciAxIGNhcGl0YWwgcmF0aW8pIGFuZCBzaGlmdCBmb3Igb3RoZXJzIChST0EpLiAKCiMjIyMgUHJlZGljdGluZyBGYWlsdXJlCgpJdCBhcHBlYXJzIHRoYXQgTVNFIGFuZCB0cmFkaXRpb25hbCByZWdyZXNzaW9uLXJlbGF0ZWQgbWV0cmljcyBhcmUgdHJpY2tpZXIgdG8gcXVhbnRpZnkgZm9yIHRoZSBDb3gtUEggbW9kZWxzLCBidXQgdGhlaXIgYXBwZWFycyB0byBiZSBzb21lIHZhbHVlIGluIGNvbXBhcmluZyB0aGUgbGlrZWxpaG9vZC1yYXRpbyB0ZXN0IHByb2R1Y2VkIGJ5IHRoZSBmaXRzIG9mIHRoZSBkaWZmZXJlbnQgbW9kZWxzIGFzIGEgbWVhc3VyZSBvZiBnb29kbmVzcyBvZiBmaXQuIAoKVGhlIGJlbG93IGFyZSB0aGUgcmF0aW9zIG9mIHRoZSBnb29kbmVzcy1vZi1maXQgb2YgbW9kZWxzIHVzaW5nIGEgbiA9IDEwMCB0cmFpbmluZyBzZXQgcHVsbGVkIGF0IHJhbmRvbSBmcm9tIHRoZSBmdWxsIHNldCBhdmFpbGFibGUgYXQgdGhhdCB0aW1lLiBJJ20gYSBsaXR0bGUgdW5zdXJlIGFib3V0IGhvdyB0byBpbnRlcnByZXQgdGhlIHJhdGlvcywgYnV0IEkgaW1hZ2luZSB0aGUgZnVsbCBtb2RlbCdzIGhpZ2hlc3QgTFIgdGVzdCBpbmRpY2F0ZXMgdGhhdCBldmVuIGZvciBzbWFsbCB0cmFpbmluZyBzZXQgc2l6ZXMsIHRoZSBmdWxsIG1vZGVsIGZpdHMgcHJlZGljdGlvbiBiZXR0ZXIgdGhhbiBpdHMgaW5kaXZpZHVhbCBjb3VudGVycGFydHMuIAoKYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KanVzdF9yYl90aWVyXzFfY2FwaXRhbF9yYXRpbyA9IGNveHBoKFN1cnYodGltZSwgZmFpbGVkKSB+IHJiX3RpZXJfMV9jYXBpdGFsX3JhdGlvLCBkYXRhID0gdHJhaW5pbmdfc2V0KQpqdXN0X25vbl9wZXJmb3JtaW5nX2xvYW5fcmF0aW8gPSBjb3hwaChTdXJ2KHRpbWUsIGZhaWxlZCkgfiBub25fcGVyZm9ybWluZ19sb2FuX3JhdGlvLCBkYXRhID0gdHJhaW5pbmdfc2V0KQpqdXN0X2Nvc3RfdG9faW5jb21lX3JhdGlvID0gY294cGgoU3Vydih0aW1lLCBmYWlsZWQpIH4gY29zdF90b19pbmNvbWVfcmF0aW8sIGRhdGEgPSB0cmFpbmluZ19zZXQpCmp1c3RfcmV0dXJuX29uX2Fzc2V0cyA9IGNveHBoKFN1cnYodGltZSwgZmFpbGVkKSB+IHJldHVybl9vbl9hc3NldHMsIGRhdGEgPSB0cmFpbmluZ19zZXQpCmp1c3RfbGlxdWlkX2Fzc2V0c19vbl90b3RhbF9hc3NldHMgPSBjb3hwaChTdXJ2KHRpbWUsIGZhaWxlZCkgfiBsaXF1aWRfYXNzZXRzX29uX3RvdGFsX2Fzc2V0cywgZGF0YSA9IHRyYWluaW5nX3NldCkKZnVsbF9tb2RlbCA9IGNveHBoKFN1cnYodGltZSwgZmFpbGVkKSB+IC4sIGRhdGEgPSB0cmFpbmluZ19zZXQpCmBgYAoKYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KcHJpbnQoanVzdF9yYl90aWVyXzFfY2FwaXRhbF9yYXRpbykKcHJpbnQoanVzdF9ub25fcGVyZm9ybWluZ19sb2FuX3JhdGlvKQpwcmludChqdXN0X2Nvc3RfdG9faW5jb21lX3JhdGlvKQpwcmludChqdXN0X3JldHVybl9vbl9hc3NldHMpCnByaW50KGp1c3RfbGlxdWlkX2Fzc2V0c19vbl90b3RhbF9hc3NldHMpCnByaW50KGZ1bGxfbW9kZWwpCmBgYAoKCgo=