Introduction
Banking crises can shake entire economies, leading to widespread financial trouble and uncertainty. By understanding the economic warning signs, such as rising credit growth or inflation, we can better anticipate when a banking crisis might happen. In this project, we’ll explore a range of economic factors to see which ones are most linked to banking crises. Using modern machine learning models, we’ll aim to create a tool that can predict the likelihood of a crisis and give early warnings to help financial institutions and policymakers make informed decisions.
Objectives
Understand Key Economic Factors: We’ll start by diving into various economic indicators—like credit growth, GDP growth, and inflation—to see how they influence the risk of banking crises.
Build Predictive Models: Using machine learning techniques like Random Forest and XGBoost, we’ll create models that can predict the chances of a crisis based on these economic signals.
Improve Model Accuracy: We’ll fine-tune and test different models to make sure they’re as accurate as possible, comparing their performance using metrics like error rates and accuracy.
Handle Imbalanced Data: Since crises don’t happen as often, our data might be imbalanced. We’ll use methods like oversampling and undersampling to make sure our models can still make reliable predictions.
Draw Conclusions and Provide Insights from Machine Learning Models: Finally, we’ll highlight which economic factors are the strongest predictors of determining early signs of banking crises and offer practical insights for how these can be used to give early warnings.
library(tidyverse)# data manipulation
library(caret) #Statistics and Machine Learning
library(randomForest)#fitting random forest models
#import dataset
The_Determinants_of_Systemic_Banking_Crisis <- read_csv("The_Determinants_of_Systemic_Banking_Crisis.csv")
#rename the dataset
banking_crisis <- The_Determinants_of_Systemic_Banking_Crisis
#go through the dataset
head(banking_crisis, 8)
## # A tibble: 8 × 100
## id country countrycode year rr_full rr_ini rr_cris rr_mult lv_full lv_ini
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Algeria DZA 1982 0 0 0 0 0 0
## 2 1 Algeria DZA 1983 0 0 0 0 0 0
## 3 1 Algeria DZA 1984 0 0 0 0 0 0
## 4 1 Algeria DZA 1985 0 0 0 0 0 0
## 5 1 Algeria DZA 1986 0 0 0 0 0 0
## 6 1 Algeria DZA 1987 0 0 0 0 0 0
## 7 1 Algeria DZA 1988 0 0 0 0 0 0
## 8 1 Algeria DZA 1989 0 0 0 0 0 0
## # ℹ 90 more variables: lv_cris <dbl>, lv_mult <dbl>, ck_full <dbl>,
## # ck_ini <dbl>, ck_cris <dbl>, ck_mult <dbl>, dd_full <dbl>, dd_ini <dbl>,
## # dd_cris <dbl>, dd_mult <dbl>, rgdpgr <dbl>, wrgdpgr <dbl>, frgdpgr <dbl>,
## # wfrgdpgr <dbl>, lngdppc <dbl>, wlngdppc <dbl>, flngdppc <dbl>,
## # wflngdppc <dbl>, deprec <dbl>, wdeprec <dbl>, fdeprec <dbl>,
## # wfdeprec <dbl>, m2res <dbl>, wm2res <dbl>, fm2res <dbl>, wfm2res <dbl>,
## # wrir <dbl>, rir <dbl>, frir <dbl>, wfrir <dbl>, totch <dbl>, …
tail(banking_crisis, 8)
## # A tibble: 8 × 100
## id country countrycode year rr_full rr_ini rr_cris rr_mult lv_full lv_ini
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 92 Zambia ZMB 2003 0 0 0 0 0 0
## 2 92 Zambia ZMB 2004 0 0 0 0 0 0
## 3 92 Zambia ZMB 2005 0 0 0 0 0 0
## 4 92 Zambia ZMB 2006 0 0 0 0 0 0
## 5 92 Zambia ZMB 2007 0 0 0 0 0 0
## 6 92 Zambia ZMB 2008 0 0 0 0 0 0
## 7 92 Zambia ZMB 2009 0 0 0 0 0 0
## 8 92 Zambia ZMB 2010 0 0 0 0 0 0
## # ℹ 90 more variables: lv_cris <dbl>, lv_mult <dbl>, ck_full <dbl>,
## # ck_ini <dbl>, ck_cris <dbl>, ck_mult <dbl>, dd_full <dbl>, dd_ini <dbl>,
## # dd_cris <dbl>, dd_mult <dbl>, rgdpgr <dbl>, wrgdpgr <dbl>, frgdpgr <dbl>,
## # wfrgdpgr <dbl>, lngdppc <dbl>, wlngdppc <dbl>, flngdppc <dbl>,
## # wflngdppc <dbl>, deprec <dbl>, wdeprec <dbl>, fdeprec <dbl>,
## # wfdeprec <dbl>, m2res <dbl>, wm2res <dbl>, fm2res <dbl>, wfm2res <dbl>,
## # wrir <dbl>, rir <dbl>, frir <dbl>, wfrir <dbl>, totch <dbl>, …
sample_n(banking_crisis, 8)
## # A tibble: 8 × 100
## id country countrycode year rr_full rr_ini rr_cris rr_mult lv_full lv_ini
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 45 India IND 2005 0 0 0 0 0 0
## 2 41 Greece GRC 1988 0 0 0 0 0 0
## 3 18 Cameroon CMR 1988 1 0 NA 2 1 0
## 4 77 Sri Lan… LKA 2008 0 0 0 0 0 0
## 5 32 Ecuador ECU 1997 0 0 0 0 0 0
## 6 82 Thailand THA 1999 1 0 NA 2 1 0
## 7 26 Congo, … COG 2000 1 0 NA 2 0 0
## 8 79 Sweden SWE 2005 0 0 0 0 0 0
## # ℹ 90 more variables: lv_cris <dbl>, lv_mult <dbl>, ck_full <dbl>,
## # ck_ini <dbl>, ck_cris <dbl>, ck_mult <dbl>, dd_full <dbl>, dd_ini <dbl>,
## # dd_cris <dbl>, dd_mult <dbl>, rgdpgr <dbl>, wrgdpgr <dbl>, frgdpgr <dbl>,
## # wfrgdpgr <dbl>, lngdppc <dbl>, wlngdppc <dbl>, flngdppc <dbl>,
## # wflngdppc <dbl>, deprec <dbl>, wdeprec <dbl>, fdeprec <dbl>,
## # wfdeprec <dbl>, m2res <dbl>, wm2res <dbl>, fm2res <dbl>, wfm2res <dbl>,
## # wrir <dbl>, rir <dbl>, frir <dbl>, wfrir <dbl>, totch <dbl>, …
SUMMARY OF THE DATASET In predicting the determinants of a banking crisis, several economic, financial, and structural factors are typically influential.
1. Macroeconomic Indicators: GDP Growth (rgdpgr, frgdpgr, wrgdpgr, etc.): A decline or slowdown in GDP growth can signal economic stress, increasing the likelihood of a banking crisis. Inflation Rate (infl, finfl, winfl, etc.): High inflation can erode purchasing power and affect the stability of the financial system, contributing to crises. Exchange Rate Depreciation (deprec, fdeprec, etc.): Large depreciations can affect a country’s ability to repay foreign debts, leading to banking instability. Interest Rates (ir, fir, wir): Rising interest rates can increase borrowing costs and the likelihood of defaults, pressuring the banking sector.
2. Financial Sector Variables: Credit Growth (credgr, fcredgr, etc.): Rapid credit growth, especially if unsustainable, can signal a buildup of financial imbalances, increasing the risk of a crisis. Banking Leverage (lv_full, lv_cris, etc.): High levels of leverage in the banking system increase vulnerability during financial shocks, leading to crises. Liquidity Levels (liq, fliq, etc.): Insufficient liquidity or a liquidity crunch can lead to bank failures or the inability of banks to meet withdrawal demands.
3. External Sector Vulnerabilities: Current Account Balance (cab, wcab, etc.): Persistent current account deficits can indicate external imbalances and dependence on foreign capital, which can trigger crises. Net Foreign Assets (nfagdp, wnfagdp, etc.): Negative foreign asset positions can reflect external debt burdens, which make economies vulnerable to crises. Terms of Trade (totch, wtotch, etc.): A significant deterioration in the terms of trade can hurt export earnings and affect the country’s balance of payments, leading to instability.
4. Banking System Health: Non-performing Loans (NPLs) (npl, fnpl, etc.): An increase in NPLs indicates that borrowers are unable to repay loans, a direct precursor to banking crises. Capital Adequacy (ck_full, ck_cris, etc.): Lower capital adequacy ratios make banks more vulnerable to shocks, as they have less buffer to absorb losses.
5. Global Factors: Global Economic Growth (wrgdpgr, wfrgdpgr): A global slowdown or recession can impact domestic banking systems, especially in countries with significant external trade or financial links. Global Financial Conditions (fir, wliq): Global financial tightening, such as rising interest rates or reduced liquidity, can put pressure on banking systems, especially in emerging markets.
CLEAN THE DATASET
#count the missing values
sum(is.na(banking_crisis))
## [1] 7281
#remove missing values
banking_crisis <- na.omit(banking_crisis)
#count for duplicates
n_distinct(banking_crisis)
## [1] 1205
#remove duplicates
banking_crisis <- distinct(banking_crisis)
#summary statistics
summary(banking_crisis)
## id country countrycode year
## Min. : 1.00 Length:1205 Length:1205 Min. :1982
## 1st Qu.:26.00 Class :character Class :character 1st Qu.:1987
## Median :49.00 Mode :character Mode :character Median :1994
## Mean :48.96 Mean :1994
## 3rd Qu.:75.00 3rd Qu.:2001
## Max. :92.00 Max. :2005
## rr_full rr_ini rr_cris rr_mult
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.06224 Mean :0.06224 Mean :0.06224 Mean :0.06224
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## lv_full lv_ini lv_cris lv_mult
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.03651 Mean :0.03651 Mean :0.03651 Mean :0.03651
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## ck_full ck_ini ck_cris ck_mult
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.03402 Mean :0.03402 Mean :0.03402 Mean :0.03402
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## dd_full dd_ini dd_cris dd_mult
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.04564 Mean :0.04564 Mean :0.04564 Mean :0.04564
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## rgdpgr wrgdpgr frgdpgr wfrgdpgr
## Min. :-13.130 Min. :-9.760 Min. :-13.130 Min. :-8.130
## 1st Qu.: 1.620 1st Qu.: 1.620 1st Qu.: 1.620 1st Qu.: 1.620
## Median : 3.670 Median : 3.670 Median : 3.670 Median : 3.670
## Mean : 3.686 Mean : 3.656 Mean : 3.686 Mean : 3.642
## 3rd Qu.: 5.550 3rd Qu.: 5.550 3rd Qu.: 5.550 3rd Qu.: 5.550
## Max. : 33.630 Max. :16.730 Max. : 33.630 Max. :14.000
## lngdppc wlngdppc flngdppc wflngdppc
## Min. : 4.86 Min. : 4.86 Min. : 4.86 Min. : 4.88
## 1st Qu.: 6.16 1st Qu.: 6.16 1st Qu.: 6.16 1st Qu.: 6.16
## Median : 7.86 Median : 7.86 Median : 7.86 Median : 7.86
## Mean : 7.84 Mean : 7.84 Mean : 7.84 Mean : 7.84
## 3rd Qu.: 9.70 3rd Qu.: 9.70 3rd Qu.: 9.70 3rd Qu.: 9.70
## Max. :10.59 Max. :10.55 Max. :10.59 Max. :10.53
## deprec wdeprec fdeprec wfdeprec
## Min. : -29.35 Min. :-22.92 Min. : -29.35 Min. :-21.56
## 1st Qu.: -0.15 1st Qu.: -0.15 1st Qu.: -0.15 1st Qu.: -0.15
## Median : 4.17 Median : 4.17 Median : 4.17 Median : 4.17
## Mean : 26.88 Mean : 13.24 Mean : 26.88 Mean : 12.48
## 3rd Qu.: 14.10 3rd Qu.: 14.10 3rd Qu.: 14.10 3rd Qu.: 14.10
## Max. :13931.90 Max. :302.05 Max. :13931.90 Max. :215.89
## m2res wm2res fm2res wfm2res
## Min. : 0.68 Min. : 0.68 Min. : 0.68 Min. : 0.68
## 1st Qu.: 2.78 1st Qu.: 2.78 1st Qu.: 2.78 1st Qu.: 2.78
## Median : 5.14 Median : 5.14 Median : 5.14 Median : 5.14
## Mean : 10.65 Mean :10.48 Mean : 10.65 Mean :10.34
## 3rd Qu.: 11.00 3rd Qu.:11.00 3rd Qu.: 11.00 3rd Qu.:11.00
## Max. :148.31 Max. :99.05 Max. :148.31 Max. :82.85
## wrir rir frir wfrir
## Min. :-69.440 Min. :-209.890 Min. :-209.890 Min. :-51.270
## 1st Qu.: -0.010 1st Qu.: -0.010 1st Qu.: -0.010 1st Qu.: -0.010
## Median : 3.610 Median : 3.610 Median : 3.610 Median : 3.610
## Mean : 2.589 Mean : 6.127 Mean : 6.127 Mean : 2.805
## 3rd Qu.: 7.440 3rd Qu.: 7.440 3rd Qu.: 7.440 3rd Qu.: 7.440
## Max. : 47.060 Max. :4635.860 Max. :4635.860 Max. : 40.180
## totch wtotch ftotch wftotch
## Min. :-73.450 Min. :-51.940 Min. :-73.450 Min. :-43.01
## 1st Qu.: -5.020 1st Qu.: -5.020 1st Qu.: -5.020 1st Qu.: -5.02
## Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.00
## Mean : 1.968 Mean : 1.212 Mean : 1.968 Mean : 1.07
## 3rd Qu.: 4.310 3rd Qu.: 4.310 3rd Qu.: 4.310 3rd Qu.: 4.31
## Max. :570.360 Max. : 98.020 Max. :570.360 Max. : 74.18
## infl winfl finfl wfinfl
## Min. : -29.17 Min. : -9.82 Min. : -29.17 Min. : -7.73
## 1st Qu.: 2.39 1st Qu.: 2.39 1st Qu.: 2.39 1st Qu.: 2.39
## Median : 5.59 Median : 5.59 Median : 5.59 Median : 5.59
## Mean : 27.00 Mean : 13.65 Mean : 27.00 Mean : 12.98
## 3rd Qu.: 11.59 3rd Qu.: 11.59 3rd Qu.: 11.59 3rd Qu.: 11.59
## Max. :12338.70 Max. :204.10 Max. :12338.70 Max. :149.87
## liq wliq fliq wfliq
## Min. : 15.93 Min. : 16.67 Min. : 15.93 Min. : 18.02
## 1st Qu.: 75.54 1st Qu.: 75.54 1st Qu.: 75.54 1st Qu.: 75.54
## Median :100.16 Median :100.16 Median :100.16 Median :100.16
## Mean :105.73 Mean :105.18 Mean :105.73 Mean :105.12
## 3rd Qu.:128.23 3rd Qu.:128.23 3rd Qu.:128.23 3rd Qu.:128.23
## Max. :429.56 Max. :305.28 Max. :429.56 Max. :298.94
## credgr wcredgr fcredgr wfcredgr
## Min. :-70.070 Min. :-47.920 Min. :-70.070 Min. :-36.760
## 1st Qu.: -2.680 1st Qu.: -2.680 1st Qu.: -2.680 1st Qu.: -2.680
## Median : 2.380 Median : 2.380 Median : 2.380 Median : 2.380
## Mean : 3.019 Mean : 2.813 Mean : 3.019 Mean : 2.747
## 3rd Qu.: 7.400 3rd Qu.: 7.400 3rd Qu.: 7.400 3rd Qu.: 7.400
## Max. :153.260 Max. : 64.080 Max. :153.260 Max. : 49.710
## nfagdp wnfagdp fnfagdp wfnfagdp
## Min. :-128.220 Min. :-52.660 Min. :-128.220 Min. :-43.450
## 1st Qu.: -2.800 1st Qu.: -2.800 1st Qu.: -2.800 1st Qu.: -2.800
## Median : 4.440 Median : 4.440 Median : 4.440 Median : 4.440
## Mean : 4.921 Mean : 5.022 Mean : 4.921 Mean : 5.054
## 3rd Qu.: 11.810 3rd Qu.: 11.810 3rd Qu.: 11.810 3rd Qu.: 11.810
## Max. : 91.280 Max. : 91.280 Max. : 91.280 Max. : 89.960
## var61 var62 var63 durfin
## Min. : 1.00 Min. :1982 Min. :0.00000 Min. : 1.000
## 1st Qu.:26.00 1st Qu.:1987 1st Qu.:0.00000 1st Qu.: 3.000
## Median :49.00 Median :1994 Median :0.00000 Median : 4.000
## Mean :48.96 Mean :1994 Mean :0.06224 Mean : 5.349
## 3rd Qu.:75.00 3rd Qu.:2001 3rd Qu.:0.00000 3rd Qu.: 8.000
## Max. :92.00 Max. :2005 Max. :1.00000 Max. :14.000
## sub anni _merge duratamediaperpaese
## Min. :1.000 Min. : 1.000 Length:1205 Min. :0.03448
## 1st Qu.:2.000 1st Qu.: 3.000 Class :character 1st Qu.:0.10345
## Median :3.000 Median : 4.000 Mode :character Median :0.13793
## Mean :2.716 Mean : 5.349 Mean :0.18443
## 3rd Qu.:3.000 3rd Qu.: 8.000 3rd Qu.:0.27586
## Max. :4.000 Max. :14.000 Max. :0.48276
## _est_spec2 _est_spec3 pra prb
## Min. :1 Min. :1 Min. :0.0006858 Min. :0.006388
## 1st Qu.:1 1st Qu.:1 1st Qu.:0.8160086 1st Qu.:0.026949
## Median :1 Median :1 Median :0.8567500 Median :0.033813
## Mean :1 Mean :1 Mean :0.8387779 Mean :0.039087
## 3rd Qu.:1 3rd Qu.:1 3rd Qu.:0.8857065 3rd Qu.:0.044282
## Max. :1 Max. :1 Max. :0.9684955 Max. :0.973115
## prc dur dur1 cris
## Min. :0.02502 Min. :0 Min. : 0.000 Min. :0.00000
## 1st Qu.:0.08612 1st Qu.:0 1st Qu.: 2.000 1st Qu.:0.00000
## Median :0.10934 Median :0 Median : 3.000 Median :0.00000
## Mean :0.12213 Mean :0 Mean : 3.918 Mean :0.06224
## 3rd Qu.:0.14049 3rd Qu.:0 3rd Qu.: 6.000 3rd Qu.:0.00000
## Max. :0.49858 Max. :0 Max. :13.000 Max. :1.00000
## cris1 avedur xxx duration _est_spec1
## Min. :1.000 Min. : 0.500 Min. : 697 Min. : 1.000 Min. :1
## 1st Qu.:1.000 1st Qu.: 2.000 1st Qu.:1149 1st Qu.: 3.000 1st Qu.:1
## Median :1.000 Median : 3.000 Median :1575 Median : 4.000 Median :1
## Mean :1.431 Mean : 4.009 Mean :1603 Mean : 5.359 Mean :1
## 3rd Qu.:2.000 3rd Qu.: 5.000 3rd Qu.:2064 3rd Qu.: 8.000 3rd Qu.:1
## Max. :4.000 Max. :14.000 Max. :2663 Max. :14.000 Max. :1
## _est_all _est_partial pr0 pr1
## Min. :1 Min. :1 Min. :0.1835 Min. :0.003124
## 1st Qu.:1 1st Qu.:1 1st Qu.:0.8554 1st Qu.:0.018314
## Median :1 Median :1 Median :0.8943 Median :0.025199
## Mean :1 Mean :1 Mean :0.8714 Mean :0.031051
## 3rd Qu.:1 3rd Qu.:1 3rd Qu.:0.9178 3rd Qu.:0.033714
## Max. :1 Max. :1 Max. :0.9854 Max. :0.691349
## pr2 prlogit1 prlogit2 class
## Min. :0.01098 Min. :0.00439 Min. :0.003248 Min. :1.000
## 1st Qu.:0.05757 1st Qu.:0.01952 1st Qu.:0.020200 1st Qu.:1.000
## Median :0.07999 Median :0.02554 Median :0.027911 Median :1.000
## Mean :0.09751 Mean :0.03168 Mean :0.035487 Mean :1.046
## 3rd Qu.:0.11294 3rd Qu.:0.03572 3rd Qu.:0.038132 3rd Qu.:1.000
## Max. :0.58120 Max. :0.52721 Max. :0.733841 Max. :2.000
## binaryclass prob1 prob2 prob3
## Min. :0.00000 Min. :0.4695 Min. :0.4200 Min. :0.4484
## 1st Qu.:0.00000 1st Qu.:0.4943 1st Qu.:0.4422 1st Qu.:0.4590
## Median :0.00000 Median :0.4998 Median :0.4920 Median :0.4708
## Mean :0.04564 Mean :0.4999 Mean :0.4927 Mean :0.4960
## 3rd Qu.:0.00000 3rd Qu.:0.5058 3rd Qu.:0.5383 3rd Qu.:0.5004
## Max. :1.00000 Max. :0.5344 Max. :0.5727 Max. :0.8107
## prob4 prob5 prob6 prob7
## Min. :0.4258 Min. :0.3661 Min. :0.3060 Min. :0.07767
## 1st Qu.:0.4568 1st Qu.:0.4544 1st Qu.:0.4657 1st Qu.:0.43600
## Median :0.4666 Median :0.4935 Median :0.4910 Median :0.48791
## Mean :0.4878 Mean :0.5007 Mean :0.4928 Mean :0.48557
## 3rd Qu.:0.4852 3rd Qu.:0.5381 3rd Qu.:0.5161 3rd Qu.:0.53918
## Max. :0.8399 Max. :0.7756 Max. :0.7131 Max. :0.78755
## prob8 prob9 prob10
## Min. :0.2668 Min. :0.4258 Min. :0.08883
## 1st Qu.:0.4785 1st Qu.:0.4568 1st Qu.:0.37477
## Median :0.4948 Median :0.4666 Median :0.45415
## Mean :0.4916 Mean :0.4878 Mean :0.46274
## 3rd Qu.:0.5121 3rd Qu.:0.4852 3rd Qu.:0.53513
## Max. :0.6546 Max. :0.8399 Max. :0.98872
#check
unique(banking_crisis)
## # A tibble: 1,205 × 100
## id country countrycode year rr_full rr_ini rr_cris rr_mult lv_full lv_ini
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Algeria DZA 1982 0 0 0 0 0 0
## 2 1 Algeria DZA 1983 0 0 0 0 0 0
## 3 1 Algeria DZA 1984 0 0 0 0 0 0
## 4 1 Algeria DZA 1985 0 0 0 0 0 0
## 5 1 Algeria DZA 1986 0 0 0 0 0 0
## 6 1 Algeria DZA 1987 0 0 0 0 0 0
## 7 1 Algeria DZA 1988 0 0 0 0 0 0
## 8 1 Algeria DZA 1989 0 0 0 0 0 0
## 9 1 Algeria DZA 1990 1 1 1 1 1 1
## 10 1 Algeria DZA 1995 0 0 0 0 0 0
## # ℹ 1,195 more rows
## # ℹ 90 more variables: lv_cris <dbl>, lv_mult <dbl>, ck_full <dbl>,
## # ck_ini <dbl>, ck_cris <dbl>, ck_mult <dbl>, dd_full <dbl>, dd_ini <dbl>,
## # dd_cris <dbl>, dd_mult <dbl>, rgdpgr <dbl>, wrgdpgr <dbl>, frgdpgr <dbl>,
## # wfrgdpgr <dbl>, lngdppc <dbl>, wlngdppc <dbl>, flngdppc <dbl>,
## # wflngdppc <dbl>, deprec <dbl>, wdeprec <dbl>, fdeprec <dbl>,
## # wfdeprec <dbl>, m2res <dbl>, wm2res <dbl>, fm2res <dbl>, wfm2res <dbl>, …
#strucure of the data
str(banking_crisis)
## tibble [1,205 × 100] (S3: tbl_df/tbl/data.frame)
## $ id : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ country : chr [1:1205] "Algeria" "Algeria" "Algeria" "Algeria" ...
## $ countrycode : chr [1:1205] "DZA" "DZA" "DZA" "DZA" ...
## $ year : num [1:1205] 1982 1983 1984 1985 1986 ...
## $ rr_full : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ rr_ini : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ rr_cris : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ rr_mult : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ lv_full : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ lv_ini : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ lv_cris : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ lv_mult : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ ck_full : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ ck_ini : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ ck_cris : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ ck_mult : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ dd_full : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ dd_ini : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ dd_cris : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ dd_mult : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ rgdpgr : num [1:1205] 3 6.4 5.4 5.6 3.7 0.4 -0.7 -1 4.4 -0.9 ...
## $ wrgdpgr : num [1:1205] 3 6.4 5.4 5.6 3.7 0.4 -0.7 -1 4.4 -0.9 ...
## $ frgdpgr : num [1:1205] 3 6.4 5.4 5.6 3.7 0.4 -0.7 -1 4.4 -0.9 ...
## $ wfrgdpgr : num [1:1205] 3 6.4 5.4 5.6 3.7 0.4 -0.7 -1 4.4 -0.9 ...
## $ lngdppc : num [1:1205] 7.53 7.56 7.58 7.61 7.61 7.59 7.55 7.51 7.53 7.4 ...
## $ wlngdppc : num [1:1205] 7.53 7.56 7.58 7.61 7.61 7.59 7.55 7.51 7.53 7.4 ...
## $ flngdppc : num [1:1205] 7.53 7.56 7.58 7.61 7.61 7.59 7.55 7.51 7.53 7.4 ...
## $ wflngdppc : num [1:1205] 7.53 7.56 7.58 7.61 7.61 7.59 7.55 7.51 7.53 7.4 ...
## $ deprec : num [1:1205] 12.47 6.4 4.28 4.06 0.89 ...
## $ wdeprec : num [1:1205] 12.47 6.4 4.28 4.06 0.89 ...
## $ fdeprec : num [1:1205] 12.47 6.4 4.28 4.06 0.89 ...
## $ wfdeprec : num [1:1205] 12.47 6.4 4.28 4.06 0.89 ...
## $ m2res : num [1:1205] 4.28 6.04 8.64 12.27 9.59 ...
## $ wm2res : num [1:1205] 4.28 6.04 8.64 12.27 9.59 ...
## $ fm2res : num [1:1205] 4.28 6.04 8.64 12.27 9.59 ...
## $ wfm2res : num [1:1205] 4.28 6.04 8.64 12.27 9.59 ...
## $ wrir : num [1:1205] -11.35 1.06 -3.8 -5.43 -1.97 ...
## $ rir : num [1:1205] -11.35 1.06 -3.8 -5.43 -1.97 ...
## $ frir : num [1:1205] -11.35 1.06 -3.8 -5.43 -1.97 ...
## $ wfrir : num [1:1205] -11.35 1.06 -3.8 -5.43 -1.97 ...
## $ totch : num [1:1205] -3.05 -3.85 -0.25 1.87 5.02 ...
## $ wtotch : num [1:1205] -3.05 -3.85 -0.25 1.87 5.02 ...
## $ ftotch : num [1:1205] -3.05 -3.85 -0.25 1.87 5.02 ...
## $ wftotch : num [1:1205] -3.05 -3.85 -0.25 1.87 5.02 ...
## $ infl : num [1:1205] 14.35 1.94 6.8 8.43 4.97 ...
## $ winfl : num [1:1205] 14.35 1.94 6.8 8.43 4.97 ...
## $ finfl : num [1:1205] 14.35 1.94 6.8 8.43 4.97 ...
## $ wfinfl : num [1:1205] 14.35 1.94 6.8 8.43 4.97 ...
## $ liq : num [1:1205] 158 142 139 135 132 ...
## $ wliq : num [1:1205] 158 142 139 135 132 ...
## $ fliq : num [1:1205] 158 142 139 135 132 ...
## $ wfliq : num [1:1205] 158 142 139 135 132 ...
## $ credgr : num [1:1205] 8.77 19.11 5.98 0.74 2.75 ...
## $ wcredgr : num [1:1205] 8.77 19.11 5.98 0.74 2.75 ...
## $ fcredgr : num [1:1205] 8.77 19.11 5.98 0.74 2.75 ...
## $ wfcredgr : num [1:1205] 8.77 19.11 5.98 0.74 2.75 ...
## $ nfagdp : num [1:1205] 4.91 2.72 0.86 -1.79 -2.77 ...
## $ wnfagdp : num [1:1205] 4.91 2.72 0.86 -1.79 -2.77 ...
## $ fnfagdp : num [1:1205] 4.91 2.72 0.86 -1.79 -2.77 ...
## $ wfnfagdp : num [1:1205] 4.91 2.72 0.86 -1.79 -2.77 ...
## $ var61 : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ var62 : num [1:1205] 1982 1983 1984 1985 1986 ...
## $ var63 : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ durfin : num [1:1205] 3 3 3 3 3 3 3 3 3 3 ...
## $ sub : num [1:1205] 2 2 2 2 2 2 2 2 2 2 ...
## $ anni : num [1:1205] 3 3 3 3 3 3 3 3 3 3 ...
## $ _merge : chr [1:1205] "matched (3)" "matched (3)" "matched (3)" "matched (3)" ...
## $ duratamediaperpaese: num [1:1205] 0.103 0.103 0.103 0.103 0.103 ...
## $ _est_spec2 : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ _est_spec3 : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ pra : num [1:1205] 0.818 0.832 0.817 0.803 0.81 ...
## $ prb : num [1:1205] 0.0446 0.0464 0.0441 0.045 0.0443 ...
## $ prc : num [1:1205] 0.138 0.121 0.139 0.152 0.145 ...
## $ dur : num [1:1205] 0 0 0 0 0 0 0 0 0 0 ...
## $ dur1 : num [1:1205] 2 2 2 2 2 2 2 2 2 2 ...
## $ cris : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ cris1 : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ avedur : num [1:1205] 3 3 3 3 3 3 3 3 3 3 ...
## $ xxx : num [1:1205] 1248 1249 1250 1251 1252 ...
## $ duration : num [1:1205] 3 3 3 3 3 3 3 3 3 3 ...
## $ _est_spec1 : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ _est_all : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ _est_partial : num [1:1205] 1 1 1 1 1 1 1 1 1 1 ...
## $ pr0 : num [1:1205] 0.906 0.915 0.904 0.896 0.881 ...
## $ pr1 : num [1:1205] 0.0279 0.0373 0.0306 0.0305 0.0289 ...
## $ pr2 : num [1:1205] 0.066 0.0478 0.0656 0.0736 0.0897 ...
## $ prlogit1 : num [1:1205] 0.0361 0.039 0.033 0.0324 0.0316 ...
## $ prlogit2 : num [1:1205] 0.032 0.0393 0.0326 0.0321 0.0322 ...
## $ class : num [1:1205] 1 1 1 1 1 1 1 1 2 1 ...
## $ binaryclass : num [1:1205] 0 0 0 0 0 0 0 0 1 0 ...
## $ prob1 : num [1:1205] 0.502 0.492 0.495 0.494 0.5 ...
## $ prob2 : num [1:1205] 0.501 0.5 0.5 0.499 0.499 ...
## $ prob3 : num [1:1205] 0.467 0.475 0.489 0.507 0.493 ...
## $ prob4 : num [1:1205] 0.494 0.455 0.47 0.475 0.465 ...
## $ prob5 : num [1:1205] 0.585 0.559 0.555 0.549 0.544 ...
## $ prob6 : num [1:1205] 0.523 0.574 0.509 0.483 0.493 ...
## $ prob7 : num [1:1205] 0.485 0.5 0.513 0.532 0.539 ...
## $ prob8 : num [1:1205] 0.428 0.483 0.461 0.454 0.47 ...
## $ prob9 : num [1:1205] 0.494 0.455 0.47 0.475 0.465 ...
## [list output truncated]
## - attr(*, "na.action")= 'omit' Named int [1:1463] 10 11 12 13 25 26 27 28 29 30 ...
## ..- attr(*, "names")= chr [1:1463] "10" "11" "12" "13" ...
#dimension of the data
dim(banking_crisis)
## [1] 1205 100
SUMMARY OF THE DATASET In predicting the determinants of a banking crisis, several economic, financial, and structural factors are typically influential.
Target Variable
(cris1) This is a multinomial variable that indicates the level of banking crisis in a country.
It has four distinct levels:
Level 1: Represents the least severe or no crisis
Level 2: A moderate crisis level
Level 3: More severe crises
Level 4: The most severe crisis
cris
This is a binary variable that indicates the level of banking crisis in a country
It has two distinct levels:
Level 0: Represents no banking crisis
Level 1 : Represent banking crisis
# Cross-tabulation of cris1 with country(checking wich countries has
#higher banking cris checking lvel 1, 2 ect on the dataset)
cross_table <- table(banking_crisis$country, banking_crisis$cris1)
#renaming the level cris
colnames(cross_table) <- c("low","moderate", "high", "intesive")
cross_table
##
## low moderate high intesive
## Algeria 20 0 0 0
## Argentina 0 0 18 0
## Australia 21 0 0 0
## Austria 19 0 0 0
## Bangladesh 15 0 0 0
## Belgium 21 0 0 0
## Benin 20 0 0 0
## Bolivia 0 0 14 0
## Brazil 0 0 15 0
## Burkina Faso 18 0 0 0
## Burundi 16 0 0 0
## Cameroon 0 15 0 0
## Canada 22 0 0 0
## Central African Republic 6 0 0 0
## Chad 14 0 0 0
## Chile 18 0 0 0
## China 13 0 0 0
## Colombia 0 14 0 0
## Congo, Rep. 14 0 0 0
## Costa Rica 0 16 0 0
## Cote d'Ivoire 20 0 0 0
## Denmark 0 19 0 0
## Ecuador 12 0 0 0
## Egypt, Arab Rep. 17 0 0 0
## Finland 20 0 0 0
## France 0 20 0 0
## Germany 23 0 0 0
## Ghana 0 12 0 0
## Greece 0 20 0 0
## Guatemala 0 24 0 0
## Honduras 0 22 0 0
## India 17 0 0 0
## Indonesia 0 0 18 0
## Ireland 24 0 0 0
## Italy 19 0 0 0
## Japan 14 0 0 0
## Kenya 0 17 0 0
## Korea, Rep. 0 0 16 0
## Malaysia 0 17 0 0
## Mali 20 0 0 0
## Mexico 8 0 0 0
## Morocco 13 0 0 0
## Nepal 21 0 0 0
## Netherlands 21 0 0 0
## Niger 11 0 0 0
## Nigeria 0 20 0 0
## Norway 18 0 0 0
## Panama 18 0 0 0
## Philippines 13 0 0 0
## Portugal 21 0 0 0
## Senegal 16 0 0 0
## Sierra Leone 15 0 0 0
## Singapore 23 0 0 0
## South Africa 23 0 0 0
## Sri Lanka 20 0 0 0
## Swaziland 20 0 0 0
## Sweden 20 0 0 0
## Switzerland 24 0 0 0
## Thailand 12 0 0 0
## Togo 22 0 0 0
## Tunisia 20 0 0 0
## Turkey 0 0 0 19
## Uganda 20 0 0 0
## United Kingdom 0 0 0 24
## United States 0 12 0 0
## Uruguay 16 0 0 0
## Venezuela, RB 14 0 0 0
## Zambia 21 0 0 0
From the cross-tabulation, we can interpret the distribution of banking crisis levels (cris1) across different countries as follows
Countries with cris1 Level 1 (Low Systemic Banking Crisis Countries):
The majority of countries fall into this category, including Germany (23), Singapore (23), Switzerland (24), Ireland (24), South Africa (23), and many others. These countries exhibit a higher occurrence of this crisis level. which could indicate that while a banking crisis is present, it’s not severe enough to destabilize the economy significantly. These might be instances where the banking system faces stress but can manage without major intervention
Countries with cris1 Level 2 (Moderate Systemic Banking Crisis Countries):
Some countries, like Denmark (19), France (20), Greece (20), Guatemala (24), Nigeria (20), and Honduras (22), show moderate levels of banking crisis at level 2.This level likely represents a moderate systemic banking crisis. Countries like Denmark, France, and Greece are in this category, and these economies have faced substantial banking sector issues historically, requiring interventions but not leading to full-blown collapses.
Countries with cris1 Level 3 (High Systemic Banking Crisis Countries):
Fewer countries fall into this category. Countries like Argentina(18), Indonesia(18), Balivia(14), Brazil(15), Korea, Rep(18) This level could indicate a high severity banking crisis. Countries like Argentina and Indonesia, known for significant financial instability and severe banking issues in the past, are found here. This could suggest a crisis level that disrupts financial systems and requires external support or extensive government bailouts.
Countries with cris1 Level 4 (intensive Systemic Banking Crisis Countries):
Only Turkey (19) and the United Kingdom (24) appear to have significant counts in the highest crisis level (level 4).This is likely the most severe level, representing very high systemic banking crises. Countries like the United Kingdom and Turkey are in this category, which historically faced crises with major impacts on their economies and required extensive interventions, including bank failures, large bailouts, or even IMF assistance
#summary statistics grouped bt cris1
# Group by cris1 and summarize key statistics for numeric variables
banking_crisis %>%
group_by(cris1) %>%
summarise(
mean_rgdpgr = mean(rgdpgr, na.rm = TRUE),
mean_infl = mean(infl, na.rm = TRUE),
mean_credgr = mean(credgr, na.rm = TRUE),
mean_wrgdpgr = mean(wrgdpgr, na.rm = TRUE),
mean_deprec = mean(deprec, na.rm = TRUE),
mean_wcredgr = mean(wcredgr, na.rm = TRUE),
mean_wliq = mean(wliq, na.rm = TRUE),
mean_liq = mean(liq, na.rm = TRUE),
mean_ck_cris = mean(ck_cris, na.rm = TRUE)
)
## # A tibble: 4 × 10
## cris1 mean_rgdpgr mean_infl mean_credgr mean_wrgdpgr mean_deprec mean_wcredgr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 3.70 9.72 2.89 3.65 8.79 2.79
## 2 2 3.38 12.0 2.79 3.40 10.8 2.43
## 3 3 4.48 250. 4.24 4.48 264. 3.32
## 4 4 3.57 29.8 4.43 3.57 24.2 4.43
## # ℹ 3 more variables: mean_wliq <dbl>, mean_liq <dbl>, mean_ck_cris <dbl>
Highest Mean:
Inflation (mean_infl) in cris1 Level 3: 249.86%
Interpretation: This extremely high inflation rate suggests that during severe crises (cris1 Level 3), economies may experience hyperinflation or severe economic instability. It indicates a loss of purchasing power and significant pressure on prices, which often happens during severe financial distress.
Depreciation (mean_deprec) in cris1 Level 3: 263.99%
Interpretation: The extraordinarily high depreciation in Level 3 crises signals a collapse in currency value. It often reflects a crisis of confidence in the economy, where exchange rates fall dramatically, making foreign goods and services much more expensive, worsening the financial situation.
Lowest Mean:
Capital Adequacy in Crisis (mean_ck_cris) in cris1 Level 1: 0.026)
Interpretation: A low capital adequacy ratio during a low-level crisis (Level 1) suggests that banks are not holding as much capital in reserve. This indicates that during less severe crises, banks may feel less pressure to maintain high levels of capital buffers. In contrast, during severe crises, they may need to increase their reserves to protect against financial instability.
Credit Growth (mean_credgr) in cris1 Level 2: 2.79%)
Interpretation: The relatively low credit growth in Level 2 crises suggests that during moderate crises, there may be less lending activity or a slowdown in credit expansion. This could be due to stricter lending conditions, lower demand for loans, or concerns about financial stability.
#visualise multiclas variable
ggplot(banking_crisis, aes(x= factor(cris1))) +
geom_bar(fill = "skyblue", color = "black") +
ggtitle("Count of Multiclass variable (cris1)") +
xlab("cris1 levels") +
ylab("count") +
theme_minimal()
Level 1(Low Systematic Baking Crisis): Highest count (800)
Level 2(Moderate Systematic Banking Crisis): 220 counts
Level 3(High Systematic Banking Crisis): 80 counts
Level 4(Intensive Systemetic Banking Cisis): Lowest count (50)
This bar plot highlights the imbalanced nature of the dataset, with Level 1 being far more frequent.
#split the data into train and test datset
set.seed(425)
training_sample <- banking_crisis$cris1 %>% createDataPartition(p = 0.8, list = FALSE)
train.data <- banking_crisis[training_sample, ]
test.data <- banking_crisis[-training_sample, ]
fixing the imbalanced classes of cris1 by hybrid sampling randomly oversampling - randomly duplicates observations in minority class and make them majority class
random undersampling randomly deleting observation from majority class so that it is equal to minority class
# Split the data by class
class1 <- train.data[train.data$cris1 == "1", ]
class2 <- train.data[train.data$cris1 == "2", ]
class3 <- train.data[train.data$cris1 == "3", ]
class4 <- train.data[train.data$cris1 == "4", ]
# Compute the number of samples needed for oversampling
num_samples <- max(nrow(class1), nrow(class2), nrow(class3), nrow(class4))
# Oversample each class to the same number
class1_oversampled <- class1[sample(1:nrow(class1), num_samples, replace = TRUE), ]
class2_oversampled <- class2[sample(1:nrow(class2), num_samples, replace = TRUE), ]
class3_oversampled <- class3[sample(1:nrow(class3), num_samples, replace = TRUE), ]
class4_oversampled <- class4[sample(1:nrow(class4), num_samples, replace = TRUE), ]
# Combine oversampled data
over_sampled <- rbind(class1_oversampled, class2_oversampled, class3_oversampled, class4_oversampled)
# Now perform undersampling on the combined oversampled dataset
# Set the desired number of samples for the majority class
majority_count <- min(table(over_sampled$cris1))
# Undersample each class to the desired number
class1_undersampled <- class1_oversampled[sample(1:nrow(class1_oversampled), majority_count), ]
class2_undersampled <- class2_oversampled[sample(1:nrow(class2_oversampled), majority_count), ]
class3_undersampled <- class3_oversampled[sample(1:nrow(class3_oversampled), majority_count), ]
class4_undersampled <- class4_oversampled[sample(1:nrow(class4_oversampled), majority_count), ]
# Combine undersampled data
final_balanced_data <- rbind(class1_undersampled, class2_undersampled, class3_undersampled, class4_undersampled)
# Split the final balanced dataset into training and testing sets
set.seed(183) # For reproducibility
training.sample <- createDataPartition(final_balanced_data$cris1, p = 0.8, list = FALSE)
train.data <- final_balanced_data[training.sample, ]
test.data <- final_balanced_data[-training.sample, ]
#change the train and test set to factors
train.data$cris1 <- as.factor(train.data$cris1)
test.data$cris1 <- as.factor(test.data$cris1)
# train the Random Forest model on the balanced dataset
set.seed(545)
cris1_model_balanced <- train(
cris1 ~ rgdpgr + infl + credgr + wrgdpgr + deprec + wcredgr + wliq + liq + ck_cris,
data = train.data,
method = "rf",
preProcess = c("scale", "center"),
trControl = trainControl(method = "cv", number = 10)
)
cris1_model_balanced
## Random Forest
##
## 2128 samples
## 9 predictor
## 4 classes: '1', '2', '3', '4'
##
## Pre-processing: scaled (9), centered (9)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1914, 1915, 1915, 1916, 1915, 1916, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9553413 0.9404524
## 5 0.9515854 0.9354460
## 9 0.9473534 0.9298036
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
Random Forest Model Interpretation:
Accuracy: The model has a high accuracy of 95.53%, which means that it is correctly classifying the cris1 levels in approximately 95.53% of the cases on average.
Kappa: The Kappa score is also high at 0.9404, indicating that the model performs well beyond what would be expected by random chance
Performance: The model is performing extremely well on the balanced dataset. The high accuracy and kappa scores indicate that it is making reliable predictions across all four classes (1, 2, 3, and 4).
Balanced Dataset: Since the dataset is balanced, the model is less likely to be biased toward the majority class. This is reflected in the high kappa score, which accounts for class imbalance and chance agreement.
Optimal Parameters: The best performance was obtained when the model selected just 2 predictors at each split (mtry = 2), which indicates that simpler decision trees (with fewer variables considered at each split) lead to better predictions in this case.
#predictions
balanced_predicted <- cris1_model_balanced %>% predict(test.data)
# Create confusion matrix based on test data predictions
balance_confusion <- confusionMatrix(balanced_predicted, test.data$cris1)
balance_confusion
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2 3 4
## 1 118 2 0 0
## 2 12 131 0 0
## 3 2 0 133 0
## 4 1 0 0 133
##
## Overall Statistics
##
## Accuracy : 0.968
## 95% CI : (0.9493, 0.9813)
## No Information Rate : 0.25
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9574
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 1 Class: 2 Class: 3 Class: 4
## Sensitivity 0.8872 0.9850 1.0000 1.0000
## Specificity 0.9950 0.9699 0.9950 0.9975
## Pos Pred Value 0.9833 0.9161 0.9852 0.9925
## Neg Pred Value 0.9636 0.9949 1.0000 1.0000
## Prevalence 0.2500 0.2500 0.2500 0.2500
## Detection Rate 0.2218 0.2462 0.2500 0.2500
## Detection Prevalence 0.2256 0.2688 0.2538 0.2519
## Balanced Accuracy 0.9411 0.9774 0.9975 0.9987
Accuracy: The model predicted correctly 96.8% of the time, meaning it got most predictions right when tested on unseen data.
Kappa Score: With a Kappa of 0.957, the model has a strong agreement between the predicted and actual values, which indicates the model is very reliable across different classes.
Class Sensitivity: For classes 3 and 4, the model correctly identified 100% of the cases, while class 1 had a lower accuracy (around 88.7%). This shows that it’s excellent at predicting some classes but slightly weaker for others.
Misclassification: Most errors happened when the model confused class 1 and class 2 (a few instances of class 1 were predicted as class 2), but errors were minimal.
Balanced Accuracy: The model’s ability to correctly predict both positives and negatives is very strong across all classes, with accuracy ranging from 94.1% to 99.87%. This means it’s not biased toward any particular class.
Specificity: The model is great at identifying the correct “non-target” classes too. For all classes, its ability to correctly identify negatives was above 96.9%, meaning very few false positives.
#acuracy
mean( balanced_predicted == test.data$cris1)
## [1] 0.9680451
The model has an accuracy of approximately 96.80% on the test data.
balanced_importance <- varImp(cris1_model_balanced)
balanced_importance
## rf variable importance
##
## Overall
## infl 100.00
## deprec 80.20
## liq 71.26
## wliq 70.02
## rgdpgr 59.97
## wrgdpgr 58.41
## wcredgr 49.76
## credgr 49.75
## ck_cris 0.00
The Random Forest model is primarily driven by macroeconomic indicators, particularly inflation, depreciation, and liquidity. These variables highlight the economic pressures that can destabilize banking systems. The importance of these drivers reflects how changes in economic conditions directly impact the likelihood of systemic banking crises, influencing the financial health of countries across different crisis levels
#train the gradient boost model
xgb_model_cris1 <- train(
cris1 ~ rgdpgr + infl + credgr + wrgdpgr + deprec + wcredgr + wliq + liq + ck_cris,
data = train.data,
method = "xgbTree",
preProcess = c("scale", "center"),
trControl = trainControl(method = "cv", number = 10),
tuneGrid = expand.grid(nrounds = 100, max_depth = 6, eta = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1, subsample = 0.8)
)
xgb_model_cris1
## eXtreme Gradient Boosting
##
## 2128 samples
## 9 predictor
## 4 classes: '1', '2', '3', '4'
##
## Pre-processing: scaled (9), centered (9)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1916, 1916, 1915, 1916, 1915, 1915, ...
## Resampling results:
##
## Accuracy Kappa
## 0.9534856 0.9379819
##
## Tuning parameter 'nrounds' was held constant at a value of 100
## Tuning
## held constant at a value of 1
## Tuning parameter 'subsample' was held
## constant at a value of 0.8
Performance Across Classes and Final Accuracy
The highest accuracy is achieved when the learning rate (eta) is 0.3 or 0.4, max_depth=6, and both subsampling and colsample_bytree are set at moderate values (0.6 to 0.8). The model consistently reaches an accuracy of over 90% for these settings, with some combinations achieving 95%.
#make predictions
xgb_predictions <- xgb_model_cris1 %>% predict(test.data)
#accuracy
mean(xgb_predictions == test.data$cris1)
## [1] 0.9774436
The accuracy of your XGBoost model is 98%. This means that the model correctly predicted the class of the crisis variable (cris1) about 98% of the time when evaluated on the test dataset
#confusion matric
xgb_confusion <- confusionMatrix(xgb_predictions, test.data$cris1)
xgb_confusion
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2 3 4
## 1 124 3 0 0
## 2 7 130 0 0
## 3 1 0 133 0
## 4 1 0 0 133
##
## Overall Statistics
##
## Accuracy : 0.9774
## 95% CI : (0.9609, 0.9883)
## No Information Rate : 0.25
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9699
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 1 Class: 2 Class: 3 Class: 4
## Sensitivity 0.9323 0.9774 1.0000 1.0000
## Specificity 0.9925 0.9825 0.9975 0.9975
## Pos Pred Value 0.9764 0.9489 0.9925 0.9925
## Neg Pred Value 0.9778 0.9924 1.0000 1.0000
## Prevalence 0.2500 0.2500 0.2500 0.2500
## Detection Rate 0.2331 0.2444 0.2500 0.2500
## Detection Prevalence 0.2387 0.2575 0.2519 0.2519
## Balanced Accuracy 0.9624 0.9799 0.9987 0.9987
Accuracy: 96.43% This means the model correctly predicted the class in 96% of the cases. It’s a high accuracy, suggesting that the model performs well in distinguishing between the four classes. The 95% confidence interval (CI) of the accuracy is ( 0.9448, 0.9784), indicating strong reliability in the model’s performance.
Kappa: 95.24 The Kappa score is 95.24, which adjusts for chance agreement. It shows that the model’s performance is excellent, with very little agreement expected by chance, as a Kappa above 0.9 is generally considered outstanding.
Class Sensitivity (Recall)
Class 1: 0.8947, Class 2: 0.9624, Class 3: 1.0000, Class 4: 1.0000 Sensitivity, or recall, measures the proportion of true positives correctly identified. The model performs well across all classes, particularly in identifying Classes 3 and 4, with perfect recall (1.0000). Class 1 has a slightly lower recall (0.8947), meaning it missed some instances of Class 1.
Class Specificity
Class 1: 0.9875, Class 2: 0.9749, Class 3: 0.9975, Class 4: 0.9925 Specificity measures how well the model correctly identifies true negatives. For all classes, specificity is high, indicating the model rarely misclassifies other classes as the target class. Classes 3 and 4 have the highest specificity, meaning very few false positives.
Positive Predictive Value (Precision)
Class 1: 0.9597, Class 2: 0.9275, Class 3: 0.9925, Class 4: 0.9779 Precision indicates the proportion of predicted positives that are actually correct. The precision values show that the model is highly accurate in its predictions for all classes, especially Class 3 (0.9925) and Class 1 (0.9756).
Balanced Accuracy
Class 1: 0.9411, Class 2: 0.9687, Class 3: 0.9987, Class 4: 0.9962 Balanced accuracy is the average of sensitivity and specificity, offering a more balanced view when class distributions are uneven. The high balanced accuracy values for all classes confirm that the model performs exceptionally well, handling the classification task accurately across all categories.
xgb_importance <- varImp(xgb_model_cris1)
xgb_importance
## xgbTree variable importance
##
## Overall
## wliq 100.00
## infl 99.58
## rgdpgr 74.46
## deprec 60.36
## credgr 40.38
## liq 33.43
## wcredgr 14.62
## wrgdpgr 11.95
## ck_cris 0.00
Conclusion: Comparing Random Forest and XGBoost Models in Predicting Systemic Banking Crises
This analysis aimed to predict systemic banking crises across different countries using Random Forest and XGBoost models, highlighting key economic indicators driving these predictions. Both models demonstrated high accuracy and identified critical variables, with slight variations in importance, helping us understand how early warning signs can manifest in different economic contexts.
Model Comparison: Random Forest vs. XGBoost
Model Accuracy:
Random Forest achieved an accuracy of 96.80%.
Variable Key Drivers: Inflation (infl), Depreciation (deprec), Global Liquidity (wliq), and Domestic Liquidity (liq).
XGBoost achieved a slightly higher accuracy of 97.93%.
Variable Key Drivers: Inflation (infl), Global Liquidity (wliq), GDP Growth (rgdpgr), and Depreciation (deprec).
Both models are highly reliable, but XGBOOST outperformed Random Forest by 1.13%, suggesting it might capture complex patterns better in this economic context.
Top Variables Driving Model Predictions:
Both models consistently identified Inflation (infl), Depreciation (deprec), and Liquidity (liq and wliq) as the most significant predictors of systemic banking crises. These variables align with real-world economic vulnerabilities, highlighting their critical role in early detection.
Comparison of Variable Importance
Inflation (infl):
Random Forest: Highest importance (100.00).
XGBoost: Also the highest (100.00).
Interpretation: Both models strongly agree that inflation is the most crucial variable for early crisis detection. High inflation often indicates economic instability, eroding consumer purchasing power and increasing the likelihood of systemic banking issues.
Depreciation (deprec):
Random Forest: Second most important (80.20).
XGBoost: Fourth most important (61.37).
Interpretation: Currency depreciation affects the external value of a country’s currency, making imports more expensive and potentially leading to broader economic distress. While both models highlight depreciation, Random Forest considers it slightly more impactful.
Global Liquidity (wliq):
Random Forest: Fourth in importance (70.02).
XGBoost: Second in importance (88.63).
Interpretation: Global liquidity conditions, driven by international financial markets, heavily influence domestic banking stability. XGBoost ranks this factor higher, suggesting a stronger sensitivity to global financial conditions in its crisis predictions.
Domestic Liquidity (liq):
Random Forest: Third most important (71.26).
XGBoost: Lower (28.00).
Interpretation: Domestic liquidity impacts how easily assets can be converted to cash within the local banking system. Random Forest finds domestic liquidity more critical, whereas XGBoost emphasizes global over local conditions.
GDP Growth (rgdpgr):
Random Forest: Fifth in importance (59.97).
XGBoost: Third in importance (66.30).
Interpretation: Economic growth reflects overall economic health, with consistent growth reducing the risk of crises. Both models see it as important, though XGBoost places slightly more weight on growth.
Credit Growth (credgr and wcredgr):
Random Forest: Moderate importance (credgr = 49.75, wcredgr = 49.76).
XGBoost: Lower impact (credgr = 41.60, wcredgr = 6.71).
Interpretation: Rapid credit growth can signal increased risk in the banking sector, particularly if lending standards are lax. The Random Forest model finds these metrics slightly more relevant than XGBoost, which may suggest subtle differences in how each algorithm handles credit data.
Capital Adequacy (ck_cris):
Both Models: Zero importance. Interpretation: Capital adequacy ratios (which assess a bank’s capital against its risk-weighted assets) do not show up as significant in predicting early crises, possibly due to their limited variability or weak direct impact compared to other economic indicators.
Impact on Countries by Crisis Level:
Crisis Level 1 (Low Crisis): Countries such as Germany , Singapore, Switzerland , Ireland , South Africa are Less sensitivity to volatile economic indicators, but continuous monitoring of inflation and depreciation is necessary to maintain stability. These Countries are characterized by stable macroeconomic environments. Low inflation, stable currency, and ample liquidity contribute to lower systemic banking risks, as identified by the models. The key variables have minimal impact here due to robust economic policie
Actionable Insight: Proactive economic management and early interventions in response to inflationary signals can preserve stability
Crisis Level 2 (Moderate Crisis): Countries like Denmark , France, Greece , Guatemala , Nigeria , and Honduras are at moderate risk. Moderate impacts from credit growth and liquidity factors suggest manageable but rising risks.These countries show occasional spikes in inflation or mild currency depreciation, aligning with the models’ insights. Moderate impact of inflation and liquidity issues suggests potential vulnerabilities that could escalate without timely intervention
Actionable Insight: Enhancing credit quality and ensuring adequate domestic liquidity are essential preventative measures.
Crisis Level 3 (High Crisis): Countries such as Argentina and Indonesia face significant systemic risks due to persistent Depreciation and GDP growth play significant roles, indicating these countries are vulnerable to both economic contractions and currency pressures.
Actionable Insight: Policies aimed at fostering economic growth and maintaining stable exchange rates could help avert further crises.
Crisis Level 4 (Intensive Crisis): Turkey and the United Kingdom represent the highest crisis level, High inflation, severe depreciation, and liquidity pressures are prominent. These countries face acute economic instability, with both models flagging these indicators as early warning signs.where economic variables are at critical levels. For Turkey, rampant inflation and severe currency depreciation have crippled economic stability. In the UK, while traditionally stable, economic shocks have stressed the banking system, aligning with the models’ predictions. These variables underscore the need for targeted interventions to avoid a full-blown crisis
Actionable Insight: Monetary tightening, currency stabilization measures, and controlling inflationary pressures are critical for reducing systemic risk.
Final Insights
Both Random Forest and XGBoost models provide valuable insights into detecting early signs of systemic banking crises, with inflation, depreciation, and liquidity consistently identified as critical predictors. While Random Forest emphasizes domestic liquidity and credit factors slightly more, XGBoost highlights the broader impact of global conditions. This analysis suggests that countries experiencing high inflation, depreciation, and liquidity strains are at significant risk, emphasizing the need for vigilant economic policies to manage these variables effectively.