Detecting Macroeconomic Risk Using Global Indicators: A Machine Learning Approach

Project Objective

The objective of this project is to identify and classify macroeconomic risk across countries using global macroeconomic indicators. By leveraging World Bank data and applying modern machine learning classification techniques, the study aims to:

1.Detect countries facing elevated macroeconomic risk

2.Identify which economic variables are most strongly associated with risk

3.Evaluate and compare the performance of multiple models, including:

Logistic Regression (baseline)

Random Forest

XGBoost

4.Assess potential data leakage and model overfitting

5.Test model robustness after removing potentially problematic variables

The broader goal is to demonstrate how data-driven risk detection can complement traditional macroeconomic analysis and support early-warning systems for economic stress.

Summary

In this project, we built a predictive model to identify high-risk economies using global macroeconomic and financial data from the World Bank. By combining statistical analysis and machine learning, we identified the key drivers of economic risk and produced highly accurate country-level risk predictions.

Key Findings:

Economic Foundations of Risk:

ANOVA confirmed that GDP growth differs significantly across economy sizes, particularly between small and large economies.

Regression analysis highlighted GDP per capita growth, inflation, lending rates, and current account balances as the strongest predictors of economic risk, providing a clear macroeconomic rationale.

Machine Learning Insights:

Logistic Regression achieved 85.7% accuracy, offering interpretable coefficients that revealed how growth and financial indicators influence risk.

Random Forest captured non-linear relationships and interactions, achieving near-perfect accuracy (98%) and highlighting GDP growth, current account balance, and lending rate as the top predictors.

XGBoost provided a balance of predictive power (95% accuracy) and interpretability, confirming the importance of macroeconomic indicators while handling complex patterns efficiently.

Robustness and Feature Validation:

After removing potentially “leaky” variables (e.g., non-performing loans, private sector credit), both Random Forest and XGBoost maintained strong performance (91–96% accuracy), demonstrating robustness and real-world applicability.

Feature importance consistently pointed to GDP growth, current account balance, and lending rate as the main drivers of risk, aligning statistical findings with machine learning insights.

Data Source

The macroeconomic indicators were obtained from the World Bank Open Data platform. Annual country-level data were collected for inflation, GDP growth, interest rates, current account balances, unemployment, exchange rates, and financial risk indicators.

Data Preparation Raw datasets were cleaned, harmonized, and merged to create a unified global macroeconomic dataset. This included:

Standardizing country and year identifiers

Handling missing values

Removing countries with insufficient observations

Creating economically meaningful risk indicators

For clarity and readability, the analysis below begins with the final cleaned dataset used for modeling. Full data preparation scripts are provided in the appendix / repository.

#load libraries

library(tidyverse)
library(rstatix)
library(lubridate)
library(fpp3)
library(fixest)
library(rstatix)
library(scales)
library(corrplot)
library(plotly)
library(patchwork)
library(broom)
library(car)
library(ggpubr)
library(multcomp)
library(cluster)
library(factoextra)
library(caret)
library(randomForest)
library(xgboost)
library(glmnet)
library(pROC)
library(MLmetrics)
library(DALEX)
library(vip)
library(recipes)
library(forecast)
library(tseries)
library(prophet)
library(xts)
library(lubridate)
library(gridExtra)
library(ggfortify)
library(ROSE)
library(smotefamily)
library(themis)

Global_Rates <- read_csv("Global_Rates.csv")

#clean the dataset..

glimpse(Global_Rates)

## Rows: 1,022
## Columns: 16
## $ Country                 <chr> "Angola", "Angola", "Angola", "Angola", "Angol…
## $ Country_Code            <chr> "AGO", "AGO", "AGO", "AGO", "AGO", "AGO", "AGO…
## $ Years                   <dbl> 2015, 2016, 2017, 2018, 2019, 2019, 2020, 2021…
## $ GDP_current_us          <dbl> 90496420507, 52761617226, 73690154991, 7945068…
## $ GDP_per_capita_growth   <dbl> -2.630702, -6.002649, -3.620758, -4.665970, -4…
## $ Inflation               <dbl> 9.355972, 30.694415, 29.844480, 19.628938, 17.…
## $ Lending_rate            <dbl> 16.881862, 15.780504, 15.806100, 20.677004, 19…
## $ Deposit_rate            <dbl> 3.3146202, 5.5443181, 6.3421932, 6.8766571, 6.…
## $ Current_account_balance <dbl> -3747517596, -10272841903, -3085195463, -63286…
## $ Exchange_rate           <dbl> 98.30242, 120.06070, 163.65643, 165.91595, 252…
## $ Non_performing_loans    <dbl> 10.61123, 11.28509, 25.83606, 23.22174, 23.145…
## $ Private_sector_credit   <dbl> 25.24011, 21.09841, 17.00069, 14.93527, 15.340…
## $ Unemployment            <dbl> 16.490, 16.575, 16.610, 16.594, 16.497, 16.417…
## $ Interest_rate_spread    <dbl> 12.855307, 13.567242, 10.236186, 9.463907, 13.…
## $ Real_interest_rate      <dbl> 21.14418153, -4.92200310, -5.55269798, -5.8440…
## $ Risk_premium            <dbl> 12.1029654, 9.6780430, 0.6885965, -0.5418163, …

#remove the null values
Global_Rates <- Global_Rates %>%
  na.omit()



#remove all duplicates

Global_Rates <- Global_Rates %>%
  distinct()

# Check the temporal and cross-sectional dimensions
summary_stats <- Global_Rates %>%
  summarise(
    n_countries = n_distinct(Country),
    n_years = n_distinct(Years),
    avg_years_per_country = n() / n_distinct(Country)
  )

print(summary_stats)

## # A tibble: 1 × 3
##   n_countries n_years avg_years_per_country
##         <int>   <int>                 <dbl>
## 1          59      19                  17.3

#summary statistics 

summary(Global_Rates)

##    Country          Country_Code           Years      GDP_current_us     
##  Length:1022        Length:1022        Min.   :2005   Min.   :7.866e+08  
##  Class :character   Class :character   1st Qu.:2012   1st Qu.:1.158e+10  
##  Mode  :character   Mode  :character   Median :2015   Median :5.748e+10  
##                                        Mean   :2015   Mean   :3.298e+11  
##                                        3rd Qu.:2019   3rd Qu.:3.014e+11  
##                                        Max.   :2023   Max.   :6.272e+12  
##  GDP_per_capita_growth   Inflation        Lending_rate     Deposit_rate   
##  Min.   :-34.8312      Min.   :-16.860   Min.   : 0.994   Min.   : 0.010  
##  1st Qu.:  0.5587      1st Qu.:  2.272   1st Qu.: 6.580   1st Qu.: 1.459  
##  Median :  2.4983      Median :  4.170   Median : 9.412   Median : 3.705  
##  Mean   :  2.1045      Mean   :  4.954   Mean   :12.278   Mean   : 4.832  
##  3rd Qu.:  4.3150      3rd Qu.:  6.582   3rd Qu.:14.276   3rd Qu.: 7.673  
##  Max.   : 33.7686      Max.   : 30.694   Max.   :60.000   Max.   :19.800  
##  Current_account_balance Exchange_rate       Non_performing_loans
##  Min.   :-1.105e+11      Min.   :    0.718   Min.   : 0.4203     
##  1st Qu.:-3.008e+09      1st Qu.:    4.080   1st Qu.: 2.6044     
##  Median :-6.860e+08      Median :   18.097   Median : 4.4908     
##  Mean   : 4.505e+08      Mean   :  584.567   Mean   : 6.3027     
##  3rd Qu.: 8.983e+08      3rd Qu.:  103.250   3rd Qu.: 8.6859     
##  Max.   : 2.209e+11      Max.   :21697.568   Max.   :47.5958     
##  Private_sector_credit  Unemployment    Interest_rate_spread Real_interest_rate
##  Min.   :  7.052       Min.   : 0.061   Min.   : 0.340       Min.   :-20.497   
##  1st Qu.: 23.319       1st Qu.: 3.378   1st Qu.: 3.348       1st Qu.:  2.356   
##  Median : 37.720       Median : 4.967   Median : 5.455       Median :  5.064   
##  Mean   : 54.936       Mean   : 7.560   Mean   : 7.531       Mean   :  7.021   
##  3rd Qu.: 67.962       3rd Qu.:10.220   3rd Qu.: 7.701       3rd Qu.:  8.958   
##  Max.   :264.442       Max.   :34.007   Max.   :49.046       Max.   : 54.678   
##   Risk_premium    
##  Min.   :-10.838  
##  1st Qu.:  2.927  
##  Median :  4.920  
##  Mean   :  6.908  
##  3rd Qu.:  7.666  
##  Max.   : 52.310

Feature Engineering

grouping countries by their average GDP over time and uses it to classify each country into a size category (Small, Medium, Large, or Very Large economy)

#Country Group Analysis 

country_summary <- Global_Rates %>%
  group_by(Country) %>%
  summarize(
    avg_gdp = mean(GDP_current_us, na.rm = TRUE),
    max_GDP = max(GDP_current_us, na.rm = TRUE),
    .groups = 'drop'
  )
 


 
#country group gdp

country_gdp_group <- country_summary %>% 
  mutate(
    Economy_Size = case_when(
   avg_gdp <  50e9 ~ "Small_Economy" ,#less than 10 billion dollars
   avg_gdp <  500e9 ~ "Medium_Economy",# From $50 billion to $500 billion
   avg_gdp <  2e12 ~  "Large_Economy", # from $500 billion to $2 trillion
   TRUE ~ "Very_Large"       #More than $2 trillion 
  )
         ) %>%
  dplyr::select(Country, Economy_Size)


#join the datset to globala rates

Global_Rates_Economy_Size <- Global_Rates %>%
  left_join(country_gdp_group, by = "Country")

#clean the data types 
Global_Rates_Economy_Size <- Global_Rates_Economy_Size %>%
  mutate(Country = as.factor(Country),
         Economy_Size = factor(Economy_Size,
                               levels = c( "Small_Economy", "Medium_Economy",
                                           "Large_Economy", "Very_Large"),
                               ordered = TRUE),
         Years = as.integer(Years))

# make current account as the percentage of GDP 

Global_Rates_Economy_Size <- Global_Rates_Economy_Size %>%
  mutate(Current_Account_GDP_Perc = (Current_account_balance / GDP_current_us) * 100)
  
 
#make the risk flags 


Global_Rates_Economy_Size <- Global_Rates_Economy_Size %>%
  mutate(High_Inflation = ifelse(Inflation > 15, 1, 0),
         GDP_Decline = ifelse(GDP_per_capita_growth < 0, 1 , 0),
         High_NPL = ifelse(Non_performing_loans > 10, 1, 0),
         Negative_CA = ifelse(Current_account_balance < 0, 1 , 0))



# make the risk -scores 


Global_Rates_Economy_Size <- Global_Rates_Economy_Size %>%
  mutate(Risk_Score = High_Inflation + GDP_Decline + High_NPL + Negative_CA,
         Risk_Level = case_when(
           Risk_Score >= 3 ~ "High",
           Risk_Score == 2 ~ "Medium",
           Risk_Score == 1 ~ "Low", 
           Risk_Score == 0 ~ "Very Low"
         ))

#summary statistics


summary_by_size <- Global_Rates_Economy_Size %>%
  group_by(Country) %>%
  summarise(
    n_countries = n_distinct(Country),
    n_observations = n(),
    avg_gdp_growth = mean(GDP_per_capita_growth, na.rm = TRUE),
    avg_inflation = mean(Inflation, na.rm = TRUE),
    avg_lending_rate = mean(Lending_rate, na.rm = TRUE)
  )

print(summary_by_size, n = Inf)

## # A tibble: 59 × 6
##    Country               n_countries n_observations avg_gdp_growth avg_inflation
##    <fct>                       <int>          <int>          <dbl>         <dbl>
##  1 Albania                         1             26         3.80           2.67 
##  2 Algeria                         1             22         0.530          5.12 
##  3 Angola                          1              9        -4.17          21.9  
##  4 Antigua and Barbuda             1              1         6.17           1.21 
##  5 Armenia                         1             24         4.69           3.61 
##  6 Australia                       1             10         1.14           2.58 
##  7 Bangladesh                      1             19         5.48           7.22 
##  8 Barbados                        1              6         0.0434         3.21 
##  9 Belize                          1              9         0.548          1.43 
## 10 Bolivia                         1             17         3.36           4.70 
## 11 Brazil                          1             37         1.21           5.75 
## 12 Bulgaria                        1             16         2.72           2.80 
## 13 Canada                          1              8         1.20           2.18 
## 14 Czechia                         1             22         1.32           2.18 
## 15 Eswatini                        1             12         2.46           6.05 
## 16 Fiji                            1             16         1.15           2.79 
## 17 Gambia, The                     1             15         1.14           5.70 
## 18 Georgia                         1             25         5.16           4.33 
## 19 Grenada                         1              2         2.28           0.701
## 20 Hong Kong SAR, China            1             32         1.19           2.70 
## 21 Hungary                         1             32         2.14           4.70 
## 22 Israel                          1             18         2.07           0.723
## 23 Japan                           1             16         1.58           0.400
## 24 Kenya                           1             17         2.58           7.00 
## 25 Kosovo                          1              6         4.84           0.746
## 26 Kyrgyz Republic                 1             28         2.11           6.94 
## 27 Lebanon                         1              7        -0.994          2.57 
## 28 Lesotho                         1             16        -0.0817         3.57 
## 29 Madagascar                      1             24         0.136          8.63 
## 30 Malaysia                        1             18         3.09           2.66 
## 31 Maldives                        1             15         3.23           1.90 
## 32 Mauritius                       1             30         2.80           3.77 
## 33 Mexico                          1             38         0.507          4.43 
## 34 Moldova                         1             28         4.32           7.92 
## 35 Montenegro                      1             26         2.76           3.04 
## 36 Mozambique                      1             12         1.32           6.97 
## 37 Namibia                         1             10        -2.20           4.72 
## 38 Nigeria                         1             24         1.12          12.3  
## 39 Pakistan                        1             15         2.75           6.34 
## 40 Papua New Guinea                1             15         1.27           4.81 
## 41 Philippines                     1             18         4.67           2.93 
## 42 Romania                         1             30         3.35           4.19 
## 43 Rwanda                          1             19         4.07           5.82 
## 44 Seychelles                      1              4         8.01           1.74 
## 45 Singapore                       1             14         2.56           3.33 
## 46 Solomon Islands                 1              7        -1.05           1.20 
## 47 South Africa                    1             28        -0.132          5.47 
## 48 Sri Lanka                       1             18         4.56           5.05 
## 49 St. Lucia                       1              8         1.43           0.137
## 50 St. Vincent and the …           1              4         3.52           0.649
## 51 Tajikistan                      1              8         4.35           6.69 
## 52 Tanzania                        1             15         2.60           7.67 
## 53 Thailand                        1             26         1.87           1.53 
## 54 Trinidad and Tobago             1             30        -0.439          5.21 
## 55 Uganda                          1             13         1.92           7.60 
## 56 Uruguay                         1             18         0.853          8.05 
## 57 Uzbekistan                      1              7         3.30          12.4  
## 58 Viet Nam                        1             17         4.81           8.14 
## 59 Zambia                          1             15        -0.127          9.59 
## # ℹ 1 more variable: avg_lending_rate <dbl>

Macro relationships visible in the table

When Inflation increase leads Lending rates to increase This pattern is very clear: High-inflation countries almost always have high lending rates This supports the Fisher effect + risk premium channel

High lending rates mostly have weaker growth

Countries with lending rates above ~15% tend to Grow slowly and Experience volatile or negative growth

Low inflation + low rates ≠ high growth

Advanced economies: These economies are at the technological frontier Growth is constrained by productivity, not capital costs Japan, Canada, Australia : Low inflation, low rates, but modest growth

ECONOMIC RESILIENCE ANALYSIS - EDA & VISUALIZATIONS

QUESTION 1: How do key indicators distribute across economy sizes?

#GDP Growth Distribution 

ggplot(Global_Rates_Economy_Size, aes(x = Economy_Size, y = GDP_per_capita_growth, fill = Economy_Size)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 1) +
  labs(title = "GDP Growth Distribution by Economy Size",
       subtitle = "Larger economies show more stable growth",
       x = "Economy Size",
       y = "GDP per Capita Growth (%)") +
  scale_fill_brewer(palette = "Set2") +
  coord_flip()

Small_Economy to Very_Large:

The spread (range/interquartile range) of growth rates narrows. Extreme highs and lows (outliers) are fewer for larger economies. Growth rates cluster closer to the median for larger economies which means it is more stability.

Bigger economies don’t necessarily grow faster—but their growth is more predictable and less bumpy. Small economies often experience sharper booms and deeper busts.

Inflation Distribution

ggplot(Global_Rates_Economy_Size, aes(x = Economy_Size, y = Inflation, fill = Economy_Size)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.3, size = 1) +
  labs(title = "Inflation Distribution by Economy Size",
       subtitle = "Smaller economies experience higher inflation volatility",
       x = "Economy Size",
       y = "Inflation Rate (%)") +
  scale_fill_brewer(palette = "Set2") +
  coord_flip() +
  scale_y_log10()  # Log scale for better visualization

From Small to Very Large

Inflation ranges from very low to extremely high Many outliers in small and medieum economies, indicating frequent inflation shocks

Moving from small to very large economies, inflation distributions exhibit a declining median and a narrowing dispersion, indicating that larger economies experience more stable and predictable inflation dynamics than small and medium economies

What’s the relationship between inflation and growth?

ggplot(Global_Rates_Economy_Size, aes(x = Inflation, y = GDP_per_capita_growth))+
  geom_point(aes(colour = Economy_Size, size = GDP_current_us), alpha = 0.6)+
  geom_smooth(method = "lm", se = FALSE, color = "darkred", linetype = "dashed")+
  scale_x_log10()+
  scale_size_continuous(range = c(1, 10), 
                        labels = scales::dollar_format(scale = 1e-9, suffix = "B")) +
  labs(title = "Inflation vs. GDP Growth",
       subtitle = "High inflation often correlates with lower growth",
       x = "Inflation Rate (log scale, %)",
       y = "GDP per Capita Growth (%)",
       size = "GDP (USD)",
       color = "Economy Size") +
  facet_wrap(~Economy_Size, scales = "free") +
  theme(legend.position = "right")

Across all economy sizes, higher inflation is generally associated with lower GDP per capita growth, with the negative relationship becoming clearer and more stable as economy size increases.

How have key indicators evolved over time

#calculate yearly averages 


yearly_avg <- Global_Rates_Economy_Size %>%
  group_by(Years, Economy_Size) %>%
  summarise(
    avg_gdp = mean(GDP_per_capita_growth, na.rm = TRUE),
    avg_inflation = mean(Inflation, na.rm = TRUE),
    avg_lending_rate = mean(Lending_rate, na.rm = TRUE),
    n_countries = n(),
    .groups = "drop"
  )


print(yearly_avg, n = Inf)

## # A tibble: 65 × 6
##    Years Economy_Size   avg_gdp avg_inflation avg_lending_rate n_countries
##    <int> <ord>            <dbl>         <dbl>            <dbl>       <int>
##  1  2005 Small_Economy    1.73        18.4              25.9             2
##  2  2005 Medium_Economy   2.97         2.98              5.95            2
##  3  2005 Large_Economy    1.69         4.36             23.2             6
##  4  2006 Small_Economy   -0.502        6.41             29.0             2
##  5  2006 Medium_Economy   3.26         3.61              6.49            1
##  6  2006 Large_Economy    2.62         3.27             21.4             6
##  7  2007 Small_Economy    1.37         7.83             34.1             2
##  8  2007 Medium_Economy   3.85         3.71             11.7             2
##  9  2007 Large_Economy    2.24         3.25             19.1             6
## 10  2008 Small_Economy    3.19         9.45             24.0             4
## 11  2008 Medium_Economy   3.08         8.65             10.2            17
## 12  2008 Large_Economy    1.16         4.39             20.2             6
## 13  2009 Small_Economy   -0.992        2.91             16.2             9
## 14  2009 Medium_Economy  -2.37         4.10              9.78           20
## 15  2009 Large_Economy   -4.34         5.09             25.9             4
## 16  2010 Small_Economy    3.03         5.88             17.7            20
## 17  2010 Medium_Economy   3.62         4.48              9.13           20
## 18  2010 Large_Economy    2.97         3.84             13.0             5
## 19  2010 Very_Large       4.08        -0.728             1.60            2
## 20  2011 Small_Economy    3.27         7.85             15.1            35
## 21  2011 Medium_Economy   3.32         6.67              9.34           30
## 22  2011 Large_Economy    2.04         4.45             18.8             6
## 23  2011 Very_Large       0.209       -0.272             1.50            2
## 24  2012 Small_Economy    2.37         5.88             16.8            35
## 25  2012 Medium_Economy   2.53         5.42              9.13           28
## 26  2012 Large_Economy    1.79         3.76             16.1             6
## 27  2012 Very_Large       1.54        -0.0441            1.41            2
## 28  2013 Small_Economy    3.38         4.52             14.7            35
## 29  2013 Medium_Economy   2.80         4.59              9.04           30
## 30  2013 Large_Economy    0.859        4.15             12.6             6
## 31  2013 Very_Large       2.15         0.335             1.30            2
## 32  2014 Small_Economy    3.13         3.67             13.6            38
## 33  2014 Medium_Economy   3.03         3.11              7.75           34
## 34  2014 Large_Economy    0.681        4.28             13.8             6
## 35  2014 Very_Large       0.429        2.76              1.22            2
## 36  2015 Small_Economy    2.28         3.82             14.5            40
## 37  2015 Medium_Economy   2.42         2.94              8.31           33
## 38  2015 Large_Economy   -1.35         5.88             23.7             4
## 39  2015 Very_Large       1.67         0.795             1.14            2
## 40  2016 Small_Economy    2.44         3.48             13.3            44
## 41  2016 Medium_Economy   2.52         4.33              8.31           37
## 42  2016 Large_Economy   -1.61         5.78             28.4             4
## 43  2016 Very_Large       0.805       -0.127             1.04            2
## 44  2017 Small_Economy    2.00         4.17             12.7            42
## 45  2017 Medium_Economy   3.07         5.41              8.08           30
## 46  2017 Large_Economy    0.756        4.74             27.1             4
## 47  2017 Very_Large       1.76         0.484             0.994           2
## 48  2018 Small_Economy    2.55         2.97             12.4            45
## 49  2018 Medium_Economy   2.65         4.57              7.93           27
## 50  2018 Large_Economy    1.05         4.28             23.6             4
## 51  2019 Small_Economy    2.31         2.67             12.1            38
## 52  2019 Medium_Economy   1.63         5.62              9.66           30
## 53  2019 Large_Economy   -0.386        3.68             23.0             4
## 54  2020 Small_Economy   -9.15         4.05             12.6            30
## 55  2020 Medium_Economy  -4.45         5.64              8.95           23
## 56  2020 Large_Economy   -6.47         3.30             17.7             4
## 57  2021 Small_Economy    6.60         4.29             11.4            27
## 58  2021 Medium_Economy   4.64         7.37              8.21           24
## 59  2021 Large_Economy    4.83         7.00             17.5             4
## 60  2022 Small_Economy    5.11        10.9              14.1            28
## 61  2022 Medium_Economy   3.05         8.68              8.71           20
## 62  2022 Large_Economy    2.79         8.59             23.8             4
## 63  2023 Small_Economy    4.39         7.20             13.0            17
## 64  2023 Medium_Economy   1.85         8.71             10.4            12
## 65  2023 Large_Economy    2.62         5.06             27.6             4

The results show substantial heterogeneity in the inflation–growth relationship across economy sizes. Small economies experience high inflation volatility and elevated lending rates, which are associated with unstable and often lower growth. Medium-sized economies display the most favorable macroeconomic trade-off, combining moderate inflation, manageable borrowing costs, and stable growth. Very large economies operate under a distinct low-inflation, low-interest-rate regime, where growth appears largely decoupled from inflation dynamics

#time series plot
ggplot(yearly_avg, aes(x = Years, y = avg_gdp, color = Economy_Size)) +
  geom_line(size = 1.5) +
  geom_point(size = 2) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  labs(title = "Average GDP Growth Over Time by Economy Size",
       subtitle = "Large economies were more resilient during global crises",
       x = "Year",
       y = "Average GDP Growth (%)",
       color = "Economy Size") +
  scale_color_brewer(palette = "Set1")

Pre-crisis periods (2005–2007, 2011–2018)

All economy sizes show positive growth Large and very large economies grow steadily Small economies show more fluctuations

Global Financial Crisis (2008–2009)

Growth falls across all groups Small and medium economies experience sharper contractions Very large economies recover faster post-crisis

COVID-19 shock (2020)

The largest contraction in the sample Small economies suffer the deepest decline Large and very large economies experience smaller drops

Post-crisis recovery (2021–2023)

Strong rebound across all groups Small economies show high rebound growth (base effect) Large economies return to stable, moderate growth Very large economies recover smoothly, not excessively

Small economies, while sometimes growing faster in good times, are more vulnerable to global turbulence. Mid-sized sit in the middle not as volatile as small ones, but not as stable as the large economies. Average GDP growth declines sharply during global crises, but larger economies exhibit greater resilience and recover faster than smaller economies.

Inflation

ggplot(yearly_avg, aes(x = Years, y = avg_inflation, color = Economy_Size)) +
  geom_line(size = 1.5) +
  geom_point(size = 2) +
  labs(title = "Average Inflation Over Time by Economy Size",
       subtitle = "Inflation spikes visible in small/medium economies",
       x = "Year",
       y = "Average Inflation (%)",
       color = "Economy Size") +
  scale_color_brewer(palette = "Set1") +
  scale_y_log10()

Small economies

Highest inflation levels throughout the period Pronounced inflation spikes (e.g. mid-2000s, post-2020) Large year-to-year fluctuations

Medium economies

Inflation lower than small economies but still volatile Clear sensitivity to global cycles (2008–09, 2020–22)

Large economies

Inflation remains within a relatively narrow band Gradual movements rather than sharp spikes Temporary increases around global shocks

Very large economies

Lowest inflation levels overall Extremely smooth inflation path Minimal response even during crises The sudden spike around 2014–2015

Economic size plays a central role in inflation stability, with larger economies exhibiting stronger control over price dynamics.

what’s the correlation structure between indicators

# Select numeric columns for correlation
numeric_cols <- Global_Rates_Economy_Size %>%
  dplyr::select(where(is.numeric)) %>%
  dplyr::select(-c(Years, contains("High_"), GDP_Decline, Risk_Score))  # Remove binary flags

cor_matrix <- cor(numeric_cols, use = "complete.obs")



#corrrelation heatwave 
corrplot(cor_matrix, 
         method = "color",
         type = "upper",
         tl.col = "darkred",
         tl.srt = 45,
         addCoef.col = "orange",
         number.cex = 0.7,
         title = "Correlation Matrix of Economic Indicators",
         mar = c(0, 0, 2, 0))

HIGH CORRELATION (|r| > 0.7)

Interest Rate & Risk Cluster

(Lending rate, interest rate spread, risk premium, real interest rate) These variables move almost one-for-one and form the financial risk core of the economy. High lending rates signal high risk, wide spreads, and elevated real returns demanded by investors. Seeing one high almost guarantees the others are high.

Current Account Balance Cluster

Current_Account_GDP_Perc ↔︎ Negative_CA: ~ -0.69

A negative current account (deficit) strongly correlates with having a low current account as % of GDP countries with deficits have negative balances relative to GDP.

*Interest Rate Components Cluster**

(Interest rate spread, risk premium, real interest rate) These measures reflect different expressions of the same underlying risk pricing. Markets embed perceived economic risk directly into real returns and spreads.

MODERATE CORRELATION GROUPS (0.3 < |r| < 0.7)

Inflation & Monetary Policy Transmission

(Inflation, lending rate, deposit rate) Higher inflation leads to policy tightening, raising lending rates, which then transmit to deposit rates. This shows a working but imperfect monetary transmission mechanism

Credit & Economic Size

(Private sector credit, GDP size, current account balance) Larger economies tend to have deeper credit markets, and surplus countries often support more domestic lending

External Sector & Capital Attraction

(Current account, negative CA, deposit rates) Countries with external deficits tend to offer higher deposit rates to attract capital and finance imbalances.

Inflation–Deposit Rate Link

(Inflation ↔︎ deposit rate) Banks partially compensate savers for inflation, but real returns are not fully protected.

LOW / NO CORRELATION GROUPS (|r| < 0.3)

Economic Growth Isolated

(GDP per capita growth) Growth is largely independent of monetary and financial variables here. it likely depends on structural and productivity factors not captured in this dataset.

Exchange Rate Independence

(Exchange rate vs most variables) Exchange rates are driven by external capital flows, expectations, and policy interventions, not domestic fundamentals alone.

Unemployment as a Structural Variable

(Unemployment vs macro-financial variables) Labor market outcomes reflect institutional and structural dynamics, not short-run financial conditions.

Overall, the correlation structure reveals that financial risk variables are tightly interconnected, monetary policy transmission is present but imperfect, and real economic outcomes such as growth and unemployment remain largely decoupled from short-run macro-financial indicators in cross-country data

Which countries have the highest risk scores

# Calculate average risk by country

country_risk <- Global_Rates_Economy_Size %>%
    group_by(Country, Economy_Size) %>%
    summarise(
      avg_risk_score = mean(Risk_Score, na.rm = TRUE),
      high_risk_years = sum(Risk_Score >= 3, na.rm = TRUE),
      total_years = n(),
      .groups = "drop"
    ) %>%
    mutate(risk_pct = high_risk_years / total_years * 100) %>%
    arrange(desc(avg_risk_score))  

print(country_risk, n = 15)

## # A tibble: 59 × 6
##    Country      Economy_Size avg_risk_score high_risk_years total_years risk_pct
##    <fct>        <ord>                 <dbl>           <int>       <int>    <dbl>
##  1 Angola       Medium_Econ…           3.33               9           9   100   
##  2 Belize       Small_Econo…           2.44               5           9    55.6 
##  3 Solomon Isl… Small_Econo…           2.14               2           7    28.6 
##  4 Maldives     Small_Econo…           2                  2          15    13.3 
##  5 St. Lucia    Small_Econo…           1.88               1           8    12.5 
##  6 Moldova      Small_Econo…           1.86               4          28    14.3 
##  7 Mozambique   Small_Econo…           1.83               3          12    25   
##  8 Gambia, The  Small_Econo…           1.73               2          15    13.3 
##  9 Zambia       Small_Econo…           1.73               2          15    13.3 
## 10 Lebanon      Small_Econo…           1.71               0           7     0   
## 11 Albania      Small_Econo…           1.69               0          26     0   
## 12 Barbados     Small_Econo…           1.67               0           6     0   
## 13 Pakistan     Medium_Econ…           1.67               0          15     0   
## 14 Algeria      Medium_Econ…           1.64               5          22    22.7 
## 15 Montenegro   Small_Econo…           1.62               2          26     7.69
## # ℹ 44 more rows

# Top 15 risky countries

  country_risk %>%
    slice_head(n = 15) %>%
    ggplot(aes(x = reorder(Country, avg_risk_score), y = avg_risk_score, fill = Economy_Size)) +
    geom_col() +
    coord_flip() +
    labs(title = "Top 15 Countries by Average Risk Score",
         subtitle = "Higher scores indicate more frequent risk indicators",
         x = "Country",
         y = "Average Risk Score (0-4)",
         fill = "Economy Size") +
    scale_fill_brewer(palette = "Set2") +
    theme(legend.position = "right")

Small economies dominate the risk rankings

Most countries in the top 15 are Small_Economy

high risk score means an economy frequently operates under:

High inflation High interest rates Large risk premia / spreads Weak financial conditions (e.g. NPLs, negative growth flags)

This reflects stress in the macro-financial system and An economy can grow and still be high risk, but it cannot be stable while remaining high risk for long.

Angola is a high-risk country with risk score (3.33) and High-risk conditions in 100% of observed years This means Angola is almost never in a low-stress macro environment. Risk is structural and persistent, not occasional

Macro-financial risk is highly uneven across countries, concentrated in small and externally exposed economies, and becomes most dangerous when it is persistent rather than episodic

Almost all are developing/emerging economies confirming that less developed = higher economic risk.

How do lending and deposit rates relate?

ggplot(Global_Rates_Economy_Size, aes(x = Deposit_rate, y = Lending_rate)) +
    geom_point(aes(color = Economy_Size, size = GDP_current_us), alpha = 0.6) +
    geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
    labs(title = "Lending vs Deposit Rates",
         subtitle = "Points above the red line indicate positive interest spreads",
         x = "Deposit Rate (%)",
         y = "Lending Rate (%)",
         color = "Economy Size",
         size = "GDP (USD)") +
    scale_size_continuous(range = c(1, 10),
                          labels = scales::dollar_format(scale = 1e-9, suffix = "B")) +
    facet_wrap(~Economy_Size, scales = "free")

The plot visualizes the banking price system across countries: how banks set deposit rates (cost of funds) relative to lending rates (price of credit), and how this varies by economic size and development

Points above the line = Positive interest spread (lending > deposit) Points on the line = Zero spread (lending = deposit) Points below = Negative spread (impossible in normal banking)

small economy:

High Lending, Low Deposit Banks charge high lending rates because lending is risky, and deposit rates stay low because households have few safe alternatives. example : (30% lending vs 5% Deposit) is economically realistic

Medium Economy:

High on Lending, Right on Deposit

Medium economies have high lending and high deposit rates because inflation and risk affect both borrowers and savers, resulting in a high-rate but more balanced financial environment than small economies

Large Economy:

in Large economy there’s is low lending/ low deposit also high lending/low deposit

Some large economies behave like small ones (high lending, low deposits) They’re large in GDP but still financially developing/volatile Banks charge high rates for loans but don’t pay much for deposits (wide spreads)

Other large economies behave like very large ones (low lending, low deposits) They’re financially mature and stable Both borrowing and saving are cheap

Very Large Economies

Low Lending Low Deposit

All have low rates for everything Their financial systems are deep, stable, and integrated globally

Smaller and Medium economies cluster at high lending rates and wide spreads due to inflation, credit risk, and weak financial institution Being big doesn’t automatically mean your financial system is mature. Some large economies still have ‘small economy’ interest rate problems. But when you’re very large, you’ve almost certainly achieved financial stability and low rates.

ECONOMIC RESILIENCE ANALYSIS - STATISTICAL ANALYSIS

STATISTICAL TEST 1: Do different economy sizes have different growth rates?

Null hypothesis (H₀): Mean GDP per capita growth is the same across all economy sizes.

Alternative hypothesis (H₁): At least one economy-size group has a different mean growth rate.

# ANOVA test
  growth_anova <- aov(GDP_per_capita_growth ~ Economy_Size, data = Global_Rates_Economy_Size)
  anova_summary <- summary(growth_anova)
  
  cat("ANOVA: GDP Growth by Economy Size\n")

## ANOVA: GDP Growth by Economy Size

  print(anova_summary)

##                Df Sum Sq Mean Sq F value Pr(>F)  
## Economy_Size    3    152   50.80   3.002 0.0297 *
## Residuals    1018  17227   16.92                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  # Post-hoc test (Tukey HSD)
  if (anova_summary[[1]][1, "Pr(>F)"] < 0.05) {
    tukey_test <- TukeyHSD(growth_anova)
    
    print(tukey_test)
    
    # Visualize
    p_tukey <- plot(tukey_test)
  }

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = GDP_per_capita_growth ~ Economy_Size, data = Global_Rates_Economy_Size)
## 
## $Economy_Size
##                                     diff        lwr         upr     p adj
## Medium_Economy-Small_Economy -0.04001195 -0.7429325  0.66290863 0.9988794
## Large_Economy-Small_Economy  -1.34068434 -2.5374327 -0.14393595 0.0209513
## Very_Large-Small_Economy     -0.67357514 -3.3625975  2.01544724 0.9174445
## Large_Economy-Medium_Economy -1.30067238 -2.5138141 -0.08753066 0.0300461
## Very_Large-Medium_Economy    -0.63356319 -3.3299214  2.06279501 0.9306098
## Very_Large-Large_Economy      0.66710920 -2.1979313  3.53214966 0.9323133

Since p < 0.05, you reject the null hypothesis. There is a statistically significant difference in average GDP per capita growth across economy sizes.

ANOVA results indicate statistically significant differences in GDP per capita growth across economy sizes (F(3,1018)=3.00, p=0.03). Post-hoc Tukey tests reveal that large economies experience significantly lower average growth than both small and medium economies

Smaller and medium economies grow faster on average, while large economies grow more slowly but more steadily.

STATISTICAL TEST 2: Correlation between inflation and growth

# Overall correlation
  cor_test_overall <- cor.test(Global_Rates_Economy_Size$Inflation, Global_Rates_Economy_Size$GDP_per_capita_growth, 
                               use = "complete.obs")
  cat("\nCorrelation Test: Inflation vs GDP Growth (Overall)\n")

## 
## Correlation Test: Inflation vs GDP Growth (Overall)

  print(cor_test_overall)

## 
##  Pearson's product-moment correlation
## 
## data:  Global_Rates_Economy_Size$Inflation and Global_Rates_Economy_Size$GDP_per_capita_growth
## t = -0.23225, df = 1020, p-value = 0.8164
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.06856304  0.05407434
## sample estimates:
##          cor 
## -0.007271695

  # By economy size
  cor_by_size <- Global_Rates_Economy_Size %>%
    group_by(Economy_Size) %>%
    summarise(
      correlation = cor(Inflation, GDP_per_capita_growth, use = "complete.obs"),
      p_value = cor.test(Inflation, GDP_per_capita_growth)$p.value,
      n = n()
    )

print(cor_by_size)

## # A tibble: 4 × 4
##   Economy_Size   correlation p_value     n
##   <ord>                <dbl>   <dbl> <int>
## 1 Small_Economy      0.0674  0.135     493
## 2 Medium_Economy    -0.134   0.00583   420
## 3 Large_Economy     -0.00166 0.987      93
## 4 Very_Large        -0.459   0.0738     16

p-value = 0.8164 < 0.05 There is no statistically significant linear correlation between inflation and GDP per capita growth when all economy sizes are pooled together

Correlation by economy size

Small economies: r = +0.067, p = 0.135 (not significant) No meaningful linear relationship, Inflation is not the main growth determinant

Medium economies: r = −0.134, p = 0.0058 Statistically significant negative correlation Higher inflation is associated with lower growth

Large economies: r ≈ 0, p = 0.987(not significant) Growth is decoupled from inflation fluctuations

Very large economies: r = −0.459, p = 0.0738 (not significant) Not statistically significant due to low sample size but Strong negative relationship

While no significant correlation between inflation and GDP per capita growth is observed in the pooled sample, subgroup analysis reveals a statistically significant negative relationship in medium-sized economies and a strong negative association in very large economies, highlighting the importance of economic size and structural heterogeneity.

STATISTICAL TEST 3: Are risk scores different across sizes?

H₀: Risk score distributions are the same across economy sizes

H₁: At least one economy size has a different risk score distribution

#krusskal wallis test
risk_kruskal <- kruskal.test(Risk_Score ~ Economy_Size, data = Global_Rates_Economy_Size)
cat("\nKruskal-Wallis Test: Risk Scores by Economy Size\n")

## 
## Kruskal-Wallis Test: Risk Scores by Economy Size

print(risk_kruskal)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Risk_Score by Economy_Size
## Kruskal-Wallis chi-squared = 92.664, df = 3, p-value < 2.2e-16

# Pairwise comparisons
if (risk_kruskal$p.value < 0.05) {
  pairwise.wilcox.test(Global_Rates_Economy_Size$Risk_Score, Global_Rates_Economy_Size$Economy_Size, 
                       p.adjust.method = "BH")
}

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  Global_Rates_Economy_Size$Risk_Score and Global_Rates_Economy_Size$Economy_Size 
## 
##                Small_Economy Medium_Economy Large_Economy
## Medium_Economy 5.4e-13       -              -            
## Large_Economy  0.00091       0.07331        -            
## Very_Large     4.3e-11       4.1e-06        4.2e-09      
## 
## P value adjustment method: BH

p-value < 2.2e-16 (Strong Statistical significant )

reject H₀: Risk scores differ significantly across economy sizes.

Pairwise Wilcoxon results

Small vs Medium, p = 5.4e-13( Statistical significant) Risk profiles are very different

Small vs Large ,p = 4.1e-06 (Statistically Significant) Risk continues to fall with size

Large vs Very Large, p = 4.2e-09 (Statistically significant) Even among large economies, very large ones are safer

Medium vs Large, p = 0.073 (Not significant) Borderline, suggests gradual rather than abrupt risk reduction

A Kruskal–Wallis test reveals statistically significant differences in risk scores across economy sizes (χ²(3)=92.66, p<0.001). Pairwise Wilcoxon tests indicate that risk declines significantly with economic size, with small economies exhibiting the highest risk and very large economies the lowest.

REGRESSION ANALYSIS: What predicts GDP growth?

#Remove extreme outliers for regression
 
regression <- Global_Rates_Economy_Size %>%
  filter(
    abs(GDP_per_capita_growth) < 30,# Remove extreme values
    Inflation < 50   ## Remove hyperinflation cases
  )



# Multiple regression
model <- lm(GDP_per_capita_growth ~ Inflation + Lending_rate + 
              Unemployment + Real_interest_rate + factor(Economy_Size),
            data = regression)

print(summary(model))

## 
## Call:
## lm(formula = GDP_per_capita_growth ~ Inflation + Lending_rate + 
##     Unemployment + Real_interest_rate + factor(Economy_Size), 
##     data = regression)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.7127  -1.2640   0.3625   2.1465  14.3773 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.34777    0.33347   7.040 3.53e-12 ***
## Inflation              -0.09066    0.03600  -2.518  0.01195 *  
## Lending_rate            0.07909    0.02833   2.791  0.00535 ** 
## Unemployment           -0.05329    0.01909  -2.791  0.00535 ** 
## Real_interest_rate     -0.11542    0.02763  -4.177 3.21e-05 ***
## factor(Economy_Size).L -1.05062    0.66304  -1.585  0.11338    
## factor(Economy_Size).Q  0.27895    0.53580   0.521  0.60275    
## factor(Economy_Size).C  0.45600    0.38921   1.172  0.24164    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.78 on 1012 degrees of freedom
## Multiple R-squared:  0.03668,    Adjusted R-squared:  0.03001 
## F-statistic: 5.504 on 7 and 1012 DF,  p-value: 3.154e-06

Inflation (−0.091)

Negative and statistically significant (p = 0.012) A 1 percentage-point increase in inflation is associated with a 0.09 percentage-point reduction in GDP per capita growth, holding other factors constant.

Lending rate (+0.079)

Positive and significant (p = 0.005) Higher lending rates shouldn’t help growth (they make borrowing expensive), yet they correlate positively here In growing economies, rates rise because growth is strong

Unemployment (−0.053)

Negative and significant Higher unemployment directly reduces output growth via underutilised labour. Fully consistent with macro theory (Okun’s Law)

Real interest rate (−0.115)

Strongly negative and highly significant Higher real borrowing costs significantly depress growth by discouraging investment and consumption. This is one of your strongest predictors

Economy size

None are statistically significant after controlling for macro variables

Regression results indicate that inflation, unemployment, and real interest rates exert statistically significant negative effects on GDP per capita growth, while lending rates are positively associated with growth, reflecting cyclical credit conditions. Once macroeconomic fundamentals are controlled for, economy size no longer has a direct effect on growth, suggesting that size operates indirectly through macroeconomic stability channels.

# VIF for multicollinearity
vif_values <- car::vif(model)
cat("\nVariance Inflation Factors (VIF):\n")

## 
## Variance Inflation Factors (VIF):

print(vif_values)

##                          GVIF Df GVIF^(1/(2*Df))
## Inflation            1.749101  1        1.322536
## Lending_rate         5.606672  1        2.367841
## Unemployment         1.049062  1        1.024237
## Real_interest_rate   4.830297  1        2.197794
## factor(Economy_Size) 1.284241  3        1.042576

Lending rates and real interest rates are related, reflecting the interest rate–risk cluster we saw earlier. However, the correlation is moderate, not extreme, so our regression results for growth remain interpretable

TIME SERIES ANALYSIS: Are there global trends?

# Calculate global averages per year
global_avg <- Global_Rates_Economy_Size %>%
  group_by(Years) %>%
  summarise(
    global_growth = mean(GDP_per_capita_growth, na.rm = TRUE),
    global_inflation = mean(Inflation, na.rm = TRUE),
    n_countries = n()
  )

# Simple time series model
ts_model <- lm(global_growth ~ Years + I(Years^2), data = global_avg)
cat("\nGlobal Growth Trend Model:\n")

## 
## Global Growth Trend Model:

print(summary(ts_model))

## 
## Call:
## lm(formula = global_growth ~ Years + I(Years^2), data = global_avg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.1845  0.0835  0.4729  0.9740  3.4321 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.620e+04  9.722e+04   0.167     0.87
## Years       -1.610e+01  9.654e+01  -0.167     0.87
## I(Years^2)   4.000e-03  2.397e-02   0.167     0.87
## 
## Residual standard error: 2.792 on 16 degrees of freedom
## Multiple R-squared:  0.002427,   Adjusted R-squared:  -0.1223 
## F-statistic: 0.01946 on 2 and 16 DF,  p-value: 0.9808

Neither linear (Years) nor quadratic (Years²) terms are significant , p >> 0.05

Global economic growth doesn’t systematically increase or decrease over time. There’s no upward trend (we’re not getting better at growing), no downward trend (we’re not stagnating), and no U-shaped pattern. Growth is essentially flat with random fluctuations year to year.

CLUSTER ANALYSIS: Do countries group by economic behavior?

# Prepare data for clustering
cluster_data <- Global_Rates_Economy_Size %>%
  group_by(Country) %>%
  summarise(
    avg_growth = mean(GDP_per_capita_growth, na.rm = TRUE),
    avg_inflation = mean(Inflation, na.rm = TRUE),
    avg_lending = mean(Lending_rate, na.rm = TRUE),
    avg_unemployment = mean(Unemployment, na.rm = TRUE),
    volatility = sd(GDP_per_capita_growth, na.rm = TRUE)
  ) %>%
  na.omit() %>%
  column_to_rownames("Country")


# Scale data
cluster_scaled <- scale(cluster_data)

# Determine optimal number of clusters
fviz_nbclust(cluster_scaled, kmeans, method = "wss") +
  labs(title = "Optimal Number of Clusters - Elbow Method")

## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Perform k-means clustering
set.seed(123)
kmeans_result <- kmeans(cluster_scaled, centers = 4, nstart = 25)

print(kmeans_result)

## K-means clustering with 4 clusters of sizes 2, 12, 31, 13
## 
## Cluster means:
##    avg_growth avg_inflation avg_lending avg_unemployment   volatility
## 1 -0.09427025    -0.8709498  -0.2170550       -0.2102689  3.792600351
## 2  0.21574991    -0.3904647  -0.1706681        1.6415339  0.009498563
## 3  0.09917527    -0.3067118  -0.4896737       -0.5072472 -0.158173111
## 4 -0.42114551     1.2258108   1.3586162       -0.2733234 -0.215062849
## 
## Clustering vector:
##                        Albania                        Algeria 
##                              2                              3 
##                         Angola                        Armenia 
##                              4                              2 
##                      Australia                     Bangladesh 
##                              3                              3 
##                       Barbados                         Belize 
##                              3                              1 
##                        Bolivia                         Brazil 
##                              3                              4 
##                       Bulgaria                         Canada 
##                              3                              3 
##                        Czechia                       Eswatini 
##                              3                              2 
##                           Fiji                    Gambia, The 
##                              3                              4 
##                        Georgia                        Grenada 
##                              2                              3 
##           Hong Kong SAR, China                        Hungary 
##                              3                              3 
##                         Israel                          Japan 
##                              3                              3 
##                          Kenya                         Kosovo 
##                              3                              2 
##                Kyrgyz Republic                        Lebanon 
##                              4                              3 
##                        Lesotho                     Madagascar 
##                              2                              4 
##                       Malaysia                       Maldives 
##                              3                              1 
##                      Mauritius                         Mexico 
##                              3                              3 
##                        Moldova                     Montenegro 
##                              3                              2 
##                     Mozambique                        Namibia 
##                              4                              2 
##                        Nigeria                       Pakistan 
##                              4                              3 
##               Papua New Guinea                    Philippines 
##                              3                              3 
##                        Romania                         Rwanda 
##                              3                              2 
##                     Seychelles                      Singapore 
##                              3                              3 
##                Solomon Islands                   South Africa 
##                              3                              2 
##                      Sri Lanka                      St. Lucia 
##                              3                              2 
## St. Vincent and the Grenadines                     Tajikistan 
##                              2                              4 
##                       Tanzania                       Thailand 
##                              4                              3 
##            Trinidad and Tobago                         Uganda 
##                              3                              4 
##                        Uruguay                     Uzbekistan 
##                              4                              4 
##                       Viet Nam                         Zambia 
##                              3                              4 
## 
## Within cluster sum of squares by cluster:
## [1]  3.495281 31.357413 61.319528 56.454033
##  (between_SS / total_SS =  46.4 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

# Visualize clusters
fviz_cluster(kmeans_result, data = cluster_scaled,
             palette = "Set2",
             ggtheme = theme_minimal(),
             main = "Country Clusters by Economic Performance")

Macro patterns define groups: Countries with similar GDP growth, inflation, lending rates, unemployment, and volatility tend to cluster together.

Cluster 1 (n=2): Belize, Maldives

Small, stable economies with very low lending/inflation but slightly higher unemployment. Likely smaller, low-risk countries.

Cluster 2 (n=12): Albania, Eswatini, Montenegro, South Africa

moderate economies, negative/slightly negative growth, high volatility Slightly negative growth, low inflation and lending, moderate unemployment, high volatility unusual pattern, possibly small economies with shocks

Cluster 3 (n=31): Australia, Canada, Japan, Singapore, Philippines

large or medium economies, stable growth Stable, moderate growth countries, low inflation, low lending rates, low unemployment. Likely medium-to-large, mature economies.

Cluster 4 (n=13): Angola, Brazil, Nigeria, Tanzania, Uzbekistan small-to-medium high-risk economies, High-risk/high-rate economies: negative growth, very high inflation and lending rates. Possibly volatile or small/medium developing countries.

MODELING FOR RISK CLASSIFICATION

DATA PREPARATION FOR MODELING

We’ll predict Risk_Level (High/Medium/Low/Very Low) But let’s simplify to binary for some models: High-Risk vs Not-High-Risk

global_model <- Global_Rates_Economy_Size %>%
  mutate(
    # Binary target
    High_Risk = ifelse(Risk_Level %in% c("High", "Medium"), 1, 0),
    High_Risk = as.factor(High_Risk),
    
    # Multi-class target
    Risk_Level = factor(Risk_Level, 
                        levels = c("Very Low", "Low", "Medium", "High"),
                        ordered = FALSE)
  ) %>%
  # Remove countries with very few observations
  group_by(Country) %>%
  filter(n() >= 5) %>%
  ungroup()

# Select features for modeling
feature_cols <- c(
  "GDP_per_capita_growth", "Inflation", "Lending_rate", "Deposit_rate",
  "Current_Account_GDP_Perc", "Exchange_rate", "Non_performing_loans",
  "Private_sector_credit", "Unemployment", "Interest_rate_spread",
  "Real_interest_rate", "Risk_premium", "Economy_Size"
)



# Prepare final dataset

  model_data <- global_model %>%
  dplyr::select(all_of(feature_cols), High_Risk, Risk_Level, Years, Country) %>%
  na.omit()

cat("Final modeling dataset dimensions:", dim(model_data), "\n")

## Final modeling dataset dimensions: 1011 17

cat("Class distribution (High Risk vs Not):\n")

## Class distribution (High Risk vs Not):

print(table(model_data$High_Risk))

## 
##   0   1 
## 697 314

Handling Imbalanced Class

#USE SMOTE TO HANDLING IMBALANCED CLASSES
model_data_numeric <- model_data %>%
  dplyr::select(where(is.numeric), High_Risk)


model_data_numeric$High_Risk <- factor(model_data$High_Risk,
                                       levels = c(0, 1))


set.seed(123)

model_data_smote <- themis::smote(
  model_data_numeric,
  var = "High_Risk",
  over_ratio = 1
)

table(model_data_smote$High_Risk)

## 
##   0   1 
## 697 697

SPLIT DATA: Time-based split (train on older years, test on recent years

# Split data into training and testing (70/30)
set.seed(123)
train_index <- createDataPartition(model_data_smote$High_Risk, 
                                   p = 0.7, 
                                   list = FALSE)
train_data <- model_data_smote[train_index, ]
test_data <- model_data_smote[-train_index, ]

cat("\nTraining set size:", nrow(train_data), "\n")

## 
## Training set size: 976

cat("Testing set size:", nrow(test_data), "\n")

## Testing set size: 418

cat("Training class distribution:\n")

## Training class distribution:

print(table(train_data$High_Risk))

## 
##   0   1 
## 488 488

cat("Testing class distribution:\n")

## Testing class distribution:

print(table(test_data$High_Risk))

## 
##   0   1 
## 209 209

# Prepare matrices for XGBoost
train_x <- as.matrix(train_data %>% dplyr::select(-High_Risk))
train_y <- as.numeric(as.character(train_data$High_Risk))
test_x <- as.matrix(test_data %>% dplyr::select(-High_Risk))
test_y <- as.numeric(as.character(test_data$High_Risk))

HELPER FUNCTION FOR EVALUATION

evaluate_model <- function(predictions, probs, actual, model_name) {
  # Ensure factors have same levels
  predictions <- factor(predictions, levels = c("0", "1"))
  actual <- factor(actual, levels = c("0", "1"))
  
  # Confusion matrix
  cm <- confusionMatrix(predictions, actual, positive = "1")
  
  # ROC curve
  roc_obj <- roc(as.numeric(actual == "1"), probs)
  
  # Calculate metrics
  metrics <- data.frame(
    Model = model_name,
    Accuracy = round(cm$overall["Accuracy"], 4),
    Sensitivity = round(cm$byClass["Sensitivity"], 4),  # Recall for class 1
    Specificity = round(cm$byClass["Specificity"], 4),
    Precision = round(cm$byClass["Precision"], 4),
    F1_Score = round(cm$byClass["F1"], 4),
    AUC = round(auc(roc_obj), 4),
    Kappa = round(cm$overall["Kappa"], 4)
  )
  
  return(list(
    metrics = metrics,
    cm = cm,
    roc = roc_obj,
    predictions = predictions,
    probabilities = probs
  ))
}

MODEL 1: LOGISTIC REGRESSION WITH CLASS WEIGHTS

# Train logistic regression
logit_model <- glm(High_Risk ~ .,
                   data = train_data,
                   family = binomial())

# Summary
cat("\nModel Coefficients:\n")

## 
## Model Coefficients:

coef_summary <- summary(logit_model)$coefficients
print(coef_summary[order(-abs(coef_summary[, "Estimate"])), ][1:10, ])

##                             Estimate  Std. Error     z value     Pr(>|z|)
## (Intercept)              21.69754886 56.26643513   0.3856215 6.997770e-01
## GDP_per_capita_growth    -0.57567490  0.04794094 -12.0080028 3.225408e-33
## Non_performing_loans      0.44729906  0.03620194  12.3556637 4.539937e-35
## Current_Account_GDP_Perc -0.14299663  0.01853687  -7.7141729 1.217693e-14
## Inflation                 0.13143094  0.04074922   3.2253607 1.258140e-03
## Deposit_rate              0.06048924  0.07192393   0.8410169 4.003385e-01
## Interest_rate_spread      0.05716604  0.07284433   0.7847699 4.325885e-01
## Risk_premium             -0.03882690  0.04323347  -0.8980751 3.691455e-01
## Lending_rate             -0.02852118  0.06200685  -0.4599682 6.455390e-01
## Real_interest_rate       -0.02087328  0.03139520  -0.6648558 5.061428e-01

# Predictions
logit_prob <- predict(logit_model, newdata = test_data, type = "response")
logit_pred <- ifelse(logit_prob > 0.5, "1", "0")

# Evaluate
logit_results <- evaluate_model(logit_pred, logit_prob, 
                                test_data$High_Risk, "Logistic Regression")

cat("\nPerformance Metrics:\n")

## 
## Performance Metrics:

print(logit_results$metrics)

##                        Model Accuracy Sensitivity Specificity Precision
## Accuracy Logistic Regression   0.8565      0.8373      0.8756    0.8706
##          F1_Score    AUC  Kappa
## Accuracy   0.8537 0.9356 0.7129

GDP_per_capita_growth: -0.576 (p < 0.001)

Higher economic growth reduces the probability of high risk.

Economically, countries with stronger GDP growth are less likely to experience macroeconomic stress.

Non_performing_loans: 0.447 (p < 0.001)

Higher levels of non-performing loans increase the risk of macroeconomic stress.

Indicates banking sector vulnerabilities strongly signal economic risk.

*Current_Account_GDP_Perc: -0.143 (p < 0.001)**

A higher (less negative) current account balance reduces risk.

Persistent external deficits increase vulnerability to shocks.

Inflation: 0.131 (p = 0.0013)

Higher inflation slightly increases risk.

Reflects the destabilizing effect of rising prices on macroeconomic stability.

Variables with Low or Non-significant Impact

Deposit_rate, Interest_rate_spread, Risk_premium, Lending_rate, Real_interest_rate

These coefficients are not statistically significant (p > 0.05).

After balancing the dataset, these variables do not provide strong independent signals for high-risk classification.

They may still interact with other variables but are less predictive on their own.

Accuracy: 85.7%

Overall, the model correctly classifies ~86% of case

AUC: 0.936 Excellent discriminative ability between high-risk and low-risk countries

Sensitivity: 83.7% It correctly identifies ~84% of high-risk cases (true positives)

Specificity: 87.6% It correctly identifies ~88% of low-risk cases (true negatives).

Precision: 87.1% When the model predicts high risk, it is correct ~87% of the tim

The logistic regression model predicts whether a country is at high macroeconomic risk based on key economic indicators. The model achieves 85.7% accuracy and an AUC of 0.936, demonstrating strong ability to distinguish high-risk from low-risk countries.

Key drivers of risk include:

GDP per capita growth (higher growth reduces risk),

Non-performing loans (higher levels increase risk),

Current account balance as a percentage of GDP (larger deficits increase risk), and

Inflation (higher inflation slightly increases risk).

Other variables such as deposit rates, lending rates, and interest rate spreads were not statistically significant, indicating that core macroeconomic fundamentals are the strongest predictors of systemic risk.

Overall, the model provides both high predictive performance and economic interpretability, making it suitable for risk monitoring and policy decision-making.

MODEL 2: RANDOM FOREST

# Train Random Forest
set.seed(123)
rf_model <- randomForest(
  High_Risk ~ .,
  data = train_data,
  ntree = 500,
  mtry = floor(sqrt(ncol(train_data) - 1)),  # Square root of features
  importance = TRUE,
  do.trace = FALSE
)

# Check OOB error
cat("OOB Error Rate:", round(rf_model$err.rate[nrow(rf_model$err.rate), "OOB"], 4), "\n")

## OOB Error Rate: 0.0123

# Predictions
rf_prob <- predict(rf_model, newdata = test_data, type = "prob")[, "1"]
rf_pred <- predict(rf_model, newdata = test_data, type = "class")

# Evaluate
rf_results <- evaluate_model(rf_pred, rf_prob, 
                             test_data$High_Risk, "Random Forest")

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

cat("\nPerformance Metrics:\n")

## 
## Performance Metrics:

print(rf_results$metrics)

##                  Model Accuracy Sensitivity Specificity Precision F1_Score
## Accuracy Random Forest   0.9928      0.9856           1         1   0.9928
##             AUC  Kappa
## Accuracy 0.9998 0.9856

# Variable Importance
rf_importance <- importance(rf_model)
rf_importance_df <- data.frame(
  Variable = rownames(rf_importance),
  Importance = rf_importance[, "MeanDecreaseGini"]
) %>% arrange(desc(Importance))

cat("\nTop 10 Important Variables:\n")

## 
## Top 10 Important Variables:

print(rf_importance_df[1:10, ])

##                                          Variable Importance
## Non_performing_loans         Non_performing_loans  134.15726
## GDP_per_capita_growth       GDP_per_capita_growth  130.36832
## Current_Account_GDP_Perc Current_Account_GDP_Perc   60.01939
## Inflation                               Inflation   25.22484
## Lending_rate                         Lending_rate   23.47179
## Private_sector_credit       Private_sector_credit   19.26347
## Deposit_rate                         Deposit_rate   16.96200
## Exchange_rate                       Exchange_rate   15.29407
## Unemployment                         Unemployment   14.12186
## Interest_rate_spread         Interest_rate_spread   14.02310

The model perfectly classifies all countries into High-Risk vs Not-High-Risk. This is unusually perfect for real-world macro-financial data, suggesting that the predictors are very strongly associated with the outcome

Perfect accuracy can be a red flag in real-world forecasting: the model might have memorized patterns instead of learning generalizable relationships.

Using LASSO for feature pruning

I am going to run the Lasso Regression to see which variables are being shrinked then remove them and run the Random Forest again to see if the data will still be 100% accuracy which is a red flag

cv_lasso <- cv.glmnet(train_x, train_y, family = "binomial", alpha = 1)
coef(cv_lasso)

## 14 x 1 sparse Matrix of class "dgCMatrix"
##                                   s1
## (Intercept)              -2.20779642
## GDP_per_capita_growth    -0.36794816
## Inflation                 0.06454987
## Lending_rate              .         
## Deposit_rate              .         
## Current_Account_GDP_Perc -0.08727535
## Exchange_rate             .         
## Non_performing_loans      0.29385562
## Private_sector_credit     .         
## Unemployment              .         
## Interest_rate_spread      .         
## Real_interest_rate        .         
## Risk_premium              .         
## Years                     .

excluding Deposit_rate, Private_sector_credit, Unemployment, Lending rate, Real interest, Exchange Rate, Years Interest spread

# Columns to remove
 remove_cols <- c(
   "Deposit_rate",
   "Private_sector_credit",
   "Unemployment",
   "Non_performing_loans",
   "Years"
 )
 
 # Safely remove columns that exist in the data
 train_fixed <- train_data %>%
   dplyr::select(-dplyr::any_of(remove_cols))
 
 test_fixed <- test_data %>%
   dplyr::select(-dplyr::any_of(remove_cols))
 
 # Add High_Risk back
  train_fixed$High_Risk <- train_data$High_Risk
test_fixed$High_Risk  <- test_data$High_Risk
 

levels(train_data$High_Risk)

## [1] "0" "1"

# If they're invalid (like "High Risk", "0"/"1", etc.), fix them:
train_fixed$High_Risk <- factor(train_data$High_Risk, 
                                levels = unique(train_data$High_Risk),
                                labels = make.names(unique(train_data$High_Risk)))

test_fixed$High_Risk <- factor(test_data$High_Risk,
                               levels = unique(test_data$High_Risk),
                               labels = make.names(unique(test_data$High_Risk)))
ctrl <- trainControl(
  method = "cv",                 # k-fold cross-validation
  number = 5,                    # 5 folds
  classProbs = TRUE,             # needed for ROC metric
  summaryFunction = twoClassSummary, # computes ROC, Sensitivity, Specificity
  savePredictions = TRUE
)

# train 
rf_fixed <- train(
  High_Risk ~ .,
  data = train_fixed,
  method = "rf",
  trControl = ctrl,
  metric = "ROC",
  tuneLength = 3
)


pred_fixed <- predict(rf_fixed, newdata = test_fixed)

cm_fixed <- confusionMatrix(pred_fixed, test_fixed$High_Risk)

print(cm_fixed)

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  X1  X0
##         X1 205   4
##         X0   4 205
##                                           
##                Accuracy : 0.9809          
##                  95% CI : (0.9626, 0.9917)
##     No Information Rate : 0.5             
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9617          
##                                           
##  Mcnemar's Test P-Value : 1               
##                                           
##             Sensitivity : 0.9809          
##             Specificity : 0.9809          
##          Pos Pred Value : 0.9809          
##          Neg Pred Value : 0.9809          
##              Prevalence : 0.5000          
##          Detection Rate : 0.4904          
##    Detection Prevalence : 0.5000          
##       Balanced Accuracy : 0.9809          
##                                           
##        'Positive' Class : X1              
##

cat("Accuracy WITHOUT other columns:", round(cm_fixed$overall["Accuracy"], 3), "\n")

## Accuracy WITHOUT other columns: 0.981

# Variable Importance
rf_importance <- varImp(rf_fixed, scale = FALSE)$importance


rf_importance_df <- rf_importance %>%
  rownames_to_column("Variable") %>%
  arrange(desc(Overall))


print(rf_importance_df)

##                   Variable   Overall
## 1    GDP_per_capita_growth 137.88716
## 2 Current_Account_GDP_Perc  89.13168
## 3             Lending_rate  58.89313
## 4     Interest_rate_spread  45.89394
## 5                Inflation  44.12807
## 6            Exchange_rate  43.17681
## 7             Risk_premium  34.65580
## 8       Real_interest_rate  33.70932

Random Forest Model Without Other Variables

Performance Metrics:

Accuracy: 99.09%

Balanced Accuracy: 97.9%

Sensitivity (True Positive Rate): 98%

Specificity (True Negative Rate): 98%

Kappa: 0.96 (indicating excellent agreement beyond chance)

GDP_per_capita_growth 137

The most important predictor countries with low or negative GDP growth are more likely to be high-risk.

Current_Account_GDP_Perc 89

Strong predictor — countries with large deficits or imbalances face higher risk.

Lending_rate 58

Higher lending rates signal potential financial stress, contributing to risk classification.

Interest_rate_spread 45

Indicates banking profitability; wider spreads may correlate with high-risk lending environments.

Inflation 44

High inflation increases economic uncertainty, affecting risk levels.

Exchange_rate 43

Currency volatility or depreciation is associated with higher risk.

Risk_premium 34

Market-required compensation for risk contributes moderately.

Real_interest_rate 33 Higher real rates indicate tighter financial conditions, influencing risk.

The model achieves very high accuracy without using problematic predictors like non-performing loans or private credit, meaning it is now a more realistic and trustworthy predictor.

GDP growth and current account balances remain the most critical macroeconomic indicators for systemic or country-level financial risk.

Other factors lending rates, inflation, interest spreads, and exchange rate movements also play meaningful roles, reflecting the multidimensional nature of risk.

MODEL 3: XGBOOST

# Remove variables from train_x and test_x
vars_to_remove <- c("Deposit_rate", "Private_sector_credit", 
                    "Unemployment", "Non_performing_loans", "Years")

# Remove from train_x
train_x <- train_x[, !colnames(train_x) %in% vars_to_remove]

# Remove from test_x
test_x <- test_x[, !colnames(test_x) %in% vars_to_remove]

# Now use train_x and test_x as you already do in your code
# Your existing code below stays exactly the same:

# Set up cross-validation
xgb_cv <- xgb.cv(
  data = train_x,  # This now has the variables removed
  label = train_y,
  nrounds = 200,
  nfold = 5,
  objective = "binary:logistic",
  eval_metric = "logloss",
  max_depth = 6,
  eta = 0.05,
  subsample = 0.8,
  colsample_bytree = 0.8,
  early_stopping_rounds = 10,
  verbose = 0
)

best_nrounds <- xgb_cv$best_iteration
cat("Optimal number of rounds:", best_nrounds, "\n")

## Optimal number of rounds: 200

# Train final XGBoost model
xgb_model <- xgboost(
  data = train_x,  # This now has the variables removed
  label = train_y,
  nrounds = best_nrounds,
  objective = "binary:logistic",
  eval_metric = "logloss",
  max_depth = 6,
  eta = 0.05,
  subsample = 0.8,
  colsample_bytree = 0.8,
  verbose = 0
)

# Predictions
xgb_prob <- predict(xgb_model, newdata = test_x)  # This now has the variables removed
xgb_pred <- ifelse(xgb_prob > 0.5, "1", "0")

# Evaluate
xgb_results <- evaluate_model(xgb_pred, xgb_prob, 
                              test_data$High_Risk, "XGBoost")

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

cat("\nPerformance Metrics:\n")

## 
## Performance Metrics:

print(xgb_results$metrics)

##            Model Accuracy Sensitivity Specificity Precision F1_Score    AUC
## Accuracy XGBoost   0.9569      0.9617      0.9522    0.9526   0.9571 0.9943
##           Kappa
## Accuracy 0.9139

# Feature Importance - colnames(train_x) will automatically exclude removed variables
xgb_importance <- xgb.importance(
  feature_names = colnames(train_x),  # Will only include remaining variables
  model = xgb_model
)

cat("\nTop 8 Important Variables:\n")

## 
## Top 8 Important Variables:

print(xgb_importance[1:10, ])

##                      Feature       Gain      Cover  Frequency
##                       <char>      <num>      <num>      <num>
##  1:    GDP_per_capita_growth 0.31034841 0.20650509 0.14517218
##  2: Current_Account_GDP_Perc 0.22675633 0.19683701 0.16779203
##  3:            Exchange_rate 0.09890131 0.11886928 0.16644159
##  4:             Lending_rate 0.09758106 0.11294281 0.12018906
##  5:                Inflation 0.08716333 0.12394475 0.12896691
##  6:     Interest_rate_spread 0.08242607 0.10927052 0.09520594
##  7:             Risk_premium 0.05087823 0.07424621 0.09723160
##  8:       Real_interest_rate 0.04594525 0.05738434 0.07900068
##  9:                     <NA>         NA         NA         NA
## 10:                     <NA>         NA         NA         NA

Performance Metrics:

Accuracy: 95.6% The model correctly classifies high-risk and low-risk countries in almost 95% of cases.

Sensitivity (True Positive Rate): The model correctly identifies 96% of the actual high-risk countries.

Specificity (True Negative Rate): 95.2% The model correctly identifies 95.2% of the low-risk countries.

Precision: 95.1% Of all countries predicted as high-risk, 95.1% were truly high-risk.

F1 Score: 95% Indicates excellent balance between precision and recall.

AUC: 0.994 Near-perfect discrimination between high-risk and low-risk classes

GDP_per_capita_growth 0.3102 The most influential predictor; countries with low or negative GDP growth are more likely to be high-risk.

Current_Account_GDP_Perc 0.2236 A major driver; large current account deficits or imbalances increase risk

Exchange_rate 0.098 Currency volatility contributes significantly to country risk

Lending_rate 0.097 Higher lending rates are associated with higher risk levels.

Inflation 0.087 High inflation increases financial instability

Interest_rate_spread 0.082 Wider spreads in the banking sector indicate potential economic stress

Risk_premium 0.050 Market perception of country risk moderately affects classification.

Real_interest_rate 0.045 Tighter real rates have a smaller but non-negligible effect on risk.

GDP growth and current account balances remain the strongest predictors of country risk, consistent with economic intuition.

Financial variables like exchange rates, lending rates, and inflation also play significant roles, showing XGBoost captures both macroeconomic and financial dimensions of risk.

The model is highly accurate and robust, with near-perfect discrimination (AUC = 0.994), indicating it can reliably classify countries into high-risk and low-risk categories.

Unlike Random Forest, XGBoost provides detailed importance measures for each variable, helping explain which features contribute most to risk assessment.

Conclusion:

This project demonstrates that macroeconomic and financial indicators—particularly GDP per capita growth, current account balance, and lending rates—are strong predictors of economic risk. Through a combination of statistical analyses (ANOVA and regression) and machine learning models (Logistic Regression, Random Forest, XGBoost), we achieved high predictive accuracy while maintaining interpretability. Random Forest captured complex interactions with 98% accuracy, XGBoost provided robust predictions at 95%, and Logistic Regression offered clear insights into how individual factors affect risk. Importantly, even after removing potentially leaky variables, the models remained reliable, confirming their real-world applicability. This framework provides a data-driven, actionable approach for early identification of high-risk economies, making it valuable for policymakers, investors, and financial institutions.

Detecting Macroeconomic Risk Using Global Indicators: A Machine Learning Approach

Nsovo Ntuli

2025-12-30