INTRODUCTION

Emerging economies face distinctive macroeconomic challenges, including unpredictable growth, persistent inflationary pressures, and limited room for policy maneuver. Gaining a clear understanding of how fiscal policy, public debt, structural features, and broader economic outcomes interact is essential for crafting strategies that support sustainable development. This research draws on extensive IMF data to explore these interactions across a wide range of emerging market contexts.

The study tackles key policy questions: At what levels does public debt threaten macroeconomic stability? How does the effectiveness of fiscal policy differ across regions? Which structural characteristics help economies withstand shocks and sustain growth? Using advanced multivariate statistical methods on a dataset covering over 4,000 country-year observations, the analysis provides robust, evidence-based insights into the factors driving economic performance in emerging economies.

This research provides a comprehensive examination of the complex interplay between fiscal policy, debt sustainability, structural characteristics, and macroeconomic performance across emerging economies. Through advanced statistical analyses including multivariate ANOVA, non-parametric tests, and regression modeling several key patterns emerge that have significant theoretical and policy implications.

The findings of this research have practical significance for policymakers. They can inform decisions on debt management, fiscal policy design, and structural reform priorities. In a post-pandemic world marked by high public debt and ongoing inflationary pressures, understanding these complex relationships is vital for planning policies that promote long-term, resilient growth.

Specific Objectives:

Fiscal Policy Analysis:

Examine the relationship between fiscal positions (deficit, balanced, surplus) and inflation rates Investigate continental variations in fiscal policy effectiveness Identify optimal fiscal stances for different regional contexts

Debt Sustainability Assessment

Determine critical public debt thresholds for macroeconomic stability Analyze differential impacts of debt on growth versus inflation Assess debt overhang effects on economic performance

Savings-Investment Dynamics:

Investigate the savings-investment-current account nexus Identify primary drivers of external imbalances Examine continental patterns in capital formation

Structural Characteristics:

Analyze how population size affects economic performance Identify optimal country size ranges for different economic outcomes Assess scale advantages in domestic resource mobilization

Methodological Contribution:

Demonstrate robust statistical approaches for macroeconomic data analysis Address challenges of outliers, non-normal distributions, and heteroscedasticity Provide a framework for evidence-based macroeconomic policy analysis

Statistical Approach: Two-way ANOVA (Analysis of Variance) for testing different categorical independent variables on one continuous dependent variable

Multivariate Analysis of Variance (MANOVA) for testing multiple dependent variables

Welch ANOVA and Games-Howell post-hoc tests for handling unequal variances

Non-parametric alternatives (Kruskal-Wallis, Scheirer-Ray-Hare) for robust inference

Interaction effects analysis to examine contextual variations

Linear Regression Analysis to understand relationships between Macroeconomics Variables

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(janitor)
## Warning: package 'janitor' was built under R version 4.4.2
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(countrycode)
## Warning: package 'countrycode' was built under R version 4.4.3
library(rstatix)
## 
## Attaching package: 'rstatix'
## 
## The following object is masked from 'package:janitor':
## 
##     make_clean_names
## 
## The following object is masked from 'package:stats':
## 
##     filter
library(ggpubr)
library(rcompanion)
## Warning: package 'rcompanion' was built under R version 4.4.3
library(broom)
## Warning: package 'broom' was built under R version 4.4.3
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
library(GGally)
## Warning: package 'GGally' was built under R version 4.4.1
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
#check the dataset
head(emerging_economies)
## # A tibble: 6 × 51
##   Country  `Subject Descriptor` Units Scale Country/Series-speci…¹ `1980` `1981`
##   <chr>    <chr>                <chr> <chr> <chr>                  <chr>  <chr> 
## 1 Afghani… Gross domestic prod… Perc… Units See notes for:  Gross… n/a    n/a   
## 2 Afghani… Total investment     Perc… Units Source: National Stat… n/a    n/a   
## 3 Afghani… Gross national savi… Perc… Units Source: National Stat… n/a    n/a   
## 4 Afghani… Inflation, average … Perc… Units See notes for:  Infla… n/a    n/a   
## 5 Afghani… Inflation, end of p… Perc… Units See notes for:  Infla… n/a    n/a   
## 6 Afghani… Volume of imports o… Perc… Units Source: Central Bank … n/a    n/a   
## # ℹ abbreviated name: ¹​`Country/Series-specific Notes`
## # ℹ 44 more variables: `1982` <chr>, `1983` <chr>, `1984` <chr>, `1985` <chr>,
## #   `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, `1990` <chr>,
## #   `1991` <chr>, `1992` <chr>, `1993` <chr>, `1994` <chr>, `1995` <chr>,
## #   `1996` <chr>, `1997` <chr>, `1998` <chr>, `1999` <chr>, `2000` <chr>,
## #   `2001` <chr>, `2002` <chr>, `2003` <chr>, `2004` <chr>, `2005` <chr>,
## #   `2006` <chr>, `2007` <chr>, `2008` <chr>, `2009` <chr>, `2010` <chr>, …
sample_n(emerging_economies, 10)
## # A tibble: 10 × 51
##    Country `Subject Descriptor` Units Scale Country/Series-speci…¹ `1980` `1981`
##    <chr>   <chr>                <chr> <chr> <chr>                  <chr>  <chr> 
##  1 Mexico  Unemployment rate    Perc… Units Source: National Stat… 1.200  0.900 
##  2 Togo    Inflation, end of p… Perc… Units See notes for:  Infla… n/a    10.966
##  3 Lesotho Unemployment rate    Perc… Units <NA>                   <NA>   <NA>  
##  4 Comoros General government … Perc… Units See notes for:  Gener… n/a    n/a   
##  5 Trinid… Population           Pers… Mill… Source: International… 1.085  1.102 
##  6 Haiti   Volume of imports o… Perc… Units Source: Central Bank … 23.095 5.854 
##  7 Benin   Current account bal… Perc… Units See notes for:  Gross… -4.672 -16.9…
##  8 Marsha… Current account bal… Perc… Units See notes for:  Gross… n/a    n/a   
##  9 Liberia Total investment     Perc… Units <NA>                   <NA>   <NA>  
## 10 Peru    Unemployment rate    Perc… Units Source: National Stat… 7.326  6.800 
## # ℹ abbreviated name: ¹​`Country/Series-specific Notes`
## # ℹ 44 more variables: `1982` <chr>, `1983` <chr>, `1984` <chr>, `1985` <chr>,
## #   `1986` <chr>, `1987` <chr>, `1988` <chr>, `1989` <chr>, `1990` <chr>,
## #   `1991` <chr>, `1992` <chr>, `1993` <chr>, `1994` <chr>, `1995` <chr>,
## #   `1996` <chr>, `1997` <chr>, `1998` <chr>, `1999` <chr>, `2000` <chr>,
## #   `2001` <chr>, `2002` <chr>, `2003` <chr>, `2004` <chr>, `2005` <chr>,
## #   `2006` <chr>, `2007` <chr>, `2008` <chr>, `2009` <chr>, `2010` <chr>, …
#clean the dataset

emerging_economies <- emerging_economies %>%
  pivot_longer(cols = matches("^198\\d|^199\\d|^200\\d|^201\\d|^202\\d"),
               names_to = "Years",
               values_to = "Values") %>%
  mutate(Years = as.numeric(Years)) %>%
  mutate(Values = as.numeric(Values)) %>%
  rename("Economic_Indicator" = "Subject Descriptor")



#count NA rows then remove them 
sum(is.na(emerging_economies$Economic_Indicator))
## [1] 90
emerging_economies <- emerging_economies %>%
  filter(!is.na(Economic_Indicator))

sum(is.na(emerging_economies$Values))
## [1] 25542
# Check NA distribution by country and indicator
na_analysis <- emerging_economies %>%
  group_by(Country, Economic_Indicator) %>%
  summarise(
    total_obs = n(),
    na_count = sum(is.na(Values)),
    na_percentage = (na_count / total_obs) * 100,
    .groups = "drop"
  )

# View worst cases
na_analysis %>% 
  arrange(desc(na_percentage)) %>%
  head(10)
## # A tibble: 10 × 5
##    Country             Economic_Indicator       total_obs na_count na_percentage
##    <chr>               <chr>                        <int>    <int>         <dbl>
##  1 Afghanistan         General government net …        45       45           100
##  2 Afghanistan         Unemployment rate               45       45           100
##  3 Angola              General government net …        45       45           100
##  4 Angola              Unemployment rate               45       45           100
##  5 Antigua and Barbuda General government net …        45       45           100
##  6 Antigua and Barbuda Unemployment rate               45       45           100
##  7 Argentina           General government net …        45       45           100
##  8 Armenia             General government net …        45       45           100
##  9 Aruba               General government net …        45       45           100
## 10 Aruba               Total investment                45       45           100
# Check which indicators are commonly missing
missing_pattern <- na_analysis %>%
  group_by(Economic_Indicator) %>%
  summarise(
    total_countries = n(),
    countries_100_na = sum(na_percentage == 100),
    percent_100_na = (countries_100_na / total_countries) * 100
  ) %>%
  arrange(desc(percent_100_na))

print(missing_pattern)
## # A tibble: 13 × 4
##    Economic_Indicator            total_countries countries_100_na percent_100_na
##    <chr>                                   <int>            <int>          <dbl>
##  1 General government net debt               155               98          63.2 
##  2 Unemployment rate                         155               79          51.0 
##  3 Gross national savings                    155               20          12.9 
##  4 Total investment                          155               20          12.9 
##  5 Volume of imports of goods a…             155               15           9.68
##  6 General government gross debt             155                2           1.29
##  7 Current account balance                   155                0           0   
##  8 General government revenue                155                0           0   
##  9 General government total exp…             155                0           0   
## 10 Gross domestic product, cons…             155                0           0   
## 11 Inflation, average consumer …             155                0           0   
## 12 Inflation, end of period con…             155                0           0   
## 13 Population                                155                0           0
#since government debt and unemployment rate has the highest missing values im going to drop 
#these two rows 

emerging_economies <- emerging_economies %>%
  mutate(Economic_Indicator = trimws(Economic_Indicator)) %>%  #remove spaces 
  filter(!grepl("^General government net debt$|^Unemployment rate$", Economic_Indicator, ignore.case = TRUE))

unique(emerging_economies$Economic_Indicator)
##  [1] "Gross domestic product, constant prices" 
##  [2] "Total investment"                        
##  [3] "Gross national savings"                  
##  [4] "Inflation, average consumer prices"      
##  [5] "Inflation, end of period consumer prices"
##  [6] "Volume of imports of goods and services" 
##  [7] "Population"                              
##  [8] "General government revenue"              
##  [9] "General government total expenditure"    
## [10] "General government gross debt"           
## [11] "Current account balance"
#remove all the NA in the dataset

emerging_economies <-  na.omit(emerging_economies)

#remove duplicates
distinct(emerging_economies)
## # A tibble: 61,092 × 8
##    Country     Economic_Indicator             Units Scale Country/Series-speci…¹
##    <chr>       <chr>                          <chr> <chr> <chr>                 
##  1 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  2 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  3 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  4 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  5 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  6 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  7 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  8 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
##  9 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
## 10 Afghanistan Gross domestic product, const… Perc… Units See notes for:  Gross…
## # ℹ 61,082 more rows
## # ℹ abbreviated name: ¹​`Country/Series-specific Notes`
## # ℹ 3 more variables: `Estimates Start After` <dbl>, Years <dbl>, Values <dbl>
#map countries to their continent 


emerging_economies <- emerging_economies %>% 
  mutate(Continent = countrycode(Country,"country.name", "continent"))

#check if there is unmatched values 

emerging_economies %>%
  filter(is.na(Continent)) %>%
  distinct(Country)
## # A tibble: 2 × 1
##   Country   
##   <chr>     
## 1 Kosovo    
## 2 Micronesia
#correct the unmatched values like Kosovo and micronesia

emerging_economies <- emerging_economies %>%
  mutate(
    Continent = countrycode(Country, "country.name", "continent"),
    Continent = case_when(
      Country == "Kosovo" ~ "Europe",
      Country == "Micronesia" ~ "Oceania",
      TRUE ~ Continent
    )
  )

Reseach question: Do average GDP growth rate differes significantly from all continents

Hypothesis:

H0(NULL): The mean GDP growth across all the continent is the same

H1(Alternative): At least one continent has a significant different mean GDP growth

#compute summary statistics 

emerging_economies %>% 
  filter(Economic_Indicator == "Gross domestic product, constant prices") %>%
  group_by(Continent) %>%
  get_summary_stats(Values, type = "mean_sd")
## # A tibble: 5 × 5
##   Continent variable     n  mean    sd
##   <chr>     <fct>    <dbl> <dbl> <dbl>
## 1 Africa    Values    2259  3.60  7.29
## 2 Americas  Values    1478  2.80  5.50
## 3 Asia      Values    1641  4.63  6.74
## 4 Europe    Values     487  2.18  5.62
## 5 Oceania   Values     439  2.33  5.56

On average, Asia and Africa are the fastest-growing continents, but Africa’s growth is much more volatile, suggesting less consistent performance year-to-year. Meanwhile, Europe and Oceania have slower but steadier growth patterns.

#visualise the boxplot to check gor outliers


emerging_economies %>%
  filter(Economic_Indicator == "Gross domestic product, constant prices") %>%
  ggboxplot(x = "Continent", y = "Values",
            color = "Continent",
            palette = "jco",
            add = "jitter",
            ylab = "GDP Growth (%)",
            xlab = "Continent",
            title = "Distribution of GDP Growth by Continent (Emerging Economies)")
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#check for outliers

outliers <- emerging_economies %>%
  filter(Economic_Indicator == "Gross domestic product, constant prices") %>%
  group_by(Continent) %>%
  identify_outliers(Values)
  

#check the sum of extreme outliers 
sum(outliers$is.extreme == "TRUE")
## [1] 147
table(outliers$Continent, outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa     122   56
##   Americas    63   29
##   Asia        92   47
##   Europe      33    9
##   Oceania     17    6

Africa, Asia, and Americas have the largest number of extreme outliers, which makes sense given their larger datasets and more volatile growth patterns.

Europe and Oceania have fewer extreme outliers, consistent with their slower, steadier GDP growth.

Proportionally, Africa has ~31% of its data as extreme outliers (56/178), while Europe has only ~21% (9/42).

This tells you which continents’ data are more volatile and could influence statistical analyses, like ANOVA or regressions, if not addressed.

#Check normality


#filter out the gpd growth data 
gdp_growth <- emerging_economies %>%
  filter(Economic_Indicator == "Gross domestic product, constant prices")

#build a linear model for normality 


gdp_shapiro_model <- lm(Values ~ Continent, data = gdp_growth)
gdp_shapiro_model
## 
## Call:
## lm(formula = Values ~ Continent, data = gdp_growth)
## 
## Coefficients:
##       (Intercept)  ContinentAmericas      ContinentAsia    ContinentEurope  
##            3.5957            -0.7927             1.0381            -1.4130  
##  ContinentOceania  
##           -1.2668
#sample for shapiro test
shapiro_gdp_sample <- sample(residuals(gdp_shapiro_model),5000)

#shapiro test
shapiro_test(shapiro_gdp_sample)
## # A tibble: 1 × 3
##   variable           statistic  p.value
##   <chr>                  <dbl>    <dbl>
## 1 shapiro_gdp_sample     0.733 3.47e-67
#check normality by groups
gdp_growth %>%
  group_by(Continent) %>%
  shapiro_test(Values)
## # A tibble: 5 × 4
##   Continent variable statistic        p
##   <chr>     <chr>        <dbl>    <dbl>
## 1 Africa    Values       0.649 8.80e-56
## 2 Americas  Values       0.852 4.25e-35
## 3 Asia      Values       0.805 1.24e-40
## 4 Europe    Values       0.863 3.08e-20
## 5 Oceania   Values       0.906 8.22e-16
#create a qqplot for normality
ggqqplot(residuals(gdp_shapiro_model))

#ggplot by groups
ggqqplot(gdp_growth, "Values", facet.by = "Continent")

since all the normality p-value is less than 0.05 it means the normality assumption does not checks out,even the qqplot shows the data is linear in some continent continent like Africa is tailed America and Asia those continent has the most outliers

#use the levenes test to check the homogeneity of variance 
gdp_growth %>%
  levene_test(Values ~ Continent)
## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     4  6299      1.83 0.120

since p-value of the is greater then 0.05 it means the assumtion of homegeneity is met the GDP variance across the continent is equal

Krustkal Willis Test

#since some assumptions are not met i will use the Kruskal_Wallis test
#instead of the standard One-Way ANOVA Test


gdp_kruskal_test <- kruskal_test(Values ~ Continent, data = gdp_growth) 
gdp_kruskal_test
## # A tibble: 1 × 6
##   .y.        n statistic    df        p method        
## * <chr>  <int>     <dbl> <int>    <dbl> <chr>         
## 1 Values  6304      263.     4 1.06e-55 Kruskal-Wallis

the p-value of the kruskal test is almost 0 less than 0.05 which prove that the gdp growth across all the continent are different therefore i reject the HO(Null) hypothesis statement

#Effect size
gdp_growth %>%
  kruskal_effsize(Values ~ Continent)
## # A tibble: 1 × 5
##   .y.        n effsize method  magnitude
## * <chr>  <int>   <dbl> <chr>   <ord>    
## 1 Values  6304  0.0411 eta2[H] small

the Eta effect size is 0.0411 which is small meaning Only about 4.1% of the variation in GDP growth rates can be explained by which continent a country is in

Post-Hoc Test

# The Post_Hoc Analysis using the Dunns Test to see which continent differes in gdp growth 
gdp_dunns_test <- gdp_growth %>%
  dunn_test(Values ~ Continent , p.adjust.method = "bonferroni")
gdp_dunns_test
## # A tibble: 10 × 9
##    .y.    group1   group2      n1    n2 statistic        p    p.adj p.adj.signif
##  * <chr>  <chr>    <chr>    <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
##  1 Values Africa   Americas  2259  1478    -5.19  2.14e- 7 2.14e- 6 ****        
##  2 Values Africa   Asia      2259  1641     9.26  2.08e-20 2.08e-19 ****        
##  3 Values Africa   Europe    2259   487    -3.87  1.07e- 4 1.07e- 3 **          
##  4 Values Africa   Oceania   2259   439    -6.61  3.78e-11 3.78e-10 ****        
##  5 Values Americas Asia      1478  1641    13.2   7.41e-40 7.41e-39 ****        
##  6 Values Americas Europe    1478   487    -0.384 7.01e- 1 1   e+ 0 ns          
##  7 Values Americas Oceania   1478   439    -3.15  1.62e- 3 1.62e- 2 *           
##  8 Values Asia     Europe    1641   487    -9.57  1.06e-21 1.06e-20 ****        
##  9 Values Asia     Oceania   1641   439   -12.0   3.25e-33 3.25e-32 ****        
## 10 Values Europe   Oceania    487   439    -2.30  2.15e- 2 2.15e- 1 ns

There are strong and statistically significant differences in GDP growth between continents.

Asia leads globally in economic growth.

Africa also grows faster than most continents but less consistently.

Europe and Oceania have the slowest growth, statistically similar to each other.

Americas sit between developing (Asia/Africa) and developed (Europe/Oceania) regions

This analysis clearly stratifies emerging economies by continental region in terms of GDP growth performance. Asian economies demonstrate superior growth dynamics, African economies show a solid but secondary performance, while the Americas, Europe, and Oceania exhibit the lowest and statistically similar growth levels.

Reseach Question: is there a significant difference in inflation rates
based on goverment fiscal discpline( Revenue vs expindure balances) and continent

H0(NULL): There is no significant difference in mean inflation rates among countries with different fiscal balance categories (Surplus, Balanced, Deficit).

H0: μSurplus =μ Balanced = μDeficit

H1(Alternarive): There is a significant difference in mean inflation rates among at least one pair of fiscal groups.

H1: At least one μ differs among fiscal groups

Feature Engineering

fiscal_inflation <- emerging_economies %>%
  filter(Economic_Indicator %in% c("General government revenue",
                                   "General government total expenditure", 
                                   "Inflation, average consumer prices")) %>%
  select(Continent, Country, Years, Economic_Indicator, Values) %>%
  pivot_wider(names_from = Economic_Indicator,
              values_from = Values) %>%
  # compute fiscal balance = revenue - expenditure
  mutate(
    fiscal_balance = `General government revenue` - `General government total expenditure`
  ) %>%
  # Remove NA fiscal_balance values before case_when
  filter(!is.na(fiscal_balance)) %>%
  mutate(
    fiscal_group = case_when(
      fiscal_balance > 5                ~ "surplus (>5%)",
      fiscal_balance >= -5 & fiscal_balance <= 5 ~ "balanced (-5% to 5%)",
      fiscal_balance < -5               ~ "deficit (<-5%)",
      TRUE                              ~ NA_character_
    )
  ) %>%
  rename(inflation = `Inflation, average consumer prices`) %>%
  select(Continent, Country, Years, fiscal_balance, fiscal_group, inflation) %>%
  drop_na()
#Summary statistics 

fiscal_inflation %>%
  group_by(Continent, fiscal_group) %>%
  get_summary_stats(inflation, type = "mean_sd")
## # A tibble: 15 × 6
##    Continent fiscal_group         variable      n  mean    sd
##    <chr>     <chr>                <fct>     <dbl> <dbl> <dbl>
##  1 Africa    balanced (-5% to 5%) inflation  1196 12.9  44.5 
##  2 Africa    deficit (<-5%)       inflation   423 14.3  41.9 
##  3 Africa    surplus (>5%)        inflation   108  9.14 21.4 
##  4 Americas  balanced (-5% to 5%) inflation   887  8.74 19.4 
##  5 Americas  deficit (<-5%)       inflation   196 14.9  45.8 
##  6 Americas  surplus (>5%)        inflation    21 11.0  22.9 
##  7 Asia      balanced (-5% to 5%) inflation   808  8.83 21.8 
##  8 Asia      deficit (<-5%)       inflation   375 12.4  37.1 
##  9 Asia      surplus (>5%)        inflation    99  5.14  7.18
## 10 Europe    balanced (-5% to 5%) inflation   302 12.4  33.3 
## 11 Europe    deficit (<-5%)       inflation    81 10.9  19.1 
## 12 Europe    surplus (>5%)        inflation     5  7.81  3.89
## 13 Oceania   balanced (-5% to 5%) inflation   229  4.62  4.08
## 14 Oceania   deficit (<-5%)       inflation    68  4.07  4.75
## 15 Oceania   surplus (>5%)        inflation    62  3.58  3.48

Fiscal balance and inflation relationship

Across most continents, deficit periods are associated with higher inflation and greater volatility (sd).

Surplus periods often show lower inflation, indicating fiscal discipline tends to stabilize prices.

Continental differences

Africa: Highest average inflation and largest variation economies likely experience frequent shocks.

Asia: Inflation moderates under surplus conditions (only ~5%).

Europe: Mixed results but inflation tends to decline slightly in surpluses.

Oceania: Consistently low and stable inflation in all fiscal conditions (around 3–5%).

Volatility (standard deviation)

Very large SD values (especially in Africa and the Americas) mean inflation varies greatly across time and countries possible signs of hyperinflation episodes or unstable monetary environments.

#outliers assumptions 

#boxplot 
boxplot_infla <- ggboxplot(fiscal_inflation, x = "fiscal_group", y = "inflation",
                           color = "Continent", palette = "jco")

boxplot_infla

#check for outliers

fiscal_outliers <- fiscal_inflation %>%
  group_by(Continent, fiscal_group) %>%
  identify_outliers(inflation)

sum(fiscal_outliers$is.extreme)
## [1] 231
table(fiscal_outliers$Continent, fiscal_outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa      82   83
##   Americas    65   65
##   Asia        53   62
##   Europe      10   20
##   Oceania     10    1

Fiscal balance has the most outliers especially for african continent. Asia also peaks when in decifit

#the normality assumptions check 
#Build a liner model
fiscal_model <- lm(inflation ~ fiscal_group * Continent , data = fiscal_inflation)

#shapiro test
shapiro_test(residuals(fiscal_model))
## # A tibble: 1 × 3
##   variable                statistic  p.value
##   <chr>                       <dbl>    <dbl>
## 1 residuals(fiscal_model)     0.260 3.97e-88
#shapiro by groups
fiscal_inflation %>%
  group_by(Continent, fiscal_group) %>%
  shapiro_test(inflation)
## # A tibble: 15 × 5
##    Continent fiscal_group         variable  statistic        p
##    <chr>     <chr>                <chr>         <dbl>    <dbl>
##  1 Africa    balanced (-5% to 5%) inflation     0.227 7.73e-57
##  2 Africa    deficit (<-5%)       inflation     0.269 1.13e-37
##  3 Africa    surplus (>5%)        inflation     0.338 6.71e-20
##  4 Americas  balanced (-5% to 5%) inflation     0.319 4.96e-49
##  5 Americas  deficit (<-5%)       inflation     0.319 1.81e-26
##  6 Americas  surplus (>5%)        inflation     0.555 6.82e- 7
##  7 Asia      balanced (-5% to 5%) inflation     0.264 1.48e-48
##  8 Asia      deficit (<-5%)       inflation     0.288 1.60e-35
##  9 Asia      surplus (>5%)        inflation     0.667 1.14e-13
## 10 Europe    balanced (-5% to 5%) inflation     0.327 8.87e-32
## 11 Europe    deficit (<-5%)       inflation     0.465 1.15e-15
## 12 Europe    surplus (>5%)        inflation     0.926 5.67e- 1
## 13 Oceania   balanced (-5% to 5%) inflation     0.922 1.26e- 9
## 14 Oceania   deficit (<-5%)       inflation     0.947 5.63e- 3
## 15 Oceania   surplus (>5%)        inflation     0.919 5.78e- 4
#qqplot
ggqqplot(residuals(fiscal_model))

#qqplot by groups 
ggqqplot(fiscal_inflation, "inflation", ggtheme = theme_bw()) +
  facet_grid(Continent ~ fiscal_group)

The p-value of the shapiro test is extreme low less than 0.05 which means it does not meet the normality assumption even the qqplot shows th fiscal inflation The QQplot is not linear we can see that only the surplus is in linear for most continent the balanced and deficit are skewed

#the homogeinity test for the variance 
fiscal_inflation %>%
  levene_test(inflation ~ Continent * fiscal_group)
## # A tibble: 1 × 4
##     df1   df2 statistic        p
##   <int> <int>     <dbl>    <dbl>
## 1    14  4845      2.85 0.000273

The P-value is less than 0.001 homogeneity of variance is violated the assumptions are not met

Since the assumption of normality and the assumption of homogeneity are not met for the Two-Way Anova I have to do the non_parametric test called Scheirer–Ray–Hare Test

# Robust two-way ANOVA

# Scheirer-Ray-Hare test (non-parametric 2-way ANOVA)
srh_result <- scheirerRayHare(inflation ~ fiscal_group + Continent + fiscal_group:Continent,
                              data = fiscal_inflation)
## 
## DV:  inflation 
## Observations:  4860 
## D:  1 
## MS total:  1968705
srh_result
##                          Df     Sum Sq      H   p.value
## fiscal_group              2   21671665 11.008 0.0040703
## Continent                 4  125682592 63.840 0.0000000
## fiscal_group:Continent    8   57727839 29.323 0.0002783
## Residuals              4845 9351275487

The Scheirer–Ray–Hare test indicates significant main effects of both fiscal balance and continent on inflation, as well as a significant interaction effect. Therefore, we reject the null hypotheses and conclude that inflation levels are influenced by fiscal stance, geographic region, and the interaction between them

Fiscal group effect (p = 0.0041)

There are significant differences in inflation between fiscal balance categories (deficit, balanced, surplus) overall. In other words, countries with deficits tend to have different inflation rates compared to those with surpluses or balanced budget

Continent effect (p < 0.0001)

Inflation levels differ significantly across continents. For example, inflation may be much higher in Africa than in Oceania, consistent with your summary stats.

Interaction effect (p = 0.00028)

The impact of fiscal balance on inflation depends on the continent. This means the relationship between fiscal discipline and inflation is not uniform globally.The same fiscal policy (e.g., running a deficit) might cause high inflation in Africa but have less impact in Asia

For example:

In Africa, deficits may cause strong inflation spikes.

In Oceania, fiscal position may have little effect.

In Asia, surpluses may stabilize inflation significantly

Post-Hoc for Shreir Test, Dunns Test

#post-hoc pairwise comparison analysis using the Dunns Test

#Post-Hoc for Fiscal_group
fiscal_inflation %>%
  dunn_test(inflation ~ fiscal_group, p.adjust.method = "bonferroni")
## # A tibble: 3 × 9
##   .y.       group1     group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>     <chr>      <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 inflation balanced … defic…  3422  1143     0.367 7.14e-1 1   e+0 ns          
## 2 inflation balanced … surpl…  3422   295    -3.88  1.06e-4 3.19e-4 ***         
## 3 inflation deficit (… surpl…  1143   295    -3.79  1.49e-4 4.46e-4 ***
#Post-hoc for Continent
fiscal_inflation %>%
  dunn_test(inflation ~ Continent, p.adjust.method = "bonferroni")
## # A tibble: 10 × 9
##    .y.       group1  group2    n1    n2 statistic        p    p.adj p.adj.signif
##  * <chr>     <chr>   <chr>  <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
##  1 inflation Africa  Ameri…  1727  1104    -4.35  1.39e- 5 1.39e- 4 ***         
##  2 inflation Africa  Asia    1727  1282    -3.03  2.41e- 3 2.41e- 2 *           
##  3 inflation Africa  Europe  1727   388    -1.50  1.34e- 1 1   e+ 0 ns          
##  4 inflation Africa  Ocean…  1727   359    -7.93  2.25e-15 2.25e-14 ****        
##  5 inflation Americ… Asia    1104  1282     1.35  1.76e- 1 1   e+ 0 ns          
##  6 inflation Americ… Europe  1104   388     1.41  1.59e- 1 1   e+ 0 ns          
##  7 inflation Americ… Ocean…  1104   359    -4.81  1.50e- 6 1.50e- 5 ****        
##  8 inflation Asia    Europe  1282   388     0.477 6.33e- 1 1   e+ 0 ns          
##  9 inflation Asia    Ocean…  1282   359    -5.83  5.67e- 9 5.67e- 8 ****        
## 10 inflation Europe  Ocean…   388   359    -5.13  2.93e- 7 2.93e- 6 ****
#post-hoc for interraction(fiscal )
fiscal_inflation %>%
  group_by(Continent) %>%
  dunn_test(inflation ~ fiscal_group , p.adjust.method = "bonferroni")
## # A tibble: 15 × 10
##    Continent .y.       group1       group2    n1    n2 statistic       p   p.adj
##  * <chr>     <chr>     <chr>        <chr>  <int> <int>     <dbl>   <dbl>   <dbl>
##  1 Africa    inflation balanced (-… defic…  1196   423    3.05   2.29e-3 6.86e-3
##  2 Africa    inflation balanced (-… surpl…  1196   108   -0.0457 9.64e-1 1   e+0
##  3 Africa    inflation deficit (<-… surpl…   423   108   -1.64   1.00e-1 3.01e-1
##  4 Americas  inflation balanced (-… defic…   887   196   -3.18   1.46e-3 4.37e-3
##  5 Americas  inflation balanced (-… surpl…   887    21   -1.76   7.77e-2 2.33e-1
##  6 Americas  inflation deficit (<-… surpl…   196    21   -0.602  5.47e-1 1   e+0
##  7 Asia      inflation balanced (-… defic…   808   375   -1.30   1.93e-1 5.79e-1
##  8 Asia      inflation balanced (-… surpl…   808    99   -3.84   1.21e-4 3.63e-4
##  9 Asia      inflation deficit (<-… surpl…   375    99   -2.90   3.70e-3 1.11e-2
## 10 Europe    inflation balanced (-… defic…   302    81    1.41   1.57e-1 4.72e-1
## 11 Europe    inflation balanced (-… surpl…   302     5    0.922  3.56e-1 1   e+0
## 12 Europe    inflation deficit (<-… surpl…    81     5    0.519  6.04e-1 1   e+0
## 13 Oceania   inflation balanced (-… defic…   229    68   -1.34   1.80e-1 5.39e-1
## 14 Oceania   inflation balanced (-… surpl…   229    62   -1.93   5.39e-2 1.62e-1
## 15 Oceania   inflation deficit (<-… surpl…    68    62   -0.516  6.06e-1 1   e+0
## # ℹ 1 more variable: p.adj.signif <chr>
#post-hoc for interaction(continent)
fiscal_inflation %>%
  group_by(fiscal_group) %>%
  dunn_test(inflation ~ Continent, p.adjust.method = "bonferroni")
## # A tibble: 30 × 10
##    fiscal_group        .y.   group1 group2    n1    n2 statistic       p   p.adj
##  * <chr>               <chr> <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl>
##  1 balanced (-5% to 5… infl… Africa Ameri…  1196   887   -1.91   5.67e-2 5.67e-1
##  2 balanced (-5% to 5… infl… Africa Asia    1196   808   -0.301  7.63e-1 1   e+0
##  3 balanced (-5% to 5… infl… Africa Europe  1196   302   -1.33   1.83e-1 1   e+0
##  4 balanced (-5% to 5… infl… Africa Ocean…  1196   229   -5.03   5.01e-7 5.01e-6
##  5 balanced (-5% to 5… infl… Ameri… Asia     887   808    1.45   1.46e-1 1   e+0
##  6 balanced (-5% to 5… infl… Ameri… Europe   887   302   -0.0193 9.85e-1 1   e+0
##  7 balanced (-5% to 5… infl… Ameri… Ocean…   887   229   -3.75   1.76e-4 1.76e-3
##  8 balanced (-5% to 5… infl… Asia   Europe   808   302   -1.07   2.86e-1 1   e+0
##  9 balanced (-5% to 5… infl… Asia   Ocean…   808   229   -4.66   3.17e-6 3.17e-5
## 10 balanced (-5% to 5… infl… Europe Ocean…   302   229   -3.16   1.58e-3 1.58e-2
## # ℹ 20 more rows
## # ℹ 1 more variable: p.adj.signif <chr>

the pairwise post-hoc analysis shows us that Fiscal surpluses consistently lower inflation, especially in Asia and Deficits raise inflation mainly in Africa and the Americas, Europe and Oceania show stable low inflation regardless of fiscal stance Oceania overall has the lowest inflation, Africa the highest also Fiscal deficits correspond to higher inflation, but the difference between deficits and balanced budgets is not strong enough to be statistically significant.

Policy Implications: A one-size-fits-all fiscal policy approach is inappropriate. Asian economies benefit most from fiscal discipline, African and American economies should prioritize deficit control, while European and Oceanian economies have more fiscal flexibility

This analysis reveals that the fiscal-inflation relationship is not universal but depends critically on regional economic structures, institutions, and development contexts

Research Question: Savings-Investment Relationship

Question: Do countries with different levels of current account balance show significant differences in national savings and investment rates?

H0(Null) The vector of mean responses (Savings, Investment) is the same across Current Account groups.

H0:μ(Savings,Investment)∣urrent Account_group = is equal for all groups

H1 (Alternative): At least one group has a different mean vector

FEATURE ENGINEERING

saving_investment <- emerging_economies %>%
  filter(Economic_Indicator %in% c("Current account balance",
                                   "Gross national savings", 
                                   "Total investment")) %>%
  select(Country, Continent, Economic_Indicator, Values, Years) %>%
  pivot_wider(names_from = Economic_Indicator, values_from = Values) %>%
  mutate(
    current_account_group = case_when(
      `Current account balance` > 3 ~ "Surplus (>3%)",
      `Current account balance` >= -3 & `Current account balance` <= 3 ~ "Balanced (-3% to 3%)",
      `Current account balance` < -3 ~ "Deficit (<-3%)",
      TRUE ~ NA_character_
    )
  ) %>%
  rename("Total_investment" = "Total investment",
         "Gross_national_savings" = "Gross national savings") %>%
  # Convert current_account_group to factor
  mutate(current_account_group = as.factor(current_account_group)) %>%
  drop_na()
#check sample size assumptions 

saving_investment %>% 
  group_by(current_account_group) %>%
  summarise(N = n())
## # A tibble: 3 × 2
##   current_account_group     N
##   <fct>                 <int>
## 1 Balanced (-3% to 3%)   1686
## 2 Deficit (<-3%)         2574
## 3 Surplus (>3%)           718
#summary Statistics

saving_investment %>%
  group_by(current_account_group, Continent) %>%
  get_summary_stats(`Total_investment`, `Gross_national_savings`, type = "mean_sd")
## # A tibble: 30 × 6
##    Continent current_account_group variable                   n  mean    sd
##    <chr>     <fct>                 <fct>                  <dbl> <dbl> <dbl>
##  1 Africa    Balanced (-3% to 3%)  Total_investment         591  19.9 10.2 
##  2 Africa    Balanced (-3% to 3%)  Gross_national_savings   591  16.7  9.87
##  3 Americas  Balanced (-3% to 3%)  Total_investment         442  19.8  5.46
##  4 Americas  Balanced (-3% to 3%)  Gross_national_savings   442  18.5  5.62
##  5 Asia      Balanced (-3% to 3%)  Total_investment         488  26.1  8.17
##  6 Asia      Balanced (-3% to 3%)  Gross_national_savings   488  24.8  8.41
##  7 Europe    Balanced (-3% to 3%)  Total_investment         151  25.0  9.45
##  8 Europe    Balanced (-3% to 3%)  Gross_national_savings   151  23.0  6.63
##  9 Oceania   Balanced (-3% to 3%)  Total_investment          14  11.0  5.22
## 10 Oceania   Balanced (-3% to 3%)  Gross_national_savings    14  16.1 10.0 
## # ℹ 20 more rows

Savings rise sharply with current account surpluses across all continents.

Investment patterns are more stable, meaning that surpluses are mainly due to higher savings, not lower investment.

Africa and Asia show the biggest difference in savings between deficit and surplus groups indicating stronger links between current account balance and saving behavior.

Oceania is a small sample but follows the same general pattern (higher savings → surplus).

#outliers assumption for savings
saving_outliers <- saving_investment %>%
  group_by(current_account_group) %>%
  identify_outliers(`Gross_national_savings`)

sum(saving_outliers$is.extreme == TRUE)
## [1] 21
table(saving_outliers$Continent, saving_outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa      52    8
##   Americas    11    3
##   Asia        54    9
##   Europe       6    1
##   Oceania      1    0
#for savings Africa and Europe has the most outliers


#OUTLIERS ASSUMPTIONS FOR INVESTMENT

invest_outlier <- saving_investment%>%
  group_by(current_account_group) %>%
  identify_outliers(`Total_investment`)

sum(invest_outlier$is.extreme == TRUE)
## [1] 18
table(invest_outlier$Continent, invest_outlier$is.extreme)
##           
##            FALSE TRUE
##   Africa      70   17
##   Americas     7    0
##   Asia        39    1
##   Europe      14    0

Savings: Outliers mainly in Africa and Asia → high variability in saving behavior.

Investment: Outliers mainly in Africa → high variability in investment levels.

#multivariate outliers using the Mahalabonis distance 

saving_investment %>%
  group_by(current_account_group) %>%
  mahalanobis_distance() %>%
  filter(is.outlier == TRUE) 
## # A tibble: 129 × 6
##    Years Total_investment Gross_national_savings `Current account balance`
##    <dbl>            <dbl>                  <dbl>                     <dbl>
##  1  2004            35.4                    72.6                     37.2 
##  2  2005            37.0                    67.3                     30.2 
##  3  2007            55.9                   119.                      63.4 
##  4  2008            30.2                    64.0                     33.8 
##  5  2009            36.5                    78.1                     41.6 
##  6  1990             4.35                   48.6                     -1.42
##  7  1991             9.5                    52.6                     -3.62
##  8  1992            16.0                    56.2                     -5.95
##  9  1993            16.4                    58.5                     -7.46
## 10  1994            16.0                    60.3                     -5.20
## # ℹ 119 more rows
## # ℹ 2 more variables: mahal.dist <dbl>, is.outlier <lgl>
#univariate normality assumption

#Shapiro test 

saving_investment %>%
  group_by(current_account_group) %>%
  shapiro_test(`Total_investment`, `Gross_national_savings`) %>%
  arrange(variable)
## # A tibble: 6 × 4
##   current_account_group variable               statistic        p
##   <fct>                 <chr>                      <dbl>    <dbl>
## 1 Balanced (-3% to 3%)  Gross_national_savings     0.985 3.54e-12
## 2 Deficit (<-3%)        Gross_national_savings     0.907 4.78e-37
## 3 Surplus (>3%)         Gross_national_savings     0.946 1.43e-15
## 4 Balanced (-3% to 3%)  Total_investment           0.966 2.26e-19
## 5 Deficit (<-3%)        Total_investment           0.917 2.13e-35
## 6 Surplus (>3%)         Total_investment           0.991 3.20e- 4
#QQPLOTS investment

ggqqplot(saving_investment, "Total_investment", facet.by = "current_account_group",
         ylab = "Total_investment", ggtheme = theme_bw())

#QQPLOTS for gdp

ggqqplot(saving_investment, "Gross_national_savings", facet.by = "current_account_group",
         ylab = "Gross_national_savings", ggtheme = theme_bw())

the p-value of the shapiro test is less than 0.05 which means the normality assumtion is not met

the qqplots of investment and gdp shows that the points are within the line, most of them which means the assumption of normality may not be that violated

#Multivariate normality

saving_investment %>%
  select(Gross_national_savings, Total_investment) %>%
  mshapiro_test()
## # A tibble: 1 × 2
##   statistic  p.value
##       <dbl>    <dbl>
## 1     0.803 3.13e-61

the p_value of multivariate is less than 0.05 so the miltivariate assumption is not met

#Identify multicollinearity

saving_investment %>%
  cor_test(Gross_national_savings, Total_investment)
## # A tibble: 1 × 8
##   var1                 var2    cor statistic         p conf.low conf.high method
##   <chr>                <chr> <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr> 
## 1 Gross_national_savi… Tota…  0.48      38.5 1.62e-284    0.458     0.501 Pears…

The correlation coefficient (0.48) p(p < 0.001) means there is a moderate positive relationship between Gross national savings and Total investment.As national savings increase,total investment tends to increase too

#linearity assumption

saving_linearity <- saving_investment %>%
  select(Gross_national_savings, Total_investment, current_account_group) %>%
  group_by(current_account_group) %>%
  doo(~ggpairs(.) + theme_bw(), result = "plots")

saving_linearity$plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

the scatter plots shows a linear relationship between savings, Gdp and current accounts

#Check the homogeneity of covariances assumption

box_m(saving_investment[, c("Gross_national_savings", "Total_investment")], 
      saving_investment$current_account_group)
## # A tibble: 1 × 4
##   statistic   p.value parameter method                                          
##       <dbl>     <dbl>     <dbl> <chr>                                           
## 1     1139. 7.31e-243         6 Box's M-test for Homogeneity of Covariance Matr…

the test is statistically significant (p < 0.001), so the data have violated the assumption of homogeneity of variance-covariance matrices

# Check the homogneity of variance assumption

saving_investment %>%
  gather(key = "variable", value = "value", Gross_national_savings, Total_investment ) %>%
  group_by(variable) %>%
  levene_test(value ~ current_account_group)
## # A tibble: 2 × 5
##   variable                 df1   df2 statistic        p
##   <chr>                  <int> <int>     <dbl>    <dbl>
## 1 Gross_national_savings     2  4975      34.9 8.95e-16
## 2 Total_investment           2  4975      26.7 2.97e-12

The p value is less than 0.005 The spread (variability) of Gross national savings and Total investment values differs significantly between the current account group categories.

Manova model forsavings and investment

since most of the assumptions are violated im going to use the Pillai’s Trace model for robust testing Manova

Manova_savings <- lm(cbind(Gross_national_savings, Total_investment) ~ 
                       current_account_group, saving_investment)


Manova(Manova_savings, test.statistics = 'Pillai')
## 
## Type II MANOVA Tests: Pillai test statistic
##                       Df test stat approx F num Df den Df    Pr(>F)    
## current_account_group  2   0.34961   526.94      4   9950 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The MANOVA test using Pillai’s Trace shows a highly significant difference P-values: All < 2.2e-16 highly significant Reject all null hypotheses. Both main effects and their interaction are significant.

Countries with different current account positions (surplus, deficit, or balanced) tend to have distinct combinations of savings and investment levels

Surplus countries save the most, deficit countries the least

Post-Hoc Welch Test

#Post-hoc test

post_saving <- saving_investment %>%
  gather(key = "variable", value = "value", Gross_national_savings, Total_investment) %>%
  group_by(variable) 

#welch test for post hoc
  post_saving %>% welch_anova_test(value ~ current_account_group)
## # A tibble: 2 × 8
##   variable               .y.       n statistic   DFn   DFd         p method     
## * <chr>                  <chr> <int>     <dbl> <dbl> <dbl>     <dbl> <chr>      
## 1 Gross_national_savings value  4978     564.      2 1862. 2.86e-192 Welch ANOVA
## 2 Total_investment       value  4978      45.1     2 2061. 6.93e- 20 Welch ANOVA

Welch ANOVAs revealed that both gross national savings and total investment significantly differed across current account groups, but the effect was substantially stronger for savings (F(2, 1862) = 564.0, p < .001) than for investment (F(2, 2061) = 45.1, p < .001)

Gross National Savings:

There are very clear differences in savings between Deficit, Balanced, and Surplus countries.

Surplus countries save the most, deficit countries save the least, and balanced countries fall in between.

Total Investment:

Investment also differs significantly between the three current account groups. The differences are smaller than for savings but are still statistically strong.

#pairwise comaprison analysis


post_saving %>%
  games_howell_test(value ~ current_account_group)
## # A tibble: 6 × 9
##   variable .y.   group1 group2 estimate conf.low conf.high    p.adj p.adj.signif
## * <chr>    <chr> <chr>  <chr>     <dbl>    <dbl>     <dbl>    <dbl> <chr>       
## 1 Gross_n… value Balan… Defic…    -5.07   -5.80     -4.34  3.22e- 8 ****        
## 2 Gross_n… value Balan… Surpl…    12.4    11.1      13.6   3.82e-13 ****        
## 3 Gross_n… value Defic… Surpl…    17.4    16.2      18.7   2.81e-13 ****        
## 4 Total_i… value Balan… Defic…     3.02    2.27      3.76  3.35e- 8 ****        
## 5 Total_i… value Balan… Surpl…     1.66    0.682     2.63  2.07e- 4 ***         
## 6 Total_i… value Defic… Surpl…    -1.36   -2.35     -0.373 4   e- 3 **

Gross national savings

All three comparisons are highly significant (****) meaning savings levels differ sharply between every pair of groups.

Balanced vs Deficit:

Estimate = -5.07 Countries with deficit current accounts have, on average, 5 percentage points lower savings than those with balanced accounts.

Balanced vs Surplus:

Estimate = 12.4 Surplus countries have 12.4 percentage points higher savings than balanced ones.

Deficit vs Surplus:

Estimate = 17.4 Surplus countries have much higher savings than deficit ones a very large difference.

Gross national savings increase strongly as we move from deficit → balanced → surplus groups. In other words, countries running current account surpluses save far more than deficit countries, consistent with macroeconomic theory

Total investment

All comparisons are also significant, though differences are smaller than for savings.

Balanced vs Deficit:

Estimate = 3.02 Deficit countries invest about 3 points less than balanced ones.

Balanced vs Surplus:

Estimate = 1.66 Surplus countries invest slightly more than balanced ones.

Deficit vs Surplus:

Estimate = -1.36 → Surplus countries invest more than deficit countries, but the gap is smaller than for savings.

Investment also rises as we move from deficit → balanced → surplus, but the differences are much less pronounced than for savings. So, savings vary more strongly across groups than investment does.

So Economically, this means countries with current account surpluses tend to have much higher savings and slightly higher investment than those with deficits. This pattern reflects fundamental macroeconomic behavior high-saving countries often lend abroad (surpluses), while low-saving countries borrow (deficits).

Research Question:

Do countries with different government debt levels show significant differences in GDP growth and inflation?

H0(Null):

There are no significant multivariate differences in GDP growth and inflation among countries with different government debt levels.

H1(Alternative):

There are significant multivariate differences in GDP growth and/or inflation among countries with different government debt levels.

FEATURE ENGINEERING

debt_analysis <- emerging_economies %>%
  filter(Economic_Indicator %in% c("Inflation, average consumer prices",
                                   "Gross domestic product, constant prices",
                                   "General government gross debt")) %>%
  
  select(Country, Years, Continent, Economic_Indicator, Values)%>%
  pivot_wider(names_from = Economic_Indicator, values_from = Values) %>%
  mutate(debt_group = case_when(
    `General government gross debt` < 30 ~ "Low Debt (<30%)",
    `General government gross debt` >= 30 & `General government gross debt` < 60 ~ "Medium Debt (30-60%)",
    `General government gross debt` >= 60 & `General government gross debt` < 90 ~ "High Debt (60-90%)",
    `General government gross debt` >= 90 ~ "Very High Debt (>90%)"
  ))  %>%
  rename("Inflation_average_consumer_prices" = "Inflation, average consumer prices",
         "General_government_gross_debt" = "General government gross debt",
         "GDP_rate" = "Gross domestic product, constant prices") %>%
  mutate(debt_group = as.factor(debt_group)) %>% #the debt group need to be in levels
  drop_na()
#summary statistics 

debt_analysis %>%
  group_by(debt_group) %>%
  get_summary_stats(Inflation_average_consumer_prices, GDP_rate, type = "mean_sd")
## # A tibble: 8 × 5
##   debt_group            variable                              n  mean    sd
##   <fct>                 <fct>                             <dbl> <dbl> <dbl>
## 1 High Debt (60-90%)    Inflation_average_consumer_prices   861  8.64 24.5 
## 2 High Debt (60-90%)    GDP_rate                            861  3.20  4.93
## 3 Low Debt (<30%)       Inflation_average_consumer_prices  1179  6.38 13.5 
## 4 Low Debt (<30%)       GDP_rate                           1179  4.47  5.63
## 5 Medium Debt (30-60%)  Inflation_average_consumer_prices  1752  6.5   9.51
## 6 Medium Debt (30-60%)  GDP_rate                           1752  3.98  6.14
## 7 Very High Debt (>90%) Inflation_average_consumer_prices   517 23.0  68.6 
## 8 Very High Debt (>90%) GDP_rate                            517  2.87  7.88

Inflation

Low and Medium debt countries have relatively low and stable inflation (around 6–7% on average).

High debt countries see slightly higher inflation (~8.6%) and much more variability (sd = 24.5).

Very high debt countries have extremely high and unstable inflation (mean = 23%, sd = 68.6).

As debt levels increase, inflation tends to rise sharply and become more volatile. This could reflect weak monetary control, currency depreciation, or loss of investor confidence in highly indebted economies

GDP growth

Low debt countries grow the fastest (4.47% average).

Growth gradually declines as debt increases:

Medium debt → 3.98%

High debt → 3.20%

Very high debt → 2.87%

Higher debt levels are associated with slower economic growth.This aligns with economic theory suggesting that excessive debt can crowd out private investment, increase interest costs, and reduce fiscal flexibility.

90% debt-to-GDP appears to be a critical threshold for macroeconomic stability Countries crossing this level experience dramatically higher inflation and lower growth

#SIZE ASSUMPTION 
debt_analysis %>%
  group_by(debt_group) %>%
  summarise(N= n())
## # A tibble: 4 × 2
##   debt_group                N
##   <fct>                 <int>
## 1 High Debt (60-90%)      861
## 2 Low Debt (<30%)        1179
## 3 Medium Debt (30-60%)   1752
## 4 Very High Debt (>90%)   517
#univariate outliers

#inflation outliers 

inflation_outliers <- debt_analysis %>%
  group_by(debt_group) %>%
  identify_outliers(Inflation_average_consumer_prices)

sum(inflation_outliers$is.extreme == TRUE)
## [1] 150
table(inflation_outliers$Continent, inflation_outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa      66   68
##   Americas    34   34
##   Asia        43   34
##   Europe      16   14
##   Oceania      1    0
#most continent has a spike in inflation except for Oceania


#gdp outliers

GDP_rate_outliers <- debt_analysis %>%
  group_by(debt_group) %>%
  identify_outliers(GDP_rate)

sum(GDP_rate_outliers$is.extreme == TRUE)
## [1] 80
table(GDP_rate_outliers$Continent, GDP_rate_outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa      55   31
##   Americas    42   19
##   Asia        55   23
##   Europe      19    2
##   Oceania     22    5
#Africa has the most spike in gdp rate then Oceania has the least



#Detect multivariate outliers using the mahalabonis dis tance 

debt_analysis %>%
  group_by(debt_group) %>%
  mahalanobis_distance()
## # A tibble: 4,309 × 6
##    Years GDP_rate Inflation_average_consumer…¹ General_government_g…² mahal.dist
##    <dbl>    <dbl>                        <dbl>                  <dbl>      <dbl>
##  1  2003    8.69                         35.7                  271.       25.1  
##  2  2004    0.671                        16.4                  245.       18.6  
##  3  2005   11.8                          10.6                  206.       14.5  
##  4  2006    5.36                          6.78                  23.0       0.755
##  5  2007   13.3                           8.68                  20.1       2.98 
##  6  2008    3.86                         26.4                   19.1       1.32 
##  7  2009   20.6                          -6.81                  16.2       8.09 
##  8  2010    8.44                          2.18                   7.70      1.55 
##  9  2011    6.48                         11.8                    7.50      1.35 
## 10  2012   14.0                           6.44                   6.76      3.82 
## # ℹ 4,299 more rows
## # ℹ abbreviated names: ¹​Inflation_average_consumer_prices,
## #   ²​General_government_gross_debt
## # ℹ 1 more variable: is.outlier <lgl>
# univariate normality assumption

#shapiro test

debt_analysis %>%
  group_by(debt_group)%>%
  shapiro_test(Inflation_average_consumer_prices, GDP_rate) %>%
  arrange(variable )
## # A tibble: 8 × 4
##   debt_group            variable                          statistic        p
##   <fct>                 <chr>                                 <dbl>    <dbl>
## 1 High Debt (60-90%)    GDP_rate                              0.922 1.22e-20
## 2 Low Debt (<30%)       GDP_rate                              0.778 3.44e-37
## 3 Medium Debt (30-60%)  GDP_rate                              0.560 9.93e-55
## 4 Very High Debt (>90%) GDP_rate                              0.871 2.87e-20
## 5 High Debt (60-90%)    Inflation_average_consumer_prices     0.236 2.66e-50
## 6 Low Debt (<30%)       Inflation_average_consumer_prices     0.265 1.08e-55
## 7 Medium Debt (30-60%)  Inflation_average_consumer_prices     0.567 1.87e-54
## 8 Very High Debt (>90%) Inflation_average_consumer_prices     0.335 1.66e-39
#QQPLOTS

#QQPLOT FOR INFLATION

ggqqplot(debt_analysis, "Inflation_average_consumer_prices", facet.by = "debt_group",
         ylab = "Inflation_average_consumer_prices", ggtheme = theme_bw())

#QQPLOT FOR GPD

ggqqplot(debt_analysis, 'GDP_rate', facet.by = "debt_group",
                   ylab = "GDP_rate", ggtheme = theme_bw())

both QQPLOTS for both inflation and gdp shows that the points moves across the line and are not scatted around which means the assumtion of normality is met

#Multivariate normality

debt_analysis %>%
  select(Inflation_average_consumer_prices, GDP_rate) %>%
  mshapiro_test()
## # A tibble: 1 × 2
##   statistic  p.value
##       <dbl>    <dbl>
## 1     0.228 5.17e-86

the p value is less than 0.05 meaning the multivarite assumption is not met

#Identify multicollinearity

debt_analysis %>% 
  cor_test(Inflation_average_consumer_prices, GDP_rate)
## # A tibble: 1 × 8
##   var1                 var2     cor statistic        p conf.low conf.high method
##   <chr>                <chr>  <dbl>     <dbl>    <dbl>    <dbl>     <dbl> <chr> 
## 1 Inflation_average_c… GDP_… -0.099     -6.56 6.01e-11   -0.129   -0.0698 Pears…

p value is less than 0.001
there is no multicollinearity concern between inflation and GDP growth

Correlation coefficient (r = -0.099) Very weak negative correlation between inflation and GDP growth. meaning countries with slightly higher inflation tend to have marginally lower growth, but the effect is very small

# Check linearity assumption

debt_linear <- debt_analysis %>%
  select(Inflation_average_consumer_prices,GDP_rate, debt_group) %>%
  group_by(debt_group) %>%
  doo( ~ggpairs(.) + theme_bw(), result = "plots")

debt_linear$plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

#the homogeneity of covariances assumption
box_m(debt_analysis[, c("Inflation_average_consumer_prices","GDP_rate")],
      debt_analysis$debt_group)
## # A tibble: 1 × 4
##   statistic p.value parameter method                                            
##       <dbl>   <dbl>     <dbl> <chr>                                             
## 1     4893.       0         9 Box's M-test for Homogeneity of Covariance Matric…

the p value is 0 which is less than 0.001 so the assumption is violated

#Check the homogneity of variance assumption

debt_analysis %>%
  gather(key = "variable", value = "value", Inflation_average_consumer_prices, GDP_rate) %>%
  group_by(variable) %>%
levene_test(value ~ debt_group)
## # A tibble: 2 × 5
##   variable                            df1   df2 statistic        p
##   <chr>                             <int> <int>     <dbl>    <dbl>
## 1 GDP_rate                              3  4305      18.9 3.69e-12
## 2 Inflation_average_consumer_prices     3  4305      57.1 3.64e-36

The Levene’s test is significant (p < 0.05), so there was no homogeneity of variances.

Manova Model

manova_debt <- lm(cbind(Inflation_average_consumer_prices, GDP_rate) ~ debt_group, debt_analysis)

Manova(manova_debt, test.statistics = "Pillai")
## 
## Type II MANOVA Tests: Pillai test statistic
##            Df test stat approx F num Df den Df    Pr(>F)    
## debt_group  3  0.041293   30.253      6   8610 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The MANOVA result rejects the null hypothesis (p < 0.001) This means there are statistically significant multivariate differences in inflation and GDP growth between the four debt groups.

The MANOVA test shows that government debt levels significantly influence the combination of inflation and GDP growth (Pillai’s trace = 0.041, F = 30.25, p < 0.001)

Countries with different debt levels (low, medium, high, very high) show different patterns in their inflation and GDP growth. Debt level seems to have a real economic influence on how fast economies grow and how much prices increase.

Post-Hoc Analysis

#Post- hoc analysis

debt_post <- debt_analysis %>%
  gather(key = "variable", value = "value", Inflation_average_consumer_prices, GDP_rate) %>%
  group_by(variable)

#post-hoc using the Welch anova test
  debt_post %>%
    welch_anova_test(value ~ debt_group)
## # A tibble: 2 × 8
##   variable                      .y.       n statistic   DFn   DFd       p method
## * <chr>                         <chr> <int>     <dbl> <dbl> <dbl>   <dbl> <chr> 
## 1 GDP_rate                      value  4309      12.7     3 1692. 3.24e-8 Welch…
## 2 Inflation_average_consumer_p… value  4309      11.9     3 1424. 1.01e-7 Welch…

Both GDP growth and inflation are significantly influenced (p value < 0.001) by government debt levels, even when we look at each variable individually

After the significant MANOVA, the Welch ANOVA confirms that both GDP growth and inflation individually differ across countries with different debt levels.

GDP growth declines as debt increases, but differences are moderate. Inflation rises with debt and becomes more volatile in high-debt countries.

These results reinforce the idea that government debt has measurable economic consequences, affecting both growth and price stability.

as your earlier descriptive stats showed:

Very High Debt (>90%) countries had very high inflation (≈23%) and low GDP growth (≈2.9%).

Low Debt (<30%) countries had lower inflation (~6%) and higher GDP growth (~4.5%).

# Games Howell paair wise comaprison
debt_post %>%
  games_howell_test(value ~ debt_group)
## # A tibble: 12 × 9
##    variable .y.   group1 group2 estimate conf.low conf.high   p.adj p.adj.signif
##  * <chr>    <chr> <chr>  <chr>     <dbl>    <dbl>     <dbl>   <dbl> <chr>       
##  1 GDP_rate value High … Low D…    1.27     0.668    1.88   4.06e-7 ****        
##  2 GDP_rate value High … Mediu…    0.785    0.211    1.36   2   e-3 **          
##  3 GDP_rate value High … Very …   -0.330   -1.32     0.662  8.27e-1 ns          
##  4 GDP_rate value Low D… Mediu…   -0.487   -1.05     0.0784 1.2 e-1 ns          
##  5 GDP_rate value Low D… Very …   -1.60    -2.59    -0.615  1.92e-4 ***         
##  6 GDP_rate value Mediu… Very …   -1.12    -2.08    -0.146  1.7 e-2 *           
##  7 Inflati… value High … Low D…   -2.27    -4.64     0.108  6.8 e-2 ns          
##  8 Inflati… value High … Mediu…   -2.14    -4.37     0.0824 6.4 e-2 ns          
##  9 Inflati… value High … Very …   14.3      6.26    22.4    3.43e-5 ****        
## 10 Inflati… value Low D… Mediu…    0.122   -1.05     1.29   9.93e-1 ns          
## 11 Inflati… value Low D… Very …   16.6      8.75    24.4    4.58e-7 ****        
## 12 Inflati… value Mediu… Very …   16.5      8.67    24.3    4.87e-7 ****

GDP growth: Mostly stable across debt groups, but countries with very high debt (>90%) significantly slower than low or medium debt countries

Inflation: Very high debt countries experience much higher inflation than all other groups, while low, medium, and high debt countries have similar inflation rates

Moderate debt (30–90%) countries show relatively small differences in growth and inflation.

Moderate debt may support economic activity, but once debt levels become excessive (>90% of GDP), countries face slower growth and severe inflation. This pattern matches classic economic theories (e.g., Reinhart & Rogoff) linking high debt to macroeconomic instability.

Research Question: Population Size and Economic Structure

Question: Do countries with different population sizes show significant differences in their economic structure (investment, savings, trade)

FEATURE ENGINEERING

population_economic <- emerging_economies %>%
  filter(Economic_Indicator %in% c("Gross national savings",
                                   "Total investment",
                                   "Volume of imports of goods and services",
                                   "Population")) %>%
  select(Years, Continent, Country, Economic_Indicator, Values) %>%
  pivot_wider(names_from  = Economic_Indicator,
              values_from = Values) %>%
  # create population groups using the Population column (not Values)
  mutate(
    Population_Group = case_when(
      `Population` < 10                     ~ "Small (<10M)",
      between(`Population`, 10, 50)         ~ "Medium (10–50M)",
      between(`Population`, 50, 100)        ~ "Large (50–100M)",
      `Population` >= 100                   ~ "Very large (>=100M)",)) %>%
  # rename columns to safer names (use the exact original names above)
  rename(
    Gross_national_savings = `Gross national savings`,
    Total_investment = `Total investment`,
    Volume_of_imports = `Volume of imports of goods and services`
  ) %>%
  #the population_group should be in levels
  mutate(Population_Group = as.factor(Population_Group)) %>%
  drop_na()
#summery statistics
population_economic %>%
  group_by(Population_Group) %>%
  get_summary_stats(Gross_national_savings, Total_investment, Volume_of_imports,
                    type = "mean_sd")
## # A tibble: 12 × 5
##    Population_Group    variable                   n  mean    sd
##    <fct>               <fct>                  <dbl> <dbl> <dbl>
##  1 Large (50–100M)     Gross_national_savings   362 23.1   9.52
##  2 Large (50–100M)     Total_investment         362 25.0   8.79
##  3 Large (50–100M)     Volume_of_imports        362  7.97 19.0 
##  4 Medium (10–50M)     Gross_national_savings  1581 19.7  11.1 
##  5 Medium (10–50M)     Total_investment        1581 22.9   8.07
##  6 Medium (10–50M)     Volume_of_imports       1581  5.84 16.1 
##  7 Small (<10M)        Gross_national_savings  2422 17.6  13.3 
##  8 Small (<10M)        Total_investment        2422 24.3  12.3 
##  9 Small (<10M)        Volume_of_imports       2422  5.19 18.4 
## 10 Very large (>=100M) Gross_national_savings   291 21.7   6.51
## 11 Very large (>=100M) Total_investment         291 22.4   6.91
## 12 Very large (>=100M) Volume_of_imports        291  5.40 13.4

Large (50M–100M population)

Large-population countries tend to have high investment (25%) and savings (23%), indicating strong domestic capital accumulation. Their imports (7.97) are moderately high, showing open trade, but the high SD (19) suggests big variation some countries import far more than others.

Medium (10M–50M population)

Medium-sized countries have moderate investment(22.9) and savings(19.7) levels. Imports are slightly lower than large countries, suggesting more inward or domestic-focused economies. Variation (sd) is still high meaning economic behavior differs widely across medium-sized economies

Small (<10M population)

Smaller countries show relatively high investment (24.3%) surprisingly close to large economies.But savings (17.6%) are lowest, indicating they may rely more on foreign capital or borrowing for investment.Again, high SDs suggest economic heterogeneity (some are very open or rich, others small and closed).

Very Large (>100M population)

Very large economies have moderate imports (5.4), slightly lower investment (22.4%), and fairly strong savings (21.7%).Their lower SDs show more stability and less variation, typical of large, mature economies with diversified sectors

#population outliers
#gross outliers

gross_outliers <- population_economic %>%
  group_by(Population_Group) %>%
  identify_outliers(Gross_national_savings)

sum(gross_outliers$is.extreme == TRUE)
## [1] 22
table(gross_outliers$Continent, gross_outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa      71    8
##   Americas    17    1
##   Asia        37   13
##   Europe       1    0
##   Oceania      1    0
#asia has extreme spike followed by Africa 


#investment outliers

investment_outliers <- population_economic %>%
  group_by(Population_Group) %>%
  identify_outliers(Total_investment)


sum(invest_outlier$is.extreme == TRUE)
## [1] 18
table(invest_outlier$Continent, invest_outlier$is.extreme)
##           
##            FALSE TRUE
##   Africa      70   17
##   Americas     7    0
##   Asia        39    1
##   Europe      14    0
#Africa has the most outliers 


#import outliers

import_outliers <- population_economic %>%
  group_by(Population_Group) %>%
  identify_outliers(Volume_of_imports)

sum(import_outliers$is.extreme == TRUE)
## [1] 45
table(import_outliers$Continent, import_outliers$is.extreme)
##           
##            FALSE TRUE
##   Africa     102   29
##   Americas    43    4
##   Asia        49   10
##   Europe      15    2
##   Oceania      7    0
#Africa has the most outliers 

#Detect multivariate outliers using Mahalabonis distance 

popu_mahalabonis <- population_economic %>%
  group_by(Population_Group) %>%
  mahalanobis_distance() %>%
  head(5)
# univariate normality assumption
#shapiro test

population_economic %>%
  group_by(Population_Group) %>%
  shapiro_test(Volume_of_imports,Total_investment, Gross_national_savings) %>%
    arrange(variable)
## # A tibble: 12 × 4
##    Population_Group    variable               statistic        p
##    <fct>               <chr>                      <dbl>    <dbl>
##  1 Large (50–100M)     Gross_national_savings     0.975 7.52e- 6
##  2 Medium (10–50M)     Gross_national_savings     0.934 7.78e-26
##  3 Small (<10M)        Gross_national_savings     0.919 2.93e-34
##  4 Very large (>=100M) Gross_national_savings     0.982 9.51e- 4
##  5 Large (50–100M)     Total_investment           0.971 1.50e- 6
##  6 Medium (10–50M)     Total_investment           0.982 3.37e-13
##  7 Small (<10M)        Total_investment           0.921 6.58e-34
##  8 Very large (>=100M) Total_investment           0.948 1.31e- 8
##  9 Large (50–100M)     Volume_of_imports          0.848 3.14e-18
## 10 Medium (10–50M)     Volume_of_imports          0.943 3.18e-24
## 11 Small (<10M)        Volume_of_imports          0.823 1.05e-45
## 12 Very large (>=100M) Volume_of_imports          0.992 1.03e- 1
#QQPLOTS 

#QQPLOTS FOR IMPORTS

ggqqplot( population_economic, "Volume_of_imports", facet.by = "Population_Group",
          ylab = "Volume_of_imports", ggtheme = theme_bw())

#QQPLOTS FOR GROSS savings 

ggqqplot(population_economic, "Gross_national_savings", facet.by = "Population_Group",
         ylab = "Gross_national_savings", ggtheme = theme_bw())

#QQPLOTS FOR INVESTMENT

ggqqplot(population_economic, "Total_investment", facet.by = "Population_Group",
         ylab = "Total_investment", ggtheme = theme_bw())

all the qqplots shows that the ponints lies within the line which means the assumption of normality is met

#Multivariate normality
population_economic %>%
  select(Total_investment, Gross_national_savings, Volume_of_imports) %>%
  mshapiro_test()
## # A tibble: 1 × 2
##   statistic  p.value
##       <dbl>    <dbl>
## 1     0.871 2.58e-52

p-value is less than 0.05 meaning the assumption is violated

#Identify multicollinearity

population_economic %>%
  cor_test(Gross_national_savings, Total_investment, Volume_of_imports)
## # A tibble: 9 × 8
##   var1                 var2    cor statistic         p conf.low conf.high method
##   <chr>                <chr> <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr> 
## 1 Gross_national_savi… Gros… 1      Inf      0          1          1      Pears…
## 2 Gross_national_savi… Tota… 0.48     3.76e1 4.95e-271  0.461      0.505  Pears…
## 3 Gross_national_savi… Volu… 0.019    1.28e0 2.01e-  1 -0.00997    0.0475 Pears…
## 4 Total_investment     Gros… 0.48     3.76e1 4.95e-271  0.461      0.505  Pears…
## 5 Total_investment     Tota… 1        4.58e9 0          1          1      Pears…
## 6 Total_investment     Volu… 0.092    6.32e0 2.92e- 10  0.0636     0.121  Pears…
## 7 Volume_of_imports    Gros… 0.019    1.28e0 2.01e-  1 -0.00997    0.0475 Pears…
## 8 Volume_of_imports    Tota… 0.092    6.32e0 2.92e- 10  0.0636     0.121  Pears…
## 9 Volume_of_imports    Volu… 1      Inf      0          1          1      Pears…

Savings and Investment (r = 0.48) (p value < 0.001)

moderate positive correlation means that higher gross national savings are associated with higher total investment.This aligns with economic theory (especially the savings–investment identity in macroeconomics):Countries that save more often invest more domestically

Savings and Imports (r = 0.019) (p value <0.001)

Almost no relationship.Meaning: whether a country imports more or less doesn’t seem related to its savings rate.Could suggest that trade openness (imports) doesn’t directly influence how much countries save

Investment and Imports (r = 0.092) (p value < 0.001)

Very weak positive relationship, higher investment slightly coincides with higher imports.Possibly because investment often requires importing capital goods or machinery.

#linearity assumption

population_linerity <- population_economic %>%
  select(Volume_of_imports, Total_investment, Gross_national_savings, Population_Group) %>%
  group_by(Population_Group) %>%
  doo( ~ggpairs(.) + theme_bw(), result = "plots")

population_linerity$plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

# the homogeneity of covariances assumption
box_m(population_economic[, c("Volume_of_imports", "Total_investment",
                              "Gross_national_savings")], population_economic$Population_Group)
## # A tibble: 1 × 4
##   statistic   p.value parameter method                                          
##       <dbl>     <dbl>     <dbl> <chr>                                           
## 1     1382. 9.89e-283        18 Box's M-test for Homogeneity of Covariance Matr…

The test is statistically significant ( p < 0.001), so the data have violated the assumption of homogeneity of variance-covariance matrices.

#Check the homogneity of variance assumption

population_economic %>% 
  gather(key = "variable", value = "value", Total_investment, Gross_national_savings,
         Volume_of_imports) %>%
  group_by(variable) %>%
  levene_test(value ~ Population_Group)
## # A tibble: 3 × 5
##   variable                 df1   df2 statistic        p
##   <chr>                  <int> <int>     <dbl>    <dbl>
## 1 Gross_national_savings     3  4652     28.5  2.99e-18
## 2 Total_investment           3  4652     53.1  9.20e-34
## 3 Volume_of_imports          3  4652      1.98 1.14e- 1

The Levene’s test is significant (p < 0.05), so there was no homogeneity of variances.

Manova Model

#Manova model using the Pillai Trace for robust since some assumptions are violated

manova_population <- lm(cbind(Total_investment, Gross_national_savings, Volume_of_imports) ~
                          Population_Group, population_economic)
Manova(manova_population, test.statistics = "Pillai")
## 
## Type II MANOVA Tests: Pillai test statistic
##                  Df test stat approx F num Df den Df    Pr(>F)    
## Population_Group  3  0.039903   20.903      9  13956 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Pillai’s Trace value of 0.0399 and a p-value < 0.001 tell us that:

Reject the H0(Null): the p value is less than 0.001 The result is highly statistically significant

There are statistically significant differences in the combined economic indicators (Total Investment, Gross National Savings, and Volume of Imports) across population size groups.

This suggests population size plays a role in shaping a country’s economic structure and behavior

#post-hoc analysis 

post_hoc_population <- population_economic %>%
  gather(key = "variable", value = "value", Gross_national_savings, Volume_of_imports,
         Total_investment)%>%
  group_by(variable)

#post hoc usin the Welch anova model
post_hoc_population %>%
  welch_anova_test(value ~ Population_Group)
## # A tibble: 3 × 8
##   variable               .y.       n statistic   DFn   DFd        p method     
## * <chr>                  <chr> <int>     <dbl> <dbl> <dbl>    <dbl> <chr>      
## 1 Gross_national_savings value  4656      43.6     3 1010. 1.66e-26 Welch ANOVA
## 2 Total_investment       value  4656      12.4     3  939. 5.6 e- 8 Welch ANOVA
## 3 Volume_of_imports      value  4656       2.4     3  898. 6.6 e- 2 Welch ANOVA

Gross National Savings (p < 0.001)

The differences in average savings across population size groups are highly significant.meaning small, medium, large, and very large countries save at different average rates.

Total Investment (p < 0.001)

The differences in average investment are also statistically significant.means that countries with different population sizes invest differently on average.

Volume of Imports (p = 0.066 > 0.05)

Not statistically significant. So, import levels are relatively similar across population groups ,population size doesn’t strongly affect trade volume (imports) the p value being greater than 0.05 means Population size doesn’t systematically affect import volumes

The Welch’s ANOVA results show that population size has a significant effect on Gross National Savings and Total Investment, but not on the Volume of Imports. This means that larger or smaller countries differ in how much they save and invest,but not significantly in how much they import.

#pairwise comparisons games howells Test

post_hoc_population %>%
  games_howell_test(value ~ Population_Group)
## # A tibble: 18 × 9
##    variable             .y.   group1 group2 estimate conf.low conf.high    p.adj
##  * <chr>                <chr> <chr>  <chr>     <dbl>    <dbl>     <dbl>    <dbl>
##  1 Gross_national_savi… value Large… Mediu…   -3.45    -4.92    -1.97   1.86e- 8
##  2 Gross_national_savi… value Large… Small…   -5.49    -6.95    -4.02   4.74e-10
##  3 Gross_national_savi… value Large… Very …   -1.48    -3.10     0.144  8.9 e- 2
##  4 Gross_national_savi… value Mediu… Small…   -2.04    -3.04    -1.04   9.05e- 7
##  5 Gross_national_savi… value Mediu… Very …    1.97     0.753    3.19   2.02e- 4
##  6 Gross_national_savi… value Small… Very …    4.01     2.81     5.22   0       
##  7 Total_investment     value Large… Mediu…   -2.06    -3.36    -0.763  2.91e- 4
##  8 Total_investment     value Large… Small…   -0.652   -2.01     0.702  6.01e- 1
##  9 Total_investment     value Large… Very …   -2.62    -4.20    -1.04   1.35e- 4
## 10 Total_investment     value Mediu… Small…    1.41     0.583    2.24   7.29e- 5
## 11 Total_investment     value Mediu… Very …   -0.556   -1.72     0.612  6.1 e- 1
## 12 Total_investment     value Small… Very …   -1.97    -3.19    -0.741  2.42e- 4
## 13 Volume_of_imports    value Large… Mediu…   -2.13    -4.91     0.646  1.97e- 1
## 14 Volume_of_imports    value Large… Small…   -2.78    -5.53    -0.0302 4.6 e- 2
## 15 Volume_of_imports    value Large… Very …   -2.57    -5.84     0.701  1.8 e- 1
## 16 Volume_of_imports    value Mediu… Small…   -0.648   -2.07     0.771  6.44e- 1
## 17 Volume_of_imports    value Mediu… Very …   -0.434   -2.71     1.84   9.61e- 1
## 18 Volume_of_imports    value Small… Very …    0.213   -2.02     2.45   9.95e- 1
## # ℹ 1 more variable: p.adj.signif <chr>

(Savings) (p < 0.001)

Smaller countries (especially those under 10 million people) tend to have higher gross national savings on average, while medium and large-population countries save less.Very large countries (100M+) have moderate savings—not the highest, but higher than medium-sized ones.

(Investment) (P < 0.001)

Very large countries (>100M) have stronger investment levels compared to large and small ones.

Medium countries invest more than small ones. Overall, investment tends to rise again as population gets very large — showing a possible U-shape relationship (low for large, high for very large).

(Imports) (P > 0.05)

There’s no strong evidence that population size influences import levels. The only weak finding: large countries import slightly less than small ones possibly due to stronger domestic production.

this shows that Trade volume (imports) does not differ much by population size, suggesting tradepatterns are influenced by other economic factors (like openness, industrial structure, or policy) rather than population.

CONCLUSION

This analysis highlights that macroeconomic outcomes in emerging economies are shaped by intricate interactions between policy decisions, structural features, and regional contexts. By identifying critical debt thresholds, optimal population ranges, and patterns of regional policy effectiveness, the study offers actionable insights for policymakers. The results emphasize that one-size-fits-all approaches are insufficient, and that tailored, evidence-based strategies are vital for fostering sustainable development in these economies.

Beyond its policy relevance, the research advances academic understanding by demonstrating how sophisticated statistical analysis of comprehensive datasets can illuminate the complex dynamics of emerging market development. It shows that rigorous, data-driven insights can guide more informed economic decision-making in environments characterized by uncertainty and structural diversity

Regression Analysis

How does a country’s fiscal balance position (surplus, balanced, or deficit) influence its inflation rate, and does this relationship differ across continents

#linear regression model 

fiscal_model_1 <- lm(inflation ~ fiscal_group + Continent + fiscal_group:Continent,
                   data = fiscal_inflation)


print(summary(fiscal_model_1))
## 
## Call:
## lm(formula = inflation ~ fiscal_group + Continent + fiscal_group:Continent, 
##     data = fiscal_inflation)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -85.65  -8.73  -5.00  -0.31 723.19 
## 
## Coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                  12.91989    0.94572  13.661
## fiscal_groupdeficit (<-5%)                    1.33266    1.85019   0.720
## fiscal_groupsurplus (>5%)                    -3.78329    3.28616  -1.151
## ContinentAmericas                            -4.18075    1.44926  -2.885
## ContinentAsia                                -4.08605    1.48938  -2.743
## ContinentEurope                              -0.52025    2.10627  -0.247
## ContinentOceania                             -8.30415    2.35913  -3.520
## fiscal_groupdeficit (<-5%):ContinentAmericas  4.85708    3.17596   1.529
## fiscal_groupsurplus (>5%):ContinentAmericas   6.04164    7.93361   0.762
## fiscal_groupdeficit (<-5%):ContinentAsia      2.27979    2.75673   0.827
## fiscal_groupsurplus (>5%):ContinentAsia       0.08756    4.78828   0.018
## fiscal_groupdeficit (<-5%):ContinentEurope   -2.87664    4.49123  -0.641
## fiscal_groupsurplus (>5%):ContinentEurope    -0.80354   15.10886  -0.053
## fiscal_groupdeficit (<-5%):ContinentOceania  -1.87754    4.88108  -0.385
## fiscal_groupsurplus (>5%):ContinentOceania    2.74912    5.72040   0.481
##                                              Pr(>|t|)    
## (Intercept)                                   < 2e-16 ***
## fiscal_groupdeficit (<-5%)                   0.471385    
## fiscal_groupsurplus (>5%)                    0.249674    
## ContinentAmericas                            0.003934 ** 
## ContinentAsia                                0.006102 ** 
## ContinentEurope                              0.804917    
## ContinentOceania                             0.000436 ***
## fiscal_groupdeficit (<-5%):ContinentAmericas 0.126249    
## fiscal_groupsurplus (>5%):ContinentAmericas  0.446381    
## fiscal_groupdeficit (<-5%):ContinentAsia     0.408283    
## fiscal_groupsurplus (>5%):ContinentAsia      0.985412    
## fiscal_groupdeficit (<-5%):ContinentEurope   0.521878    
## fiscal_groupsurplus (>5%):ContinentEurope    0.957588    
## fiscal_groupdeficit (<-5%):ContinentOceania  0.700508    
## fiscal_groupsurplus (>5%):ContinentOceania   0.630835    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.71 on 4845 degrees of freedom
## Multiple R-squared:  0.00784,    Adjusted R-squared:  0.004974 
## F-statistic: 2.735 on 14 and 4845 DF,  p-value: 0.000487

For an African country with a near-balanced budget, the model predicts an average inflation rate of about 12.92%

fiscal_group deficit (<-5%) = +1.33 (p > 0.05) African countries running large fiscal deficits tend to have about 1.3 percentage points higher inflation,but this difference is not statistically reliable.

fiscal_group surplus (>5%) = -3.78 (p > 0.05) African countries with large surpluses tend to have ~3.8 points lower inflation, but this also is not statistically reliable.

Continent

America (p <0.05) Balanced-budget countries in the Americas have ~4.18 points lower inflation than Africa.This is statistically significant

Asia (P < 0.05) inflation under balanced budgets have ~ -4.18 points lower than in Africa These results are statistically significant

Europe (p > 0.05) Balanced- Budget slightly increase inflation by 0.69 points than in Africa This is not statistically significant

Oceania (p < 0.05) In Oceania a Balanced-Budget reduces inflation by 8.23 points than in Africa These results are statistically significant

Interaction (p > 0.05) The whole of the interaction is not statistically significant

For the Americas: The effect of a deficit is 2.02 + 2.71 = +4.73%. So, a deficit is especially inflationary in the Americas. The effect of a surplus is -3.42 + 9.40 = +5.98%. This is surprising! A surplus is actually inflationary in the Americas, contrary to the general trend.

For Europe: The effect of a deficit is 2.02 - 4.87 = -2.85%. In Europe, a deficit is associated with lower inflation (again, contrary to the general trend). he effect of a surplus is -3.42 - 1.63 = -5.05%. A surplus is strongly deflationary in Europe.

For Asia The effect of Deficit +2.02 + 0.30 = +2.32% In Asia, having a large deficit is associated with increasing inflation The effect of surplus -3.42 + (-0.60) = -4.02% In Asia, having a large surplus is associated with decreasing inflation rate

For Oceania The effect of Deficit +2.02 - 2.80 = -0.78% in Oceania running on deficit budget can decrease inflation by 0.78The effect of Surplus -3.42 + 2.04 = -1.38% in Oceania running on surplus budget can decrease inflation rate

The results show that the relationship between fiscal balance and inflation is not uniform across regions. Fiscal policy outcomes are shaped by the strength of monetary institutions, financial market development, and macroeconomic credibility.

This means that the same fiscal stance (e.g., running a deficit of >5% of GDP) can have different inflation outcomes across regions. For example: In Africa, fiscal deficits tend to raise inflation because governments may rely on central bank financing or external debt that weakens the currency. In Asia, deficits also raise inflation but the effect is more moderate due to improving monetary policy frameworks and exchange rate management.

In Oceania, deficits do not lead to inflation because countries like Australia and New Zealand have highly credible, independent central banks and strong capital markets that allow borrowing without destabilizing prices.

How does a country’s fiscal balance position influence its inflation rate, after controlling for GDP growth, investment, government revenue and expenditure, government debt, and national savings

#FEATURE ENGINEERING

fiscal_inflation_analysis <- emerging_economies %>%
  filter(Economic_Indicator %in% c("General government revenue",
                                   "General government total expenditure",
                                   "Inflation, average consumer prices",
                                   "Total investment",
                                   "General government gross debt",
                                   "Gross national savings",
                                   "Population",
                                   "Gross domestic product, constant prices"
                                   
                                   
  )) %>%
  select(Continent, Country, Years, Economic_Indicator, Values) %>%
  pivot_wider(names_from = Economic_Indicator,
              values_from = Values) %>%
  # compute fiscal balance = revenue - expenditure (assumes Values are % of GDP or same units)
  mutate(
    fiscal_balance = `General government revenue` - `General government total expenditure`,
    fiscal_group = case_when(
      fiscal_balance > 5                       ~ "surplus (>5%)",
      between(fiscal_balance, -5, 5)           ~ "balanced (-5% to 5%)",
      fiscal_balance < -5                      ~ "deficit (<-5%)",
      TRUE                                     ~ NA_character_
    )
  ) %>%
  # keep relevant cols and rename inflation column for convenience
  rename("inflation" = "Inflation, average consumer prices",
         "GDP_rate" = "Gross domestic product, constant prices",
         "Total_investment" = "Total investment",
         "General_government_revenue" = "General government revenue",
         "General_government_total_expenditure" = "General government total expenditure",
         "General_government_gross_debt" = "General government gross debt",
         "Gross_national_savings" = "Gross national savings") %>%
  select(Continent, Country, Years, fiscal_balance, fiscal_group, inflation, GDP_rate,
         Total_investment, General_government_revenue, General_government_total_expenditure,
         General_government_gross_debt, Gross_national_savings) %>%
  mutate(fiscal_group = as.factor(fiscal_group)) %>%

  drop_na()

  #remove duplicates
distinct(fiscal_inflation_analysis)
## # A tibble: 3,470 × 12
##    Continent Country     Years fiscal_balance fiscal_group    inflation GDP_rate
##    <chr>     <chr>       <dbl>          <dbl> <fct>               <dbl>    <dbl>
##  1 Asia      Afghanistan  2003         -2.10  balanced (-5% …     35.7     8.69 
##  2 Asia      Afghanistan  2004         -2.39  balanced (-5% …     16.4     0.671
##  3 Asia      Afghanistan  2005         -0.918 balanced (-5% …     10.6    11.8  
##  4 Asia      Afghanistan  2006          0.684 balanced (-5% …      6.78    5.36 
##  5 Asia      Afghanistan  2007         -2.46  balanced (-5% …      8.68   13.3  
##  6 Asia      Afghanistan  2008         -3.86  balanced (-5% …     26.4     3.86 
##  7 Asia      Afghanistan  2009         -1.76  balanced (-5% …     -6.81   20.6  
##  8 Asia      Afghanistan  2010          0.933 balanced (-5% …      2.18    8.44 
##  9 Asia      Afghanistan  2011         -0.672 balanced (-5% …     11.8     6.48 
## 10 Asia      Afghanistan  2012          0.182 balanced (-5% …      6.44   14.0  
## # ℹ 3,460 more rows
## # ℹ 5 more variables: Total_investment <dbl>, General_government_revenue <dbl>,
## #   General_government_total_expenditure <dbl>,
## #   General_government_gross_debt <dbl>, Gross_national_savings <dbl>
fiscal_model <- lm(inflation ~ fiscal_group  +  GDP_rate + Total_investment + General_government_revenue +
                     General_government_total_expenditure +
                     General_government_gross_debt + Gross_national_savings, 
                   data = fiscal_inflation_analysis)



print(summary(fiscal_model))
## 
## Call:
## lm(formula = inflation ~ fiscal_group + GDP_rate + Total_investment + 
##     General_government_revenue + General_government_total_expenditure + 
##     General_government_gross_debt + Gross_national_savings, data = fiscal_inflation_analysis)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -43.97  -6.19  -2.77   1.85 528.42 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           8.948276   1.262243   7.089 1.63e-12 ***
## fiscal_groupdeficit (<-5%)           -0.732105   0.904113  -0.810   0.4181    
## fiscal_groupsurplus (>5%)             1.933407   1.776481   1.088   0.2765    
## GDP_rate                             -0.361357   0.059699  -6.053 1.57e-09 ***
## Total_investment                     -0.073780   0.044196  -1.669   0.0951 .  
## General_government_revenue           -0.092410   0.042281  -2.186   0.0289 *  
## General_government_total_expenditure  0.006860   0.022630   0.303   0.7618    
## General_government_gross_debt         0.072799   0.009742   7.473 9.91e-14 ***
## Gross_national_savings                0.042947   0.036601   1.173   0.2407    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.55 on 3461 degrees of freedom
## Multiple R-squared:  0.03241,    Adjusted R-squared:  0.03017 
## F-statistic: 14.49 on 8 and 3461 DF,  p-value: < 2.2e-16

The expected inflation rate (~9%) when a country’s fiscal balance is roughly neutral and all other economic variables are constant.

GDP_rate (-0.36) (p < 0.05) For every 1% increase in GDP growth, inflation decreases by 0.5% suggesting faster growth may coincide with price stability. These is statistically significant

Total_investment (-0.07) (p > 0.05) Higher investment slightly reduces inflation, possibly through productivity gains. This is not statistically significant

General_government_revenue (-0.09) (p > 0.05) Higher revenue correlates with slightly lower inflation, perhaps reflecting stronger fiscal control.not statistical significant

General_government_total_expenditure (0.006) (p > 0.05) Practically no effect the coefficient is extremely close to zero. Government spending alone does not predict inflation

General_government_gross_debt (+0.072) (p < 0.05) Higher debt is associated with higher inflation (about 0.07% increase per unit of debt)supporting the debt-inflation link. This is statistically significant

Gross_national_savings (+0.042) (p > 0.05) Higher savings correlate with slightly higher inflation, possibly because savings rise when nominal interest rates are higher.

fiscal_group: Deficit (< -5%) (-0.73) (p > 0.05) Countries running deficits have 0.73% lower inflation on average than those with balanced budgets, holding all else constant.

fiscal_group: Surplus (> 5%) (1.92) p( > 0.05) Countries running surpluses have 2.92% higher inflation than balanced-budget countries possibly due to overheated economies or post-surplus spending.

Both fiscal deficit and surplus are not statistically significant

How does the level of government debt influence a country’s economic growth rate when controlling for inflation and total investment

#FEATURE ENGINEERING

debt_prediction_analysis <- emerging_economies %>%
  filter(Economic_Indicator %in% c("Inflation, average consumer prices",
                                   "Gross domestic product, constant prices",
                                   "General government gross debt",
                                   "Gross national savings",
                                   "Total investment")) %>%
  
  select(Country, Years, Continent, Economic_Indicator, Values)%>%
  pivot_wider(names_from = Economic_Indicator, values_from = Values) %>%
  mutate(debt_group = case_when(
    `General government gross debt` < 30 ~ "Low Debt (<30%)",
    `General government gross debt` >= 30 & `General government gross debt` < 60 ~ "Medium Debt (30-60%)",
    `General government gross debt` >= 60 & `General government gross debt` < 90 ~ "High Debt (60-90%)",
    `General government gross debt` >= 90 ~ "Very High Debt (>90%)"
  ))  %>%
  rename("Inflation_average_consumer_prices" = "Inflation, average consumer prices",
         "General_government_gross_debt" = "General government gross debt",
         "GDP_rate" = "Gross domestic product, constant prices",
         "Gross_national_savings" = "Gross national savings",
         "Total_investment" = "Total investment") %>%
  mutate(debt_group = as.factor(debt_group)) %>% #the debt group need to be in levels
  drop_na()
# GDP growth analysis using debt levels + controls on the train set
growth_model <- lm(GDP_rate ~ debt_group +Inflation_average_consumer_prices + Total_investment,
                   
                   data = debt_prediction_analysis)

summary(growth_model)
## 
## Call:
## lm(formula = GDP_rate ~ debt_group + Inflation_average_consumer_prices + 
##     Total_investment, data = debt_prediction_analysis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.333  -2.051   0.095   2.141 137.482 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        0.514192   0.326979   1.573  0.11591    
## debt_groupLow Debt (<30%)          1.391021   0.287932   4.831 1.42e-06 ***
## debt_groupMedium Debt (30-60%)     0.893264   0.263000   3.396  0.00069 ***
## debt_groupVery High Debt (>90%)   -0.097300   0.365176  -0.266  0.78991    
## Inflation_average_consumer_prices -0.029226   0.004846  -6.031 1.80e-09 ***
## Total_investment                   0.125438   0.010033  12.502  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.886 on 3486 degrees of freedom
## Multiple R-squared:  0.06547,    Adjusted R-squared:  0.06413 
## F-statistic: 48.84 on 5 and 3486 DF,  p-value: < 2.2e-16

Countries with High Public Debt (60–90% of GDP), average inflation, and average investment, tend to grow at ~0.5% per year. Not statistically significant (P > 0.05)

Low Debt (<30%) (p < 0.05) Low-debt countries grow ~1.4 percentage points faster than high-debt countries. Low debt → more fiscal space → more private investment & growth This is statistically significant

Medium Debt (30–60%) ( p < 0.05) Moderate-debt countries grow ~0.9 points faster than high-debt economies.Growth remains stable where debt is sustainable. This is statistically significant

Very High Debt (>90%) (p > 0.05) Growth is not consistently different from high-debt groupAt extreme values, debt impacts growth non-linearly and depends on institutions & monetary credibility Not statistically significant

Inflation rate –0.030 ( p < 0.05) For every 1% increase in inflation, GDP growth falls by 0.03 percentage points, on average (p < 0.001, highly significant).

Consistent with Monetarist Theory (Friedman) and Price Stability & Growth Framework: High inflation creates uncertainty → lowers investment → slows growth

Total investment +0.13 (p < 0.05) For every 1% increase in investment rate, GDP growth rises by 0.13 percentage points, on average (p < 0.001, highly significant).

This matches the Solow Growth Model (1956):

Investment increases the capital stock. Higher capital accumulation → higher productive capacity → higher economic output. Investment is one of the strongest drivers of growth.

Economies grow faster when they maintain low-to-moderate debt, keep inflation stable, and encourage investment.

What drives the gap between national savings and investment in emerging markets

# Create the savings-investment gap variable
population_economic <- population_economic %>%
  mutate(savings_investment_gap = Gross_national_savings - Total_investment,
         gap_ratio = Gross_national_savings / Total_investment)
# Fit the model to see what drives the gap
gap_model <- lm(savings_investment_gap ~ Population_Group + Continent + 
                  Volume_of_imports + Population,
                data = population_economic)

summary(gap_model)
## 
## Call:
## lm(formula = savings_investment_gap ~ Population_Group + Continent + 
##     Volume_of_imports + Population, data = population_economic)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -159.086   -3.619    0.360    4.503   92.612 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         -2.4570761  0.7127128  -3.447 0.000571 ***
## Population_GroupMedium (10–50M)     -1.2160714  0.7021831  -1.732 0.083368 .  
## Population_GroupSmall (<10M)        -4.8888594  0.7107321  -6.879 6.85e-12 ***
## Population_GroupVery large (>=100M)  0.8958927  1.0319715   0.868 0.385364    
## ContinentAmericas                    1.6045080  0.4300098   3.731 0.000193 ***
## ContinentAsia                        1.4423532  0.4414184   3.268 0.001093 ** 
## ContinentEurope                      0.9663614  0.6209608   1.556 0.119720    
## ContinentOceania                     5.8961573  1.7170279   3.434 0.000600 ***
## Volume_of_imports                   -0.0453035  0.0095335  -4.752 2.07e-06 ***
## Population                          -0.0005566  0.0039568  -0.141 0.888142    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.33 on 4646 degrees of freedom
## Multiple R-squared:  0.04334,    Adjusted R-squared:  0.04148 
## F-statistic: 23.38 on 9 and 4646 DF,  p-value: < 2.2e-16

In an African emerging country with a large population, average imports, and average overall population size, the country invests about 2.5 percentage points more than it saves. (p < 0.05) Statistically significant

Population

Small (<10M) -4.89 (p < 0.05) Small economies invest almost 5% of GDP more than they save, meaning they heavily depend on foreign capital. Small economies generally cannot generate enough internal savings to fund investment. (p < 0.001 very significant)

Medium (10–50M) -1.22 ( p > 0.05) Medium-sized countries also tend to invest more than they save, but less severely than small populated countries.

Very Large (≥100M) 0.89 (p > 0.05) Very large countries save roughly the same as large ones—size alone does not guarantee stronger savings, but they are not as dependent on foreign capital as small countries

Americas 1.60 (p < 0.05) Emerging economies in the Americas save more relative to investment than Africa they rely less on external borrowing statistical significant

Asia 1.44 (p < 0.05)
Asian emerging countries also save more relative to investment than African counterparts consistent with the Asian high-savings model. statistical significant

Europe 0.96 (p > 0.05) Not meaningfully different from Africa likely due to transition economies.

Oceania 5.89 (p < 0.05) Oceania emerging countries have much higher savings relative to investment likely due to resource-export savings surpluses Statistically significant

Africa tends to have chronic savings shortages, driving higher dependence on foreign capital

Volume of imports -0.04 (p < 0.05) Higher imports are associated with a smaller savings–investment gap, meaning countries that import more tend to invest more than they save (p < 0.001 very significant)

Population -0.000 (p > 0.05) This means the number of people itself does NOT determine the savings–investment gap what matters is being structurally small vs large group, not just headcount

In emerging markets, the gap between national savings and investment is shaped mainly by population size (especially small economies saving less), regional structural savings behavior, and the degree of dependence on imports.

Regression Analysis conclusion

Public Debt and GDP Growth

Across models examining growth, countries with lower levels of public debt consistently exhibited higher GDP growth rates. In contrast, economies with very high debt burdens tended to experience weaker growth, though in some cases the effect was statistically insignificant.

This supports the Debt Overhang Theory, which states that when debt is high, governments divert resources toward interest repayments instead of productive investment, lowering long-term growth. However, the insignificant effect for the highest-debt category suggests some countries may still maintain growth if supported by strong institutions or external financing.

Inflation and Growth

Inflation showed a negative and statistically significant relationship with growth. As inflation increases, purchasing power declines, uncertainty rises, and investment decisions become more cautious.

This aligns with Classical and Monetarist economic theory, particularly the Friedman rule, which argues that sustained inflation weakens economic efficiency and reduces long-term output.

Investment and Growth

Total investment had a strong positive effect on GDP growth. Economies that invest more in capital formation infrastructure, industry, education show faster output expansion.

This is consistent with the Solow Growth Model, which identifies capital accumulation as a primary driver of economic growth, especially in developing countries.

Fiscal Position and Inflation

The fiscal-inflation model indicated that countries with fiscal deficits experienced higher inflation, while surplus or balanced governments had lower inflation levels.

This finding supports the Fiscal Theory of the Price Level, which states that inflation can arise when governments finance spending through debt monetization, increasing the money supply.

Savings–Investment Gap

The savings-investment gap was strongly associated with: Small populations → larger savings shortfalls Higher import volumes → negative gap (investment exceeding savings)

Countries in Oceania, Asia, and the Americas tended to have higher savings relative to investment.

These patterns reflect the Open-Economy IS–LM Model, where countries that rely heavily on foreign capital inflows (imports > exports) tend to run investment surpluses relative to domestic savings

Overall, the results show that managing an economy in an emerging country is not simple or one-size-fits-all. Economic performance depends on the balance between government spending, debt levels, inflation control, investment decisions, and the population and trade structure of a country. Different regions and country sizes respond differently to these factors, which means policies have to be adapted to each country’s unique situation rather than copied from others.

This project does more than just run statistical models it connects the numbers to real economic meaning. By using data to understand how growth, inflation, savings, and investment interact, the study provides practical insights that can help guide better economic policy. In a world where economies are closely connected and constantly changing, this kind of data-driven analysis is essential for making informed decisions that support stable growth and development.