Load required packages
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
## Rows: 1675 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Time Code, Country Name, Country Code, Region, Income Group
## dbl (14): Time, GDP (constant 2015 US$), GDP growth (annual %), GDP (current...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
world_bank
## # A tibble: 1,675 × 19
## Time `Time Code` `Country Name` `Country Code` Region `Income Group`
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 Brazil BRA Latin America… Upper middle …
## 2 2000 YR2000 China CHN East Asia & P… Upper middle …
## 3 2000 YR2000 France FRA Europe & Cent… High income
## 4 2000 YR2000 Germany DEU Europe & Cent… High income
## 5 2000 YR2000 India IND South Asia Lower middle …
## 6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
## 7 2000 YR2000 Italy ITA Europe & Cent… High income
## 8 2000 YR2000 Japan JPN East Asia & P… High income
## 9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
## 10 2000 YR2000 Mexico MEX Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## # `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## # `Unemployment, total (% of total labor force)` <dbl>,
## # `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## # `Population, total` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675 19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time <dbl> 2000, 20…
## $ `Time Code` <chr> "YR2000"…
## $ `Country Name` <chr> "Brazil"…
## $ `Country Code` <chr> "BRA", "…
## $ Region <chr> "Latin A…
## $ `Income Group` <chr> "Upper m…
## $ `GDP (constant 2015 US$)` <dbl> 1.18642e…
## $ `GDP growth (annual %)` <dbl> 4.387949…
## $ `GDP (current US$)` <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
## $ `Labor force, total` <dbl> 80295093…
## $ `Population, total` <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
## $ `Gross savings (% of GDP)` <dbl> 13.99170…
## $ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
df <- world_bank |> clean_names()
colnames(df)
## [1] "time"
## [2] "time_code"
## [3] "country_name"
## [4] "country_code"
## [5] "region"
## [6] "income_group"
## [7] "gdp_constant_2015_us"
## [8] "gdp_growth_annual_percent"
## [9] "gdp_current_us"
## [10] "unemployment_total_percent_of_total_labor_force"
## [11] "inflation_consumer_prices_annual_percent"
## [12] "labor_force_total"
## [13] "population_total"
## [14] "exports_of_goods_and_services_percent_of_gdp"
## [15] "imports_of_goods_and_services_percent_of_gdp"
## [16] "general_government_final_consumption_expenditure_percent_of_gdp"
## [17] "foreign_direct_investment_net_inflows_percent_of_gdp"
## [18] "gross_savings_percent_of_gdp"
## [19] "current_account_balance_percent_of_gdp"
inflation_consumer_prices_annual_percent (continuous)
Inflation is included as an indicator of macroeconomic stability. High or volatile inflation can reduce purchasing power, create uncertainty, and discourage investment, potentially slowing economic growth.
gross_savings_percent_of_gdp (continuous)
Gross savings is included because it reflects the amount of resources available for investment in an economy. Higher savings can finance capital formation and infrastructure, which are key drivers of economic growth.
exports_of_goods_and_services_percent_of_gdp (continuous)
Exports are included to capture a country’s level of trade openness. Economies that are more integrated into global markets may experience higher growth due to increased demand, specialization, and efficiency gains.
income_group (categorical)
wdi_clean <- df |>
filter(time == "2024")|>
select(
country_name,
income_group,
gdp_growth_annual_percent,
exports_of_goods_and_services_percent_of_gdp,
inflation_consumer_prices_annual_percent,
population_total,
gross_savings_percent_of_gdp
) |>
drop_na()
lm_model2 <- lm(
gdp_growth_annual_percent ~
gross_savings_percent_of_gdp +
inflation_consumer_prices_annual_percent +
exports_of_goods_and_services_percent_of_gdp,
data = wdi_clean
)
summary(lm_model2)
##
## Call:
## lm(formula = gdp_growth_annual_percent ~ gross_savings_percent_of_gdp +
## inflation_consumer_prices_annual_percent + exports_of_goods_and_services_percent_of_gdp,
## data = wdi_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6469 -1.5431 0.1496 1.3035 6.0281
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.009729 0.911967 1.107
## gross_savings_percent_of_gdp 0.098257 0.036285 2.708
## inflation_consumer_prices_annual_percent -0.016359 0.010493 -1.559
## exports_of_goods_and_services_percent_of_gdp -0.006132 0.012069 -0.508
## Pr(>|t|)
## (Intercept) 0.27321
## gross_savings_percent_of_gdp 0.00909 **
## inflation_consumer_prices_annual_percent 0.12495
## exports_of_goods_and_services_percent_of_gdp 0.61351
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.294 on 53 degrees of freedom
## Multiple R-squared: 0.1718, Adjusted R-squared: 0.1249
## F-statistic: 3.664 on 3 and 53 DF, p-value: 0.01788
Interpretation:
The regression results show that gross savings has a positive and statistically significant effect on GDP growth (p = 0.009), indicating that higher savings are associated with higher economic growth. Inflation and exports have negative coefficients, but their effects are not statistically significant, suggesting limited evidence of their impact in this model. The overall model is statistically significant (p = 0.0179), meaning at least one predictor contributes to explaining GDP growth. However, the R² of 0.1718 indicates that the model explains about 17% of the variation, implying other important factors are not included.
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:purrr':
##
## some
## The following object is masked from 'package:dplyr':
##
## recode
vif(lm_model2)
## gross_savings_percent_of_gdp
## 1.104725
## inflation_consumer_prices_annual_percent
## 1.037578
## exports_of_goods_and_services_percent_of_gdp
## 1.113485
All VIF values are close to 1, indicating very low multicollinearity among the predictors. This suggests that the variables are not highly correlated and can be reliably included in the model.
This interaction is included to examine whether the effect of savings on growth differs across income groups. Economic theory suggests that the impact of savings may be stronger in developing economies compared to developed ones.
lm_model3 <- lm(
gdp_growth_annual_percent ~
gross_savings_percent_of_gdp * income_group,
data = wdi_clean
)
summary(lm_model3)
##
## Call:
## lm(formula = gdp_growth_annual_percent ~ gross_savings_percent_of_gdp *
## income_group, data = wdi_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.5530 -0.9471 -0.0180 1.0495 3.9433
##
## Coefficients:
## Estimate
## (Intercept) 1.583898
## gross_savings_percent_of_gdp 0.009002
## income_groupLow income 0.648024
## income_groupLower middle income -1.821792
## income_groupUpper middle income -2.150628
## gross_savings_percent_of_gdp:income_groupLow income 0.224818
## gross_savings_percent_of_gdp:income_groupLower middle income 0.162431
## gross_savings_percent_of_gdp:income_groupUpper middle income 0.144685
## Std. Error t value
## (Intercept) 1.478651 1.071
## gross_savings_percent_of_gdp 0.057567 0.156
## income_groupLow income 2.408601 0.269
## income_groupLower middle income 1.885745 -0.966
## income_groupUpper middle income 1.908860 -1.127
## gross_savings_percent_of_gdp:income_groupLow income 0.114905 1.957
## gross_savings_percent_of_gdp:income_groupLower middle income 0.072914 2.228
## gross_savings_percent_of_gdp:income_groupUpper middle income 0.076170 1.899
## Pr(>|t|)
## (Intercept) 0.2893
## gross_savings_percent_of_gdp 0.8764
## income_groupLow income 0.7890
## income_groupLower middle income 0.3387
## income_groupUpper middle income 0.2654
## gross_savings_percent_of_gdp:income_groupLow income 0.0561 .
## gross_savings_percent_of_gdp:income_groupLower middle income 0.0305 *
## gross_savings_percent_of_gdp:income_groupUpper middle income 0.0634 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.789 on 49 degrees of freedom
## Multiple R-squared: 0.5343, Adjusted R-squared: 0.4678
## F-statistic: 8.031 on 7 and 49 DF, p-value: 1.783e-06
Interpretation:
The interaction model shows that the effect of gross savings on GDP growth varies across income groups, with some interaction terms being statistically significant. This suggests that savings may have a stronger impact on growth in certain income categories, particularly lower middle-income countries. The higher R² (0.53) indicates a substantial improvement in model fit when accounting for these differences.
par(mfrow = c(2, 2))
plot(lm_model2)
Interpretation:
1. Residuals vs Fitted
The residuals are fairly randomly scattered around zero, suggesting that the linearity assumption is reasonably satisfied. There is no strong visible pattern, although slight clustering may indicate minor model misspecification.
Most points lie close to the reference line, indicating that residuals are approximately normally distributed. Some deviation at the extremes suggests mild non-normality, but not severe.
The spread of residuals appears relatively constant, although there is a slight downward trend. This suggests mild heteroscedasticity, but the issue does not appear severe.
Most observations have low leverage, with a few moderate points but none exceeding Cook’s distance threshold. This indicates that there are no highly influential outliers significantly affecting the model.
The multiple regression model improves upon the simple model by incorporating additional macroeconomic variables, with gross savings emerging as the only statistically significant predictor. While inflation and exports were theoretically relevant, they did not show significant effects in this dataset, suggesting their impact may be context-dependent or captured indirectly. Diagnostic plots indicate that model assumptions are largely satisfied, with only minor concerns such as slight heteroscedasticity and non-normality. The interaction model further reveals that the relationship between savings and growth differs across income groups, significantly improving model fit. Overall, the analysis highlights that economic growth is influenced by multiple factors, but additional variables such as labor force or investment may be needed for a more comprehensive model.