Load required packages
library(readr)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(corrplot)
library(janitor)
library(effsize)
library(scales)
library(broom)
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
world_bank
## # A tibble: 1,675 × 19
## Time `Time Code` `Country Name` `Country Code` Region `Income Group`
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 Brazil BRA Latin America… Upper middle …
## 2 2000 YR2000 China CHN East Asia & P… Upper middle …
## 3 2000 YR2000 France FRA Europe & Cent… High income
## 4 2000 YR2000 Germany DEU Europe & Cent… High income
## 5 2000 YR2000 India IND South Asia Lower middle …
## 6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
## 7 2000 YR2000 Italy ITA Europe & Cent… High income
## 8 2000 YR2000 Japan JPN East Asia & P… High income
## 9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
## 10 2000 YR2000 Mexico MEX Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## # `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## # `Unemployment, total (% of total labor force)` <dbl>,
## # `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## # `Population, total` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675 19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time <dbl> 2000, 20…
## $ `Time Code` <chr> "YR2000"…
## $ `Country Name` <chr> "Brazil"…
## $ `Country Code` <chr> "BRA", "…
## $ Region <chr> "Latin A…
## $ `Income Group` <chr> "Upper m…
## $ `GDP (constant 2015 US$)` <dbl> 1.18642e…
## $ `GDP growth (annual %)` <dbl> 4.387949…
## $ `GDP (current US$)` <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
## $ `Labor force, total` <dbl> 80295093…
## $ `Population, total` <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
## $ `Gross savings (% of GDP)` <dbl> 13.99170…
## $ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
df <- world_bank |> clean_names()
glimpse(df)
## Rows: 1,675
## Columns: 19
## $ time <int> 2000, …
## $ time_code <chr> "YR200…
## $ country_name <chr> "Brazi…
## $ country_code <chr> "BRA",…
## $ region <chr> "Latin…
## $ income_group <chr> "Upper…
## $ gdp_constant_2015_us <dbl> 1.1864…
## $ gdp_growth_annual_percent <dbl> 4.3879…
## $ gdp_current_us <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent <dbl> 7.0441…
## $ labor_force_total <dbl> 802950…
## $ population_total <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp <dbl> 5.0339…
## $ gross_savings_percent_of_gdp <dbl> 13.991…
## $ current_account_balance_percent_of_gdp <dbl> -4.047…
This analysis is designed for international economic policymakers and global financial organizations (e.g., World Bank analysts or economic advisors) who are interested in understanding how economic growth patterns differ across countries, income groups and regions.
The goal is to support data-driven decisions related to economic development strategies and investment prioritization.
The primary objective of this project is to analyze the relationship between economic growth, GDP size, and trade patterns across countries.
Specifically, this project aims to answer:
How does GDP growth vary across income groups?
Do wealthier countries grow differently than developing economies?
How do trade indicators like exports relate to economic performance?
How do unemployment and macroeconomic indicators correlate with income level?
The ultimate goal is to derive insights that can inform economic policy and development strategies.
The dataset is sourced from the World Bank’s World Development Indicators and includes country-level economic metrics over time.
Key variables used in this analysis include:
GDP (constant 2015 US$)
GDP growth (annual %)
Population, Total
Exports of goods and services (% of GDP)
Unemployment (% of Labor Force)
Inflation, consumer prices (annual %)
Gross Savings (% of GDP)
The dataset spans multiple countries, years and income groups allowing for cross-sectional and time-series analysis of global economic trends.
Countries under different Income Groups
df |>
filter(time == "2024") |>
arrange(income_group, country_name) |>
select(income_group, country_name)
## # A tibble: 67 × 2
## income_group country_name
## <chr> <chr>
## 1 High income Australia
## 2 High income Bulgaria
## 3 High income Canada
## 4 High income Chile
## 5 High income Costa Rica
## 6 High income Finland
## 7 High income France
## 8 High income Germany
## 9 High income Israel
## 10 High income Italy
## # ℹ 57 more rows
This scatterplot compares countries’ total GDP (constant 2015 US$) with their average annual GDP growth, colored by income group and sized by population.
# Prepare data for scatter plot- mean of columns
scatter_data <- df |>
group_by(country_name, income_group) |>
summarise(
Avg_GDP_Growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
GDP_Constant_2015 = mean(gdp_constant_2015_us, na.rm = TRUE),
Population = mean(population_total, na.rm = TRUE),
.groups = "drop"
)
ggplot(scatter_data, aes(x = Avg_GDP_Growth, y = GDP_Constant_2015, color = income_group, size = Population)) +
geom_point(alpha = 0.6) +
labs(
title = "GDP Level vs Average GDP Growth",
x = "Average GDP Growth (Annual %)",
y = "GDP (Constant 2015 US$)",
size = "Population",
color = "Income Group"
) +
theme_minimal()
Insight:
High-income countries (e.g., United States) dominate in total GDP but tend to exhibit lower growth rates, while lower-income countries often show higher growth. This suggests a potential convergence effect, where developing economies grow faster than developed ones.
This line chart shows how GDP growth has evolved over time across different income groups.
# Prepare data: average GDP growth per year per income group
line_data <- df |>
group_by(time, income_group) |>
summarise(
avg_gdp_growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
.groups = "drop"
)
ggplot(line_data, aes(x = time, y = avg_gdp_growth, color = income_group)) +
geom_line(size = 1) +
labs(
title = "GDP Growth Trends Over Time by Income Group",
x = "Year",
y = "Average GDP Growth (%)",
color = "Income Group"
) +
theme_minimal()
Insight:
The line chart displays the trend in average annual GDP growth, where high-income countries consistently exhibit the lowest growth rates, while low-income countries show the highest. Despite these differences, the overall downward trend across all income groups suggests global economic growth has slowed over time, particularly among more developed economies. A noticeable dip around 2020 across all groups indicates a global economic shock affecting all economies.
This bar chart compares the average exports (% of GDP) across income groups.
bar_data <- df |>
ungroup() |>
group_by(income_group) |>
summarise(
avg_exports = mean(exports_of_goods_and_services_percent_of_gdp, na.rm = TRUE),
.groups = "drop"
)
ggplot(bar_data, aes(x = reorder(income_group, avg_exports),
y = avg_exports,
fill = income_group)) +
geom_col() +
geom_text(aes(label = round(avg_exports, 2)),
vjust = -0.5,
size = 3.5) +
labs(
title = "Average Exports (% of GDP) by Income Group",
x = "Income Group",
y = "Average Exports (% of GDP)",
fill = "Income Group"
) +
theme_minimal() +
theme(legend.position = "none")
Insight:
High-income countries have the highest export share, indicating strong global trade integration. Upper middle-income countries also show significant export activity, while lower middle-income countries lag behind. Low-income countries have the lowest export percentages, suggesting limited participation in international trade. This highlights a clear gap in trade capacity across income groups.
ggplot(df, aes(x = income_group,
y = unemployment_total_percent_of_total_labor_force,
fill = income_group)) +
geom_boxplot(alpha = 0.7) +
labs(
title = "Unemployment Distribution by Income Group",
x = "Income Group",
y = "Unemployment (%)"
) +
theme_minimal() +
theme(legend.position = "none")
Insight:
High-income countries are mostly clustered around about 6% unemployment, but they also have some high outliers above 25%, showing rare but severe spikes in joblessness despite overall stability.
Low-income countries have the lowest median unemployment (around 3.5%), which is likely because many people work in informal or subsistence jobs that are not counted in official unemployment rates.
Upper-middle-income countries show the widest range (about 3% to 11%), meaning unemployment varies a lot across them, likely due to ongoing economic and structural changes.
ggplot(df, aes(x = income_group,
y = inflation_consumer_prices_annual_percent,
fill = income_group)) +
geom_boxplot(alpha = 0.7) +
coord_cartesian(ylim = c(0, 30)) + # adjust range as needed
labs(
title = "Inflation Distribution by Income Group",
x = "Income Group",
y = "Inflation (%)"
) +
theme_minimal() +
theme(legend.position = "none")
Insights:
High-income countries have the lowest and most stable inflation, with a median around 2–3%, reflecting strong central banks and stable monetary policies.
Low-income and lower-middle-income countries both have higher median inflation (around 6%) and large upper outliers reaching up to 30%+, showing they are more exposed to shocks, currency fluctuations, and weaker policy control.
Upper-middle-income countries fall in between, with a moderate median inflation (~5%) and a more contained spread, suggesting improving price stability but still not as steady as high-income countries.
The World Bank WDI dataset is assumed to be reliable and consistently collected across countries, though minor reporting differences may exist.
Missing values are assumed to be random or limited enough that they do not substantially bias the overall analysis, although some distortion is still possible.
World Bank income group classifications are assumed to be a reasonable way to represent a country’s level of economic development, even though countries within the same group can still be quite different.
Averaging values across years is assumed to provide a meaningful representation of long-term structural trends, while smoothing short-term fluctuations.
The analysis includes only countries with sufficiently complete data for key numerical variables; this selection may introduce sample bias toward better-documented or higher-income countries.
Main Variable: gdp_growth_annual_percent (continuous)
Grouping Variable: income_group
Let:
Group A = High income countries
Group B = Middle & Low income countries
df$income_binary <- ifelse(df$income_group == "High income",
"High income",
"Non High income")
df$income_binary <- as.factor(df$income_binary)
table(df$income_binary)
##
## High income Non High income
## 600 1075
\[H_0: \mu_{\text{High Income}} = \mu_{\text{Non-High Income}}\] \[H_1: \mu_{\text{High Income}} \neq \mu_{\text{Non-High Income}}\]
Two Sample t-test
α = 0.05
Reason : Standard in economics. False positive (claiming difference when none exists) is moderately costly but acceptable at 5%.
Power (1 − β) = 0.8
Reason : We want 80% probability of detecting a meaningful difference.
Minimum Effect Size (Coehn’s d) = 0.3
Reason : A small-to-moderate difference in GDP growth (around 1 percentage point) is economically meaningful at the macro level. Even small growth differences compound over time.
t_test1 <- t.test(gdp_growth_annual_percent ~ income_binary,
data = df,
var.equal = FALSE)
t_test1
##
## Welch Two Sample t-test
##
## data: gdp_growth_annual_percent by income_binary
## t = -11.271, df = 1593.5, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group High income and group Non High income is not equal to 0
## 95 percent confidence interval:
## -2.439632 -1.716378
## sample estimates:
## mean in group High income mean in group Non High income
## 2.565952 4.643957
The mean GDP growth rate for: High-income countries: 2.57%
Non-high-income countries: 4.64%
Because the p-value is far below the chosen significance level (α = 0.05), we reject the null hypothesis that the two groups have equal mean GDP growth rates.
cohen.d(gdp_growth_annual_percent ~ income_binary, data = df)
##
## Cohen's d
##
## d estimate: -0.5206472 (medium)
## 95 percent confidence interval:
## lower upper
## -0.6221788 -0.4191155
This represents a medium effect size, indicating that the difference is not only statistically significant but also practically meaningful.
alpha <- 0.05
result_text <- ifelse(t_test1$p.value < alpha,
paste0("We **reject** the null hypothesis (p = ", round(t_test1$p.value, 4), " < α = ", alpha, ")."),
paste0("We **fail to reject** the null hypothesis (p = ", round(t_test1$p.value, 4), " ≥ α = ", alpha, ")."))
cat(result_text)
## We **reject** the null hypothesis (p = 0 < α = 0.05).
The negative sign of the test statistic and effect size indicates that high-income countries experience, on average, lower GDP growth rates compared to middle- and low-income countries. Economically, this finding is consistent with growth theory as developing economies often grow faster due to industrial expansion, capital accumulation, and structural transformation.High-income economies tend to grow more slowly because they are already near the technological and productivity frontier.
ggplot(df, aes(x = income_binary,
y = gdp_growth_annual_percent)) +
geom_boxplot() +
labs(title = "GDP Growth by Income Group",
x = "Income Group",
y = "GDP Growth (%)")
The boxplot shows that non-high-income countries have a higher median GDP growth rate than high-income countries. Non-high-income economies also display greater variability and more extreme growth outliers, indicating more volatile growth patterns. In contrast, high-income countries exhibit lower but more stable growth rates.
Main Variable: Binary, whether
gdp_growth_annual_percent is above or below its median
(“high growth” = success)
Group A: Countries with exports ≥ median exports share of GDP (“High Exports”)
Group B: Countries with exports < median (“Low Exports”)
\[H_0: P(\text{high growth} \mid \text{high exports}) = P(\text{high growth} \mid \text{low exports})\] \[H_1: P(\text{high growth} \mid \text{high exports}) \neq P(\text{high growth} \mid \text{low exports})\]
median_exports <- median(df$exports_of_goods_and_services_percent_of_gdp,
na.rm = TRUE)
df$high_exports <- ifelse(
df$exports_of_goods_and_services_percent_of_gdp >= median_exports,
"High Exports",
"Low Exports"
)
df$high_exports <- as.factor(df$high_exports)
median_growth <- median(df$gdp_growth_annual_percent, na.rm = TRUE)
df$high_growth <- ifelse(
df$gdp_growth_annual_percent >= median_growth,
"High Growth",
"Low Growth"
)
contingency_table <- table(df$high_exports, df$high_growth)
print(contingency_table)
##
## High Growth Low Growth
## High Exports 372 425
## Low Exports 416 380
# Use chi-squared if all expected counts >= 5, else Fisher's Exact
expected_counts <- chisq.test(contingency_table)$expected
use_fisher <- any(expected_counts < 5)
if (use_fisher) {
cat("\nUsing Fisher's Exact Test (some expected counts < 5)\n")
h2_result <- fisher.test(contingency_table)
} else {
cat("\nUsing Chi-Squared Test\n")
h2_result <- chisq.test(contingency_table)
}
##
## Using Chi-Squared Test
print(h2_result)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 4.7508, df = 1, p-value = 0.02928
cat("p-value:", round(h2_result$p.value, 5), "\n\n")
## p-value: 0.02928
if (h2_result$p.value < 0.05) {
cat("The p-value is below 0.05. Under Fisher's framework, this is strong evidence against the null hypothesis.\n",
"The data suggest that export intensity and GDP growth classification are NOT independent.\n")
} else {
cat("The p-value is above 0.05. Under Fisher's framework, the data do not provide strong evidence against the null.\n",
"We cannot confidently claim export intensity is associated with GDP growth classification.\n")
}
## The p-value is below 0.05. Under Fisher's framework, this is strong evidence against the null hypothesis.
## The data suggest that export intensity and GDP growth classification are NOT independent.
The Chi-squared test indicates a statistically significant association between export intensity and GDP growth classification (χ²(1) = 4.75, p = 0.029). Since the p-value is below 0.05, we reject the null hypothesis of independence under Fisher’s framework. This suggests that countries with different export intensities are not equally likely to experience above-median GDP growth, indicating a meaningful relationship between trade openness and economic performance.
df_clean <- df[!is.na(df$high_exports) & !is.na(df$high_growth), ]
prop_table <- df_clean |>
group_by(high_exports, high_growth) |>
summarise(n = n(), .groups = "drop") |>
group_by(high_exports) |>
mutate(prop = n / sum(n))
ggplot(prop_table, aes(x = high_exports, y = prop, fill = high_growth)) +
geom_col() +
geom_text(aes(label = paste0(round(prop * 100, 1), "%")),
position = position_stack(vjust = 0.5),
color = "white", fontface = "bold", size = 4) +
scale_y_continuous(labels = percent_format()) +
labs(title = "Percentage of High Growth by Export Intensity",
x = "Export Group",
y = "Percentage",
fill = "Growth Category") +
theme_minimal()
The chart shows that 52.3% of low-export countries experienced high growth, compared to 46.7% of high-export countries. Although the difference is modest, the Chi-square test confirms that this variation is statistically significant (p = 0.029). This suggests export intensity and GDP growth classification are related, though the effect appears small in magnitude.
# Using data from latest datayear
wdi_clean <- df |>
filter(time == "2024")|>
select(
country_name,
income_group,
gdp_growth_annual_percent,
exports_of_goods_and_services_percent_of_gdp,
inflation_consumer_prices_annual_percent,
population_total,
gross_savings_percent_of_gdp
) |>
drop_na()
lm_model <- lm(
gdp_growth_annual_percent ~ gross_savings_percent_of_gdp,
data = wdi_clean
)
summary(lm_model)
##
## Call:
## lm(formula = gdp_growth_annual_percent ~ gross_savings_percent_of_gdp,
## data = wdi_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7063 -1.5161 0.0876 1.3936 6.1868
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.58657 0.86787 0.676 0.50195
## gross_savings_percent_of_gdp 0.10046 0.03469 2.896 0.00541 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.305 on 55 degrees of freedom
## Multiple R-squared: 0.1323, Adjusted R-squared: 0.1166
## F-statistic: 8.389 on 1 and 55 DF, p-value: 0.005409
Linear Regression Form
\[ \text{GDP Growth}_i = 0.5866 + 0.1005 \cdot (\text{Gross Savings}_i) + \epsilon_i \]
Interpretation
The linear regression model examines the relationship between gross savings (% of GDP) and GDP growth. The intercept (0.587) represents the predicted GDP growth when the gross savings rate is zero, although this value is mostly a baseline and may not have strong practical meaning in this context. The coefficient for gross_savings_percent_of_gdp (0.10046) indicates that for every 1 percentage point increase in gross savings as a share of GDP, the GDP growth rate is expected to increase by approximately 0.10 percentage points, holding other factors constant.
Evaluation
The p-value for gross savings (0.00541) is less than 0.05, indicating that the relationship between gross savings and GDP growth is statistically significant. This suggests that higher savings rates are associated with higher economic growth. The R² value of 0.132 means that gross savings explain about 13.2% of the variation in GDP growth. While this shows some explanatory power, it also suggests that many other factors such as investment, trade, labor markets, and policy conditions also influence economic growth.
inflation_consumer_prices_annual_percent (continuous)
Inflation is included as an indicator of macroeconomic stability. High or volatile inflation can reduce purchasing power, create uncertainty, and discourage investment, potentially slowing economic growth.
gross_savings_percent_of_gdp (continuous)
Gross savings is included because it reflects the amount of resources available for investment in an economy. Higher savings can finance capital formation and infrastructure, which are key drivers of economic growth.
exports_of_goods_and_services_percent_of_gdp (continuous)
Exports are included to capture a country’s level of trade openness. Economies that are more integrated into global markets may experience higher growth due to increased demand, specialization, and efficiency gains.
income_group (categorical)
lm_model2 <- lm(
gdp_growth_annual_percent ~
gross_savings_percent_of_gdp +
inflation_consumer_prices_annual_percent +
exports_of_goods_and_services_percent_of_gdp,
data = wdi_clean
)
summary(lm_model2)
##
## Call:
## lm(formula = gdp_growth_annual_percent ~ gross_savings_percent_of_gdp +
## inflation_consumer_prices_annual_percent + exports_of_goods_and_services_percent_of_gdp,
## data = wdi_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6469 -1.5431 0.1496 1.3035 6.0281
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.009729 0.911967 1.107
## gross_savings_percent_of_gdp 0.098257 0.036285 2.708
## inflation_consumer_prices_annual_percent -0.016359 0.010493 -1.559
## exports_of_goods_and_services_percent_of_gdp -0.006132 0.012069 -0.508
## Pr(>|t|)
## (Intercept) 0.27321
## gross_savings_percent_of_gdp 0.00909 **
## inflation_consumer_prices_annual_percent 0.12495
## exports_of_goods_and_services_percent_of_gdp 0.61351
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.294 on 53 degrees of freedom
## Multiple R-squared: 0.1718, Adjusted R-squared: 0.1249
## F-statistic: 3.664 on 3 and 53 DF, p-value: 0.01788
Linear Regression Form
\[ \text{GDP Growth}_i = 1.0097 + 0.0983 \cdot (\text{Gross Savings}_i) - 0.0164 \cdot (\text{Inflation}_i) - 0.0061 \cdot (\text{Exports}_i) + \epsilon_i \]
Interpretation:
The regression results show that gross savings has a positive and statistically significant effect on GDP growth (p = 0.009), indicating that higher savings are associated with higher economic growth. Inflation and exports have negative coefficients, but their effects are not statistically significant, suggesting limited evidence of their impact in this model. The overall model is statistically significant (p = 0.0179), meaning at least one predictor contributes to explaining GDP growth. However, the R² of 0.1718 indicates that the model explains about 17% of the variation, implying other important factors are not included.
wdi_clean$pred <- predict(lm_model2)
ggplot(wdi_clean, aes(x = pred, y = gdp_growth_annual_percent)) +
geom_point(alpha = 0.6) +
geom_abline(slope = 1, intercept = 0, color = "red") +
labs(title = "Actual vs Predicted GDP Growth",
x = "Predicted",
y = "Actual")
The plot shows the relationship between actual and predicted GDP growth values from the regression model. Points closer to the red 45-degree line indicate more accurate predictions, while larger deviations reflect prediction errors. Overall, the model captures general trends but shows noticeable dispersion, suggesting moderate predictive accuracy.
library(car)
vif(lm_model2)
## gross_savings_percent_of_gdp
## 1.104725
## inflation_consumer_prices_annual_percent
## 1.037578
## exports_of_goods_and_services_percent_of_gdp
## 1.113485
All VIF values are close to 1, indicating very low multicollinearity among the predictors. This suggests that the variables are not highly correlated and can be reliably included in the model.
par(mfrow = c(2, 2))
plot(lm_model2)
Interpretation:
The residuals are fairly randomly scattered around zero, suggesting that the linearity assumption is reasonably satisfied. There is no strong visible pattern, although slight clustering may indicate minor model misspecification.
Most points lie close to the reference line, indicating that residuals are approximately normally distributed. Some deviation at the extremes suggests mild non-normality, but not severe.
The spread of residuals appears relatively constant, although there is a slight downward trend. This suggests mild heteroscedasticity, but the issue does not appear severe.
Most observations have low leverage, with a few moderate points but none exceeding Cook’s distance threshold. This indicates that there are no highly influential outliers significantly affecting the model.
# Add residuals to dataset
wdi_clean$residuals <- resid(lm_model2)
# Residuals vs Savings
ggplot(wdi_clean, aes(x = gross_savings_percent_of_gdp, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Gross Savings", x = "Gross Savings (% GDP)", y = "Residuals")
# Residuals vs Inflation
ggplot(wdi_clean, aes(x = inflation_consumer_prices_annual_percent, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Inflation", x = "Inflation (%)", y = "Residuals")
# Residuals vs Exports
ggplot(wdi_clean, aes(x = exports_of_goods_and_services_percent_of_gdp, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Exports", x = "Exports (% GDP)", y = "Residuals")
numeric_data <- wdi_clean %>%
select(
gdp_growth_annual_percent,
gross_savings_percent_of_gdp,
inflation_consumer_prices_annual_percent,
exports_of_goods_and_services_percent_of_gdp
)
cor_matrix <- cor(numeric_data)
labels <- c("GDP growth", "Gross savings", "Inflation", "Exports % GDP")
colnames(cor_matrix) <- rownames(cor_matrix) <- labels
# Plot
corrplot(cor_matrix,
method = "color",
addCoef.col = "black",
tl.col = "black",
tl.srt = 45,
tl.cex = 0.85,
number.cex = 0.8
)
The strongest relationship is between GDP growth and gross savings (r = 0.36), suggesting countries that save more tend to grow faster. Inflation has a weak negative correlation with both GDP growth (−0.24) and gross savings (−0.14), indicating higher inflation slightly dampens both. Exports % GDP shows very weak correlations with everything, implying it operates largely independently of the other three variables.
ggplot(wdi_clean, aes(x = residuals)) +
geom_histogram(bins = 20) +
labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency")
The residuals are roughly bell-shaped and centered near zero, which is a good sign that the linear regression assumptions are reasonably met. However, the distribution is slightly irregular with some gaps and a few outliers beyond ±4, suggesting mild non-normality. Overall the model’s errors are acceptably distributed but not perfectly normal.
cooks_d <- cooks.distance(lm_model2)
plot(cooks_d, type = "h", main = "Cook's Distance", ylab = "Cook's D")
abline(h = 4/length(cooks_d), col = "red")
Observation ~21 has a Cook’s D of ~0.52, far exceeding the red threshold line (~0.07), flagging it as a highly influential point that could be distorting the regression coefficients. Observation ~56 also crosses the threshold slightly and warrants attention. The vast majority of observations sit well below the cutoff, meaning one or two data points are driving most of the leverage concern.
The multiple regression model improves upon the simple model by incorporating additional macroeconomic variables, with gross savings emerging as the only statistically significant predictor. While inflation and exports were theoretically relevant, they did not show significant effects in this dataset, suggesting their impact may be context-dependent or captured indirectly. Diagnostic plots indicate that model assumptions are largely satisfied, with only minor concerns such as slight heteroscedasticity and non-normality. The interaction model further reveals that the relationship between savings and growth differs across income groups, significantly improving model fit. Overall, the analysis highlights that economic growth is influenced by multiple factors, but additional variables such as labor force or investment may be needed for a more comprehensive model.
This analysis provides evidence that economic growth patterns differ significantly across countries based on income level, structural factors, and macroeconomic conditions.
High-income countries experience significantly lower GDP growth rates compared to middle- and low-income countries, supporting the economic theory of convergence, where developing economies grow faster through industrialization, capital accumulation, and structural transformation.
The association between export intensity and GDP growth classification is statistically significant, but the effect size is modest, suggesting that trade openness alone is not a dominant driver of whether a country achieves high growth.
Regression analysis highlights gross savings as a consistent and significant predictor of GDP growth, higher savings rates are associated with higher economic growth, indicating the importance of domestic resource mobilization for investment and long-term development.
Inflation and export share were not statistically significant in the multivariate model, suggesting their effects may be indirect, context-specific, or captured through other economic channels.
Overall, while structural factors like income level explain broad growth differences, internal economic capacity, particularly savings and investment, plays a more direct role in driving growth outcomes.
Based on the findings, the following recommendations are proposed for policymakers and international economic organizations:
Boost Savings and Investment:
Since savings is the strongest predictor of growth, governments should strengthen financial systems, encourage household savings, and expand access to investment. More savings means more capital for infrastructure and long-term development.
Support Structural Growth in Poorer Countries
Lower and middle-income countries grow faster but less steadily. Governments should focus on building stronger institutions, diversifying their economies, and improving governance to make growth more stable and sustainable.
Don’t Rely Only on Exports
Export intensity is linked to growth, but the effect is small. Trade promotion alone is not enough, countries need a balanced approach that also includes domestic investment, industrial policy, and education.
Keep Inflation Under Control
Inflation was not significant in the model, but low-income countries still show signs of price instability. Stable monetary policy reduces uncertainty and creates a better environment for investment and growth.
Collect Better Data
The models explain only a limited share of growth variation, meaning key factors like education, labor productivity, and institutional quality are missing. Better data collection and monitoring would lead to more accurate analysis and smarter policy decisions.