Load required packages
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
## Rows: 1675 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Time Code, Country Name, Country Code, Region, Income Group
## dbl (14): Time, GDP (constant 2015 US$), GDP growth (annual %), GDP (current...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
world_bank
## # A tibble: 1,675 × 19
## Time `Time Code` `Country Name` `Country Code` Region `Income Group`
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 Brazil BRA Latin America… Upper middle …
## 2 2000 YR2000 China CHN East Asia & P… Upper middle …
## 3 2000 YR2000 France FRA Europe & Cent… High income
## 4 2000 YR2000 Germany DEU Europe & Cent… High income
## 5 2000 YR2000 India IND South Asia Lower middle …
## 6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
## 7 2000 YR2000 Italy ITA Europe & Cent… High income
## 8 2000 YR2000 Japan JPN East Asia & P… High income
## 9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
## 10 2000 YR2000 Mexico MEX Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## # `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## # `Unemployment, total (% of total labor force)` <dbl>,
## # `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## # `Population, total` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675 19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time <dbl> 2000, 20…
## $ `Time Code` <chr> "YR2000"…
## $ `Country Name` <chr> "Brazil"…
## $ `Country Code` <chr> "BRA", "…
## $ Region <chr> "Latin A…
## $ `Income Group` <chr> "Upper m…
## $ `GDP (constant 2015 US$)` <dbl> 1.18642e…
## $ `GDP growth (annual %)` <dbl> 4.387949…
## $ `GDP (current US$)` <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
## $ `Labor force, total` <dbl> 80295093…
## $ `Population, total` <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
## $ `Gross savings (% of GDP)` <dbl> 13.99170…
## $ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
df <- world_bank |> clean_names()
glimpse(df)
## Rows: 1,675
## Columns: 19
## $ time <int> 2000, …
## $ time_code <chr> "YR200…
## $ country_name <chr> "Brazi…
## $ country_code <chr> "BRA",…
## $ region <chr> "Latin…
## $ income_group <chr> "Upper…
## $ gdp_constant_2015_us <dbl> 1.1864…
## $ gdp_growth_annual_percent <dbl> 4.3879…
## $ gdp_current_us <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent <dbl> 7.0441…
## $ labor_force_total <dbl> 802950…
## $ population_total <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp <dbl> 5.0339…
## $ gross_savings_percent_of_gdp <dbl> 13.991…
## $ current_account_balance_percent_of_gdp <dbl> -4.047…
Response: gdp_growth_annual_percent
Categorical: income_group
Explanation
The response variable selected for this analysis is GDP growth (annual %). GDP growth is an important indicator of economic performance because it reflects how quickly an economy is expanding or contracting. The categorical explanatory variable chosen is income group. Countries are classified into different income groups by the World Bank, and economic growth may differ across these categories due to differences in development levels, industrial structure, and economic policy.
wdi_clean <- df |>
filter(time == "2024")|>
select(
country_name,
income_group,
gdp_growth_annual_percent,
exports_of_goods_and_services_percent_of_gdp,
inflation_consumer_prices_annual_percent,
population_total,
gross_savings_percent_of_gdp
) |>
drop_na()
Null Hypothesis (H₀):
The mean GDP growth rate is the same across all income groups.
\[H_0: \mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k\]
Alternative Hypothesis (H₁):
At least one income group has a different mean GDP growth rate.
\[H_1: \text{At least one } \mu_i \neq \mu_j \text{ for some } i \neq j\]
ggplot(wdi_clean, aes(x = income_group, y = gdp_growth_annual_percent)) +
geom_boxplot(fill = "skyblue") +
labs(
title = "GDP Growth by Income Group",
x = "Income Group",
y = "GDP Growth (%)"
) +
theme_minimal()
The box plot shows how GDP growth differs across World Bank income groups. Low-income countries appear to have the highest median GDP growth, while high-income countries tend to have lower median growth rates. This suggests that developing economies may experience faster growth compared to more developed economies.
anova_model <- aov(
gdp_growth_annual_percent ~ income_group,
data = wdi_clean
)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## income_group 3 84.78 28.259 5.946 0.00142 **
## Residuals 53 251.87 4.752
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation
The ANOVA test examines whether the mean GDP growth rate differs across income groups.
From the output:
F-value = 5.946
p-value = 0.00142.
Since the p-value is less than 0.05, we reject the null hypothesis that all income groups have the same mean GDP growth
Significance and Conclusion
The results indicate that GDP growth varies significantly across income groups. This suggests that a country’s level of economic development may influence its economic growth rate. For policymakers and economists, this may imply that developing economies often experience higher growth as they catch up with more developed countries.
gross_savings_percent_of_gdp
ggplot(wdi_clean, aes(
x = gross_savings_percent_of_gdp,
y = gdp_growth_annual_percent
)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "GDP Growth vs Gross Savings (% of GDP)",
x = "Gross Savings (%)",
y = "GDP Growth (%)"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Explanation
The scatter plot shows the relationship between gross savings (% of GDP) and GDP growth. The upward-sloping regression line suggests a positive relationship, meaning countries with higher savings rates may experience higher economic growth. However, the points are somewhat scattered, indicating that other factors may also influence GDP growth.
lm_model <- lm(
gdp_growth_annual_percent ~ gross_savings_percent_of_gdp,
data = wdi_clean
)
summary(lm_model)
##
## Call:
## lm(formula = gdp_growth_annual_percent ~ gross_savings_percent_of_gdp,
## data = wdi_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7063 -1.5161 0.0876 1.3936 6.1868
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.58657 0.86787 0.676 0.50195
## gross_savings_percent_of_gdp 0.10046 0.03469 2.896 0.00541 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.305 on 55 degrees of freedom
## Multiple R-squared: 0.1323, Adjusted R-squared: 0.1166
## F-statistic: 8.389 on 1 and 55 DF, p-value: 0.005409
Interpretation
The linear regression model examines the relationship between gross savings (% of GDP) and GDP growth. The intercept (0.587) represents the predicted GDP growth when the gross savings rate is zero, although this value is mostly a baseline and may not have strong practical meaning in this context. The coefficient for gross_savings_percent_of_gdp (0.10046) indicates that for every 1 percentage point increase in gross savings as a share of GDP, the GDP growth rate is expected to increase by approximately 0.10 percentage points, holding other factors constant.
Evaluation
The p-value for gross savings (0.00541) is less than 0.05, indicating that the relationship between gross savings and GDP growth is statistically significant. This suggests that higher savings rates are associated with higher economic growth. The R² value of 0.132 means that gross savings explain about 13.2% of the variation in GDP growth. While this shows some explanatory power, it also suggests that many other factors such as investment, trade, labor markets, and policy conditions also influence economic growth.
Recommendation
The regression results suggest a positive and statistically significant relationship between gross savings and GDP growth. This implies that countries with higher savings rates tend to experience higher economic growth. Therefore, policymakers may benefit from encouraging higher national savings through financial incentives, stable banking systems, or investment-friendly policies, as increased savings can provide more capital for investment and economic expansion.