Data Dive 8

Load required packages

library(readr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(broom)

Load the World Bank Dataset

#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))

## Rows: 1675 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Time Code, Country Name, Country Code, Region, Income Group
## dbl (14): Time, GDP (constant 2015 US$), GDP growth (annual %), GDP (current...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

world_bank

## # A tibble: 1,675 × 19
##     Time `Time Code` `Country Name` `Country Code` Region         `Income Group`
##    <dbl> <chr>       <chr>          <chr>          <chr>          <chr>         
##  1  2000 YR2000      Brazil         BRA            Latin America… Upper middle …
##  2  2000 YR2000      China          CHN            East Asia & P… Upper middle …
##  3  2000 YR2000      France         FRA            Europe & Cent… High income   
##  4  2000 YR2000      Germany        DEU            Europe & Cent… High income   
##  5  2000 YR2000      India          IND            South Asia     Lower middle …
##  6  2000 YR2000      Indonesia      IDN            East Asia & P… Upper middle …
##  7  2000 YR2000      Italy          ITA            Europe & Cent… High income   
##  8  2000 YR2000      Japan          JPN            East Asia & P… High income   
##  9  2000 YR2000      Korea, Rep.    KOR            East Asia & P… High income   
## 10  2000 YR2000      Mexico         MEX            Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## #   `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## #   `Unemployment, total (% of total labor force)` <dbl>,
## #   `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## #   `Population, total` <dbl>,
## #   `Exports of goods and services (% of GDP)` <dbl>, …

dim(world_bank)

## [1] 1675   19

# Check column data types
glimpse(world_bank)

## Rows: 1,675
## Columns: 19
## $ Time                                                          <dbl> 2000, 20…
## $ `Time Code`                                                   <chr> "YR2000"…
## $ `Country Name`                                                <chr> "Brazil"…
## $ `Country Code`                                                <chr> "BRA", "…
## $ Region                                                        <chr> "Latin A…
## $ `Income Group`                                                <chr> "Upper m…
## $ `GDP (constant 2015 US$)`                                     <dbl> 1.18642e…
## $ `GDP growth (annual %)`                                       <dbl> 4.387949…
## $ `GDP (current US$)`                                           <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)`                <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)`                       <dbl> 7.044141…
## $ `Labor force, total`                                          <dbl> 80295093…
## $ `Population, total`                                           <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)`                    <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)`                    <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)`           <dbl> 5.033917…
## $ `Gross savings (% of GDP)`                                    <dbl> 13.99170…
## $ `Current account balance (% of GDP)`                          <dbl> -4.04774…

# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)

# Clean column names
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

df <- world_bank |> clean_names()
glimpse(df)

## Rows: 1,675
## Columns: 19
## $ time                                                            <int> 2000, …
## $ time_code                                                       <chr> "YR200…
## $ country_name                                                    <chr> "Brazi…
## $ country_code                                                    <chr> "BRA",…
## $ region                                                          <chr> "Latin…
## $ income_group                                                    <chr> "Upper…
## $ gdp_constant_2015_us                                            <dbl> 1.1864…
## $ gdp_growth_annual_percent                                       <dbl> 4.3879…
## $ gdp_current_us                                                  <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force                 <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent                        <dbl> 7.0441…
## $ labor_force_total                                               <dbl> 802950…
## $ population_total                                                <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp                    <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp                    <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp            <dbl> 5.0339…
## $ gross_savings_percent_of_gdp                                    <dbl> 13.991…
## $ current_account_balance_percent_of_gdp                          <dbl> -4.047…

Variables

Response: gdp_growth_annual_percent

Categorical: income_group

Explanation

The response variable selected for this analysis is GDP growth (annual %). GDP growth is an important indicator of economic performance because it reflects how quickly an economy is expanding or contracting. The categorical explanatory variable chosen is income group. Countries are classified into different income groups by the World Bank, and economic growth may differ across these categories due to differences in development levels, industrial structure, and economic policy.

wdi_clean <- df |>
  filter(time == "2024")|>
  select(
    country_name,
    income_group,
    gdp_growth_annual_percent,
    exports_of_goods_and_services_percent_of_gdp,
    inflation_consumer_prices_annual_percent,
    population_total,
    gross_savings_percent_of_gdp
  ) |>
  drop_na()

Hypothesis for ANOVA

Null Hypothesis (H₀):

The mean GDP growth rate is the same across all income groups.

\[H_0: \mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k\]

Alternative Hypothesis (H₁):

At least one income group has a different mean GDP growth rate.

\[H_1: \text{At least one } \mu_i \neq \mu_j \text{ for some } i \neq j\]

Box Plot Visualization

ggplot(wdi_clean, aes(x = income_group, y = gdp_growth_annual_percent)) +
  geom_boxplot(fill = "skyblue") +
  labs(
    title = "GDP Growth by Income Group",
    x = "Income Group",
    y = "GDP Growth (%)"
  ) +
  theme_minimal()

The box plot shows how GDP growth differs across World Bank income groups. Low-income countries appear to have the highest median GDP growth, while high-income countries tend to have lower median growth rates. This suggests that developing economies may experience faster growth compared to more developed economies.

ANOVA

anova_model <- aov(
  gdp_growth_annual_percent ~ income_group,
  data = wdi_clean
)

summary(anova_model)

##              Df Sum Sq Mean Sq F value  Pr(>F)   
## income_group  3  84.78  28.259   5.946 0.00142 **
## Residuals    53 251.87   4.752                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

The ANOVA test examines whether the mean GDP growth rate differs across income groups.

From the output:

F-value = 5.946
p-value = 0.00142.
Since the p-value is less than 0.05, we reject the null hypothesis that all income groups have the same mean GDP growth

Significance and Conclusion

The results indicate that GDP growth varies significantly across income groups. This suggests that a country’s level of economic development may influence its economic growth rate. For policymakers and economists, this may imply that developing economies often experience higher growth as they catch up with more developed countries.

Continuous Variable for regression

gross_savings_percent_of_gdp

ggplot(wdi_clean, aes(
  x = gross_savings_percent_of_gdp,
  y = gdp_growth_annual_percent
)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "GDP Growth vs Gross Savings (% of GDP)",
    x = "Gross Savings (%)",
    y = "GDP Growth (%)"
  ) +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

Explanation

The scatter plot shows the relationship between gross savings (% of GDP) and GDP growth. The upward-sloping regression line suggests a positive relationship, meaning countries with higher savings rates may experience higher economic growth. However, the points are somewhat scattered, indicating that other factors may also influence GDP growth.

Linear Regression Model

lm_model <- lm(
  gdp_growth_annual_percent ~ gross_savings_percent_of_gdp,
  data = wdi_clean
)

summary(lm_model)

## 
## Call:
## lm(formula = gdp_growth_annual_percent ~ gross_savings_percent_of_gdp, 
##     data = wdi_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7063 -1.5161  0.0876  1.3936  6.1868 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                   0.58657    0.86787   0.676  0.50195   
## gross_savings_percent_of_gdp  0.10046    0.03469   2.896  0.00541 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.305 on 55 degrees of freedom
## Multiple R-squared:  0.1323, Adjusted R-squared:  0.1166 
## F-statistic: 8.389 on 1 and 55 DF,  p-value: 0.005409

Interpretation

The linear regression model examines the relationship between gross savings (% of GDP) and GDP growth. The intercept (0.587) represents the predicted GDP growth when the gross savings rate is zero, although this value is mostly a baseline and may not have strong practical meaning in this context. The coefficient for gross_savings_percent_of_gdp (0.10046) indicates that for every 1 percentage point increase in gross savings as a share of GDP, the GDP growth rate is expected to increase by approximately 0.10 percentage points, holding other factors constant.

Evaluation

The p-value for gross savings (0.00541) is less than 0.05, indicating that the relationship between gross savings and GDP growth is statistically significant. This suggests that higher savings rates are associated with higher economic growth. The R² value of 0.132 means that gross savings explain about 13.2% of the variation in GDP growth. While this shows some explanatory power, it also suggests that many other factors such as investment, trade, labor markets, and policy conditions also influence economic growth.

Recommendation

The regression results suggest a positive and statistically significant relationship between gross savings and GDP growth. This implies that countries with higher savings rates tend to experience higher economic growth. Therefore, policymakers may benefit from encouraging higher national savings through financial incentives, stable banking systems, or investment-friendly policies, as increased savings can provide more capital for investment and economic expansion.