Final_Project

Load required packages

library(readr)
library(dplyr)
library(tidyverse)

Load the World Bank Dataset

#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..')) 
world_bank
# A tibble: 1,675 × 19
    Time `Time Code` `Country Name` `Country Code` Region         `Income Group`
   <dbl> <chr>       <chr>          <chr>          <chr>          <chr>         
 1  2000 YR2000      Brazil         BRA            Latin America… Upper middle …
 2  2000 YR2000      China          CHN            East Asia & P… Upper middle …
 3  2000 YR2000      France         FRA            Europe & Cent… High income   
 4  2000 YR2000      Germany        DEU            Europe & Cent… High income   
 5  2000 YR2000      India          IND            South Asia     Lower middle …
 6  2000 YR2000      Indonesia      IDN            East Asia & P… Upper middle …
 7  2000 YR2000      Italy          ITA            Europe & Cent… High income   
 8  2000 YR2000      Japan          JPN            East Asia & P… High income   
 9  2000 YR2000      Korea, Rep.    KOR            East Asia & P… High income   
10  2000 YR2000      Mexico         MEX            Latin America… Upper middle …
# ℹ 1,665 more rows
# ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
#   `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
#   `Unemployment, total (% of total labor force)` <dbl>,
#   `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
#   `Population, total` <dbl>,
#   `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
[1] 1675   19
# Check column data types
glimpse(world_bank)
Rows: 1,675
Columns: 19
$ Time                                                          <dbl> 2000, 20…
$ `Time Code`                                                   <chr> "YR2000"…
$ `Country Name`                                                <chr> "Brazil"…
$ `Country Code`                                                <chr> "BRA", "…
$ Region                                                        <chr> "Latin A…
$ `Income Group`                                                <chr> "Upper m…
$ `GDP (constant 2015 US$)`                                     <dbl> 1.18642e…
$ `GDP growth (annual %)`                                       <dbl> 4.387949…
$ `GDP (current US$)`                                           <dbl> 6.554482…
$ `Unemployment, total (% of total labor force)`                <dbl> NA, 3.70…
$ `Inflation, consumer prices (annual %)`                       <dbl> 7.044141…
$ `Labor force, total`                                          <dbl> 80295093…
$ `Population, total`                                           <dbl> 17401828…
$ `Exports of goods and services (% of GDP)`                    <dbl> 10.18805…
$ `Imports of goods and services (% of GDP)`                    <dbl> 12.45171…
$ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
$ `Foreign direct investment, net inflows (% of GDP)`           <dbl> 5.033917…
$ `Gross savings (% of GDP)`                                    <dbl> 13.99170…
$ `Current account balance (% of GDP)`                          <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)
Rows: 1,675
Columns: 19
$ time                                                            <int> 2000, …
$ time_code                                                       <chr> "YR200…
$ country_name                                                    <chr> "Brazi…
$ country_code                                                    <chr> "BRA",…
$ region                                                          <chr> "Latin…
$ income_group                                                    <chr> "Upper…
$ gdp_constant_2015_us                                            <dbl> 1.1864…
$ gdp_growth_annual_percent                                       <dbl> 4.3879…
$ gdp_current_us                                                  <dbl> 6.5544…
$ unemployment_total_percent_of_total_labor_force                 <dbl> NA, 3.…
$ inflation_consumer_prices_annual_percent                        <dbl> 7.0441…
$ labor_force_total                                               <dbl> 802950…
$ population_total                                                <dbl> 174018…
$ exports_of_goods_and_services_percent_of_gdp                    <dbl> 10.188…
$ imports_of_goods_and_services_percent_of_gdp                    <dbl> 12.451…
$ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
$ foreign_direct_investment_net_inflows_percent_of_gdp            <dbl> 5.0339…
$ gross_savings_percent_of_gdp                                    <dbl> 13.991…
$ current_account_balance_percent_of_gdp                          <dbl> -4.047…

Audience

This analysis is designed for international economic policymakers and global financial organizations (e.g., World Bank analysts or economic advisors) who are interested in understanding how economic growth patterns differ across countries, income groups and regions.

The goal is to support data-driven decisions related to economic development strategies and investment prioritization.

Objective

The primary objective of this project is to analyze the relationship between economic growth, GDP size, and trade patterns across countries.

Specifically, this project aims to answer:

  • How does GDP growth vary across income groups?

  • Do wealthier countries grow differently than developing economies?

  • How do trade indicators like exports relate to economic performance?

The ultimate goal is to derive insights that can inform economic policy and development strategies.

Data Description

The dataset is sourced from the World Bank’s World Development Indicators and includes country-level economic metrics over time.

Key variables used in this analysis include:

  • GDP (constant 2015 US$) - GDP growth (annual %)

  • Population - Income group classification

  • Exports of goods and services (% of GDP)

The dataset spans multiple countries and years, allowing for cross-sectional and time-series analysis of global economic trends.

Exploratory Data Analysis

1. GDP Level vs Growth (Scatter Plot)

This scatterplot compares countries’ total GDP (constant 2015 US$) with their average annual GDP growth, colored by income group and sized by population.

# Prepare data for scatter plot- mean of columns
scatter_data <- df |>
  group_by(country_name, income_group) |>
  summarise(
    Avg_GDP_Growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
    GDP_Constant_2015 = mean(gdp_constant_2015_us, na.rm = TRUE),
    Population = mean(population_total, na.rm = TRUE),
    .groups = "drop"
  )
ggplot(scatter_data, aes(x = Avg_GDP_Growth, y = GDP_Constant_2015, color = income_group, size = Population)) +
  geom_point(alpha = 0.6) +
  labs(
    title = "GDP Level vs Average GDP Growth",
    x = "Average GDP Growth (Annual %)",
    y = "GDP (Constant 2015 US$)",
    size = "Population",
    color = "Income Group"
  ) +
  theme_minimal()

Insight:

High-income countries (e.g., United States) dominate in total GDP but tend to exhibit lower growth rates, while lower-income countries often show higher growth. This suggests a potential convergence effect, where developing economies grow faster than developed ones.

3. Export Patterns by Income Group (Bar Chart)

This bar chart compares the average exports (% of GDP) across income groups.

bar_data <- df |>
  ungroup() |>
  group_by(income_group) |>
  summarise(
    avg_exports = mean(exports_of_goods_and_services_percent_of_gdp, na.rm = TRUE),
    .groups = "drop"
  )
ggplot(bar_data, aes(x = reorder(income_group, avg_exports), y = avg_exports, fill = income_group)) +
  geom_col() +
  labs(
    title = "Average Exports (% of GDP) by Income Group",
    x = "Income Group",
    y = "Average Exports (% of GDP)",
    fill = "Income Group"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Insight:

High-income countries have the highest export share, indicating strong global trade integration. Upper middle-income countries also show significant export activity, while lower middle-income countries lag behind. Low-income countries have the lowest export percentages, suggesting limited participation in international trade. This highlights a clear gap in trade capacity across income groups.

Key Observations

  • High-income countries have large economies but relatively stable and lower growth rates.
  • Lower and middle-income countries tend to grow faster but with greater variability.
  • Trade (exports as % of GDP) varies significantly across income groups, suggesting different economic structures.
  • There may be a relationship between economic development level and growth potential.

Planned Analysis

To build on the exploratory findings, the following analyses are planned:

  • Hypothesis Testing: Test whether GDP growth differs significantly across income groups.

  • Regression Analysis: Model GDP growth as a function of:

    • GDP size
    • Exports (% of GDP)
    • Population
  • Comparative Analysis: Examine whether trade intensity influences economic growth differently across income groups.

Assumptions

  • The World Bank WDI dataset is assumed to be reliable and consistently collected across countries, though minor reporting differences may exist.

  • Missing values are assumed to be random or limited enough that they do not substantially bias the overall analysis, although some distortion is still possible.

  • World Bank income group classifications are assumed to be a reasonable way to represent a country’s level of economic development, even though countries within the same group can still be quite different.

  • Averaging values across years is assumed to provide a meaningful representation of long-term structural trends, while smoothing short-term fluctuations.

  • The analysis includes only countries with sufficiently complete data for key numerical variables; this selection may introduce sample bias toward better-documented or higher-income countries.

Next Steps

  • Perform statistical tests to validate observed differences between income groups.
  • Develop regression models to quantify relationships between variables.
  • Refine visualizations to better communicate insights.
  • Translate findings into actionable economic recommendations.

Slides

View Slide Deck