Final_Project

Load required packages

library(readr)
library(dplyr)
library(tidyverse)

Load the World Bank Dataset

#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..')) 
world_bank

# A tibble: 1,675 × 19
    Time `Time Code` `Country Name` `Country Code` Region         `Income Group`
   <dbl> <chr>       <chr>          <chr>          <chr>          <chr>         
 1  2000 YR2000      Brazil         BRA            Latin America… Upper middle …
 2  2000 YR2000      China          CHN            East Asia & P… Upper middle …
 3  2000 YR2000      France         FRA            Europe & Cent… High income   
 4  2000 YR2000      Germany        DEU            Europe & Cent… High income   
 5  2000 YR2000      India          IND            South Asia     Lower middle …
 6  2000 YR2000      Indonesia      IDN            East Asia & P… Upper middle …
 7  2000 YR2000      Italy          ITA            Europe & Cent… High income   
 8  2000 YR2000      Japan          JPN            East Asia & P… High income   
 9  2000 YR2000      Korea, Rep.    KOR            East Asia & P… High income   
10  2000 YR2000      Mexico         MEX            Latin America… Upper middle …
# ℹ 1,665 more rows
# ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
#   `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
#   `Unemployment, total (% of total labor force)` <dbl>,
#   `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
#   `Population, total` <dbl>,
#   `Exports of goods and services (% of GDP)` <dbl>, …

dim(world_bank)

[1] 1675   19

# Check column data types
glimpse(world_bank)

Rows: 1,675
Columns: 19
$ Time                                                          <dbl> 2000, 20…
$ `Time Code`                                                   <chr> "YR2000"…
$ `Country Name`                                                <chr> "Brazil"…
$ `Country Code`                                                <chr> "BRA", "…
$ Region                                                        <chr> "Latin A…
$ `Income Group`                                                <chr> "Upper m…
$ `GDP (constant 2015 US$)`                                     <dbl> 1.18642e…
$ `GDP growth (annual %)`                                       <dbl> 4.387949…
$ `GDP (current US$)`                                           <dbl> 6.554482…
$ `Unemployment, total (% of total labor force)`                <dbl> NA, 3.70…
$ `Inflation, consumer prices (annual %)`                       <dbl> 7.044141…
$ `Labor force, total`                                          <dbl> 80295093…
$ `Population, total`                                           <dbl> 17401828…
$ `Exports of goods and services (% of GDP)`                    <dbl> 10.18805…
$ `Imports of goods and services (% of GDP)`                    <dbl> 12.45171…
$ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
$ `Foreign direct investment, net inflows (% of GDP)`           <dbl> 5.033917…
$ `Gross savings (% of GDP)`                                    <dbl> 13.99170…
$ `Current account balance (% of GDP)`                          <dbl> -4.04774…

# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)

# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)

Rows: 1,675
Columns: 19
$ time                                                            <int> 2000, …
$ time_code                                                       <chr> "YR200…
$ country_name                                                    <chr> "Brazi…
$ country_code                                                    <chr> "BRA",…
$ region                                                          <chr> "Latin…
$ income_group                                                    <chr> "Upper…
$ gdp_constant_2015_us                                            <dbl> 1.1864…
$ gdp_growth_annual_percent                                       <dbl> 4.3879…
$ gdp_current_us                                                  <dbl> 6.5544…
$ unemployment_total_percent_of_total_labor_force                 <dbl> NA, 3.…
$ inflation_consumer_prices_annual_percent                        <dbl> 7.0441…
$ labor_force_total                                               <dbl> 802950…
$ population_total                                                <dbl> 174018…
$ exports_of_goods_and_services_percent_of_gdp                    <dbl> 10.188…
$ imports_of_goods_and_services_percent_of_gdp                    <dbl> 12.451…
$ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
$ foreign_direct_investment_net_inflows_percent_of_gdp            <dbl> 5.0339…
$ gross_savings_percent_of_gdp                                    <dbl> 13.991…
$ current_account_balance_percent_of_gdp                          <dbl> -4.047…

Audience

This analysis is designed for international economic policymakers and global financial organizations (e.g., World Bank analysts or economic advisors) who are interested in understanding how economic growth patterns differ across countries, income groups and regions.

The goal is to support data-driven decisions related to economic development strategies and investment prioritization.

Objective

The primary objective of this project is to analyze the relationship between economic growth, GDP size, and trade patterns across countries.

Specifically, this project aims to answer:

How does GDP growth vary across income groups?
Do wealthier countries grow differently than developing economies?
How do trade indicators like exports relate to economic performance?

The ultimate goal is to derive insights that can inform economic policy and development strategies.

Data Description

The dataset is sourced from the World Bank’s World Development Indicators and includes country-level economic metrics over time.

Key variables used in this analysis include:

GDP (constant 2015 US$) - GDP growth (annual %)
Population - Income group classification
Exports of goods and services (% of GDP)

The dataset spans multiple countries and years, allowing for cross-sectional and time-series analysis of global economic trends.

Exploratory Data Analysis

1. GDP Level vs Growth (Scatter Plot)

This scatterplot compares countries’ total GDP (constant 2015 US$) with their average annual GDP growth, colored by income group and sized by population.

# Prepare data for scatter plot- mean of columns
scatter_data <- df |>
  group_by(country_name, income_group) |>
  summarise(
    Avg_GDP_Growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
    GDP_Constant_2015 = mean(gdp_constant_2015_us, na.rm = TRUE),
    Population = mean(population_total, na.rm = TRUE),
    .groups = "drop"
  )

ggplot(scatter_data, aes(x = Avg_GDP_Growth, y = GDP_Constant_2015, color = income_group, size = Population)) +
  geom_point(alpha = 0.6) +
  labs(
    title = "GDP Level vs Average GDP Growth",
    x = "Average GDP Growth (Annual %)",
    y = "GDP (Constant 2015 US$)",
    size = "Population",
    color = "Income Group"
  ) +
  theme_minimal()

Insight:

High-income countries (e.g., United States) dominate in total GDP but tend to exhibit lower growth rates, while lower-income countries often show higher growth. This suggests a potential convergence effect, where developing economies grow faster than developed ones.

2. GDP Growth Trends Over Time (Line Chart)

This line chart shows how GDP growth has evolved over time across different income groups.

# Prepare data: average GDP growth per year per income group
line_data <- df |>
  group_by(time, income_group) |>
  summarise(
    avg_gdp_growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
    .groups = "drop"
  )

ggplot(line_data, aes(x = time, y = avg_gdp_growth, color = income_group)) +
  geom_line(size = 1) +
  labs(
    title = "GDP Growth Trends Over Time by Income Group",
    x = "Year",
    y = "Average GDP Growth (%)",
    color = "Income Group"
  ) +
  theme_minimal()

Insight:

The line chart displays the trend in average annual GDP growth, where high-income countries consistently exhibit the lowest growth rates, while low-income countries show the highest. Despite these differences, the overall downward trend across all income groups suggests global economic growth has slowed over time, particularly among more developed economies. A noticeable dip around 2020 across all groups indicates a global economic shock affecting all economies.

3. Export Patterns by Income Group (Bar Chart)

This bar chart compares the average exports (% of GDP) across income groups.

bar_data <- df |>
  ungroup() |>
  group_by(income_group) |>
  summarise(
    avg_exports = mean(exports_of_goods_and_services_percent_of_gdp, na.rm = TRUE),
    .groups = "drop"
  )

ggplot(bar_data, aes(x = reorder(income_group, avg_exports), y = avg_exports, fill = income_group)) +
  geom_col() +
  labs(
    title = "Average Exports (% of GDP) by Income Group",
    x = "Income Group",
    y = "Average Exports (% of GDP)",
    fill = "Income Group"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Insight:

High-income countries have the highest export share, indicating strong global trade integration. Upper middle-income countries also show significant export activity, while lower middle-income countries lag behind. Low-income countries have the lowest export percentages, suggesting limited participation in international trade. This highlights a clear gap in trade capacity across income groups.

Key Observations

High-income countries have large economies but relatively stable and lower growth rates.
Lower and middle-income countries tend to grow faster but with greater variability.
Trade (exports as % of GDP) varies significantly across income groups, suggesting different economic structures.
There may be a relationship between economic development level and growth potential.

Planned Analysis

To build on the exploratory findings, the following analyses are planned:

Hypothesis Testing: Test whether GDP growth differs significantly across income groups.
Regression Analysis: Model GDP growth as a function of:
- GDP size
- Exports (% of GDP)
- Population
Comparative Analysis: Examine whether trade intensity influences economic growth differently across income groups.

Assumptions

The World Bank WDI dataset is assumed to be reliable and consistently collected across countries, though minor reporting differences may exist.
Missing values are assumed to be random or limited enough that they do not substantially bias the overall analysis, although some distortion is still possible.
World Bank income group classifications are assumed to be a reasonable way to represent a country’s level of economic development, even though countries within the same group can still be quite different.
Averaging values across years is assumed to provide a meaningful representation of long-term structural trends, while smoothing short-term fluctuations.
The analysis includes only countries with sufficiently complete data for key numerical variables; this selection may introduce sample bias toward better-documented or higher-income countries.

Next Steps

Perform statistical tests to validate observed differences between income groups.
Develop regression models to quantify relationships between variables.
Refine visualizations to better communicate insights.
Translate findings into actionable economic recommendations.

Slides

View Slide Deck