library(readr)
library(dplyr)
library(tidyverse)Final_Project
Load required packages
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
world_bank# A tibble: 1,675 × 19
Time `Time Code` `Country Name` `Country Code` Region `Income Group`
<dbl> <chr> <chr> <chr> <chr> <chr>
1 2000 YR2000 Brazil BRA Latin America… Upper middle …
2 2000 YR2000 China CHN East Asia & P… Upper middle …
3 2000 YR2000 France FRA Europe & Cent… High income
4 2000 YR2000 Germany DEU Europe & Cent… High income
5 2000 YR2000 India IND South Asia Lower middle …
6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
7 2000 YR2000 Italy ITA Europe & Cent… High income
8 2000 YR2000 Japan JPN East Asia & P… High income
9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
10 2000 YR2000 Mexico MEX Latin America… Upper middle …
# ℹ 1,665 more rows
# ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
# `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
# `Unemployment, total (% of total labor force)` <dbl>,
# `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
# `Population, total` <dbl>,
# `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)[1] 1675 19
# Check column data types
glimpse(world_bank)Rows: 1,675
Columns: 19
$ Time <dbl> 2000, 20…
$ `Time Code` <chr> "YR2000"…
$ `Country Name` <chr> "Brazil"…
$ `Country Code` <chr> "BRA", "…
$ Region <chr> "Latin A…
$ `Income Group` <chr> "Upper m…
$ `GDP (constant 2015 US$)` <dbl> 1.18642e…
$ `GDP growth (annual %)` <dbl> 4.387949…
$ `GDP (current US$)` <dbl> 6.554482…
$ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
$ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
$ `Labor force, total` <dbl> 80295093…
$ `Population, total` <dbl> 17401828…
$ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
$ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
$ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
$ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
$ `Gross savings (% of GDP)` <dbl> 13.99170…
$ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)Rows: 1,675
Columns: 19
$ time <int> 2000, …
$ time_code <chr> "YR200…
$ country_name <chr> "Brazi…
$ country_code <chr> "BRA",…
$ region <chr> "Latin…
$ income_group <chr> "Upper…
$ gdp_constant_2015_us <dbl> 1.1864…
$ gdp_growth_annual_percent <dbl> 4.3879…
$ gdp_current_us <dbl> 6.5544…
$ unemployment_total_percent_of_total_labor_force <dbl> NA, 3.…
$ inflation_consumer_prices_annual_percent <dbl> 7.0441…
$ labor_force_total <dbl> 802950…
$ population_total <dbl> 174018…
$ exports_of_goods_and_services_percent_of_gdp <dbl> 10.188…
$ imports_of_goods_and_services_percent_of_gdp <dbl> 12.451…
$ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
$ foreign_direct_investment_net_inflows_percent_of_gdp <dbl> 5.0339…
$ gross_savings_percent_of_gdp <dbl> 13.991…
$ current_account_balance_percent_of_gdp <dbl> -4.047…
Audience
This analysis is designed for international economic policymakers and global financial organizations (e.g., World Bank analysts or economic advisors) who are interested in understanding how economic growth patterns differ across countries, income groups and regions.
The goal is to support data-driven decisions related to economic development strategies and investment prioritization.
Objective
The primary objective of this project is to analyze the relationship between economic growth, GDP size, and trade patterns across countries.
Specifically, this project aims to answer:
How does GDP growth vary across income groups?
Do wealthier countries grow differently than developing economies?
How do trade indicators like exports relate to economic performance?
The ultimate goal is to derive insights that can inform economic policy and development strategies.
Data Description
The dataset is sourced from the World Bank’s World Development Indicators and includes country-level economic metrics over time.
Key variables used in this analysis include:
GDP (constant 2015 US$) - GDP growth (annual %)
Population - Income group classification
Exports of goods and services (% of GDP)
The dataset spans multiple countries and years, allowing for cross-sectional and time-series analysis of global economic trends.
Exploratory Data Analysis
1. GDP Level vs Growth (Scatter Plot)
This scatterplot compares countries’ total GDP (constant 2015 US$) with their average annual GDP growth, colored by income group and sized by population.
# Prepare data for scatter plot- mean of columns
scatter_data <- df |>
group_by(country_name, income_group) |>
summarise(
Avg_GDP_Growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
GDP_Constant_2015 = mean(gdp_constant_2015_us, na.rm = TRUE),
Population = mean(population_total, na.rm = TRUE),
.groups = "drop"
)ggplot(scatter_data, aes(x = Avg_GDP_Growth, y = GDP_Constant_2015, color = income_group, size = Population)) +
geom_point(alpha = 0.6) +
labs(
title = "GDP Level vs Average GDP Growth",
x = "Average GDP Growth (Annual %)",
y = "GDP (Constant 2015 US$)",
size = "Population",
color = "Income Group"
) +
theme_minimal()Insight:
High-income countries (e.g., United States) dominate in total GDP but tend to exhibit lower growth rates, while lower-income countries often show higher growth. This suggests a potential convergence effect, where developing economies grow faster than developed ones.
2. GDP Growth Trends Over Time (Line Chart)
This line chart shows how GDP growth has evolved over time across different income groups.
# Prepare data: average GDP growth per year per income group
line_data <- df |>
group_by(time, income_group) |>
summarise(
avg_gdp_growth = mean(gdp_growth_annual_percent, na.rm = TRUE),
.groups = "drop"
)ggplot(line_data, aes(x = time, y = avg_gdp_growth, color = income_group)) +
geom_line(size = 1) +
labs(
title = "GDP Growth Trends Over Time by Income Group",
x = "Year",
y = "Average GDP Growth (%)",
color = "Income Group"
) +
theme_minimal()Insight:
The line chart displays the trend in average annual GDP growth, where high-income countries consistently exhibit the lowest growth rates, while low-income countries show the highest. Despite these differences, the overall downward trend across all income groups suggests global economic growth has slowed over time, particularly among more developed economies. A noticeable dip around 2020 across all groups indicates a global economic shock affecting all economies.
3. Export Patterns by Income Group (Bar Chart)
This bar chart compares the average exports (% of GDP) across income groups.
bar_data <- df |>
ungroup() |>
group_by(income_group) |>
summarise(
avg_exports = mean(exports_of_goods_and_services_percent_of_gdp, na.rm = TRUE),
.groups = "drop"
)ggplot(bar_data, aes(x = reorder(income_group, avg_exports), y = avg_exports, fill = income_group)) +
geom_col() +
labs(
title = "Average Exports (% of GDP) by Income Group",
x = "Income Group",
y = "Average Exports (% of GDP)",
fill = "Income Group"
) +
theme_minimal() +
theme(legend.position = "none")Insight:
High-income countries have the highest export share, indicating strong global trade integration. Upper middle-income countries also show significant export activity, while lower middle-income countries lag behind. Low-income countries have the lowest export percentages, suggesting limited participation in international trade. This highlights a clear gap in trade capacity across income groups.
Key Observations
- High-income countries have large economies but relatively stable and lower growth rates.
- Lower and middle-income countries tend to grow faster but with greater variability.
- Trade (exports as % of GDP) varies significantly across income groups, suggesting different economic structures.
- There may be a relationship between economic development level and growth potential.
Planned Analysis
To build on the exploratory findings, the following analyses are planned:
Hypothesis Testing: Test whether GDP growth differs significantly across income groups.
Regression Analysis: Model GDP growth as a function of:
- GDP size
- Exports (% of GDP)
- Population
Comparative Analysis: Examine whether trade intensity influences economic growth differently across income groups.
Assumptions
The World Bank WDI dataset is assumed to be reliable and consistently collected across countries, though minor reporting differences may exist.
Missing values are assumed to be random or limited enough that they do not substantially bias the overall analysis, although some distortion is still possible.
World Bank income group classifications are assumed to be a reasonable way to represent a country’s level of economic development, even though countries within the same group can still be quite different.
Averaging values across years is assumed to provide a meaningful representation of long-term structural trends, while smoothing short-term fluctuations.
The analysis includes only countries with sufficiently complete data for key numerical variables; this selection may introduce sample bias toward better-documented or higher-income countries.
Next Steps
- Perform statistical tests to validate observed differences between income groups.
- Develop regression models to quantify relationships between variables.
- Refine visualizations to better communicate insights.
- Translate findings into actionable economic recommendations.