Load required packages
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
## Rows: 1675 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Time Code, Country Name, Country Code, Region, Income Group
## dbl (14): Time, GDP (constant 2015 US$), GDP growth (annual %), GDP (current...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
world_bank
## # A tibble: 1,675 × 19
## Time `Time Code` `Country Name` `Country Code` Region `Income Group`
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 Brazil BRA Latin America… Upper middle …
## 2 2000 YR2000 China CHN East Asia & P… Upper middle …
## 3 2000 YR2000 France FRA Europe & Cent… High income
## 4 2000 YR2000 Germany DEU Europe & Cent… High income
## 5 2000 YR2000 India IND South Asia Lower middle …
## 6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
## 7 2000 YR2000 Italy ITA Europe & Cent… High income
## 8 2000 YR2000 Japan JPN East Asia & P… High income
## 9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
## 10 2000 YR2000 Mexico MEX Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## # `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## # `Unemployment, total (% of total labor force)` <dbl>,
## # `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## # `Population, total` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675 19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time <dbl> 2000, 20…
## $ `Time Code` <chr> "YR2000"…
## $ `Country Name` <chr> "Brazil"…
## $ `Country Code` <chr> "BRA", "…
## $ Region <chr> "Latin A…
## $ `Income Group` <chr> "Upper m…
## $ `GDP (constant 2015 US$)` <dbl> 1.18642e…
## $ `GDP growth (annual %)` <dbl> 4.387949…
## $ `GDP (current US$)` <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
## $ `Labor force, total` <dbl> 80295093…
## $ `Population, total` <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
## $ `Gross savings (% of GDP)` <dbl> 13.99170…
## $ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
df <- world_bank |> clean_names()
glimpse(df)
## Rows: 1,675
## Columns: 19
## $ time <int> 2000, …
## $ time_code <chr> "YR200…
## $ country_name <chr> "Brazi…
## $ country_code <chr> "BRA",…
## $ region <chr> "Latin…
## $ income_group <chr> "Upper…
## $ gdp_constant_2015_us <dbl> 1.1864…
## $ gdp_growth_annual_percent <dbl> 4.3879…
## $ gdp_current_us <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent <dbl> 7.0441…
## $ labor_force_total <dbl> 802950…
## $ population_total <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp <dbl> 5.0339…
## $ gross_savings_percent_of_gdp <dbl> 13.991…
## $ current_account_balance_percent_of_gdp <dbl> -4.047…
Q: Is there any relation between economic growth and unemployment levels for United States?
gdp_growth_annual_percent (explanatory variable)
unemployment_change (response variable)
df_one <- df |>
filter(country_code == "USA") |>
arrange(time) |>
mutate (
unemployment_change = unemployment_total_percent_of_total_labor_force - lag(unemployment_total_percent_of_total_labor_force)
)
df_one
## # A tibble: 25 × 20
## time time_code country_name country_code region income_group
## <int> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 United States USA North America High income
## 2 2001 YR2001 United States USA North America High income
## 3 2002 YR2002 United States USA North America High income
## 4 2003 YR2003 United States USA North America High income
## 5 2004 YR2004 United States USA North America High income
## 6 2005 YR2005 United States USA North America High income
## 7 2006 YR2006 United States USA North America High income
## 8 2007 YR2007 United States USA North America High income
## 9 2008 YR2008 United States USA North America High income
## 10 2009 YR2009 United States USA North America High income
## # ℹ 15 more rows
## # ℹ 14 more variables: gdp_constant_2015_us <dbl>,
## # gdp_growth_annual_percent <dbl>, gdp_current_us <dbl>,
## # unemployment_total_percent_of_total_labor_force <dbl>,
## # inflation_consumer_prices_annual_percent <dbl>, labor_force_total <dbl>,
## # population_total <dbl>, exports_of_goods_and_services_percent_of_gdp <dbl>,
## # imports_of_goods_and_services_percent_of_gdp <dbl>, …
ggplot(df_one, aes(x = gdp_growth_annual_percent,
y = unemployment_change)) +
geom_point(alpha = 0.4) +
geom_smooth(method = 'lm' , se = FALSE) +
labs(
title = "GDP Growth vs Unemployment Change"
)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
According to the economic theory, i.e Okun’s law, there is a negative coorelation between gdp growth and unemployment change as confirmed from the graph. When gdp growth rate is higher, unemployment change goes down (negative), indicating strong inverse relationship. Outliers: There appears to be one extreme negative GDP growth observation, likely a recession year, and one high positive GDP growth observation. These points strengthen the negative trend but may exert influence on correlation.
cor.test(df_one$gdp_growth_annual_percent,
df_one$unemployment_change,
method = "pearson")
##
## Pearson's product-moment correlation
##
## data: df_one$gdp_growth_annual_percent and df_one$unemployment_change
## t = -8.8083, df = 22, p-value = 1.152e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9483755 -0.7442675
## sample estimates:
## cor
## -0.882659
r = -0.8827
95%CI: (-0.9484, -0.7443)
p-value = 1.152e-08 (extremely small i.e statistically significant)
The Pearson correlation coefficient between GDP growth and unemployment change is -0.883, indicating a strong negative linear relationship. This suggests that years with higher economic growth are strongly associated with decreases in unemployment. The 95% confidence interval (-0.948, -0.744) confirms that the true population correlation is strongly negative.
Why this makes sense
When economic output grows faster, unemployment tends to fall because organizations expand production during growth and labor demand increases, due to which unemployment declines.
t.test(df_one$unemployment_change)
##
## One Sample t-test
##
## data: df_one$unemployment_change
## t = 0.0041487, df = 23, p-value = 0.9967
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.6220297 0.6245297
## sample estimates:
## mean of x
## 0.00125
mean unemployment_change = 0.00125
95% CI: (-0.6220, 0.6245)
p-value = 0.9967
The 95% confidence interval for the mean unemployment change is (-0.622, 0.625). Because this interval contains zero, we fail to reject the null hypothesis that the average annual change in unemployment is zero. This suggests that, over the observed period, unemployment in the United States fluctuates around a stable long-term level rather than exhibiting a consistent upward or downward trend.
Q2 : How does trade balance relate with economic growth for USA?
gdp_growth_annual_percent (explanatory variable)
net_trade_percentage_gdp (response variable)
df_two <- df |>
filter(country_code == "USA") |>
arrange(time) |>
mutate (
net_trade_percentage_gdp = exports_of_goods_and_services_percent_of_gdp - imports_of_goods_and_services_percent_of_gdp
)
df_two
## # A tibble: 25 × 20
## time time_code country_name country_code region income_group
## <int> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 United States USA North America High income
## 2 2001 YR2001 United States USA North America High income
## 3 2002 YR2002 United States USA North America High income
## 4 2003 YR2003 United States USA North America High income
## 5 2004 YR2004 United States USA North America High income
## 6 2005 YR2005 United States USA North America High income
## 7 2006 YR2006 United States USA North America High income
## 8 2007 YR2007 United States USA North America High income
## 9 2008 YR2008 United States USA North America High income
## 10 2009 YR2009 United States USA North America High income
## # ℹ 15 more rows
## # ℹ 14 more variables: gdp_constant_2015_us <dbl>,
## # gdp_growth_annual_percent <dbl>, gdp_current_us <dbl>,
## # unemployment_total_percent_of_total_labor_force <dbl>,
## # inflation_consumer_prices_annual_percent <dbl>, labor_force_total <dbl>,
## # population_total <dbl>, exports_of_goods_and_services_percent_of_gdp <dbl>,
## # imports_of_goods_and_services_percent_of_gdp <dbl>, …
ggplot(df_two,
aes(x = net_trade_percentage_gdp,
y = gdp_growth_annual_percent)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Net Trade (% GDP) vs GDP Growth")
## `geom_smooth()` using formula = 'y ~ x'
The graph shows a slight downward trend for gdp growth vs net trade percentage. The points are widely scattered indicating a weak relationship. Outliers: One strong negative GDP growth year (likely recession) and some clustering of observations around moderate trade deficits.
cor.test(df_two$net_trade_percentage_gdp,
df_two$gdp_growth_annual_percent,
method = "pearson")
##
## Pearson's product-moment correlation
##
## data: df_two$net_trade_percentage_gdp and df_two$gdp_growth_annual_percent
## t = -0.96441, df = 23, p-value = 0.3449
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5494740 0.2147101
## sample estimates:
## cor
## -0.1971464
r = -0.197
95% CI: (-0.549, 0.215)
p-value = 0.3449
The Pearson correlation between net trade (% of GDP) and GDP growth is -0.197, indicating a weak negative association. However, the relationship is not statistically significant (p = 0.345 > 0.05), and the 95% confidence interval (-0.549, 0.215) includes zero. Therefore, we do not have sufficient evidence to conclude that a linear relationship exists between trade balance and GDP growth over this period.
Why this makes sense
The U.S. runs a persistent trade deficit and GDP growth is driven largely by domestic consumption and investment. Trade balance fluctuations may not strongly determine short-run growth.
t.test(df_two$net_trade_percentage_gdp)
##
## One Sample t-test
##
## data: df_two$net_trade_percentage_gdp
## t = -19.12, df = 24, p-value = 4.957e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -4.087839 -3.291320
## sample estimates:
## mean of x
## -3.68958
Mean = -3.6896
95% CI = (-4.0878, -3.2913)
p-value = 4.957e-16
The mean net trade balance over the observed period is approximately -3.69% of GDP. The 95% confidence interval (-4.09%, -3.29%) indicates that the United States consistently runs a statistically significant trade deficit. Because zero is not contained within this interval, we conclude that the long-run average trade balance is significantly negative.