Load required packages
library(readr)
library(dplyr)
library(tidyverse)
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
world_bank
## # A tibble: 1,675 × 19
## Time `Time Code` `Country Name` `Country Code` Region `Income Group`
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 Brazil BRA Latin America… Upper middle …
## 2 2000 YR2000 China CHN East Asia & P… Upper middle …
## 3 2000 YR2000 France FRA Europe & Cent… High income
## 4 2000 YR2000 Germany DEU Europe & Cent… High income
## 5 2000 YR2000 India IND South Asia Lower middle …
## 6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
## 7 2000 YR2000 Italy ITA Europe & Cent… High income
## 8 2000 YR2000 Japan JPN East Asia & P… High income
## 9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
## 10 2000 YR2000 Mexico MEX Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## # `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## # `Unemployment, total (% of total labor force)` <dbl>,
## # `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## # `Population, total` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675 19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time <dbl> 2000, 20…
## $ `Time Code` <chr> "YR2000"…
## $ `Country Name` <chr> "Brazil"…
## $ `Country Code` <chr> "BRA", "…
## $ Region <chr> "Latin A…
## $ `Income Group` <chr> "Upper m…
## $ `GDP (constant 2015 US$)` <dbl> 1.18642e…
## $ `GDP growth (annual %)` <dbl> 4.387949…
## $ `GDP (current US$)` <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
## $ `Labor force, total` <dbl> 80295093…
## $ `Population, total` <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
## $ `Gross savings (% of GDP)` <dbl> 13.99170…
## $ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)
## Rows: 1,675
## Columns: 19
## $ time <int> 2000, …
## $ time_code <chr> "YR200…
## $ country_name <chr> "Brazi…
## $ country_code <chr> "BRA",…
## $ region <chr> "Latin…
## $ income_group <chr> "Upper…
## $ gdp_constant_2015_us <dbl> 1.1864…
## $ gdp_growth_annual_percent <dbl> 4.3879…
## $ gdp_current_us <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent <dbl> 7.0441…
## $ labor_force_total <dbl> 802950…
## $ population_total <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp <dbl> 5.0339…
## $ gross_savings_percent_of_gdp <dbl> 13.991…
## $ current_account_balance_percent_of_gdp <dbl> -4.047…
df$high_unemployment <- ifelse(df$unemployment_total_percent_of_total_labor_force > 6, 1, 0)
1 = High unemployment (>6%) 0 = Low unemployment
Inflation GDP growth Foreign Direct Investment (FDI)
This model estimates how inflation, economic growth, and foreign investment affect the likelihood of high unemployment.
model2 <- glm(high_unemployment ~ inflation_consumer_prices_annual_percent +
gdp_growth_annual_percent +
foreign_direct_investment_net_inflows_percent_of_gdp,
data = df,
family = binomial)
summary(model2)
##
## Call:
## glm(formula = high_unemployment ~ inflation_consumer_prices_annual_percent +
## gdp_growth_annual_percent + foreign_direct_investment_net_inflows_percent_of_gdp,
## family = binomial, data = df)
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -0.212657 0.092867
## inflation_consumer_prices_annual_percent 0.039924 0.009282
## gdp_growth_annual_percent -0.049628 0.016254
## foreign_direct_investment_net_inflows_percent_of_gdp 0.003460 0.008311
## z value Pr(>|z|)
## (Intercept) -2.290 0.02203 *
## inflation_consumer_prices_annual_percent 4.301 1.7e-05 ***
## gdp_growth_annual_percent -3.053 0.00226 **
## foreign_direct_investment_net_inflows_percent_of_gdp 0.416 0.67721
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1696.1 on 1228 degrees of freedom
## Residual deviance: 1662.8 on 1225 degrees of freedom
## (446 observations deleted due to missingness)
## AIC: 1670.8
##
## Number of Fisher Scoring iterations: 4
The logistic regression results show that inflation and GDP growth are significant predictors of high unemployment, while foreign direct investment (FDI) is not. Inflation has a positive coefficient, indicating that higher inflation is associated with an increased likelihood of high unemployment. In contrast, GDP growth has a negative coefficient, suggesting that stronger economic growth reduces the probability of high unemployment. FDI does not appear to have a statistically significant effect in this model.
Logistic Regression Form
\[ \log\left(\frac{p}{1-p}\right) = -0.231 + 0.0399 \cdot \text{Inflation} - 0.0496 \cdot \text{GDP Growth} + 0.0035 \cdot \text{FDI} \] Inflation has a positive coefficient, meaning that higher inflation increases the likelihood of high unemployment. GDP growth has a negative coefficient, indicating that stronger economic growth reduces the probability of high unemployment. Foreign direct investment (FDI) has a very small and statistically insignificant coefficient, suggesting it does not have a meaningful impact in this model.
confint(model2)
## 2.5 % 97.5 %
## (Intercept) -0.39511562 -0.03078128
## inflation_consumer_prices_annual_percent 0.02278612 0.05919044
## gdp_growth_annual_percent -0.08196189 -0.01815042
## foreign_direct_investment_net_inflows_percent_of_gdp -0.01310510 0.01998241
The confidence intervals show that inflation has a positive and statistically significant effect, as its interval (0.0228, 0.0592) does not include zero. Similarly, GDP growth has a negative and significant effect, with its interval (-0.0820, -0.0182) entirely below zero. In contrast, the interval for FDI (-0.0131, 0.0200) includes zero, indicating that its effect is not statistically significant. Overall, the confidence intervals reinforce that inflation and GDP growth are reliable predictors, while FDI is not.
plot(model2$fitted.values,
residuals(model2, type = "deviance"),
xlab = "Fitted Values",
ylab = "Deviance Residuals")
abline(h = 0, col = "red")
The two-band pattern is normal for binary logistic regression and indicates the model is functioning correctly. The single extreme residual near fitted value 1.0 suggests one observation is being poorly predicted and may warrant further inspection.
qqnorm(residuals(model2))
qqline(residuals(model2), col = "red")
The S-shaped deviation from the diagonal is expected in logistic regression since deviance residuals from binary outcomes are never normally distributed, making this plot largely uninformative here. The two arcs simply reflect the two outcome groups (0 and 1) rather than any genuine model violation.
plot(cooks.distance(model2), type = "h")
abline(h = 4/nrow(df), col = "red")
Observation near 1200 has a Cook’s D close to 1.0, flagging it as an extremely influential point that could be skewing the model’s coefficient estimates. We can inspect more about this.
which(cooks.distance(model2) > 0.5)
## 1629
## 1209
# Inspect the observation
df[1209, ]
## # A tibble: 1 × 20
## time time_code country_name country_code region income_group
## <int> <chr> <chr> <chr> <chr> <chr>
## 1 2018 YR2018 France FRA Europe & Central Asia High income
## # ℹ 14 more variables: gdp_constant_2015_us <dbl>,
## # gdp_growth_annual_percent <dbl>, gdp_current_us <dbl>,
## # unemployment_total_percent_of_total_labor_force <dbl>,
## # inflation_consumer_prices_annual_percent <dbl>, labor_force_total <dbl>,
## # population_total <dbl>, exports_of_goods_and_services_percent_of_gdp <dbl>,
## # imports_of_goods_and_services_percent_of_gdp <dbl>,
## # general_government_final_consumption_expenditure_percent_of_gdp <dbl>, …
# Refit without it and compare with model2
model2_clean <- glm(high_unemployment ~ inflation_consumer_prices_annual_percent +
gdp_growth_annual_percent +
foreign_direct_investment_net_inflows_percent_of_gdp,
data = df[-1209, ],
family = binomial)
summary(model2_clean)
##
## Call:
## glm(formula = high_unemployment ~ inflation_consumer_prices_annual_percent +
## gdp_growth_annual_percent + foreign_direct_investment_net_inflows_percent_of_gdp,
## family = binomial, data = df[-1209, ])
##
## Coefficients:
## Estimate Std. Error
## (Intercept) -0.216047 0.092927
## inflation_consumer_prices_annual_percent 0.040057 0.009291
## gdp_growth_annual_percent -0.049407 0.016252
## foreign_direct_investment_net_inflows_percent_of_gdp 0.003484 0.008312
## z value Pr(>|z|)
## (Intercept) -2.325 0.02008 *
## inflation_consumer_prices_annual_percent 4.311 1.62e-05 ***
## gdp_growth_annual_percent -3.040 0.00237 **
## foreign_direct_investment_net_inflows_percent_of_gdp 0.419 0.67512
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1694.5 on 1227 degrees of freedom
## Residual deviance: 1661.2 on 1224 degrees of freedom
## (446 observations deleted due to missingness)
## AIC: 1669.2
##
## Number of Fisher Scoring iterations: 4
After removing the influential observation 1209, the coefficients and significance levels remain virtually unchanged, confirming the original model is robust and not driven by that single point. All three predictors behave identically, inflation and GDP growth remain significant in the same directions, and FDI remains non-significant, hence model2 can be trusted as the final model.
Higher inflation is associated with increased unemployment levels Economic growth plays a key role in reducing unemployment FDI alone does not significantly explain unemployment differences Labor market conditions are strongly linked to macroeconomic performance
These results highlight the importance of maintaining stable economic growth to control unemployment. Policymakers should focus on promoting growth and managing inflation effectively, as both variables significantly influence labor market outcomes. The findings also suggest that relying solely on foreign investment may not directly improve employment conditions.
The model has several limitations that affect its overall reliability. While it shows some improvement over the baseline, the reduction in deviance is relatively small, suggesting limited explanatory power. One of the predictors, foreign direct investment (FDI), is not statistically significant, indicating it may not meaningfully contribute to the model. Additionally, a substantial number of observations were removed due to missing data, which could introduce bias and reduce the generalizability of the results. The model also excludes potentially important variables such as population, labor force, or government expenditure, and assumes a linear relationship in the log-odds, which may oversimplify complex real-world economic relationships.