Load required packages

library(readr)
library(dplyr)
library(tidyverse)

Load the World Bank Dataset

#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..')) 
world_bank
## # A tibble: 1,675 × 19
##     Time `Time Code` `Country Name` `Country Code` Region         `Income Group`
##    <dbl> <chr>       <chr>          <chr>          <chr>          <chr>         
##  1  2000 YR2000      Brazil         BRA            Latin America… Upper middle …
##  2  2000 YR2000      China          CHN            East Asia & P… Upper middle …
##  3  2000 YR2000      France         FRA            Europe & Cent… High income   
##  4  2000 YR2000      Germany        DEU            Europe & Cent… High income   
##  5  2000 YR2000      India          IND            South Asia     Lower middle …
##  6  2000 YR2000      Indonesia      IDN            East Asia & P… Upper middle …
##  7  2000 YR2000      Italy          ITA            Europe & Cent… High income   
##  8  2000 YR2000      Japan          JPN            East Asia & P… High income   
##  9  2000 YR2000      Korea, Rep.    KOR            East Asia & P… High income   
## 10  2000 YR2000      Mexico         MEX            Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## #   `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## #   `Unemployment, total (% of total labor force)` <dbl>,
## #   `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## #   `Population, total` <dbl>,
## #   `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675   19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time                                                          <dbl> 2000, 20…
## $ `Time Code`                                                   <chr> "YR2000"…
## $ `Country Name`                                                <chr> "Brazil"…
## $ `Country Code`                                                <chr> "BRA", "…
## $ Region                                                        <chr> "Latin A…
## $ `Income Group`                                                <chr> "Upper m…
## $ `GDP (constant 2015 US$)`                                     <dbl> 1.18642e…
## $ `GDP growth (annual %)`                                       <dbl> 4.387949…
## $ `GDP (current US$)`                                           <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)`                <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)`                       <dbl> 7.044141…
## $ `Labor force, total`                                          <dbl> 80295093…
## $ `Population, total`                                           <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)`                    <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)`                    <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)`           <dbl> 5.033917…
## $ `Gross savings (% of GDP)`                                    <dbl> 13.99170…
## $ `Current account balance (% of GDP)`                          <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)
## Rows: 1,675
## Columns: 19
## $ time                                                            <int> 2000, …
## $ time_code                                                       <chr> "YR200…
## $ country_name                                                    <chr> "Brazi…
## $ country_code                                                    <chr> "BRA",…
## $ region                                                          <chr> "Latin…
## $ income_group                                                    <chr> "Upper…
## $ gdp_constant_2015_us                                            <dbl> 1.1864…
## $ gdp_growth_annual_percent                                       <dbl> 4.3879…
## $ gdp_current_us                                                  <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force                 <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent                        <dbl> 7.0441…
## $ labor_force_total                                               <dbl> 802950…
## $ population_total                                                <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp                    <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp                    <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp            <dbl> 5.0339…
## $ gross_savings_percent_of_gdp                                    <dbl> 13.991…
## $ current_account_balance_percent_of_gdp                          <dbl> -4.047…

Create Response Variable

df$high_unemployment <- ifelse(df$unemployment_total_percent_of_total_labor_force > 6, 1, 0)

1 = High unemployment (>6%) 0 = Low unemployment

Explanatory Variables

Inflation GDP growth Foreign Direct Investment (FDI)

Build Model

This model estimates how inflation, economic growth, and foreign investment affect the likelihood of high unemployment.

model2 <- glm(high_unemployment ~ inflation_consumer_prices_annual_percent +
                                   gdp_growth_annual_percent +
                                   foreign_direct_investment_net_inflows_percent_of_gdp,
              data = df,
              family = binomial)

summary(model2)
## 
## Call:
## glm(formula = high_unemployment ~ inflation_consumer_prices_annual_percent + 
##     gdp_growth_annual_percent + foreign_direct_investment_net_inflows_percent_of_gdp, 
##     family = binomial, data = df)
## 
## Coefficients:
##                                                       Estimate Std. Error
## (Intercept)                                          -0.212657   0.092867
## inflation_consumer_prices_annual_percent              0.039924   0.009282
## gdp_growth_annual_percent                            -0.049628   0.016254
## foreign_direct_investment_net_inflows_percent_of_gdp  0.003460   0.008311
##                                                      z value Pr(>|z|)    
## (Intercept)                                           -2.290  0.02203 *  
## inflation_consumer_prices_annual_percent               4.301  1.7e-05 ***
## gdp_growth_annual_percent                             -3.053  0.00226 ** 
## foreign_direct_investment_net_inflows_percent_of_gdp   0.416  0.67721    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1696.1  on 1228  degrees of freedom
## Residual deviance: 1662.8  on 1225  degrees of freedom
##   (446 observations deleted due to missingness)
## AIC: 1670.8
## 
## Number of Fisher Scoring iterations: 4

The logistic regression results show that inflation and GDP growth are significant predictors of high unemployment, while foreign direct investment (FDI) is not. Inflation has a positive coefficient, indicating that higher inflation is associated with an increased likelihood of high unemployment. In contrast, GDP growth has a negative coefficient, suggesting that stronger economic growth reduces the probability of high unemployment. FDI does not appear to have a statistically significant effect in this model.

Model Equation

Logistic Regression Form

\[ \log\left(\frac{p}{1-p}\right) = -0.231 + 0.0399 \cdot \text{Inflation} - 0.0496 \cdot \text{GDP Growth} + 0.0035 \cdot \text{FDI} \] Inflation has a positive coefficient, meaning that higher inflation increases the likelihood of high unemployment. GDP growth has a negative coefficient, indicating that stronger economic growth reduces the probability of high unemployment. Foreign direct investment (FDI) has a very small and statistically insignificant coefficient, suggesting it does not have a meaningful impact in this model.

confint(model2)
##                                                            2.5 %      97.5 %
## (Intercept)                                          -0.39511562 -0.03078128
## inflation_consumer_prices_annual_percent              0.02278612  0.05919044
## gdp_growth_annual_percent                            -0.08196189 -0.01815042
## foreign_direct_investment_net_inflows_percent_of_gdp -0.01310510  0.01998241

The confidence intervals show that inflation has a positive and statistically significant effect, as its interval (0.0228, 0.0592) does not include zero. Similarly, GDP growth has a negative and significant effect, with its interval (-0.0820, -0.0182) entirely below zero. In contrast, the interval for FDI (-0.0131, 0.0200) includes zero, indicating that its effect is not statistically significant. Overall, the confidence intervals reinforce that inflation and GDP growth are reliable predictors, while FDI is not.

Insights

Higher inflation is associated with increased unemployment levels Economic growth plays a key role in reducing unemployment FDI alone does not significantly explain unemployment differences Labor market conditions are strongly linked to macroeconomic performance

Significance of findings

These results highlight the importance of maintaining stable economic growth to control unemployment. Policymakers should focus on promoting growth and managing inflation effectively, as both variables significantly influence labor market outcomes. The findings also suggest that relying solely on foreign investment may not directly improve employment conditions.

Issues with Model

The model has several limitations that affect its overall reliability. While it shows some improvement over the baseline, the reduction in deviance is relatively small, suggesting limited explanatory power. One of the predictors, foreign direct investment (FDI), is not statistically significant, indicating it may not meaningfully contribute to the model. Additionally, a substantial number of observations were removed due to missing data, which could introduce bias and reduce the generalizability of the results. The model also excludes potentially important variables such as population, labor force, or government expenditure, and assumes a linear relationship in the log-odds, which may oversimplify complex real-world economic relationships.