Import and Clean Data
We import data for the period 2000–2020 for the required
variables.
1. Outcome: Poverty Headcount Ratio (SI.POV.NAHC)
2. Interest: GDP per Capita (NY.GDP.PCAP.KD)
3. Controls: Primary School Enrollment (SE.PRM.ENRR), Urban
Population (SP.URB.TOTL.IN.ZS)
Note: On initial runs, we found significant missing
data in control variables. Therefore, we fill missing values for control
variables using nearby years. Group by country to ensure we don’t mix
data between nations. Fill gaps: If 2010 is missing but 2009 exists, use
2009. ‘downup’ checks previous years first, then future years. Finally,
we drop rows that are still missing Poverty data.
# because we cannot impute the main outcome we are studying.
## [1] "Observations available for analysis: 1095"
## [1] "Number of observations: 1095"
## [1] "Number of unique countries: 157"
Visualizing Poverty Evolution
Here we plot the evolution of poverty over time, with each line
representing a country.
ggplot(panel_data, aes(x = year, y = poverty, group = country)) +
geom_line(alpha = 0.4, color = "steelblue") +
labs(title = "Evolution of Poverty Rates (2000-2020)",
subtitle = "Each line represents one country",
y = "Poverty Headcount Ratio (%)",
x = "Year") +
theme_minimal()

Main Effects (Simple Model)
We start with a simple model explaining poverty by GDP per capita.
The prompt asks to assess non-linear associations (using
logarithm).
Reasoning: The relationship between money and poverty is usually
non-linear (an extra $100 matters a lot more to a poor country than a
rich one). Therefore, we use log_gdp.
# Model 1: Poverty ~ Log(GDP)
m1_simple <- lm(poverty ~ log_gdp, data = panel_data)
# Display results
kable(tidy(m1_simple), caption = "Model 1: Simple Effect of GDP on Poverty")
Model 1: Simple Effect of GDP on Poverty
| (Intercept) |
87.718205 |
2.282971 |
38.42283 |
0 |
| log_gdp |
-7.180391 |
0.256893 |
-27.95090 |
0 |
Visualize Predicted Values
We visualize how poverty is predicted to change as GDP
increases.
pred_data <- panel_data %>%
select(log_gdp) %>%
mutate(predicted_poverty = predict(m1_simple))
ggplot(panel_data, aes(x = log_gdp, y = poverty)) +
geom_point(alpha = 0.3) +
geom_line(data = pred_data, aes(y = predicted_poverty), color = "red", linewidth = 1.2) +
labs(title = "Predicted Poverty vs. Log GDP",
x = "Log(GDP per Capita)",
y = "Poverty Rate") +
theme_minimal()
### Interpretation: The negative coefficient for
log_gdp indicates that as countries become richer (higher GDP), poverty
rates decrease significantly. The log transformation fits the data well,
capturing the diminishing returns of wealth on poverty reduction.
Incorporating Control Varaiables
We now add Education and Urban
Population to see if the effect of GDP holds up.
Since we used fill() in the first step, this model should now run
without errors.
m2_controls <- lm(poverty ~ log_gdp + education + urban_pop,
data = panel_data)
kable(tidy(m2_controls), caption = "Model 2: With Controls")
Model 2: With Controls
| (Intercept) |
79.0423426 |
4.0449527 |
19.540981 |
0.0000000 |
| log_gdp |
-7.8824963 |
0.3777768 |
-20.865484 |
0.0000000 |
| education |
0.1054796 |
0.0299129 |
3.526223 |
0.0004390 |
| urban_pop |
0.0678805 |
0.0255375 |
2.658072 |
0.0079738 |
Assessment: We compare the coefficient of log_gdp
in Model 1 vs Model 2. If it decreases (gets closer to zero), it means
some of the “effect” of GDP was actually capturing the fact that richer
countries are also more educated and urbanized.
Temporal Dynamics (Time Fixed Effects)
We control for “contemporaneous effects” (external shocks). By
adding factor(year), we account for global events (like the 2008 crisis)
that affected all countries simultaneously.
# Model 3: Adding Year Fixed Effects
m3_time <- lm(poverty ~ log_gdp + education + urban_pop + factor(year),
data = panel_data)
# We filter the output to show main variables only (hiding the list of years)
tidy(m3_time) %>%
filter(!str_detect(term, "factor")) %>%
kable(caption = "Model 3: With Time Fixed Effects")
Model 3: With Time Fixed Effects
| (Intercept) |
84.7143817 |
4.5053517 |
18.803056 |
0.0000000 |
| log_gdp |
-7.5440910 |
0.3782989 |
-19.942144 |
0.0000000 |
| education |
0.0916230 |
0.0298273 |
3.071780 |
0.0021816 |
| urban_pop |
0.0675215 |
0.0253697 |
2.661505 |
0.0078957 |
Model Comparison
Here we compare all three models side-by-side to see how the
coefficient for GDP changes as we add controls.
# We use modelsummary to create a professional comparison table
models <- list(
"Simple" = m1_simple,
"Controls" = m2_controls,
"Time FE" = m3_time
)
modelsummary(models,
stars = TRUE,
coef_map = c("log_gdp" = "Log GDP",
"education" = "Education",
"urban_pop" = "Urban Pop"),
title = "Comparison of Poverty Models")
Comparison of Poverty Models
| |
Simple |
Controls |
Time FE |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
| Log GDP |
-7.180*** |
-7.882*** |
-7.544*** |
|
(0.257) |
(0.378) |
(0.378) |
| Education |
|
0.105*** |
0.092** |
|
|
(0.030) |
(0.030) |
| Urban Pop |
|
0.068** |
0.068** |
|
|
(0.026) |
(0.025) |
| Num.Obs. |
1095 |
1095 |
1095 |
| R2 |
0.417 |
0.428 |
0.453 |
| R2 Adj. |
0.416 |
0.426 |
0.441 |
| AIC |
8329.2 |
8312.2 |
8302.6 |
| BIC |
8344.2 |
8337.1 |
8427.6 |
| Log.Lik. |
-4161.597 |
-4151.076 |
-4126.324 |
| RMSE |
10.82 |
10.72 |
10.48 |
Bonus: Omitted Variable Bias (Country Fixed
Effects)
Finally, we control for Country Fixed Effects. This controls for
unobservable things that make a country unique but don’t change over
time (like geography, culture, or history).
This is the most rigorous test. It asks: “When a specific country
gets richer than it was before, does its poverty go down?”
# Model 4: Country Fixed Effects (and Time FE)
# We use factor(country) to create a dummy variable for every country
# Note: This requires sufficient data points per country to run effectively
m4_country_fe <- lm(poverty ~ log_gdp + education + urban_pop + factor(year) + factor(country),
data = panel_data)
# Compare the GDP coefficient (The "Main Effect")
coef_simple <- coef(m1_simple)["log_gdp"]
coef_fe <- coef(m4_country_fe)["log_gdp"]
print(paste("GDP Effect (Simple Model):", round(coef_simple, 2)))
## [1] "GDP Effect (Simple Model): -7.18"
print(paste("GDP Effect (Country FE Model):", round(coef_fe, 2)))
## [1] "GDP Effect (Country FE Model): -24.16"
Interpretation: Usually, adding country fixed
effects reduces the size of the GDP coefficient. This suggests that
simply comparing rich countries to poor countries (Simple Model)
overestimates the effect of money. However, if the coefficient remains
negative and significant, it confirms that economic growth within a
country truly drives poverty reduction.