Panel Data Analysis: Economic Development & Poverty (Task 11)

Import and Clean Data

We import data for the period 2000–2020 for the required variables.

1. Outcome: Poverty Headcount Ratio (SI.POV.NAHC)

2. Interest: GDP per Capita (NY.GDP.PCAP.KD)

3. Controls: Primary School Enrollment (SE.PRM.ENRR), Urban Population (SP.URB.TOTL.IN.ZS)

Note: On initial runs, we found significant missing data in control variables. Therefore, we fill missing values for control variables using nearby years. Group by country to ensure we don’t mix data between nations. Fill gaps: If 2010 is missing but 2009 exists, use 2009. ‘downup’ checks previous years first, then future years. Finally, we drop rows that are still missing Poverty data.

# because we cannot impute the main outcome we are studying.

## [1] "Observations available for analysis: 1095"

## [1] "Number of observations: 1095"

## [1] "Number of unique countries: 157"

Visualizing Poverty Evolution

Here we plot the evolution of poverty over time, with each line representing a country.

ggplot(panel_data, aes(x = year, y = poverty, group = country)) +
  geom_line(alpha = 0.4, color = "steelblue") +
  labs(title = "Evolution of Poverty Rates (2000-2020)",
       subtitle = "Each line represents one country",
       y = "Poverty Headcount Ratio (%)",
       x = "Year") +
  theme_minimal()

Main Effects (Simple Model)

We start with a simple model explaining poverty by GDP per capita. The prompt asks to assess non-linear associations (using logarithm).

Reasoning: The relationship between money and poverty is usually non-linear (an extra $100 matters a lot more to a poor country than a rich one). Therefore, we use log_gdp.

# Model 1: Poverty ~ Log(GDP)
m1_simple <- lm(poverty ~ log_gdp, data = panel_data)

# Display results
kable(tidy(m1_simple), caption = "Model 1: Simple Effect of GDP on Poverty")

Model 1: Simple Effect of GDP on Poverty
term	estimate	std.error	statistic	p.value
(Intercept)	87.718205	2.282971	38.42283	0
log_gdp	-7.180391	0.256893	-27.95090	0

Visualize Predicted Values

We visualize how poverty is predicted to change as GDP increases.

pred_data <- panel_data %>%
  select(log_gdp) %>%
  mutate(predicted_poverty = predict(m1_simple))

ggplot(panel_data, aes(x = log_gdp, y = poverty)) +
  geom_point(alpha = 0.3) +
  geom_line(data = pred_data, aes(y = predicted_poverty), color = "red", linewidth = 1.2) +
  labs(title = "Predicted Poverty vs. Log GDP",
       x = "Log(GDP per Capita)",
       y = "Poverty Rate") +
  theme_minimal()

### Interpretation: The negative coefficient for log_gdp indicates that as countries become richer (higher GDP), poverty rates decrease significantly. The log transformation fits the data well, capturing the diminishing returns of wealth on poverty reduction.

Incorporating Control Varaiables

We now add Education and Urban Population to see if the effect of GDP holds up.

Since we used fill() in the first step, this model should now run without errors.

m2_controls <- lm(poverty ~ log_gdp + education + urban_pop, 
                  data = panel_data)

kable(tidy(m2_controls), caption = "Model 2: With Controls")

Model 2: With Controls
term	estimate	std.error	statistic	p.value
(Intercept)	79.0423426	4.0449527	19.540981	0.0000000
log_gdp	-7.8824963	0.3777768	-20.865484	0.0000000
education	0.1054796	0.0299129	3.526223	0.0004390
urban_pop	0.0678805	0.0255375	2.658072	0.0079738

Assessment: We compare the coefficient of log_gdp in Model 1 vs Model 2. If it decreases (gets closer to zero), it means some of the “effect” of GDP was actually capturing the fact that richer countries are also more educated and urbanized.

Temporal Dynamics (Time Fixed Effects)

We control for “contemporaneous effects” (external shocks). By adding factor(year), we account for global events (like the 2008 crisis) that affected all countries simultaneously.

# Model 3: Adding Year Fixed Effects
m3_time <- lm(poverty ~ log_gdp + education + urban_pop + factor(year), 
              data = panel_data)

# We filter the output to show main variables only (hiding the list of years)
tidy(m3_time) %>% 
  filter(!str_detect(term, "factor")) %>% 
  kable(caption = "Model 3: With Time Fixed Effects")

Model 3: With Time Fixed Effects
term	estimate	std.error	statistic	p.value
(Intercept)	84.7143817	4.5053517	18.803056	0.0000000
log_gdp	-7.5440910	0.3782989	-19.942144	0.0000000
education	0.0916230	0.0298273	3.071780	0.0021816
urban_pop	0.0675215	0.0253697	2.661505	0.0078957

Model Comparison

Here we compare all three models side-by-side to see how the coefficient for GDP changes as we add controls.

# We use modelsummary to create a professional comparison table
models <- list(
  "Simple" = m1_simple,
  "Controls" = m2_controls,
  "Time FE" = m3_time
)

modelsummary(models, 
             stars = TRUE, 
             coef_map = c("log_gdp" = "Log GDP", 
                          "education" = "Education", 
                          "urban_pop" = "Urban Pop"),
             title = "Comparison of Poverty Models")

Comparison of Poverty Models
	Simple	Controls	Time FE
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
Log GDP	-7.180***	-7.882***	-7.544***
	(0.257)	(0.378)	(0.378)
Education		0.105***	0.092**
		(0.030)	(0.030)
Urban Pop		0.068**	0.068**
		(0.026)	(0.025)
Num.Obs.	1095	1095	1095
R2	0.417	0.428	0.453
R2 Adj.	0.416	0.426	0.441
AIC	8329.2	8312.2	8302.6
BIC	8344.2	8337.1	8427.6
Log.Lik.	-4161.597	-4151.076	-4126.324
RMSE	10.82	10.72	10.48

Bonus: Omitted Variable Bias (Country Fixed Effects)

Finally, we control for Country Fixed Effects. This controls for unobservable things that make a country unique but don’t change over time (like geography, culture, or history).

This is the most rigorous test. It asks: “When a specific country gets richer than it was before, does its poverty go down?”

# Model 4: Country Fixed Effects (and Time FE)
# We use factor(country) to create a dummy variable for every country
# Note: This requires sufficient data points per country to run effectively
m4_country_fe <- lm(poverty ~ log_gdp + education + urban_pop + factor(year) + factor(country), 
                    data = panel_data)

# Compare the GDP coefficient (The "Main Effect")
coef_simple <- coef(m1_simple)["log_gdp"]
coef_fe <- coef(m4_country_fe)["log_gdp"]

print(paste("GDP Effect (Simple Model):", round(coef_simple, 2)))

## [1] "GDP Effect (Simple Model): -7.18"

print(paste("GDP Effect (Country FE Model):", round(coef_fe, 2)))

## [1] "GDP Effect (Country FE Model): -24.16"

Interpretation: Usually, adding country fixed effects reduces the size of the GDP coefficient. This suggests that simply comparing rich countries to poor countries (Simple Model) overestimates the effect of money. However, if the coefficient remains negative and significant, it confirms that economic growth within a country truly drives poverty reduction.