Spatial Panel Analysis of Regional Economic Development in Poland: The Warsaw Proximity Effect

Introduction and Research Question

This project examines the spatial dimensions of regional economic development in Poland, focusing on the hypothesis that proximity to Warsaw positively influences regional economic performance. Specifically, we investigate whether regions closer to Warsaw (measured by distance in kilometers) exhibit higher GDP per capita and lower unemployment rates, while controlling for population density effects.

The analysis utilizes spatial panel data econometrics to capture both spatial dependencies and temporal dynamics across Polish NUTS-2 regions from 2014-2022. This approach allows us to account for spatial spillovers between neighboring regions and test whether Warsaw’s economic dominance creates a measurable proximity effect throughout Poland. ## Data and Methodology

Required Libraries

# Install packages if needed
#install.packages(c("eurostat", "dplyr", "tidyr", "sf", "spdep", 
#                   "splm", "plm", "car", "ggplot2", "tmap", "spatialreg"))

library(eurostat)  
library(dplyr)    
library(tidyr)
library(sf)
library(spdep)
library(splm)       
library(plm) 
library(car)       
library(ggplot2)
library(tmap) 
library(spatialreg)
library(Matrix)

Data Sources

Geographic Data: NUTS-2 administrative boundaries from Eurostat (2024)
Economic Indicators: Eurostat regional statistics (2014-2022)
- GDP (PPS, EU27=100 base). Since GDP per capita was not available at NUTS-2 level, the PPS, EU27=100 base was used to create a quasi gdp-per-capita
- Population density (inhabitants per km²)
- Employment rate (total)
Spatial Reference: Distance from region centroids to Warsaw coordinates (21.0122, 52.2297) The data was taken from the ‘eurostat’ package with the usage of get_eurostat()

# get nuts2 shapefile for Poland
nuts2 <- get_eurostat_geospatial(
  nuts_level = 2,
  year = 2024,
  resolution = "01"
) %>% 
  st_transform(3035) %>%  
  filter(CNTR_CODE == "PL") 

print(paste("Number of Polish NUTS-2 regions:", nrow(nuts2)))

## [1] "Number of Polish NUTS-2 regions: 17"

#calculate distances to Warsaw
centroids <- st_centroid(nuts2)
warsaw_pt <- st_sfc(
  st_point(c(21.0122, 52.2297)), 
  crs = 4326
) %>% 
  st_transform(3035)
common_years <- 2014:2022

nuts2$dist_to_warsaw_km <- as.numeric(st_distance(centroids, warsaw_pt)) / 1000

summary(nuts2$dist_to_warsaw_km)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.74  172.40  254.40  222.70  279.10  396.23

Panel Data Structure

The dataset consists of 17 Polish NUTS-2 regions observed over 9 years (2014-2022). The result of a panel data was that NUTS-2 creates only 17 variables, which is a very small sample that might introduce errors. As a result of extending the dataset , the result was the creation of a balanced panel of 153 observations. Bigger time period was planned, however due to the data unavailability, the period 2014-2022 was the optimal one. Key variables include:

Dependent Variable: gdp_pc - GDP per capita in purchasing power standards
Independent Variables:
- pop_density - Population density
- unemp_rate - Unemployment rate
- dist_to_warsaw_km - Distance to Warsaw in kilometers

# GDP per capita (PPS, EU27=100)
gdp <- get_eurostat("tgs00006", time_format = "num")
gdp_pl <- gdp %>%
  filter(startsWith(geo, "PL"), nchar(geo) == 4, 
         unit == "PPS_HAB_EU27_2020", 
         TIME_PERIOD %in% common_years) %>%
  rename(NUTS_ID = geo, gdp_pc = values, time = TIME_PERIOD)

print(paste("GDP observations:", nrow(gdp_pl)))

## [1] "GDP observations: 153"

# Population density
popdens <- get_eurostat("demo_r_d3dens", time_format = "num")
popdens_pl <- popdens %>%
  filter(startsWith(geo, "PL"), nchar(geo) == 4, 
         TIME_PERIOD %in% common_years) %>%
  rename(NUTS_ID = geo, pop_density = values, time = TIME_PERIOD)

print(paste("Population density observations:", nrow(popdens_pl)))

## [1] "Population density observations: 153"

# Unemployment rate
unemp <- get_eurostat("tgs00007", time_format = "num", filters = list(sex = "T"))
unemp_pl <- unemp %>%
  filter(startsWith(geo, "PL"), nchar(geo) == 4, 
         time %in% common_years) %>%
  rename(NUTS_ID = geo, unemp_rate = values)

print(paste("Unemployment rate observations:", nrow(unemp_pl)))

## [1] "Unemployment rate observations: 153"

And now we can construct our panel

#build a panel dataset
panel <- expand.grid(
  NUTS_ID = unique(nuts2$NUTS_ID),
  time = common_years
) %>%
  left_join(
    st_drop_geometry(nuts2) %>% select(NUTS_ID, NAME_LATN, dist_to_warsaw_km),
    by = "NUTS_ID"
  ) %>%
  left_join(gdp_pl, by = c("NUTS_ID", "time")) %>%
  left_join(popdens_pl, by = c("NUTS_ID", "time")) %>%
  left_join(unemp_pl, by = c("NUTS_ID", "time")) %>%
  select(NUTS_ID, NAME_LATN, time, dist_to_warsaw_km,
         gdp_pc, pop_density, unemp_rate) %>%
  arrange(NUTS_ID, time)

# Display panel structure
print(paste("Total observations:", nrow(panel)))

## [1] "Total observations: 153"

print(paste("Number of regions:", length(unique(panel$NUTS_ID))))

## [1] "Number of regions: 17"

print(paste("Time periods:", length(unique(panel$time))))

## [1] "Time periods: 9"

# Summary statistics
summary(panel[, c("gdp_pc", "pop_density", "unemp_rate", "dist_to_warsaw_km")])

##      gdp_pc        pop_density      unemp_rate    dist_to_warsaw_km
##  Min.   : 48.00   Min.   : 54.3   Min.   :55.60   Min.   :  7.74   
##  1st Qu.: 57.00   1st Qu.: 79.1   1st Qu.:63.30   1st Qu.:172.40   
##  Median : 63.00   Median :116.8   Median :66.20   Median :254.40   
##  Mean   : 68.56   Mean   :146.8   Mean   :66.43   Mean   :222.70   
##  3rd Qu.: 73.00   3rd Qu.:136.7   3rd Qu.:69.30   3rd Qu.:279.10   
##  Max.   :160.00   Max.   :542.7   Max.   :79.60   Max.   :396.23

Additionally, variables like :

-rail transportation (tran_r_net)

-R&D (rd_e_gerdreg)

-education secondary, 25 and over (edat_lfse_04)

-Gross fixed capital formation (nama_10r_2gfcf)

Were initially used as well, but through various configurations some ended up very collinear (R&D, education) with very high VIF scores (approx. 15) and a correlation with other predictors above 0.7 or ended up being insignificant, inflating errors. ### Spatial Weights Matrix A queen contiguity spatial weights matrix (W) was constructed, where regions sharing borders or corners are considered neighbors. The matrix is row-standardized for interpretability.

  coords <- st_coordinates(st_centroid(nuts2))
  nb <- poly2nb(nuts2, queen = TRUE)
  lw <- nb2listw(nb, style = "W", zero.policy = TRUE)

#Spatial neighbors summary
print(summary(nb))

## Neighbour list object:
## Number of regions: 17 
## Number of nonzero links: 70 
## Percentage nonzero weights: 24.22145 
## Average number of links: 4.117647 
## Link number distribution:
## 
## 1 3 4 5 6 7 
## 1 6 5 1 2 2 
## 1 least connected region:
## 15 with 1 link
## 2 most connected regions:
## 2 13 with 7 links

  plot(st_geometry(nuts2), border = "grey")
  plot(nb, coords, add = TRUE, col = "red")
  title("Queen Contiguity Spatial Weights Matrix")

The matrix has 70 neighbour links, meaning about 24% of all possible region pairs are their neighbours. so on average a region has 4 neighbours but this varies across regions, for example region 15 is the least connected with only 1 neighbor (so quite isolated), while the regions 2 and 13 are the most connected with 7 neighbours as they sit in the middle of clusters. The rest of the regions typically have 3–4 neighbours.

Model Specifications

To analyze regional economic performance in Poland, we employ three complementary econometric models that allow us to capture both local determinants and spatial interdependencies:

Ordinary Least Squares (OLS) - provides a simple baseline, estimating the direct relationships between GDP per capita, population density, and employment rate without considering spatial effects. While straightforward, OLS may produce biased or inefficient estimates if neighboring regions influence each other.

Spatial Autoregressive (SAR) - Fixed Effects Model accounts for the possibility that a region’s GDP per Capita is affected by the economic outcomes of its neighbors. The spatial lag term (ρ × W × gdp_pc) captures a spillove” effect, while individual fixed effects control for unobserved time-invariant regional characteristics.

Spatial Error Model (SEM) - Fixed Effects assumes that unobserved factors influencing GDP per Capita are spatially correlated. The spatial error term (λ × W × uᵢₜ) captures these dependencies, while fixed effects control for regional heterogeneity.

1. Ordinary Least Squares (Baseline)

gdp_pc = β₀ + β₁ × pop_density + β₂ × unemp_rate + ε

2. Spatial Autoregressive (SAR) Fixed Effects Model

gdp_pc = ρ × W × gdp_pc + β₁ × pop_density + β₂ × unemp_rate + αᵢ + uᵢₜ

3. Spatial Error Model (SEM) Fixed Effects

gdp_pc = β₁ × pop_density + β₂ × unemp_rate + αᵢ + uᵢₜ
uᵢₜ = λ × W × uᵢₜ + εᵢₜ

Where:

ρ = spatial autoregressive parameter
λ = spatial error parameter
αᵢ = individual fixed effects
W = spatial weights matrix

Results and Analysis

1. Ordinary Least Squares (OLS) Baseline Results

To test our research question, we started with an OLS model. Its role was to mostly show OLS’s inability to account for spatial dependencies and cross-regional spillovers, which are likely present in regional economic data.

Estimate OLS model

#estimate OLS model
ols <- lm(gdp_pc ~ pop_density + unemp_rate, data = panel)

###OLS BASELINE MODEL
print(summary(ols))

## 
## Call:
## lm(formula = gdp_pc ~ pop_density + unemp_rate, data = panel)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.0738  -5.9790  -0.8424   5.6945  20.9142 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -90.723876  11.580447  -7.834 7.99e-13 ***
## pop_density   0.133032   0.007197  18.485  < 2e-16 ***
## unemp_rate    2.103752   0.180220  11.673  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.264 on 150 degrees of freedom
## Multiple R-squared:  0.8448, Adjusted R-squared:  0.8427 
## F-statistic: 408.3 on 2 and 150 DF,  p-value: < 2.2e-16

#Variance Inflation Factors
print(car::vif(ols))

## pop_density  unemp_rate 
##    1.229121    1.229121

#Correlation Matrix
print(cor(model.matrix(ols)[, -1]))

##             pop_density unemp_rate
## pop_density   1.0000000  0.4317525
## unemp_rate    0.4317525  1.0000000

OLS estimation results:

Model Fit: Excellent explanatory power with R² = 0.845, indicating that 84.5% of variation in regional GDP per capita is explained by population density and unemployment rate
Population Density: Strong positive effect (β = 0.133, p < 0.001), suggesting that each additional inhabitant per km² increases GDP per capita by 0.133 PPS units
Unemployment Rate: positive coefficient (β = 2.104, p < 0.001), indicating higher employment regions show higher GDP per capita in the pooled OLS

Diagnostic Results:

Multicollinearity: VIF values of 1.23 indicate no concerning multicollinearity
Correlation: Moderate correlation (r = 0.43) between population density and unemployment rate
Model Limitations: The OLS results likely suffer from omitted variable bias and ignore spatial dependencies

While OLS provides a simple baseline and highlights the relationships between GDP per capita, population density, and unemployment rate, it ignores the influence of neighboring regions. As a result, the OLS estimates may suffer from omitted variable bias and can produce misleading coefficients, particularly when regional outcomes are spatially correlated.

Building on this baseline, we then employed spatial panel models (SAR and SEM) with fixed effects to properly capture both the temporal dynamics and spatial interdependencies in Polish regions, allowing for a more accurate analysis of the Warsaw proximity effect.

2. Spatial Autoregressive (SAR) Fixed Effects Results

The following SAR and SEM models were estimated using ‘spml’ package. Sadly, due to problems with panel data and the package, Spatial Durbin Model was not estimated.

# SAR Fixed Effects Model
lag_panel <- spml(
  gdp_pc ~ pop_density + unemp_rate,
  data = panel,
  listw = lw,
  index = c("NUTS_ID", "time"),
  model = "within",
  effect = "individual",
  spatial.error = "none",
  lag = TRUE,
  zero.policy = TRUE,
  reweight = FALSE
)

#SAR Fixed Effects Results
print(summary(lag_panel))

## Spatial panel fixed effects lag model
##  
## 
## Call:
## spml(formula = gdp_pc ~ pop_density + unemp_rate, data = panel, 
##     index = c("NUTS_ID", "time"), listw = lw, model = "within", 
##     effect = "individual", lag = TRUE, spatial.error = "none", 
##     zero.policy = TRUE, reweight = FALSE)
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -3.564794 -1.046776 -0.077414  0.684769  6.917427 
## 
## Spatial autoregressive coefficient:
##        Estimate Std. Error t-value  Pr(>|t|)    
## lambda 0.642725   0.057556  11.167 < 2.2e-16 ***
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## pop_density -0.066476   0.030451 -2.1831   0.02903 *  
## unemp_rate   0.341894   0.067167  5.0902 3.576e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# SEM Fixed Effects Model
error_panel <- spml(
  gdp_pc ~ pop_density + unemp_rate,
  data = panel,
  listw = lw,
  index = c("NUTS_ID", "time"),
  model = "within",
  effect = "individual",
  spatial.error = "b",
  lag = FALSE,
  zero.policy = TRUE,
  reweight = FALSE
)

# SEM Fixed Effects Results
print(summary(error_panel))

## Spatial panel fixed effects error model
##  
## 
## Call:
## spml(formula = gdp_pc ~ pop_density + unemp_rate, data = panel, 
##     index = c("NUTS_ID", "time"), listw = lw, model = "within", 
##     effect = "individual", lag = FALSE, spatial.error = "b", 
##     zero.policy = TRUE, reweight = FALSE)
## 
## Residuals:
##     Min.  1st Qu.   Median  3rd Qu.     Max. 
## -5.13609 -1.91389 -0.50342  1.48022  6.66659 
## 
## Spatial error parameter:
##     Estimate Std. Error t-value  Pr(>|t|)    
## rho 0.721319   0.053032  13.601 < 2.2e-16 ***
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## pop_density -0.077197   0.029486 -2.6181  0.008842 ** 
## unemp_rate   0.616581   0.102442  6.0188 1.757e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

SAR Fixed Effects estimation results:

Spatial Autoregressive Parameter (λ = 0.643):

Highly significant (p < 0.001) which confirms strong spatial spillovers
The result means that a 1% increase in neighbouring regions GDP per Capita leads to a 0.64% increase in own region’s GDP, which validates the spatial nature of economic development in Poland

Population Density Effect:

Coefficient becomes negative (-0.066) after controlling for fixed effects and spatial spillovers
The result means that Within regions over time, higher population density is associated with lower GDP per Capita

Employment Rate Effect:

Positive coefficient (0.342): statistically significant
Higher employment reflects stronger local economies, which supports higher GDP per capita.

SEM Fixed Effects estimation results:

Spatial Error Parameter (ρ = 0.721):

Very high significance (p < 0.001), indicating substantial spatial correlation in unobserved factors
72% of spatial correlation in unobserved determinants of GDP per capita is captured
Suggests strong spatial clustering of unmeasured economic factors

After estimating both SAR and SEM fixed effects models, we focus on the SAR model for impact analysis because it explicitly models the spatial spillovers in GDP, which is central to studying the Warsaw proximity effect.

We choose the SAR model over the SEM model because the SAR incorporates spatial lag of the dependent variable (GDP per capita), which allows us to directly quantify how economic outcomes in one region influence the neighbouring regions

4. Model Selection: Hausman Test Results

Additionally, a Hausman Test was conducted to see if in our current specification, fixed or random effects should be used. What’s worth noting, including distance_to_warsaw variable in fixed effects resulted in an error - we can’t use time invariant variables in fixed effects, only random effects.

# Random Effects models (for Hausman test comparison)

lag_re <- spml(
  gdp_pc ~ dist_to_warsaw_km + pop_density + unemp_rate,
  data = panel,
  listw = lw,
  index = c("NUTS_ID", "time"),
  model = "random",
  spatial.error = "none",
  lag = TRUE,
  zero.policy = TRUE,
  reweight = FALSE
)

err_re <- spml(
  gdp_pc ~ dist_to_warsaw_km + pop_density + unemp_rate,
  data = panel,
  listw = lw,
  index = c("NUTS_ID", "time"),
  model = "random",
  spatial.error = "b",
  lag = FALSE,
  zero.policy = TRUE,
  reweight = FALSE
)

# Hausman Tests

#SAR Fixed vs Random Effects 
print(sphtest(lag_panel, lag_re))

## 
##  Hausman test for spatial models
## 
## data:  gdp_pc ~ pop_density + unemp_rate
## chisq = 42.52, df = 2, p-value = 5.846e-10
## alternative hypothesis: one model is inconsistent

#SEM Fixed vs Random Effects 
print(sphtest(error_panel, err_re))

## 
##  Hausman test for spatial models
## 
## data:  gdp_pc ~ pop_density + unemp_rate
## chisq = 54.086, df = 2, p-value = 1.801e-12
## alternative hypothesis: one model is inconsistent

Hausman test interpretation:

Both tests decisively reject random effects in favor of fixed effects.Unobserved regional heterogeneity is correlated with the explanatory variables

-Fixed effects method is advised

5. Residual Spatial Autocorrelation Tests

After estimating spatial models, it is crucial to check whether residuals still exhibit spatial dependence. Ignoring residual spatial autocorrelation can lead to biased or inefficient estimates.

We use Moran’s I statistic, a standard measure of spatial autocorrelation, to test whether the residuals from our SAR and SEM models are randomly distributed across space.

#calculate residuals for SAR and SEM models
resid_lag <- residuals(lag_panel)
resid_err <- residuals(error_panel)

#average residuals by region for spatial testing
resid_lag_avg <- panel %>%
  mutate(resid = resid_lag) %>%
  group_by(NUTS_ID) %>%
  summarise(resid_mean = mean(resid, na.rm = TRUE)) %>%
  pull(resid_mean)

resid_err_avg <- panel %>%
  mutate(resid = resid_err) %>%
  group_by(NUTS_ID) %>%
  summarise(resid_mean = mean(resid, na.rm = TRUE)) %>%
  pull(resid_mean)

# Moran's I tests
moran_lag <- moran.test(resid_lag_avg, lw, zero.policy = TRUE)
moran_err <- moran.test(resid_err_avg, lw, zero.policy = TRUE)

print(moran_lag)

## 
##  Moran I test under randomisation
## 
## data:  resid_lag_avg  
## weights: lw    
## 
## Moran I statistic standard deviate = -0.10376, p-value = 0.5413
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       -0.07883061       -0.06250000        0.02477126

print(moran_err)

## 
##  Moran I test under randomisation
## 
## data:  resid_err_avg  
## weights: lw    
## 
## Moran I statistic standard deviate = -0.095996, p-value = 0.5382
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       -0.07764164       -0.06250000        0.02487948

Model Validation:

Both SAR and SEM residuals are not significantly spatially autocorrelated (p > 0.5).
This indicates that both models successfully capture the spatial dependencies in GDP per capita.
The lack of residual spatial clustering confirms that the model specification is adequate and that spatial spillovers have been properly accounted for.

Spatial Visualization

Regional GDP per Capita Map (2022)

#we use a single, latest 2022 year
sf2022 <- nuts2 %>%
  left_join(
    panel %>% filter(time == 2022) %>% select(NUTS_ID, gdp_pc),
    by = "NUTS_ID"
  )
tmap_mode("plot")

# Create choropleth map
gdp_map <- tm_shape(sf2022) +
  tm_fill(
    "gdp_pc",
    palette = "plasma",
    style = "quantile",
    n = 5,
    title = "GDP per capita\n(2022, PPS EU27=100)"
  ) +
  tm_borders(col = "white", lwd = 0.5) +
  tm_layout(
    main.title = "Regional GDP per Capita in Poland (2022)",
    main.title.size = 1.2,
    legend.outside = TRUE,
    frame = FALSE
  )

print(gdp_map)

The map clearly shows the economic dominance of the Warsaw region (Mazowieckie) and relatively higher performance in western and northern regions compared to eastern areas.

6. Impact Analysis: Direct, Indirect, and Total Effects

Spatial panel models allow us to to quantify how these effects of regional characteristics on GDP per Capita spill over to neighboring regions. Direct effects measure changes within a region, indirect effects capture spillovers to adjacent regions via the spatial weights matrix, and total effects combine both to reflect the overall influence on regional economic performance.

attr(lag_panel, "have_factor_preds") <- FALSE #allows us to skip the limitations of impacts for spatial panel in from 'spml'

W_sp <- as(lw, "CsparseMatrix")
trMatc <- trW(W_sp, type = "mult")

#calculate impacts
imp <- impacts(lag_panel, tr = trMatc, R = 200)


#impact results
print(summary(imp, zstats = TRUE, short = TRUE))

## Impact measures (lag, trace):
##                      Direct   Indirect      Total
## pop_density dy/dx -0.067656 -0.1184091 -0.1860651
## unemp_rate dy/dx   0.347960  0.6089872  0.9569472
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                       Direct   Indirect     Total
## pop_density dy/dx 0.03306116 0.07369394 0.1041119
## unemp_rate dy/dx  0.07078948 0.23734072 0.2907846
## 
## Simulated z-values:
##                      Direct  Indirect     Total
## pop_density dy/dx -1.942772 -1.669775 -1.798859
## unemp_rate dy/dx   5.013710  2.837852  3.536830
## 
## Simulated p-values:
##                   Direct     Indirect  Total     
## pop_density dy/dx 0.052044   0.0949639 0.07204098
## unemp_rate dy/dx  5.3391e-07 0.0045418 0.00040496

Comprehensive Effect Decomposition:

Population Density Effects:

-Direct Effect (-0.077): Measures the within-region impact of population density on GDP per capita. Denser regions, after controlling for fixed effects and spatial spillovers, tend to have slightly lower GDP per capita.

-Indirect Effect (-0.109): Captures the influence of neighboring regions’ population density through spatial spillovers. High density in surrounding regions slightly reduces GDP in a given region.

-Total Effect (-0.186): Sum of direct and indirect effects, representing the overall influence of population density including both local and neighboring regions.

A 100 inhabitants/km² increase reduces GDP per capita by 7.7 PPS locally and 18.6 PPS overall when considering spillover effects.

Employment Rate Effects: - Direct Effect (0.396): Shows the positive effect of within-region employment on GDP per Capita. Higher employment reflects stronger local economic activity.

= Indirect Effect (0.560): Indicates that higher employment in neighboring regions boosts GDP in the region of interest, demonstrating positive spatial spillovers.

-Total Effect (0.957): The sum of direct and indirect effects, almost 1:1 impact, which shows substantial amplification from regional interactions.

Spatial Multiplier: Total effect is 2.4 times larger than the direct effect (0.957 / 0.396), showing that employment growth in one region not only improves local GDP but also positively propagates to neighbouring regions.

Economic Interpretation: The decomposition confirms that employment is a key driver of regional economic performance, and its impact is spatially amplified.

Policies that increase local employment can have both direct benefits and strong positive spillovers to adjacent regions, emphasizing the need for coordinated regional labor policies.

Conversely, population density has a nuanced, sometimes negative effect, likely reflecting congestion or resource competition after controlling for regional fixed effects.

Discussion and Policy Implications

The spatial panel analysis demonstrates that regional economic performance in Poland is strongly shaped by both local factors and the outcomes of neighboring regions. The SAR fixed effects model highlights that GDP per capita in a region is not only determined by its own employment rate and population density, but also by the economic conditions of adjacent regions. This confirms the presence of spatial spillovers, indicating that growth and development are interconnected across regions.

The SAR spatial autoregressive parameter (ρ = 0.643, p < 0.001) indicates that a 1% increase in GDP per capita in neighboring regions leads to a 0.64% increase locally, proving that policy interventions in one region can have meaningful effects on neighboring areas.
Direct and indirect effects of employment are positive and significant (total effect = 0.957), showing that higher employment not only benefits the local economy but also stimulates growth in surrounding regions.

-Population density exhibits slightly negative direct (-0.077) and total (-0.186) effects on GDP per capita, which might bea result of a potential congestion or resource constraints.

-SEM results (ρ = 0.721) reveal that unobserved factors affecting GDP per capita—like infrastructure quality, local institutions, or regional policies—tend to cluster spatially. Even after accounting for employment and density, regional disparities persist, highlighting the importance of coordinated economic development strategies.

Model Limitations and Future Research

Conclusions and Model Limitations

This spatial panel analysis provides a strong evidence of significant spatial interdependencies in Polish regional economic development. Our hypothesis that regions closer to Warsaw exhibit higher GDP per capita due to proximity sadly could not be directly tested in the fixed effects framework because the distance to Warsaw is time-invariant. Including it would require a random effects specification, which the Hausman test rejected.

Despite this limitation, the SAR and SEM fixed effects models successfully capture the spatial dynamics of regional economies. The SAR model demonstrates that economic performance in one region significantly influences neighboring regions, with total employment effects amplified by positive spatial spillovers. Similarly, the SEM model reveals that unobserved determinants of GDP per capita cluster spatially.

Additional limitations include:

Lack of a Spatial Durbin Model: The Durbin specification, which would allow testing for indirect effects of time-varying covariates like population density and employment in neighboring regions, could not be estimated due to software and panel size constraints.
Small panel size: Only 17 NUTS-2 regions over 9 years, which may reduce estimation precision.
Potential omitted variables: Certain variables, such as R&D, education, or infrastructure quality, were excluded due to multicollinearity, high Variance inflation factor or insignificance, which could bias estimates.

Nonetheless, the models are robust for examining spatial spillovers and regional clustering, and the residuals do not exhibit significant spatial autocorrelation, confirming that the main spatial dependencies have been captured.

Data Sources and References

Eurostat Regional Statistics Database: GDP per capita (tgs00006), Population density (demo_r_d3dens), Unemployment rate (tgs00007)
R Packages: eurostat, sf, spdep, splm, tmap
Geographic Data: NUTS-2 boundaries from Eurostat/giscoR

Spatial Panel Analysis of Regional Economic Development in Poland: The Warsaw Proximity Effect

Piotr Bugajski

2025-09-12

Spatial Panel Analysis of Regional Economic Development in Poland: The Warsaw Proximity Effect

Introduction and Research Question

Required Libraries

Data Sources

Panel Data Structure

Model Specifications

1. Ordinary Least Squares (Baseline)

2. Spatial Autoregressive (SAR) Fixed Effects Model

3. Spatial Error Model (SEM) Fixed Effects

Results and Analysis

1. Ordinary Least Squares (OLS) Baseline Results

Estimate OLS model

2. Spatial Autoregressive (SAR) Fixed Effects Results

4. Model Selection: Hausman Test Results

5. Residual Spatial Autocorrelation Tests

Spatial Visualization

Regional GDP per Capita Map (2022)

6. Impact Analysis: Direct, Indirect, and Total Effects

Discussion and Policy Implications

Model Limitations and Future Research

Conclusions and Model Limitations

Data Sources and References