Introduction

This document serves as an appendix to the article titled “The impact of public transport accessibility on the labor market in Poland – an econometric analysis”. It contains the R code, methodology, and visualizations used to analyze the spatial relationship between public transport accessibility and the registered unemployment rate across Polish municipalities (gminas).

The core of the analysis utilizes Geographically Weighted Regression (GWR) to capture the spatial heterogeneity of this relationship. The code is divided into three main sections:

Data Preparation & Accessibility Index: Construction of the transport accessibility indicator based on various variables.
Exploratory Spatial Data Analysis: Visualizations of the indicator and unemployment rates.
GWR Modeling: Implementation and results of the local regression models.

1. Data Preparation & Accessibility Index

In this section, spatial boundaries (shapefiles) and the data are imported. To quantify public transport accessibility, a composite index is constructed at the gmina level.

Creating the Accesibility Index

The index consists of five key components, which are added together with appropriate weight in order to create the composite accesibility index:

Bus stops density (per sq km at the gmina level).
Transport expenditure (per capita at the gmina level).
Transport lines density (per sq km at the voivodeship level).
Rail stops density (per sq km at the poviat level).
Transport companies (per 100k people at the voivodeship level).

The index relies on the percent_rank transformation to normalize variables with different scales. First all the variables were individually normalized to 0-1 scale and later added together with appropriate weight in order to create the composite accesibility index, which was later also normalized to 0-100 percentile scale for the purpose of easier interpretation.

# Importing the data
dane.gmina <- read_excel("gminy.xlsx")

# Calculating the variables for the accessibility index and normalizing them using percent_rank to account for varying scales and distributions
dane.gmina <- dane.gmina %>%
  mutate(
    # 1. Bus stops per sq km (gmina level)
    indx_bus_stops = percent_rank(bus_stops / area_gmina),
    
    # 2. Transport expenditure per capita (gmina level)
    indx_trans_expen = percent_rank(trans_expen / population),
    
    # 3. Transport lines per sq km (voivodeship level)
    indx_trans_lines = percent_rank(trans_lines / area_wojew),
    
    # 4. Rail stops per sq km (powiat level)
    indx_rail_stops = percent_rank(rail_stops / area_powiat),
    
    # 5. Transport companies per 100k people (voivodeship level)
    indx_trans_comp = percent_rank(trans_comp)
  )

# Calculating the final composite accessibility index for each gmina
# Weights reflect the relative importance of each variable
dane.gmina <- dane.gmina %>%
  mutate(
    access = percent_rank(0.5 * indx_bus_stops + 0.3 * indx_rail_stops +
              0.3 * indx_trans_expen + 0.15 * indx_trans_lines + 0.15 * indx_trans_comp) * 100,
  )

# Calculating the average accessibility score for each poviat (county level)
dane.gmina <- dane.gmina %>%
  group_by(powiat) %>%
  mutate(access_pow = mean(access, na.rm = TRUE)) %>%
  ungroup()

# Displaying a snapshot of the calculated data
head(dane.gmina) %>% 
  kable(caption = "Sample of the data (all variables)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
  scroll_box(width = "100%", height = "300px")

Sample of the data (all variables)
code_match	gmina	powiat	wojewodztwo	unemployment	income	investment	urban	urban_rural	rural	edu_high	bus_stops	trans_expen	population	area_gmina	area_powiat	area_wojew	trans_comp	trans_lines	rail_stops	indx_bus_stops	indx_trans_expen	indx_trans_lines	indx_rail_stops	indx_trans_comp	access	access_pow
201011	Bolesławiec (1)	bolesławiecki	dolnośląskie	1.8	7278.72	3481.2	1	0	0	0.195	162	17280184	37055	23	1303	19947	1.528	783	8	0.9951535	0.5214055	0.7475767	0.3873183	0.4030695	86.71244	33.06408
201022	Bolesławiec (2)	bolesławiecki	dolnośląskie	1.5	5907.56	3481.2	0	0	1	0.149	98	4813471	15267	289	1303	19947	1.528	783	8	0.3937803	0.2265751	0.7475767	0.3873183	0.4030695	28.67528	33.06408
201032	Gromadka (2)	bolesławiecki	dolnośląskie	1.5	8665.13	3481.2	0	0	1	0.108	23	3750357	5003	268	1303	19947	1.528	783	8	0.0306947	0.8037157	0.7475767	0.3873183	0.4030695	27.34249	33.06408
201043	Nowogrodziec (3)	bolesławiecki	dolnośląskie	1.7	7061.37	3481.2	0	1	0	0.101	78	2439519	14622	176	1303	19947	1.528	783	8	0.5016155	0.0222132	0.7475767	0.3873183	0.4030695	27.54443	33.06408
201052	Osiecznica (2)	bolesławiecki	dolnośląskie	2.0	7251.63	3481.2	0	0	1	0.120	34	3078527	7284	437	1303	19947	1.528	783	8	0.0230210	0.4402262	0.7475767	0.3873183	0.4030695	11.34895	33.06408
201062	Warta Bolesławiecka (2)	bolesławiecki	dolnośląskie	2.0	6565.37	3481.2	0	0	1	0.119	29	2469107	8411	110	1303	19947	1.528	783	8	0.2726171	0.1857835	0.7475767	0.3873183	0.4030695	16.76090	33.06408

Merging Spatial and Attribute Data

To perform Geographically Weighted Regression (GWR), polygon data (gmina borders) must be converted into point geometries (centroids). These points allow the GWR algorithm to calculate distance matrices between observations.

# Unifying the ID column format to enable merging
gmina <- gmina %>% mutate(code_match = as.numeric(JPT_KOD_JE),
                            pov_ID = 1:nrow(gmina))

# Merging spatial borders (gmina) with attribute data (dane.gmina)
gmina_data <- merge(gmina, dane.gmina, by = "code_match", all.y = TRUE)

# Converting the sf object to a Spatial object (required by GWmodel package)
gmina_data_sp <- as_Spatial(gmina_data)

# Extracting centroids (geometric centers) of each gmina
gmina_data_point <- gmina_data
gmina_data_point$point_geometry <- st_centroid(st_make_valid(st_geometry(gmina_data_point)))

# Setting the active geometry to points instead of polygons
st_geometry(gmina_data_point) <- "point_geometry" 

# Final conversion to SpatialPointsDataFrame for GWR modeling
gmina_data_sp_point <- as_Spatial(gmina_data_point)

2. Exploratory Spatial Data Analysis

To visualize the spatial heterogeneity of public transport accessibility and the labor market situation, a series of maps were generated.

Gminas and Poviats Accesibility

The following maps present the calculated transport accessibility index on a continuous percentile based scale (0 for worst accesibility and 100 for best).

The left map visualizes the average index values aggregated at the poviat level, highlighting broader regional disparities. The right map shows the original values at the lowest administrative level (gminas), revealing local spatial variations.

plot_poviats <- ggplot() +
  geom_sf(gmina_data, mapping = aes(geometry = geometry, fill = access_pow), inherit.aes = FALSE, color = NA) +
  scale_fill_viridis_c(option = "plasma", n.breaks = 6) +  
  labs(fill = "Transport accesibility\nin poviats") +
  geom_sf(data = poviat, color = "black", fill = NA, linewidth = 0.4) +
  geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.6) +
  plot_theme + legend_style_cont 

plot_gminas <- ggplot() +
  geom_sf(gmina_data, mapping = aes(geometry = geometry, fill = access), inherit.aes = FALSE) +
  scale_fill_viridis_c(option = "plasma", n.breaks = 6) +  
  labs(fill = "Transport accesibility\nin gminas") +
  plot_theme + legend_style_cont

plots_access <- plot_poviats | plot_gminas

plots_access

Best and Worst connected Gminas

To better understand the extremes, the continuous accessibility index was transformed into binary categorical variables. We identified the top and bottom percentiles to pinpoint the municipalities with the absolute highest and lowest access to public transport.

10% Threshold Analysis

p_top10 <- ggplot() +
  geom_sf(data = gmina_data, aes(fill = as.factor(top10)), color = "gray70", size = 0.1) +
  geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.6) +
  scale_fill_manual(values = c("0" = "lightgray", "1" = "red"), 
                    name = "10% best connected",
                    labels = c("below 90th perc.", "above 90th perc.")) +
  plot_theme + legend_style 

p_bot10 <- ggplot() +
  geom_sf(data = gmina_data, aes(fill = as.factor(dolne10)), color = "gray70", size = 0.1) +
  geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.6) +
  scale_fill_manual(values = c("0" = "lightgray", "1" = "blue"), 
                    name = "10% worst connected",
                    labels = c("above 10th perc.", "below 10th perc.")) +
  plot_theme + legend_style

plots_10 <- p_top10 | p_bot10
plots_10

25% Threshold Analysis

Gminas with extreme unemployment rates

Similarly to the accessibility index, we map the municipalities dealing with the highest and lowest unemployment rates. By comparing these spatial patterns with the accessibility maps, preliminary hypotheses regarding their relationship can be formulated before running the GWR model.

3. Geographically Weighted Regression (GWR)

In this section, we estimate the Geographically Weighted Regression (GWR) models to investigate spatial non-stationarity in the relationship between public transport accessibility and the unemployment rate.

Model 1: Individual Gmina’s Accessibility

The first model analyzes the impact of the accessibility index calculated at the gmina level. The formula controls for income, municipal characteristics (urban/rural type), local investments, and the higher education rate. We use an adaptive bandwidth with a bisquare kernel function to account for the varying density of administrative units.

# Setting the seed for reproducibility of results
set.seed(123)

# Converting categorical variables to factors
dane.gmina$rural <- factor(dane.gmina$rural)
dane.gmina$urban_rural <- factor(dane.gmina$urban_rural)
dane.gmina$urban <- factor(dane.gmina$urban)


# Model 1 specification
formula <- unemployment ~ access + income + urban_rural + rural + investment + log(edu_high)

# Finding optimal adaptive bandwidth
bw.poly.adapt <- bw.gwr(formula, data = gmina_data_sp, adaptive = TRUE)

# Estimating the GWR model
gwr.model.adapt <- gwr.basic(formula, data = gmina_data_sp,
                             bw = bw.poly.adapt, kernel = "bisquare", 
                             adaptive = TRUE)

# Displaying the basic output (Global results and GWR summary)
gwr.model.adapt

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2026-03-25 15:30:53.70849 
##    Call:
##    gwr.basic(formula = formula, data = gmina_data_sp, bw = bw.poly.adapt, 
##     kernel = "bisquare", adaptive = TRUE)
## 
##    Dependent (y) variable:  unemployment
##    Independent variables:  access income urban_rural rural investment edu_high
##    Number of data points: 2477
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0248 -1.4997 -0.3551  1.1828 12.6319 
## 
##    Coefficients:
##                     Estimate  Std. Error t value             Pr(>|t|)    
##    (Intercept)    3.29996577  0.41251590   8.000   0.0000000000000019 ***
##    access        -0.01708787  0.00176031  -9.707 < 0.0000000000000002 ***
##    income         0.00011914  0.00002345   5.081   0.0000004033455282 ***
##    urban_rural   -0.69774935  0.16671339  -4.185   0.0000294719965215 ***
##    rural         -1.03074998  0.16034241  -6.428   0.0000000001542249 ***
##    investment    -0.00024003  0.00001982 -12.111 < 0.0000000000000002 ***
##    log(edu_high) -1.35311561  0.18827205  -7.187   0.0000000000008732 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 2.161 on 2470 degrees of freedom
##    Multiple R-squared: 0.1702
##    Adjusted R-squared: 0.1682 
##    F-statistic: 84.45 on 6 and 2470 DF,  p-value: < 0.00000000000000022 
##    ***Extra Diagnostic information
##    Residual sum of squares: 11536.73
##    Sigma(hat): 2.159006
##    AIC:  10856.26
##    AICc:  10856.31
##    BIC:  8488.293
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Adaptive bandwidth: 78 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                            Min.        1st Qu.         Median        3rd Qu.
##    Intercept     -13.1136589093   0.6963776296   2.6621670551   5.0732127824
##    access         -0.0673682515  -0.0157506366  -0.0066535010   0.0036619286
##    income         -0.0008084058  -0.0000841011   0.0000211585   0.0001194731
##    urban_rural    -7.4813988676  -1.3462162985  -0.5842736180   0.1012914585
##    rural          -7.3460975245  -1.9181949747  -1.1249682998  -0.4566211140
##    investment     -0.0026808654  -0.0005507011  -0.0001862082   0.0000066661
##    log(edu_high)  -8.1821903470  -2.7196588827  -1.5015866845  -0.3950199250
##                     Max.
##    Intercept     15.3134
##    access         0.0547
##    income         0.0009
##    urban_rural    2.9855
##    rural          4.1520
##    investment     0.0020
##    log(edu_high)  4.9910
##    ************************Diagnostic information*************************
##    Number of data points: 2477 
##    Effective number of parameters (2trace(S) - trace(S'S)): 687.4203 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1789.58 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 9394.778 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 8582.755 
##    BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 9685.71 
##    Residual sum of squares: 3751.184 
##    R-square value:  0.7301964 
##    Adjusted R-square value:  0.6265004 
## 
##    ***********************************************************************
##    Program stops at: 2026-03-25 15:30:55.54248

Maps of Model 1 Results

To fully interpret the GWR results, the local parameters must be mapped. The following figures present the statistical significance of the accessibility coefficient, the Local \(R^2\) values, and the spatial distribution of the accessibility parameter itself.

Statistical Significance of Accessibility

This map shows where the transport accessibility indicator is statistically significant at the 95% confidence level (\(|t| > 1.96\)).

# Mapping local t-values for the accessibility variable
ggplot()+
  geom_sf(
    data = gmina_data, 
    mapping = aes(geometry = geometry, 
      fill = ifelse(abs(gwr.model.adapt[["SDF"]]@data[["access_TV"]]) > 1.96, "|t| > 1.96", "|t| < 1.96")) 
    , inherit.aes = FALSE, linewidth = 0.1)+
  geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.3) +
  scale_fill_manual(values = c("|t| > 1.96" = "#08306b", "|t| < 1.96" = "#f7fbff"),
                    name = "Significance level") + 
  plot_theme_reg + legend_style

Local R Squared

The Local \(R^2\) map illustrates the goodness-of-fit of the model across different regions. Higher values indicate areas where the model explains a larger portion of the variance in the unemployment rate.

# Mapping Local R Squared
ggplot() +
  geom_sf(
    data = gmina_data, 
    aes(geometry = geometry, 
        fill = cut(gwr.model.adapt[["SDF"]]@data[["Local_R2"]],
                   breaks = c(-Inf, 0.6, 0.69, Inf), 
                   labels = c("< 0.6", "0.6 - 0.69", "> 0.69"))
    ), 
    inherit.aes = FALSE, linewidth = 0.1) +
  geom_sf(data = poviat, color = "black", fill = NA, linewidth = 0.25) +
  scale_fill_manual(values = c("< 0.6" = "#f7fbff", "0.6 - 0.69" = "#6baed6", "> 0.69" = "#08306b"),
                    name = "Local R-Squared") +
  plot_theme_reg + legend_style

Local Accessibility Coefficients

Finally, this map displays the actual values of the local coefficients for the transport accessibility indicator. Negative values (darker colors) indicate areas where higher accessibility is associated with a decrease in the unemployment rate.

# Extracting the local regression coefficients for the accessibility variable
gmina_data$gwr.adapt.access <- gwr.model.adapt$SDF$access


# Mapping the regression coefficients
ggplot() +
  geom_sf(
    data = gmina_data, 
    mapping = aes(
      geometry = geometry, 
      fill = cut(gwr.adapt.access,
                 breaks = c(-0.08, -0.016, -0.007, 0.003, 0.06),
                 labels = c("-0.08 - -0.016", "-0.016 - -0.007", "-0.007 - 0.003", "0.003 - 0.06")) 
    ), 
    inherit.aes = FALSE, linewidth = 0.1) +
  geom_sf(data = poviat, color = "black", fill = NA, linewidth = 0.25) +
  scale_fill_manual(values = c("-0.08 - -0.016" = "#08306b", "-0.016 - -0.007" = "#436EEE", "-0.007 - 0.003" = "#BFEFFF", "0.003 - 0.06" = "#f7fbff"), name = "Coefficient value") +
  plot_theme_reg + legend_style

Model 2: Poviat’s Average Accessibility

The second model shifts the analytical focus from the individual municipality to the broader regional context. By utilizing the average accessibility index calculated at the poviat level (access_pow), we aim to determine whether regional transport infrastructure networks have a significant spillover effect on local unemployment rates within the constituent gminas.

# Setting the seed for reproducibility of results
set.seed(123)

# Model 2 specification - using poviat-level accessibility (access_pow)
formulapow <- unemployment ~ access_pow + income + urban_rural + rural + investment + log(edu_high)

# Finding optimal adaptive bandwidth
bw.poly.adaptpow <- bw.gwr(formulapow, data = gmina_data_sp, adaptive = TRUE)

# Estimating the Model
gwr.model.adapt.pow <- gwr.basic(formulapow, data = gmina_data_sp,
                             bw = bw.poly.adaptpow, kernel = "bisquare", 
                             adaptive = TRUE)
# Displaying the summary of Model 2
gwr.model.adapt.pow

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2026-03-25 15:32:25.487889 
##    Call:
##    gwr.basic(formula = formulapow, data = gmina_data_sp, bw = bw.poly.adaptpow, 
##     kernel = "bisquare", adaptive = TRUE)
## 
##    Dependent (y) variable:  unemployment
##    Independent variables:  access_pow income urban_rural rural investment edu_high
##    Number of data points: 2477
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0993 -1.4785 -0.3321  1.1633 12.9537 
## 
##    Coefficients:
##                     Estimate  Std. Error t value             Pr(>|t|)    
##    (Intercept)    3.26494936  0.39911921   8.180 0.000000000000000448 ***
##    access_pow    -0.02311402  0.00217983 -10.604 < 0.0000000000000002 ***
##    income         0.00010921  0.00002337   4.674 0.000003114444719130 ***
##    urban_rural   -0.59761883  0.16424329  -3.639              0.00028 ***
##    rural         -0.92611134  0.15851125  -5.843 0.000000005817051206 ***
##    investment    -0.00022248  0.00001998 -11.134 < 0.0000000000000002 ***
##    log(edu_high) -1.49049622  0.18186931  -8.195 0.000000000000000396 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 2.154 on 2470 degrees of freedom
##    Multiple R-squared: 0.1761
##    Adjusted R-squared: 0.1741 
##    F-statistic: 87.97 on 6 and 2470 DF,  p-value: < 0.00000000000000022 
##    ***Extra Diagnostic information
##    Residual sum of squares: 11455.41
##    Sigma(hat): 2.151383
##    AIC:  10838.73
##    AICc:  10838.79
##    BIC:  8470.771
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Adaptive bandwidth: 70 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                           Min.       1st Qu.        Median       3rd Qu.
##    Intercept     -11.902411856   0.463767201   2.666398305   5.209141136
##    access_pow     -0.167696221  -0.033186084  -0.008455192   0.014506060
##    income         -0.000852440  -0.000093966   0.000012272   0.000106859
##    urban_rural    -9.659389332  -1.283372987  -0.550733673   0.107293104
##    rural         -14.279864176  -1.831290615  -1.096676074  -0.466332331
##    investment     -0.002915733  -0.000541022  -0.000154112   0.000038355
##    log(edu_high)  -8.038200881  -2.929895931  -1.623132093  -0.450546529
##                     Max.
##    Intercept     18.2824
##    access_pow     0.1402
##    income         0.0009
##    urban_rural    3.1964
##    rural          3.6835
##    investment     0.0046
##    log(edu_high)  5.0283
##    ************************Diagnostic information*************************
##    Number of data points: 2477 
##    Effective number of parameters (2trace(S) - trace(S'S)): 705.3493 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1771.651 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 9202.661 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 8315.042 
##    BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 9643.689 
##    Residual sum of squares: 3322.183 
##    R-square value:  0.7610522 
##    Adjusted R-square value:  0.6658659 
## 
##    ***********************************************************************
##    Program stops at: 2026-03-25 15:32:28.657104

Maps of Model 2 Results

The following spatial visualizations provide insights into the local performance and parameter estimates of Model 2.

Statistical Significance of Regional Accessibility

This map highlights the areas where the poviat-level transport accessibility indicator is statistically significant (\(|t| > 1.96\)) in explaining the local unemployment rate.

Local R-Squared

The Local \(R^2\) map evaluates the goodness-of-fit for Model 2. Comparing this spatial distribution with Model 1 allows us to assess whether regional or local accessibility is a better predictor in specific parts of Poland.

Regional Accessibility Coefficients

This figure illustrates the spatial variation of the access_pow coefficient. It shows the varying magnitude and direction of the relationship between regional transport accessibility and local unemployment.

Model 3: Interaction with Municipality Type

The third model introduces interaction variables between the transport accessibility index and the type of municipality (rural or urban-rural, with urban as the baseline category).

# Setting the seed for reproducibility of results
set.seed(123)

# Model 3 specification - variant of model 1 with interaction variables (accessibility x type of gmina)
formulatyp <- unemployment ~  income + investment + edu_high + access*rural + access*urban_rural

# Finding optimal adaptive bandwidth
bw.poly.adapttyp <- bw.gwr(formulatyp, data = gmina_data_sp, adaptive = TRUE)
bw.poly.adapttyp

# Estimating the GWR model
gwr.model.adapt.typ <- gwr.basic(formulatyp, data = gmina_data_sp,
                                 bw = bw.poly.adapttyp, kernel = "bisquare", 
                                 adaptive = TRUE)

# Displaying the summary of Model 3
gwr.model.adapt.typ

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2026-03-25 15:34:30.92119 
##    Call:
##    gwr.basic(formula = formulatyp, data = gmina_data_sp, bw = bw.poly.adapttyp, 
##     kernel = "bisquare", adaptive = TRUE)
## 
##    Dependent (y) variable:  unemployment
##    Independent variables:  income investment edu_high access rural urban_rural
##    Number of data points: 2477
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1747 -1.5156 -0.3687  1.1885 12.5953 
## 
##    Coefficients:
##                          Estimate  Std. Error t value             Pr(>|t|)    
##    (Intercept)         7.30885075  0.50126884  14.581 < 0.0000000000000002 ***
##    income              0.00012392  0.00002342   5.292     0.00000013157335 ***
##    investment         -0.00023668  0.00001991 -11.890 < 0.0000000000000002 ***
##    edu_high           -8.07448861  1.15484245  -6.992     0.00000000000348 ***
##    access             -0.01968158  0.00544004  -3.618             0.000303 ***
##    rural              -1.13117636  0.45753402  -2.472             0.013490 *  
##    urban_rural        -1.16802992  0.46840449  -2.494             0.012709 *  
##    access:rural        0.00034583  0.00575910   0.060             0.952122    
##    access:urban_rural  0.00786190  0.00613908   1.281             0.200444    
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 2.161 on 2468 degrees of freedom
##    Multiple R-squared: 0.1711
##    Adjusted R-squared: 0.1684 
##    F-statistic: 63.68 on 8 and 2468 DF,  p-value: < 0.00000000000000022 
##    ***Extra Diagnostic information
##    Residual sum of squares: 11524.57
##    Sigma(hat): 2.157868
##    AIC:  10857.64
##    AICc:  10857.73
##    BIC:  8516.94
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Adaptive bandwidth: 155 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.       1st Qu.        Median       3rd Qu.
##    Intercept          -13.055651470   4.208773448   6.672402856  10.249684332
##    income              -0.000446642  -0.000035269   0.000057312   0.000155440
##    investment          -0.001974385  -0.000498653  -0.000208541  -0.000047293
##    edu_high           -38.648363614 -15.853434522  -9.093526914  -2.827677400
##    access              -0.221849608  -0.020274444  -0.003951915   0.009761134
##    rural              -21.527236520  -2.666913916  -0.679488672   0.634964621
##    urban_rural        -22.724820885  -2.323391997  -0.518733164   1.060192015
##    access:rural        -0.229179290  -0.023638306  -0.006819297   0.013308949
##    access:urban_rural  -0.209350598  -0.020901005   0.001611900   0.020822015
##                          Max.
##    Intercept          29.6560
##    income              0.0007
##    investment          0.0008
##    edu_high           17.6287
##    access              0.2238
##    rural              17.9252
##    urban_rural        18.1472
##    access:rural        0.1885
##    access:urban_rural  0.2091
##    ************************Diagnostic information*************************
##    Number of data points: 2477 
##    Effective number of parameters (2trace(S) - trace(S'S)): 448.606 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2028.394 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 9657.408 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 9203.522 
##    BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 9054.178 
##    Residual sum of squares: 5190.713 
##    R-square value:  0.6266584 
##    Adjusted R-square value:  0.5440482 
## 
##    ***********************************************************************
##    Program stops at: 2026-03-25 15:34:33.21276

This specification aimed to test whether the degree of urbanization significantly alters the strength or direction of the relationship between accessibility and unemployment. However, as discussed in the main text, these interaction variables proved to be statistically insignificant in both the global (OLS) and local (GWR) models.

Diagnostic Tests for the Models

To ensure the methodological rigor of the spatial analysis, a series of diagnostic tests proposed by Leung et al. (2000) were conducted.

# Note: eval=FALSE is set to prevent long execution times during document rendering
# the full results were calculated separetely

# 1. Bandwidth selection and model fitting using the 'spgwr' package
# Model 1: Individual gmina accessibility
bw.a.spgwr <- gwr.sel(formula, data=gmina_data_sp, adapt=TRUE, RMSE=TRUE) 
gwr.model.test <- gwr(formula, data = gmina_data_sp, adapt = bw.a.spgwr, hatmatrix = TRUE) 

# Model 2: Average poviat accessibility
bw.a.spgwr.pow <- gwr.sel(formulapow, data=gmina_data_sp, adapt=TRUE, RMSE=TRUE) 
gwr.model.testpow <- gwr(formulapow, data = gmina_data_sp, adapt = bw.a.spgwr.pow, hatmatrix = TRUE) 

# 2. Conducting F1 and F2 tests (GWR vs OLS goodness-of-fit)
LMZ.F1GWR.test(gwr.model.test) 
LMZ.F2GWR.test(gwr.model.test) 
LMZ.F1GWR.test(gwr.model.testpow) 
LMZ.F2GWR.test(gwr.model.testpow) 

# 3. Conducting F3 test (Spatial non-stationarity of coefficients)
LMZ.F3GWR.test(gwr.model.test) 
LMZ.F3GWR.test(gwr.model.testpow) 

# 4. Moran's I test for spatial autocorrelation of GWR residuals
# Creating a spatial weights matrix based on contiguous neighbors
cont.nb <- poly2nb(st_make_valid(gmina))
cont.listw <- nb2listw(cont.nb, style="W")

# Running the Moran's I test on the residuals
spgwr::gwr.morantest(gwr.model.test, cont.listw)

The key findings from these tests are as follows:

GWR vs. OLS Goodness-of-Fit (F1 and F2 tests): Both \(F_1\) and \(F_2\) tests evaluate whether the GWR models describe the data significantly better than the global OLS models. For both the gmina-level and poviat-level accessibility models, the p-values were close to 0 (\(p < 0.001\)). We reject the null hypothesis, confirming that the local GWR models provide a significantly better fit.
Spatial Non-stationarity of Parameters (F3 test): This test checks whether the regression coefficients vary significantly across space. For the primary model, all variables (including transport accessibility) yielded p-values below 0.01. This confirms significant spatial heterogeneity and justifies the use of a geographically weighted approach. Similar results were found for Model 2.
Spatial Autocorrelation of Residuals (Moran’s I): Testing the residuals of the GWR model resulted in a Moran’s I statistic of 0.01 with a p-value of 0.1295. Since we cannot reject the null hypothesis of no spatial autocorrelation, we conclude that the residuals are randomly distributed. The GWR model successfully eliminated the influence of unobserved spatial dependencies.

Summary and Conclusions

This appendix has detailed the methodological workflow, spatial data preparation, and empirical estimation of the Geographically Weighted Regression (GWR) models used in this thesis.

The spatial visualizations and diagnostic tests presented above collectively demonstrate that the relationship between public transport accessibility and the unemployment rate in Poland is highly non-stationary. The GWR approach proved to be vastly superior to traditional global models (OLS), effectively capturing the local variations and eliminating spatial autocorrelation in the residuals.

Appendix: The impact of public transport accessibility on the labor market in Poland

Grzegorz Pakier

Kateryna Zabarina

March 2025