This document serves as an appendix to the article titled “The impact of public transport accessibility on the labor market in Poland – an econometric analysis”. It contains the R code, methodology, and visualizations used to analyze the spatial relationship between public transport accessibility and the registered unemployment rate across Polish municipalities (gminas).
The core of the analysis utilizes Geographically Weighted Regression (GWR) to capture the spatial heterogeneity of this relationship. The code is divided into three main sections:
In this section, spatial boundaries (shapefiles) and the data are imported. To quantify public transport accessibility, a composite index is constructed at the gmina level.
The index consists of five key components, which are added together with appropriate weight in order to create the composite accesibility index:
The index relies on the percent_rank transformation to
normalize variables with different scales. First all the variables were
individually normalized to 0-1 scale and later added together with
appropriate weight in order to create the composite accesibility index,
which was later also normalized to 0-100 percentile scale for the
purpose of easier interpretation.
# Importing the data
dane.gmina <- read_excel("gminy.xlsx")
# Calculating the variables for the accessibility index and normalizing them using percent_rank to account for varying scales and distributions
dane.gmina <- dane.gmina %>%
mutate(
# 1. Bus stops per sq km (gmina level)
indx_bus_stops = percent_rank(bus_stops / area_gmina),
# 2. Transport expenditure per capita (gmina level)
indx_trans_expen = percent_rank(trans_expen / population),
# 3. Transport lines per sq km (voivodeship level)
indx_trans_lines = percent_rank(trans_lines / area_wojew),
# 4. Rail stops per sq km (powiat level)
indx_rail_stops = percent_rank(rail_stops / area_powiat),
# 5. Transport companies per 100k people (voivodeship level)
indx_trans_comp = percent_rank(trans_comp)
)
# Calculating the final composite accessibility index for each gmina
# Weights reflect the relative importance of each variable
dane.gmina <- dane.gmina %>%
mutate(
access = percent_rank(0.5 * indx_bus_stops + 0.3 * indx_rail_stops +
0.3 * indx_trans_expen + 0.15 * indx_trans_lines + 0.15 * indx_trans_comp) * 100,
)
# Calculating the average accessibility score for each poviat (county level)
dane.gmina <- dane.gmina %>%
group_by(powiat) %>%
mutate(access_pow = mean(access, na.rm = TRUE)) %>%
ungroup()
# Displaying a snapshot of the calculated data
head(dane.gmina) %>%
kable(caption = "Sample of the data (all variables)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "300px")
| code_match | gmina | powiat | wojewodztwo | unemployment | income | investment | urban | urban_rural | rural | edu_high | bus_stops | trans_expen | population | area_gmina | area_powiat | area_wojew | trans_comp | trans_lines | rail_stops | indx_bus_stops | indx_trans_expen | indx_trans_lines | indx_rail_stops | indx_trans_comp | access | access_pow |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 201011 | Bolesławiec (1) | bolesławiecki | dolnośląskie | 1.8 | 7278.72 | 3481.2 | 1 | 0 | 0 | 0.195 | 162 | 17280184 | 37055 | 23 | 1303 | 19947 | 1.528 | 783 | 8 | 0.9951535 | 0.5214055 | 0.7475767 | 0.3873183 | 0.4030695 | 86.71244 | 33.06408 |
| 201022 | Bolesławiec (2) | bolesławiecki | dolnośląskie | 1.5 | 5907.56 | 3481.2 | 0 | 0 | 1 | 0.149 | 98 | 4813471 | 15267 | 289 | 1303 | 19947 | 1.528 | 783 | 8 | 0.3937803 | 0.2265751 | 0.7475767 | 0.3873183 | 0.4030695 | 28.67528 | 33.06408 |
| 201032 | Gromadka (2) | bolesławiecki | dolnośląskie | 1.5 | 8665.13 | 3481.2 | 0 | 0 | 1 | 0.108 | 23 | 3750357 | 5003 | 268 | 1303 | 19947 | 1.528 | 783 | 8 | 0.0306947 | 0.8037157 | 0.7475767 | 0.3873183 | 0.4030695 | 27.34249 | 33.06408 |
| 201043 | Nowogrodziec (3) | bolesławiecki | dolnośląskie | 1.7 | 7061.37 | 3481.2 | 0 | 1 | 0 | 0.101 | 78 | 2439519 | 14622 | 176 | 1303 | 19947 | 1.528 | 783 | 8 | 0.5016155 | 0.0222132 | 0.7475767 | 0.3873183 | 0.4030695 | 27.54443 | 33.06408 |
| 201052 | Osiecznica (2) | bolesławiecki | dolnośląskie | 2.0 | 7251.63 | 3481.2 | 0 | 0 | 1 | 0.120 | 34 | 3078527 | 7284 | 437 | 1303 | 19947 | 1.528 | 783 | 8 | 0.0230210 | 0.4402262 | 0.7475767 | 0.3873183 | 0.4030695 | 11.34895 | 33.06408 |
| 201062 | Warta Bolesławiecka (2) | bolesławiecki | dolnośląskie | 2.0 | 6565.37 | 3481.2 | 0 | 0 | 1 | 0.119 | 29 | 2469107 | 8411 | 110 | 1303 | 19947 | 1.528 | 783 | 8 | 0.2726171 | 0.1857835 | 0.7475767 | 0.3873183 | 0.4030695 | 16.76090 | 33.06408 |
To perform Geographically Weighted Regression (GWR), polygon data (gmina borders) must be converted into point geometries (centroids). These points allow the GWR algorithm to calculate distance matrices between observations.
# Unifying the ID column format to enable merging
gmina <- gmina %>% mutate(code_match = as.numeric(JPT_KOD_JE),
pov_ID = 1:nrow(gmina))
# Merging spatial borders (gmina) with attribute data (dane.gmina)
gmina_data <- merge(gmina, dane.gmina, by = "code_match", all.y = TRUE)
# Converting the sf object to a Spatial object (required by GWmodel package)
gmina_data_sp <- as_Spatial(gmina_data)
# Extracting centroids (geometric centers) of each gmina
gmina_data_point <- gmina_data
gmina_data_point$point_geometry <- st_centroid(st_make_valid(st_geometry(gmina_data_point)))
# Setting the active geometry to points instead of polygons
st_geometry(gmina_data_point) <- "point_geometry"
# Final conversion to SpatialPointsDataFrame for GWR modeling
gmina_data_sp_point <- as_Spatial(gmina_data_point)
To visualize the spatial heterogeneity of public transport accessibility and the labor market situation, a series of maps were generated.
The following maps present the calculated transport accessibility index on a continuous percentile based scale (0 for worst accesibility and 100 for best).
The left map visualizes the average index values aggregated at the poviat level, highlighting broader regional disparities. The right map shows the original values at the lowest administrative level (gminas), revealing local spatial variations.
plot_poviats <- ggplot() +
geom_sf(gmina_data, mapping = aes(geometry = geometry, fill = access_pow), inherit.aes = FALSE, color = NA) +
scale_fill_viridis_c(option = "plasma", n.breaks = 6) +
labs(fill = "Transport accesibility\nin poviats") +
geom_sf(data = poviat, color = "black", fill = NA, linewidth = 0.4) +
geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.6) +
plot_theme + legend_style_cont
plot_gminas <- ggplot() +
geom_sf(gmina_data, mapping = aes(geometry = geometry, fill = access), inherit.aes = FALSE) +
scale_fill_viridis_c(option = "plasma", n.breaks = 6) +
labs(fill = "Transport accesibility\nin gminas") +
plot_theme + legend_style_cont
plots_access <- plot_poviats | plot_gminas
plots_access
To better understand the extremes, the continuous accessibility index was transformed into binary categorical variables. We identified the top and bottom percentiles to pinpoint the municipalities with the absolute highest and lowest access to public transport.
p_top10 <- ggplot() +
geom_sf(data = gmina_data, aes(fill = as.factor(top10)), color = "gray70", size = 0.1) +
geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.6) +
scale_fill_manual(values = c("0" = "lightgray", "1" = "red"),
name = "10% best connected",
labels = c("below 90th perc.", "above 90th perc.")) +
plot_theme + legend_style
p_bot10 <- ggplot() +
geom_sf(data = gmina_data, aes(fill = as.factor(dolne10)), color = "gray70", size = 0.1) +
geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.6) +
scale_fill_manual(values = c("0" = "lightgray", "1" = "blue"),
name = "10% worst connected",
labels = c("above 10th perc.", "below 10th perc.")) +
plot_theme + legend_style
plots_10 <- p_top10 | p_bot10
plots_10
Similarly to the accessibility index, we map the municipalities dealing with the highest and lowest unemployment rates. By comparing these spatial patterns with the accessibility maps, preliminary hypotheses regarding their relationship can be formulated before running the GWR model.
In this section, we estimate the Geographically Weighted Regression (GWR) models to investigate spatial non-stationarity in the relationship between public transport accessibility and the unemployment rate.
The first model analyzes the impact of the accessibility index calculated at the gmina level. The formula controls for income, municipal characteristics (urban/rural type), local investments, and the higher education rate. We use an adaptive bandwidth with a bisquare kernel function to account for the varying density of administrative units.
# Setting the seed for reproducibility of results
set.seed(123)
# Converting categorical variables to factors
dane.gmina$rural <- factor(dane.gmina$rural)
dane.gmina$urban_rural <- factor(dane.gmina$urban_rural)
dane.gmina$urban <- factor(dane.gmina$urban)
# Model 1 specification
formula <- unemployment ~ access + income + urban_rural + rural + investment + log(edu_high)
# Finding optimal adaptive bandwidth
bw.poly.adapt <- bw.gwr(formula, data = gmina_data_sp, adaptive = TRUE)
# Estimating the GWR model
gwr.model.adapt <- gwr.basic(formula, data = gmina_data_sp,
bw = bw.poly.adapt, kernel = "bisquare",
adaptive = TRUE)
# Displaying the basic output (Global results and GWR summary)
gwr.model.adapt
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2026-03-25 15:30:53.70849
## Call:
## gwr.basic(formula = formula, data = gmina_data_sp, bw = bw.poly.adapt,
## kernel = "bisquare", adaptive = TRUE)
##
## Dependent (y) variable: unemployment
## Independent variables: access income urban_rural rural investment edu_high
## Number of data points: 2477
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0248 -1.4997 -0.3551 1.1828 12.6319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.29996577 0.41251590 8.000 0.0000000000000019 ***
## access -0.01708787 0.00176031 -9.707 < 0.0000000000000002 ***
## income 0.00011914 0.00002345 5.081 0.0000004033455282 ***
## urban_rural -0.69774935 0.16671339 -4.185 0.0000294719965215 ***
## rural -1.03074998 0.16034241 -6.428 0.0000000001542249 ***
## investment -0.00024003 0.00001982 -12.111 < 0.0000000000000002 ***
## log(edu_high) -1.35311561 0.18827205 -7.187 0.0000000000008732 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 2.161 on 2470 degrees of freedom
## Multiple R-squared: 0.1702
## Adjusted R-squared: 0.1682
## F-statistic: 84.45 on 6 and 2470 DF, p-value: < 0.00000000000000022
## ***Extra Diagnostic information
## Residual sum of squares: 11536.73
## Sigma(hat): 2.159006
## AIC: 10856.26
## AICc: 10856.31
## BIC: 8488.293
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Adaptive bandwidth: 78 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -13.1136589093 0.6963776296 2.6621670551 5.0732127824
## access -0.0673682515 -0.0157506366 -0.0066535010 0.0036619286
## income -0.0008084058 -0.0000841011 0.0000211585 0.0001194731
## urban_rural -7.4813988676 -1.3462162985 -0.5842736180 0.1012914585
## rural -7.3460975245 -1.9181949747 -1.1249682998 -0.4566211140
## investment -0.0026808654 -0.0005507011 -0.0001862082 0.0000066661
## log(edu_high) -8.1821903470 -2.7196588827 -1.5015866845 -0.3950199250
## Max.
## Intercept 15.3134
## access 0.0547
## income 0.0009
## urban_rural 2.9855
## rural 4.1520
## investment 0.0020
## log(edu_high) 4.9910
## ************************Diagnostic information*************************
## Number of data points: 2477
## Effective number of parameters (2trace(S) - trace(S'S)): 687.4203
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1789.58
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 9394.778
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 8582.755
## BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 9685.71
## Residual sum of squares: 3751.184
## R-square value: 0.7301964
## Adjusted R-square value: 0.6265004
##
## ***********************************************************************
## Program stops at: 2026-03-25 15:30:55.54248
To fully interpret the GWR results, the local parameters must be mapped. The following figures present the statistical significance of the accessibility coefficient, the Local \(R^2\) values, and the spatial distribution of the accessibility parameter itself.
Statistical Significance of Accessibility
This map shows where the transport accessibility indicator is statistically significant at the 95% confidence level (\(|t| > 1.96\)).
# Mapping local t-values for the accessibility variable
ggplot()+
geom_sf(
data = gmina_data,
mapping = aes(geometry = geometry,
fill = ifelse(abs(gwr.model.adapt[["SDF"]]@data[["access_TV"]]) > 1.96, "|t| > 1.96", "|t| < 1.96"))
, inherit.aes = FALSE, linewidth = 0.1)+
geom_sf(data = wojewodztwa, color = "black", fill = NA, linewidth = 0.3) +
scale_fill_manual(values = c("|t| > 1.96" = "#08306b", "|t| < 1.96" = "#f7fbff"),
name = "Significance level") +
plot_theme_reg + legend_style
Local R Squared
The Local \(R^2\) map illustrates the goodness-of-fit of the model across different regions. Higher values indicate areas where the model explains a larger portion of the variance in the unemployment rate.
# Mapping Local R Squared
ggplot() +
geom_sf(
data = gmina_data,
aes(geometry = geometry,
fill = cut(gwr.model.adapt[["SDF"]]@data[["Local_R2"]],
breaks = c(-Inf, 0.6, 0.69, Inf),
labels = c("< 0.6", "0.6 - 0.69", "> 0.69"))
),
inherit.aes = FALSE, linewidth = 0.1) +
geom_sf(data = poviat, color = "black", fill = NA, linewidth = 0.25) +
scale_fill_manual(values = c("< 0.6" = "#f7fbff", "0.6 - 0.69" = "#6baed6", "> 0.69" = "#08306b"),
name = "Local R-Squared") +
plot_theme_reg + legend_style
Local Accessibility Coefficients
Finally, this map displays the actual values of the local coefficients for the transport accessibility indicator. Negative values (darker colors) indicate areas where higher accessibility is associated with a decrease in the unemployment rate.
# Extracting the local regression coefficients for the accessibility variable
gmina_data$gwr.adapt.access <- gwr.model.adapt$SDF$access
# Mapping the regression coefficients
ggplot() +
geom_sf(
data = gmina_data,
mapping = aes(
geometry = geometry,
fill = cut(gwr.adapt.access,
breaks = c(-0.08, -0.016, -0.007, 0.003, 0.06),
labels = c("-0.08 - -0.016", "-0.016 - -0.007", "-0.007 - 0.003", "0.003 - 0.06"))
),
inherit.aes = FALSE, linewidth = 0.1) +
geom_sf(data = poviat, color = "black", fill = NA, linewidth = 0.25) +
scale_fill_manual(values = c("-0.08 - -0.016" = "#08306b", "-0.016 - -0.007" = "#436EEE", "-0.007 - 0.003" = "#BFEFFF", "0.003 - 0.06" = "#f7fbff"), name = "Coefficient value") +
plot_theme_reg + legend_style
The second model shifts the analytical focus from the individual
municipality to the broader regional context. By utilizing the average
accessibility index calculated at the poviat level
(access_pow), we aim to determine whether regional
transport infrastructure networks have a significant spillover effect on
local unemployment rates within the constituent gminas.
# Setting the seed for reproducibility of results
set.seed(123)
# Model 2 specification - using poviat-level accessibility (access_pow)
formulapow <- unemployment ~ access_pow + income + urban_rural + rural + investment + log(edu_high)
# Finding optimal adaptive bandwidth
bw.poly.adaptpow <- bw.gwr(formulapow, data = gmina_data_sp, adaptive = TRUE)
# Estimating the Model
gwr.model.adapt.pow <- gwr.basic(formulapow, data = gmina_data_sp,
bw = bw.poly.adaptpow, kernel = "bisquare",
adaptive = TRUE)
# Displaying the summary of Model 2
gwr.model.adapt.pow
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2026-03-25 15:32:25.487889
## Call:
## gwr.basic(formula = formulapow, data = gmina_data_sp, bw = bw.poly.adaptpow,
## kernel = "bisquare", adaptive = TRUE)
##
## Dependent (y) variable: unemployment
## Independent variables: access_pow income urban_rural rural investment edu_high
## Number of data points: 2477
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0993 -1.4785 -0.3321 1.1633 12.9537
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.26494936 0.39911921 8.180 0.000000000000000448 ***
## access_pow -0.02311402 0.00217983 -10.604 < 0.0000000000000002 ***
## income 0.00010921 0.00002337 4.674 0.000003114444719130 ***
## urban_rural -0.59761883 0.16424329 -3.639 0.00028 ***
## rural -0.92611134 0.15851125 -5.843 0.000000005817051206 ***
## investment -0.00022248 0.00001998 -11.134 < 0.0000000000000002 ***
## log(edu_high) -1.49049622 0.18186931 -8.195 0.000000000000000396 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 2.154 on 2470 degrees of freedom
## Multiple R-squared: 0.1761
## Adjusted R-squared: 0.1741
## F-statistic: 87.97 on 6 and 2470 DF, p-value: < 0.00000000000000022
## ***Extra Diagnostic information
## Residual sum of squares: 11455.41
## Sigma(hat): 2.151383
## AIC: 10838.73
## AICc: 10838.79
## BIC: 8470.771
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Adaptive bandwidth: 70 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -11.902411856 0.463767201 2.666398305 5.209141136
## access_pow -0.167696221 -0.033186084 -0.008455192 0.014506060
## income -0.000852440 -0.000093966 0.000012272 0.000106859
## urban_rural -9.659389332 -1.283372987 -0.550733673 0.107293104
## rural -14.279864176 -1.831290615 -1.096676074 -0.466332331
## investment -0.002915733 -0.000541022 -0.000154112 0.000038355
## log(edu_high) -8.038200881 -2.929895931 -1.623132093 -0.450546529
## Max.
## Intercept 18.2824
## access_pow 0.1402
## income 0.0009
## urban_rural 3.1964
## rural 3.6835
## investment 0.0046
## log(edu_high) 5.0283
## ************************Diagnostic information*************************
## Number of data points: 2477
## Effective number of parameters (2trace(S) - trace(S'S)): 705.3493
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1771.651
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 9202.661
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 8315.042
## BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 9643.689
## Residual sum of squares: 3322.183
## R-square value: 0.7610522
## Adjusted R-square value: 0.6658659
##
## ***********************************************************************
## Program stops at: 2026-03-25 15:32:28.657104
The following spatial visualizations provide insights into the local performance and parameter estimates of Model 2.
Statistical Significance of Regional Accessibility
This map highlights the areas where the poviat-level transport accessibility indicator is statistically significant (\(|t| > 1.96\)) in explaining the local unemployment rate.
Local R-Squared
The Local \(R^2\) map evaluates the goodness-of-fit for Model 2. Comparing this spatial distribution with Model 1 allows us to assess whether regional or local accessibility is a better predictor in specific parts of Poland.
Regional Accessibility Coefficients
This figure illustrates the spatial variation of the access_pow coefficient. It shows the varying magnitude and direction of the relationship between regional transport accessibility and local unemployment.
The third model introduces interaction variables between the transport accessibility index and the type of municipality (rural or urban-rural, with urban as the baseline category).
# Setting the seed for reproducibility of results
set.seed(123)
# Model 3 specification - variant of model 1 with interaction variables (accessibility x type of gmina)
formulatyp <- unemployment ~ income + investment + edu_high + access*rural + access*urban_rural
# Finding optimal adaptive bandwidth
bw.poly.adapttyp <- bw.gwr(formulatyp, data = gmina_data_sp, adaptive = TRUE)
bw.poly.adapttyp
# Estimating the GWR model
gwr.model.adapt.typ <- gwr.basic(formulatyp, data = gmina_data_sp,
bw = bw.poly.adapttyp, kernel = "bisquare",
adaptive = TRUE)
# Displaying the summary of Model 3
gwr.model.adapt.typ
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2026-03-25 15:34:30.92119
## Call:
## gwr.basic(formula = formulatyp, data = gmina_data_sp, bw = bw.poly.adapttyp,
## kernel = "bisquare", adaptive = TRUE)
##
## Dependent (y) variable: unemployment
## Independent variables: income investment edu_high access rural urban_rural
## Number of data points: 2477
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1747 -1.5156 -0.3687 1.1885 12.5953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.30885075 0.50126884 14.581 < 0.0000000000000002 ***
## income 0.00012392 0.00002342 5.292 0.00000013157335 ***
## investment -0.00023668 0.00001991 -11.890 < 0.0000000000000002 ***
## edu_high -8.07448861 1.15484245 -6.992 0.00000000000348 ***
## access -0.01968158 0.00544004 -3.618 0.000303 ***
## rural -1.13117636 0.45753402 -2.472 0.013490 *
## urban_rural -1.16802992 0.46840449 -2.494 0.012709 *
## access:rural 0.00034583 0.00575910 0.060 0.952122
## access:urban_rural 0.00786190 0.00613908 1.281 0.200444
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 2.161 on 2468 degrees of freedom
## Multiple R-squared: 0.1711
## Adjusted R-squared: 0.1684
## F-statistic: 63.68 on 8 and 2468 DF, p-value: < 0.00000000000000022
## ***Extra Diagnostic information
## Residual sum of squares: 11524.57
## Sigma(hat): 2.157868
## AIC: 10857.64
## AICc: 10857.73
## BIC: 8516.94
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Adaptive bandwidth: 155 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -13.055651470 4.208773448 6.672402856 10.249684332
## income -0.000446642 -0.000035269 0.000057312 0.000155440
## investment -0.001974385 -0.000498653 -0.000208541 -0.000047293
## edu_high -38.648363614 -15.853434522 -9.093526914 -2.827677400
## access -0.221849608 -0.020274444 -0.003951915 0.009761134
## rural -21.527236520 -2.666913916 -0.679488672 0.634964621
## urban_rural -22.724820885 -2.323391997 -0.518733164 1.060192015
## access:rural -0.229179290 -0.023638306 -0.006819297 0.013308949
## access:urban_rural -0.209350598 -0.020901005 0.001611900 0.020822015
## Max.
## Intercept 29.6560
## income 0.0007
## investment 0.0008
## edu_high 17.6287
## access 0.2238
## rural 17.9252
## urban_rural 18.1472
## access:rural 0.1885
## access:urban_rural 0.2091
## ************************Diagnostic information*************************
## Number of data points: 2477
## Effective number of parameters (2trace(S) - trace(S'S)): 448.606
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2028.394
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 9657.408
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 9203.522
## BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 9054.178
## Residual sum of squares: 5190.713
## R-square value: 0.6266584
## Adjusted R-square value: 0.5440482
##
## ***********************************************************************
## Program stops at: 2026-03-25 15:34:33.21276
This specification aimed to test whether the degree of urbanization significantly alters the strength or direction of the relationship between accessibility and unemployment. However, as discussed in the main text, these interaction variables proved to be statistically insignificant in both the global (OLS) and local (GWR) models.
To ensure the methodological rigor of the spatial analysis, a series of diagnostic tests proposed by Leung et al. (2000) were conducted.
# Note: eval=FALSE is set to prevent long execution times during document rendering
# the full results were calculated separetely
# 1. Bandwidth selection and model fitting using the 'spgwr' package
# Model 1: Individual gmina accessibility
bw.a.spgwr <- gwr.sel(formula, data=gmina_data_sp, adapt=TRUE, RMSE=TRUE)
gwr.model.test <- gwr(formula, data = gmina_data_sp, adapt = bw.a.spgwr, hatmatrix = TRUE)
# Model 2: Average poviat accessibility
bw.a.spgwr.pow <- gwr.sel(formulapow, data=gmina_data_sp, adapt=TRUE, RMSE=TRUE)
gwr.model.testpow <- gwr(formulapow, data = gmina_data_sp, adapt = bw.a.spgwr.pow, hatmatrix = TRUE)
# 2. Conducting F1 and F2 tests (GWR vs OLS goodness-of-fit)
LMZ.F1GWR.test(gwr.model.test)
LMZ.F2GWR.test(gwr.model.test)
LMZ.F1GWR.test(gwr.model.testpow)
LMZ.F2GWR.test(gwr.model.testpow)
# 3. Conducting F3 test (Spatial non-stationarity of coefficients)
LMZ.F3GWR.test(gwr.model.test)
LMZ.F3GWR.test(gwr.model.testpow)
# 4. Moran's I test for spatial autocorrelation of GWR residuals
# Creating a spatial weights matrix based on contiguous neighbors
cont.nb <- poly2nb(st_make_valid(gmina))
cont.listw <- nb2listw(cont.nb, style="W")
# Running the Moran's I test on the residuals
spgwr::gwr.morantest(gwr.model.test, cont.listw)
The key findings from these tests are as follows:
This appendix has detailed the methodological workflow, spatial data preparation, and empirical estimation of the Geographically Weighted Regression (GWR) models used in this thesis.
The spatial visualizations and diagnostic tests presented above collectively demonstrate that the relationship between public transport accessibility and the unemployment rate in Poland is highly non-stationary. The GWR approach proved to be vastly superior to traditional global models (OLS), effectively capturing the local variations and eliminating spatial autocorrelation in the residuals.