Welcome to the RPubs companion for the article Urban greening and support for environmental parties: Evidence from Germany by Eliza HaĆatek and Katarzyna Kopczewska (2026).
This document provides the main R code used to reproduce the empirical results presented in the article. The analysis is based on a constituency-level dataset constructed from multiple sources, including land-use indicators, electoral results, and environmental funding data for Germany.
The analytical framework of the study is presented below. It distinguishes three key dimensions shaping urban greening outcomes: political demand, financial capacity, and structural context. The framework highlights both direct and indirect pathways through which these factors interact.
In particular, the framework emphasizes a novel politicalâfinancial pathway, where electoral support for environmental parties may influence greening outcomes indirectly through increased access to environmental funding.
To reproduce the empirical results, please download the replication package here and place all files in your working directory.
All file paths in this document are defined relative to the project folder. It is therefore recommended to open the R Markdown file within an RStudio Project and to preserve the folder structure provided in the replication package.
The replication package includes the following input datasets:
projects.csv â project-level dataset with harmonised
postal codes and grant informationplz_unique.rds â postal-code geometriesLUCAS_DE.csv â land-use data (LUCAS, Germany only) used
to construct greening indicatorsgerman_2017_constituency_results.csv â electoral data
(Green Party vote share)Geometrie_Wahlkreise_19DBT_VG250.shp (and associated
files: .dbf, .shx, etc.) â constituency
boundaries for GermanyPlease ensure that all shapefile components are stored in the same folder.
The shapefile is processed using the sf package and
merged with the analytical dataset via constituency identifiers
(WKR_NR).
The code provided in this document reproduces the full data preparation pipeline starting from the input datasets listed above.
The objects projects.csv and
plz_unique.rds are pre-processed to reduce computation time
and ensure consistency of spatial matching.
Please download all files, keep them in a single directory (without nested subfolders), and set this directory as your working directory before running the code.
This section documents how the final analytical dataset was constructed from raw spatial, electoral, land-use, and project-level data. The pipeline includes data cleaning, harmonisation of postal codes, spatial matching of projects to constituencies, aggregation of land-use transitions, and merging of electoral indicators.
# Constituency geometries
const <- st_read("Geometrie_Wahlkreise_19DBT_VG250.shp", quiet = TRUE)
# Cached project data (cleaned)
projects_all <- readr::read_csv("projects.csv", show_col_types = FALSE)
# Postal-code geometries
plz_unique <- readRDS("plz_unique.rds")
# Land-use data (LUCAS)
landtake_raw <- read.csv("LUCAS_DE.csv", header = TRUE, sep = ",", dec = ".")
# Election results
election <- readr::read_csv("german_2017_constituency_results.csv", show_col_types = FALSE)# Ensure valid geometries
const <- st_make_valid(const)
plz_unique <- st_make_valid(plz_unique)
# Ensure numeric variables
projects_all <- projects_all %>%
mutate(
Grant_numeric = as.numeric(Grant_numeric),
StartYear = as.numeric(StartYear),
EndYear = as.numeric(EndYear)
)
# Convert LUCAS points to sf
landtake <- st_as_sf(
landtake_raw,
coords = c("X_WGS84", "Y_WGS84"),
crs = 4326
) %>%
st_transform(25832)projects_sf <- projects_all %>%
left_join(plz_unique, by = c("Postal_clean_final" = "plz_code5")) %>%
st_as_sf() %>%
st_make_valid() %>%
mutate(
project_id = row_number(),
plz_area = st_area(geometry)
)
assignments <- st_intersection(
projects_sf %>% dplyr::select(project_id, plz_area),
const %>% dplyr::select(WKR_NR, WKR_NAME)
) %>%
mutate(
overlap_area = st_area(geometry),
overlap_share = as.numeric(overlap_area / plz_area)
)%>%
st_drop_geometry() %>%
group_by(project_id) %>%
slice_max(overlap_share, n = 1, with_ties = FALSE) %>%
ungroup()
projects_assigned <- projects_sf %>%
left_join(assignments, by = "project_id")
project_counts <- projects_assigned %>%
st_drop_geometry() %>%
group_by(WKR_NR) %>%
summarise(
num_projects = n(),
sum_grant = sum(Grant_numeric, na.rm = TRUE),
.groups = "drop"
)landtake_summary <- landtake_with_const %>%
group_by(WKR_NR) %>%
summarise(
num_greened_cells = sum(change_from_6 == 1, na.rm = TRUE),
total_valid_cells = sum(!is.na(change_from_6)),
sum_TOT_P_2018 = sum(TOT_P_2018, na.rm = TRUE),
sum_TOT_P_2006 = sum(TOT_P_2006, na.rm = TRUE),
mean_DIST_COAST = mean(DIST_COAST, na.rm = TRUE),
mean_DIST_BORD = mean(DIST_BORD, na.rm = TRUE),
mean_ELEV = mean(ELEV, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
fraction_greened = num_greened_cells / total_valid_cells,
percent_greened = 100 * fraction_greened
) %>%
rename(geometry_land_take = geometry)inhab_metrics <- landtake_with_const %>%
mutate(
inhab_any = as.integer(
replace_na(TOT_P_2006, 0) > 0 | replace_na(TOT_P_2018, 0) > 0
)
) %>%
group_by(WKR_NR) %>%
summarise(
num_greened_cells_pop = sum((change_from_6 == 1) & (inhab_any == 1), na.rm = TRUE),
total_inhab_cells = sum(inhab_any == 1, na.rm = TRUE),
perc_greened_inhab = ifelse(
total_inhab_cells > 0,
100 * num_greened_cells_pop / total_inhab_cells,
NA_real_
),
sum_TOT_P_2006_pop = sum(
if_else(inhab_any == 1, replace_na(TOT_P_2006, 0), 0),
na.rm = TRUE
),
sum_TOT_P_2018_pop = sum(
if_else(inhab_any == 1, replace_na(TOT_P_2018, 0), 0),
na.rm = TRUE
),
.groups = "drop"
) %>%
mutate(
popul.rel.change_pop = ifelse(
sum_TOT_P_2006_pop > 0,
sum_TOT_P_2018_pop / sum_TOT_P_2006_pop,
NA_real_
)
)
thr_pop <- median(inhab_metrics$perc_greened_inhab, na.rm = TRUE)
inhab_metrics <- inhab_metrics %>%
mutate(
greened_binary_pop = as.integer(perc_greened_inhab >= thr_pop)
)baseline_overall <- landtake_with_const %>%
mutate(
is_artificial_2006 = as.integer(STR05 == 6),
is_green_2006 = as.integer(STR05 %in% c(3, 4, 5)),
is_artificial_2018 = as.integer(STR18a == 6),
is_green_2018 = as.integer(STR18a %in% c(3, 4, 5))
) %>%
group_by(WKR_NR) %>%
summarise(
pct_artificial_2006 = 100 * mean(is_artificial_2006, na.rm = TRUE),
pct_green_2006 = 100 * mean(is_green_2006, na.rm = TRUE),
pct_artificial_2018 = 100 * mean(is_artificial_2018, na.rm = TRUE),
pct_green_2018 = 100 * mean(is_green_2018, na.rm = TRUE),
.groups = "drop"
)
baseline_inhab <- landtake_with_const %>%
mutate(
inhab_any = as.integer(
replace_na(TOT_P_2006, 0) > 0 | replace_na(TOT_P_2018, 0) > 0
),
is_artificial_2006 = as.integer(STR05 == 6),
is_green_2006 = as.integer(STR05 %in% c(3, 4, 5)),
is_artificial_2018 = as.integer(STR18a == 6),
is_green_2018 = as.integer(STR18a %in% c(3, 4, 5))
) %>%
filter(inhab_any == 1) %>%
group_by(WKR_NR) %>%
summarise(
pct_artificial_2006_pop = 100 * mean(is_artificial_2006, na.rm = TRUE),
pct_green_2006_pop = 100 * mean(is_green_2006, na.rm = TRUE),
pct_artificial_2018_pop = 100 * mean(is_artificial_2018, na.rm = TRUE),
pct_green_2018_pop = 100 * mean(is_green_2018, na.rm = TRUE),
.groups = "drop"
)project_years <- projects_assigned %>%
mutate(
StartYear_num = as.numeric(StartYear),
EndYear_num = as.numeric(EndYear)
) %>%
group_by(WKR_NR) %>%
summarise(
first_project_year = suppressWarnings(min(StartYear_num, na.rm = TRUE)),
last_project_year = suppressWarnings(max(EndYear_num, na.rm = TRUE)),
mean_project_year = suppressWarnings(mean(StartYear_num, na.rm = TRUE)),
.groups = "drop"
)cons_summary <- landtake_summary %>%
left_join(st_drop_geometry(inhab_metrics), by = "WKR_NR") %>%
left_join(st_drop_geometry(baseline_overall), by = "WKR_NR") %>%
left_join(st_drop_geometry(baseline_inhab), by = "WKR_NR") %>%
left_join(st_drop_geometry(project_years), by = "WKR_NR")
thr_uncond <- median(cons_summary$percent_greened, na.rm = TRUE)
cons_summary <- cons_summary %>%
mutate(
greened_binary = as.integer(percent_greened >= thr_uncond)
)
rm(thr_pop, thr_uncond)const_data <- const %>%
rename(geometry_const = geometry) %>% # keep constituency geometry
left_join(election_processed, by = "WKR_NR") %>% # add Green Party vote share
left_join(st_drop_geometry(cons_summary), by = "WKR_NR") %>% # add greening and structural indicators
left_join(st_drop_geometry(project_counts), by = "WKR_NR") %>% # add project counts and funding
mutate(
green_vote_share_dec = green_vote_share / 100, # convert vote share to proportion
sum_TOT_P_2006_K = sum_TOT_P_2006 / 1000, # population in thousands
sum_TOT_P_2018_K = sum_TOT_P_2018 / 1000,
sum_grant_K = sum_grant / 1000, # grants in thousands of euro
mean_DIST_COAST_K = mean_DIST_COAST / 1000, # distance rescaled
mean_DIST_BORD_K = mean_DIST_BORD / 1000,
popul.change.06.18 = sum_TOT_P_2018 - sum_TOT_P_2006,
popul.rel.change = ifelse(
sum_TOT_P_2006 > 0,
sum_TOT_P_2018 / sum_TOT_P_2006,
NA_real_
),
grant_percap = ifelse(
sum_TOT_P_2018 > 0,
sum_grant / sum_TOT_P_2018,
NA_real_
),
proj_percap = ifelse(
sum_TOT_P_2018 > 0,
num_projects / sum_TOT_P_2018,
NA_real_
),
funding_index = as.numeric(scale(grant_percap)) + as.numeric(scale(proj_percap))
)const_data <- const_data %>%
mutate(
ELEV_class = case_when(
mean_ELEV < 0 ~ "elev_<0",
mean_ELEV >= 0 & mean_ELEV < 250 ~ "elev_0-250",
mean_ELEV >= 250 & mean_ELEV < 500 ~ "elev_250-500",
mean_ELEV >= 500 & mean_ELEV < 1000 ~ "elev_500-1000",
mean_ELEV >= 1000 & mean_ELEV < 2000 ~ "elev_1000-2000",
mean_ELEV >= 2000 ~ "elev_>2000",
TRUE ~ NA_character_
),
ELEV_class = factor(
ELEV_class,
levels = c(
"elev_<0", "elev_0-250", "elev_250-500",
"elev_500-1000", "elev_1000-2000", "elev_>2000"
)
)
)
# Replace only project-related missing (NA) values with zero
const_data <- const_data %>%
mutate(
across(
c(
green_vote_share_dec,
num_projects,
sum_grant,
sum_grant_K,
grant_percap,
proj_percap,
funding_index
),
~ ifelse(is.na(.), 0, .)
)
)
dat <- st_drop_geometry(const_data)dat <- dat %>%
# POLITICAL VARIABLES
mutate(
green_vote_share_dec = ifelse(
!is.na(green_vote_share_dec),
green_vote_share_dec,
green_vote_share / 100
)
) %>%
# FINANCIAL VARIABLES
mutate(
# Log transformation of total grants (avoid log(0))
log_grantsK = log(pmax(sum_grant_K, 0) + 1),
# Ensure finite values only
grant_percap = ifelse(is.finite(grant_percap), grant_percap, NA_real_),
proj_percap = ifelse(is.finite(proj_percap), proj_percap, NA_real_),
# Log projects per capita (small offset to avoid -Inf)
log_proj_percap = log(proj_percap + 1e-6)
) %>%
# CONTROL VARIABLES (only those requiring transformation)
mutate(
# Elevation as categorical factor
ELEV_class = as.factor(ELEV_class)
)| Variable | Level | Description |
|---|---|---|
| Land use class (STR05) | 1 | Arable land |
| 2 | Permanent crops | |
| 3 | Grass | |
| 4 | Wooded and shrub areas | |
| 5 | Bare land / low vegetation | |
| 6 | Artificial land | |
| 7 | Water | |
| Land use class (STR18a) | 1 | Arable land |
| 2 | Permanent crops | |
| 3 | Grass | |
| 4 | Wooded areas and shrubs (merged) | |
| 5 | Bare surface / low vegetation | |
| 6 | Artificial / sealed areas | |
| 7 | Inland and coastal waters | |
| Land-use change 2005-2018 | Change | STR05 â STR18a |
| No change | STR05 = STR18a | |
| Total | All categories combined | |
| Urban greening | Binary | 1 if artificial â non-artificial â„ median; 0 otherwise |
| Percent greened | Continuous | Fraction of artificial â other land-use transitions |
| Elevation class | Categorical | <0; 0-250; 250-500; 500-1000; 1000-2000; >2000 m ASL |
| % artificial (2006, inhab.) | Continuous | Share of artificial land in 2006 (inhabited) |
| % green (2006, inhab.) | Continuous | Share of green land in 2006 (inhabited) |
| Green vote share | Continuous | Green Party vote share (0-1) |
| Grant per capita | Continuous | DBU funding per capita (âŹ) |
| Log projects per capita | Continuous | Log of DBU projects per capita |
| Population change | Continuous | Population ratio (2018 / 2006) |
| Mean distance to border | Continuous | Average distance to national border (km) |
| mean | sd | min | max | n | |
|---|---|---|---|---|---|
| greened_binary_pop | 0.505 | 0.501 | 0.000 | 1.000 | 299 |
| perc_greened_inhab | 10.259 | 10.424 | 1.572 | 68.421 | 299 |
| green_vote_share_dec | 0.088 | 0.039 | 0.022 | 0.212 | 299 |
| grant_percap | 4.883 | 5.831 | 0.000 | 46.500 | 299 |
| log_proj_percap | -10.763 | 1.052 | -13.816 | -7.592 | 299 |
| log_grantsK | 6.479 | 1.640 | 0.000 | 9.453 | 299 |
| popul.rel.change | 1.006 | 0.084 | 0.786 | 1.569 | 299 |
| pct_artificial_2006_pop | 21.057 | 20.616 | 3.020 | 93.333 | 299 |
| pct_green_2006_pop | 44.060 | 18.386 | 3.846 | 93.130 | 299 |
| mean_DIST_BORD_K | 0.082 | 0.053 | 0.004 | 0.236 | 299 |
| greened_binary_pop | perc_greened_inhab | green_vote_share_dec | grant_percap | log_proj_percap | log_grantsK | popul.rel.change | pct_artificial_2006_pop | pct_green_2006_pop | mean_DIST_BORD_K | |
|---|---|---|---|---|---|---|---|---|---|---|
| greened_binary_pop | 1.00 | 0.60 | 0.30 | 0.12 | 0.09 | 0.07 | 0.27 | 0.63 | -0.41 | -0.03 |
| perc_greened_inhab | 0.60 | 1.00 | 0.33 | 0.13 | 0.13 | 0.11 | 0.25 | 0.90 | -0.47 | -0.03 |
| green_vote_share_dec | 0.30 | 0.33 | 1.00 | 0.31 | 0.31 | 0.31 | 0.68 | 0.40 | -0.14 | 0.01 |
| grant_percap | 0.12 | 0.13 | 0.31 | 1.00 | 0.75 | 0.64 | 0.21 | 0.22 | -0.13 | 0.07 |
| log_proj_percap | 0.09 | 0.13 | 0.31 | 0.75 | 1.00 | 0.90 | 0.21 | 0.18 | -0.14 | 0.10 |
| log_grantsK | 0.07 | 0.11 | 0.31 | 0.64 | 0.90 | 1.00 | 0.23 | 0.14 | -0.08 | 0.06 |
| popul.rel.change | 0.27 | 0.25 | 0.68 | 0.21 | 0.21 | 0.23 | 1.00 | 0.38 | -0.21 | -0.08 |
| pct_artificial_2006_pop | 0.63 | 0.90 | 0.40 | 0.22 | 0.18 | 0.14 | 0.38 | 1.00 | -0.54 | -0.06 |
| pct_green_2006_pop | -0.41 | -0.47 | -0.14 | -0.13 | -0.14 | -0.08 | -0.21 | -0.54 | 1.00 | -0.17 |
| mean_DIST_BORD_K | -0.03 | -0.03 | 0.01 | 0.07 | 0.10 | 0.06 | -0.08 | -0.06 | -0.17 | 1.00 |
ggplot(plot_data_inhab, aes(x = x, y = y)) +
geom_point(aes(color = role), alpha = 0.5, size = 1.8) +
geom_smooth(
method = "lm",
se = FALSE,
aes(color = role),
linewidth = 0.5
) +
facet_wrap(
~ var,
scales = "free_x",
labeller = as_labeller(var_labels_inhab)
) +
scale_color_manual(
values = role_colors,
name = "Variable type"
) +
labs(
x = NULL,
y = "% greened area (inhabited cells)"
) +
theme_bw() +
theme(
strip.background = element_rect(fill = "grey95"),
axis.text.x = element_text(angle = 30, hjust = 1),
legend.position = "bottom"
)ggplot(const_plot) +
geom_sf(aes(fill = green_vote_share)) +
scale_fill_gradientn(
colours = c("grey85", "#A8E6A3", "forestgreen", "darkgreen"),
name = "Vote share (%)"
) +
theme_minimal() +
labs(
title = "Percentage of votes for the Green Party (second votes)"
)ggplot(const_plot) +
geom_sf(aes(fill = as.factor(greened_binary_pop))) +
scale_fill_manual(
values = c("lightgray", "forestgreen"),
labels = c("Below median", "Above median"),
name = "Urban greening"
) +
theme_minimal() +
labs(
title = "Fraction of areas showing urban greening"
)ggplot(const_plot) +
geom_sf(aes(fill = grant_per_cap_cat)) +
scale_fill_manual(
values = c("#edf8e9", "#bae4b3", "#74c476", "#31a354", "#006d2c")
) +
theme_minimal() +
labs(
title = "Green-related grants per capita",
fill = "Grants per capita (âŹ)"
)ggplot(const_plot) +
geom_sf(aes(fill = project_cat)) +
scale_fill_manual(
values = c("#e5f5e0", "#a1d99b", "#74c476", "#31a354", "#006d2c")
) +
theme_minimal() +
labs(
title = "Number of green-related projects",
fill = "Project count"
)The empirical strategy combines three complementary approaches to investigate the relationship between political preferences and urban greening.
The model decomposes the total effect of political support into a direct
component and an indirect component operating through financial
capacity. This allows us to assess whether funding acts as a mechanism
translating political preferences into environmental outcomes.
We begin with a parametric approach to estimate the baseline relationship between political support and urban greening.
##
## Call:
## glm(formula = greened_binary_pop ~ green_vote_share_dec + grant_percap +
## log_proj_percap + popul.rel.change + pct_artificial_2006_pop +
## pct_green_2006_pop + I(green_vote_share_dec * pct_artificial_2006_pop) +
## mean_DIST_BORD_K + ELEV_class, family = binomial(link = "probit"),
## data = dat)
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -6.875571 3.481814 -1.975
## green_vote_share_dec 17.002029 8.645218 1.967
## grant_percap -0.025647 0.035288 -0.727
## log_proj_percap 0.112229 0.182040 0.617
## popul.rel.change 3.418891 2.999100 1.140
## pct_artificial_2006_pop 0.455643 0.075873 6.005
## pct_green_2006_pop -0.010111 0.008125 -1.244
## I(green_vote_share_dec * pct_artificial_2006_pop) -1.991461 0.549469 -3.624
## mean_DIST_BORD_K 2.273279 2.315583 0.982
## ELEV_classelev_250-500 0.259277 0.293854 0.882
## ELEV_classelev_500-1000 0.386481 0.434666 0.889
## Pr(>|z|)
## (Intercept) 0.04830 *
## green_vote_share_dec 0.04922 *
## grant_percap 0.46735
## log_proj_percap 0.53756
## popul.rel.change 0.25430
## pct_artificial_2006_pop 1.91e-09 ***
## pct_green_2006_pop 0.21335
## I(green_vote_share_dec * pct_artificial_2006_pop) 0.00029 ***
## mean_DIST_BORD_K 0.32623
## ELEV_classelev_250-500 0.37760
## ELEV_classelev_500-1000 0.37393
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 414.47 on 298 degrees of freedom
## Residual deviance: 145.46 on 288 degrees of freedom
## AIC: 167.46
##
## Number of Fisher Scoring iterations: 11
##
## Call:
## glm(formula = greened_binary_pop ~ green_vote_share_dec + grant_percap +
## popul.rel.change + pct_artificial_2006_pop + I(green_vote_share_dec *
## pct_artificial_2006_pop) + mean_DIST_BORD_K + ELEV_class,
## family = binomial(link = "probit"), data = dat)
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) -8.676029 2.858275 -3.035
## green_vote_share_dec 17.204170 8.752413 1.966
## grant_percap -0.008384 0.026672 -0.314
## popul.rel.change 3.446442 2.986048 1.154
## pct_artificial_2006_pop 0.451803 0.075168 6.011
## I(green_vote_share_dec * pct_artificial_2006_pop) -1.956454 0.553040 -3.538
## mean_DIST_BORD_K 3.007147 2.237581 1.344
## ELEV_classelev_250-500 0.083448 0.260160 0.321
## ELEV_classelev_500-1000 0.197624 0.400364 0.494
## Pr(>|z|)
## (Intercept) 0.002402 **
## green_vote_share_dec 0.049339 *
## grant_percap 0.753250
## popul.rel.change 0.248426
## pct_artificial_2006_pop 1.85e-09 ***
## I(green_vote_share_dec * pct_artificial_2006_pop) 0.000404 ***
## mean_DIST_BORD_K 0.178972
## ELEV_classelev_250-500 0.748396
## ELEV_classelev_500-1000 0.621581
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 414.47 on 298 degrees of freedom
## Residual deviance: 147.61 on 290 degrees of freedom
## AIC: 165.61
##
## Number of Fisher Scoring iterations: 11
| model | Moran_I | p_value |
|---|---|---|
| M1 | 0.039 | 0.117 |
| M2 | 0.030 | 0.177 |
To complement the regression analysis, we apply machine learning methods that allow for flexible functional forms and interactions between predictors.
| model | AUC | Accuracy | Kappa |
|---|---|---|---|
| Random Forest | 0.966 | 0.898 | 0.796 |
| XGBoost | 0.963 | 0.881 | 0.762 |
ggplot(rf_imp_df, aes(x = reorder(Variable, Importance), y = Importance)) +
geom_col() +
coord_flip() +
theme_minimal() +
labs(
x = "Predictor",
y = "Importance"
)This section tests whether environmental funding mediates the relationship between Green Party support and urban greening in inhabited areas. The preferred specification includes the baseline share of artificial land in 2006 within inhabited areas as a structural control.
| Mediator | ACME | ACME 95% CI | ADE | ADE 95% CI | Total Effect |
|---|---|---|---|---|---|
| Grants per capita | -0.0125 | [-0.0449; 0.0143] | -0.0785 | [-0.1831; 0.0258] | -0.0889 |
| Projects per capita | -0.0054 | [-0.0377; 0.0258] | -0.0875 | [-0.1973; 0.0179] | -0.0921 |