Introduction

Welcome to the RPubs companion for the article Urban greening and support for environmental parties: Evidence from Germany by Eliza HaƂatek and Katarzyna Kopczewska (2026).

This document provides the main R code used to reproduce the empirical results presented in the article. The analysis is based on a constituency-level dataset constructed from multiple sources, including land-use indicators, electoral results, and environmental funding data for Germany.

The analytical framework of the study is presented below. It distinguishes three key dimensions shaping urban greening outcomes: political demand, financial capacity, and structural context. The framework highlights both direct and indirect pathways through which these factors interact.

In particular, the framework emphasizes a novel political–financial pathway, where electoral support for environmental parties may influence greening outcomes indirectly through increased access to environmental funding.

Data access and replication

To reproduce the empirical results, please download the replication package here and place all files in your working directory.

All file paths in this document are defined relative to the project folder. It is therefore recommended to open the R Markdown file within an RStudio Project and to preserve the folder structure provided in the replication package.

Replication package contents

The replication package includes the following input datasets:

  • projects.csv → project-level dataset with harmonised postal codes and grant information
  • plz_unique.rds → postal-code geometries
  • LUCAS_DE.csv → land-use data (LUCAS, Germany only) used to construct greening indicators
  • german_2017_constituency_results.csv → electoral data (Green Party vote share)
  • Geometrie_Wahlkreise_19DBT_VG250.shp (and associated files: .dbf, .shx, etc.) → constituency boundaries for Germany

Please ensure that all shapefile components are stored in the same folder.

The shapefile is processed using the sf package and merged with the analytical dataset via constituency identifiers (WKR_NR).

Replication notes

  • The code provided in this document reproduces the full data preparation pipeline starting from the input datasets listed above.

  • The objects projects.csv and plz_unique.rds are pre-processed to reduce computation time and ensure consistency of spatial matching.

Please download all files, keep them in a single directory (without nested subfolders), and set this directory as your working directory before running the code.

Data preparation

This section documents how the final analytical dataset was constructed from raw spatial, electoral, land-use, and project-level data. The pipeline includes data cleaning, harmonisation of postal codes, spatial matching of projects to constituencies, aggregation of land-use transitions, and merging of electoral indicators.

Universal packages

library(sf)
library(dplyr)
library(readr)
library(readxl)
library(tidyr)
library(stringr)
library(writexl)

Reading data

# Constituency geometries
const <- st_read("Geometrie_Wahlkreise_19DBT_VG250.shp", quiet = TRUE)

# Cached project data (cleaned)
projects_all <- readr::read_csv("projects.csv", show_col_types = FALSE)

# Postal-code geometries
plz_unique <- readRDS("plz_unique.rds")

# Land-use data (LUCAS)
landtake_raw <- read.csv("LUCAS_DE.csv", header = TRUE, sep = ",", dec = ".")

# Election results
election <- readr::read_csv("german_2017_constituency_results.csv", show_col_types = FALSE)

Preparation

# Ensure valid geometries
const <- st_make_valid(const)
plz_unique <- st_make_valid(plz_unique)

# Ensure numeric variables
projects_all <- projects_all %>%
  mutate(
    Grant_numeric = as.numeric(Grant_numeric),
    StartYear = as.numeric(StartYear),
    EndYear = as.numeric(EndYear)
  )

# Convert LUCAS points to sf
landtake <- st_as_sf(
  landtake_raw,
  coords = c("X_WGS84", "Y_WGS84"),
  crs = 4326
) %>%
  st_transform(25832)

Assign projects to constituencies

projects_sf <- projects_all %>%
  left_join(plz_unique, by = c("Postal_clean_final" = "plz_code5")) %>%
  st_as_sf() %>%
  st_make_valid() %>%
  mutate(
    project_id = row_number(),
    plz_area = st_area(geometry)
  )

assignments <- st_intersection(
  projects_sf %>% dplyr::select(project_id, plz_area),
  const %>% dplyr::select(WKR_NR, WKR_NAME)
) %>%
  mutate(
    overlap_area = st_area(geometry),
    overlap_share = as.numeric(overlap_area / plz_area)
  )%>%
  st_drop_geometry() %>%
  group_by(project_id) %>%
  slice_max(overlap_share, n = 1, with_ties = FALSE) %>%
  ungroup()

projects_assigned <- projects_sf %>%
  left_join(assignments, by = "project_id")

project_counts <- projects_assigned %>%
  st_drop_geometry() %>%
  group_by(WKR_NR) %>%
  summarise(
    num_projects = n(),
    sum_grant = sum(Grant_numeric, na.rm = TRUE),
    .groups = "drop"
  )

Construct land-use indicators

landtake <- landtake %>%
  mutate(
    change_to_6   = as.integer(change %in% c("1to6","2to6","3to6","4to6","5to6","7to6")),
    change_from_6 = as.integer(change %in% c("6to1","6to2","6to3","6to4","6to5","6to7"))
  )

landtake_with_const <- st_join(landtake, const)

Aggregate greening measures

landtake_summary <- landtake_with_const %>%
  group_by(WKR_NR) %>%
  summarise(
    num_greened_cells = sum(change_from_6 == 1, na.rm = TRUE),
    total_valid_cells = sum(!is.na(change_from_6)),
    sum_TOT_P_2018    = sum(TOT_P_2018, na.rm = TRUE),
    sum_TOT_P_2006    = sum(TOT_P_2006, na.rm = TRUE),
    mean_DIST_COAST   = mean(DIST_COAST, na.rm = TRUE),
    mean_DIST_BORD    = mean(DIST_BORD, na.rm = TRUE),
    mean_ELEV         = mean(ELEV, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    fraction_greened = num_greened_cells / total_valid_cells,
    percent_greened  = 100 * fraction_greened
  ) %>%
  rename(geometry_land_take = geometry)

Inhabited-area greening indicators

inhab_metrics <- landtake_with_const %>%
  mutate(
    inhab_any = as.integer(
      replace_na(TOT_P_2006, 0) > 0 | replace_na(TOT_P_2018, 0) > 0
    )
  ) %>%
  group_by(WKR_NR) %>%
  summarise(
    num_greened_cells_pop = sum((change_from_6 == 1) & (inhab_any == 1), na.rm = TRUE),
    total_inhab_cells     = sum(inhab_any == 1, na.rm = TRUE),
    perc_greened_inhab    = ifelse(
      total_inhab_cells > 0,
      100 * num_greened_cells_pop / total_inhab_cells,
      NA_real_
    ),
    sum_TOT_P_2006_pop = sum(
      if_else(inhab_any == 1, replace_na(TOT_P_2006, 0), 0),
      na.rm = TRUE
    ),
    sum_TOT_P_2018_pop = sum(
      if_else(inhab_any == 1, replace_na(TOT_P_2018, 0), 0),
      na.rm = TRUE
    ),
    .groups = "drop"
  ) %>%
  mutate(
    popul.rel.change_pop = ifelse(
      sum_TOT_P_2006_pop > 0,
      sum_TOT_P_2018_pop / sum_TOT_P_2006_pop,
      NA_real_
    )
  )

thr_pop <- median(inhab_metrics$perc_greened_inhab, na.rm = TRUE)

inhab_metrics <- inhab_metrics %>%
  mutate(
    greened_binary_pop = as.integer(perc_greened_inhab >= thr_pop)
  )

Baseline land-use structure

baseline_overall <- landtake_with_const %>%
  mutate(
    is_artificial_2006 = as.integer(STR05 == 6),
    is_green_2006      = as.integer(STR05 %in% c(3, 4, 5)),
    is_artificial_2018 = as.integer(STR18a == 6),
    is_green_2018      = as.integer(STR18a %in% c(3, 4, 5))
  ) %>%
  group_by(WKR_NR) %>%
  summarise(
    pct_artificial_2006 = 100 * mean(is_artificial_2006, na.rm = TRUE),
    pct_green_2006      = 100 * mean(is_green_2006, na.rm = TRUE),
    pct_artificial_2018 = 100 * mean(is_artificial_2018, na.rm = TRUE),
    pct_green_2018      = 100 * mean(is_green_2018, na.rm = TRUE),
    .groups = "drop"
  )

baseline_inhab <- landtake_with_const %>%
  mutate(
    inhab_any = as.integer(
      replace_na(TOT_P_2006, 0) > 0 | replace_na(TOT_P_2018, 0) > 0
    ),
    is_artificial_2006 = as.integer(STR05 == 6),
    is_green_2006      = as.integer(STR05 %in% c(3, 4, 5)),
    is_artificial_2018 = as.integer(STR18a == 6),
    is_green_2018      = as.integer(STR18a %in% c(3, 4, 5))
  ) %>%
  filter(inhab_any == 1) %>%
  group_by(WKR_NR) %>%
  summarise(
    pct_artificial_2006_pop = 100 * mean(is_artificial_2006, na.rm = TRUE),
    pct_green_2006_pop      = 100 * mean(is_green_2006, na.rm = TRUE),
    pct_artificial_2018_pop = 100 * mean(is_artificial_2018, na.rm = TRUE),
    pct_green_2018_pop      = 100 * mean(is_green_2018, na.rm = TRUE),
    .groups = "drop"
  )

Project timing variables

project_years <- projects_assigned %>%
  mutate(
    StartYear_num = as.numeric(StartYear),
    EndYear_num   = as.numeric(EndYear)
  ) %>%
  group_by(WKR_NR) %>%
  summarise(
    first_project_year = suppressWarnings(min(StartYear_num, na.rm = TRUE)),
    last_project_year  = suppressWarnings(max(EndYear_num, na.rm = TRUE)),
    mean_project_year  = suppressWarnings(mean(StartYear_num, na.rm = TRUE)),
    .groups = "drop"
  )

Consolidate constituency-level summaries

cons_summary <- landtake_summary %>%
  left_join(st_drop_geometry(inhab_metrics), by = "WKR_NR") %>%
  left_join(st_drop_geometry(baseline_overall), by = "WKR_NR") %>%
  left_join(st_drop_geometry(baseline_inhab), by = "WKR_NR") %>%
  left_join(st_drop_geometry(project_years), by = "WKR_NR")

thr_uncond <- median(cons_summary$percent_greened, na.rm = TRUE)

cons_summary <- cons_summary %>%
  mutate(
    greened_binary = as.integer(percent_greened >= thr_uncond)
  )

rm(thr_pop, thr_uncond)

Election data

election_processed <- election %>%
  mutate(WKR_NR = as.numeric(sub(":.*", "", Wahlkreis))) %>%
  filter(Specification %in% c("GRÜNE", "GRÜNE/B 90")) %>%
  select(WKR_NR, green_vote_share = `Second votes %`) %>%
  mutate(green_vote_share = as.numeric(green_vote_share)) %>%
  distinct()

Build final dataset

const_data <- const %>%
  rename(geometry_const = geometry) %>%   # keep constituency geometry
  left_join(election_processed, by = "WKR_NR") %>%   # add Green Party vote share
  left_join(st_drop_geometry(cons_summary), by = "WKR_NR") %>%   # add greening and structural indicators
  left_join(st_drop_geometry(project_counts), by = "WKR_NR") %>%   # add project counts and funding
  mutate(
    green_vote_share_dec = green_vote_share / 100,   # convert vote share to proportion
    sum_TOT_P_2006_K     = sum_TOT_P_2006 / 1000,    # population in thousands
    sum_TOT_P_2018_K     = sum_TOT_P_2018 / 1000,
    sum_grant_K          = sum_grant / 1000,         # grants in thousands of euro
    mean_DIST_COAST_K    = mean_DIST_COAST / 1000,   # distance rescaled
    mean_DIST_BORD_K     = mean_DIST_BORD / 1000,
    popul.change.06.18   = sum_TOT_P_2018 - sum_TOT_P_2006,
    popul.rel.change     = ifelse(
      sum_TOT_P_2006 > 0,
      sum_TOT_P_2018 / sum_TOT_P_2006,
      NA_real_
    ),
    grant_percap = ifelse(
      sum_TOT_P_2018 > 0,
      sum_grant / sum_TOT_P_2018,
      NA_real_
    ),
    proj_percap = ifelse(
      sum_TOT_P_2018 > 0,
      num_projects / sum_TOT_P_2018,
      NA_real_
    ),
    funding_index = as.numeric(scale(grant_percap)) + as.numeric(scale(proj_percap))
  )
const_data <- const_data %>%
  mutate(
    ELEV_class = case_when(
      mean_ELEV < 0                         ~ "elev_<0",
      mean_ELEV >= 0   & mean_ELEV < 250    ~ "elev_0-250",
      mean_ELEV >= 250 & mean_ELEV < 500    ~ "elev_250-500",
      mean_ELEV >= 500 & mean_ELEV < 1000   ~ "elev_500-1000",
      mean_ELEV >= 1000 & mean_ELEV < 2000  ~ "elev_1000-2000",
      mean_ELEV >= 2000                     ~ "elev_>2000",
      TRUE                                  ~ NA_character_
    ),
    ELEV_class = factor(
      ELEV_class,
      levels = c(
        "elev_<0", "elev_0-250", "elev_250-500",
        "elev_500-1000", "elev_1000-2000", "elev_>2000"
      )
    )
  )

# Replace only project-related missing (NA) values with zero
const_data <- const_data %>%
  mutate(
    across(
      c(
        green_vote_share_dec,
        num_projects,
        sum_grant,
        sum_grant_K,
        grant_percap,
        proj_percap,
        funding_index
      ),
      ~ ifelse(is.na(.), 0, .)
    )
  )

dat <- st_drop_geometry(const_data)

Setup packages

library(sf)
library(dplyr)
library(readr)
library(tidyr)
library(stringr)
library(ggplot2)
library(psych)
library(DescTools)
library(car)
library(spdep)
library(mediation)
library(caret)
library(vip)
library(ranger)
library(xgboost)
library(pROC)
library(shapviz)
library(broom)

Variable preparation

dat <- dat %>%
  # POLITICAL VARIABLES
  mutate(
    green_vote_share_dec = ifelse(
      !is.na(green_vote_share_dec),
      green_vote_share_dec,
      green_vote_share / 100
    )
  ) %>%
  # FINANCIAL VARIABLES
  mutate(
    # Log transformation of total grants (avoid log(0))
    log_grantsK = log(pmax(sum_grant_K, 0) + 1),

    # Ensure finite values only
    grant_percap = ifelse(is.finite(grant_percap), grant_percap, NA_real_),
    proj_percap  = ifelse(is.finite(proj_percap),  proj_percap,  NA_real_),

    # Log projects per capita (small offset to avoid -Inf)
    log_proj_percap = log(proj_percap + 1e-6)
  ) %>%
  # CONTROL VARIABLES (only those requiring transformation)
  mutate(
    # Elevation as categorical factor
    ELEV_class = as.factor(ELEV_class)
  )

Variable definitions

Variable Level Description
Land use class (STR05) 1 Arable land
2 Permanent crops
3 Grass
4 Wooded and shrub areas
5 Bare land / low vegetation
6 Artificial land
7 Water
Land use class (STR18a) 1 Arable land
2 Permanent crops
3 Grass
4 Wooded areas and shrubs (merged)
5 Bare surface / low vegetation
6 Artificial / sealed areas
7 Inland and coastal waters
Land-use change 2005-2018 Change STR05 ≠ STR18a
No change STR05 = STR18a
Total All categories combined
Urban greening Binary 1 if artificial → non-artificial ≄ median; 0 otherwise
Percent greened Continuous Fraction of artificial → other land-use transitions
Elevation class Categorical <0; 0-250; 250-500; 500-1000; 1000-2000; >2000 m ASL
% artificial (2006, inhab.) Continuous Share of artificial land in 2006 (inhabited)
% green (2006, inhab.) Continuous Share of green land in 2006 (inhabited)
Green vote share Continuous Green Party vote share (0-1)
Grant per capita Continuous DBU funding per capita (€)
Log projects per capita Continuous Log of DBU projects per capita
Population change Continuous Population ratio (2018 / 2006)
Mean distance to border Continuous Average distance to national border (km)

Descriptive statistics

Summary statistics

mean sd min max n
greened_binary_pop 0.505 0.501 0.000 1.000 299
perc_greened_inhab 10.259 10.424 1.572 68.421 299
green_vote_share_dec 0.088 0.039 0.022 0.212 299
grant_percap 4.883 5.831 0.000 46.500 299
log_proj_percap -10.763 1.052 -13.816 -7.592 299
log_grantsK 6.479 1.640 0.000 9.453 299
popul.rel.change 1.006 0.084 0.786 1.569 299
pct_artificial_2006_pop 21.057 20.616 3.020 93.333 299
pct_green_2006_pop 44.060 18.386 3.846 93.130 299
mean_DIST_BORD_K 0.082 0.053 0.004 0.236 299

Correlation matrix

greened_binary_pop perc_greened_inhab green_vote_share_dec grant_percap log_proj_percap log_grantsK popul.rel.change pct_artificial_2006_pop pct_green_2006_pop mean_DIST_BORD_K
greened_binary_pop 1.00 0.60 0.30 0.12 0.09 0.07 0.27 0.63 -0.41 -0.03
perc_greened_inhab 0.60 1.00 0.33 0.13 0.13 0.11 0.25 0.90 -0.47 -0.03
green_vote_share_dec 0.30 0.33 1.00 0.31 0.31 0.31 0.68 0.40 -0.14 0.01
grant_percap 0.12 0.13 0.31 1.00 0.75 0.64 0.21 0.22 -0.13 0.07
log_proj_percap 0.09 0.13 0.31 0.75 1.00 0.90 0.21 0.18 -0.14 0.10
log_grantsK 0.07 0.11 0.31 0.64 0.90 1.00 0.23 0.14 -0.08 0.06
popul.rel.change 0.27 0.25 0.68 0.21 0.21 0.23 1.00 0.38 -0.21 -0.08
pct_artificial_2006_pop 0.63 0.90 0.40 0.22 0.18 0.14 0.38 1.00 -0.54 -0.06
pct_green_2006_pop -0.41 -0.47 -0.14 -0.13 -0.14 -0.08 -0.21 -0.54 1.00 -0.17
mean_DIST_BORD_K -0.03 -0.03 0.01 0.07 0.10 0.06 -0.08 -0.06 -0.17 1.00

Selected plots

ggplot(plot_data_inhab, aes(x = x, y = y)) +
  geom_point(aes(color = role), alpha = 0.5, size = 1.8) +
  geom_smooth(
    method = "lm",
    se = FALSE,
    aes(color = role),
    linewidth = 0.5
  ) +
  facet_wrap(
    ~ var,
    scales = "free_x",
    labeller = as_labeller(var_labels_inhab)
  ) +
  scale_color_manual(
    values = role_colors,
    name = "Variable type"
  ) +
  labs(
    x = NULL,
    y = "% greened area (inhabited cells)"
  ) +
  theme_bw() +
  theme(
    strip.background = element_rect(fill = "grey95"),
    axis.text.x = element_text(angle = 30, hjust = 1),
    legend.position = "bottom"
  )

ggplot(const_plot) +
  geom_sf(aes(fill = green_vote_share)) +
  scale_fill_gradientn(
    colours = c("grey85", "#A8E6A3", "forestgreen", "darkgreen"),
    name = "Vote share (%)"
  ) +
  theme_minimal() +
  labs(
    title = "Percentage of votes for the Green Party (second votes)"
  )

ggplot(const_plot) +
  geom_sf(aes(fill = as.factor(greened_binary_pop))) +
  scale_fill_manual(
    values = c("lightgray", "forestgreen"),
    labels = c("Below median", "Above median"),
    name = "Urban greening"
  ) +
  theme_minimal() +
  labs(
    title = "Fraction of areas showing urban greening"
  )

ggplot(const_plot) +
  geom_sf(aes(fill = grant_per_cap_cat)) +
  scale_fill_manual(
    values = c("#edf8e9", "#bae4b3", "#74c476", "#31a354", "#006d2c")
  ) +
  theme_minimal() +
  labs(
    title = "Green-related grants per capita",
    fill = "Grants per capita (€)"
  )

ggplot(const_plot) +
  geom_sf(aes(fill = project_cat)) +
  scale_fill_manual(
    values = c("#e5f5e0", "#a1d99b", "#74c476", "#31a354", "#006d2c")
  ) +
  theme_minimal() +
  labs(
    title = "Number of green-related projects",
    fill = "Project count"
  )

The empirical strategy combines three complementary approaches to investigate the relationship between political preferences and urban greening.

The model decomposes the total effect of political support into a direct component and an indirect component operating through financial capacity. This allows us to assess whether funding acts as a mechanism translating political preferences into environmental outcomes.

Probit models

We begin with a parametric approach to estimate the baseline relationship between political support and urban greening.

Model comparison

summary(mod1_inhab)
## 
## Call:
## glm(formula = greened_binary_pop ~ green_vote_share_dec + grant_percap + 
##     log_proj_percap + popul.rel.change + pct_artificial_2006_pop + 
##     pct_green_2006_pop + I(green_vote_share_dec * pct_artificial_2006_pop) + 
##     mean_DIST_BORD_K + ELEV_class, family = binomial(link = "probit"), 
##     data = dat)
## 
## Coefficients:
##                                                    Estimate Std. Error z value
## (Intercept)                                       -6.875571   3.481814  -1.975
## green_vote_share_dec                              17.002029   8.645218   1.967
## grant_percap                                      -0.025647   0.035288  -0.727
## log_proj_percap                                    0.112229   0.182040   0.617
## popul.rel.change                                   3.418891   2.999100   1.140
## pct_artificial_2006_pop                            0.455643   0.075873   6.005
## pct_green_2006_pop                                -0.010111   0.008125  -1.244
## I(green_vote_share_dec * pct_artificial_2006_pop) -1.991461   0.549469  -3.624
## mean_DIST_BORD_K                                   2.273279   2.315583   0.982
## ELEV_classelev_250-500                             0.259277   0.293854   0.882
## ELEV_classelev_500-1000                            0.386481   0.434666   0.889
##                                                   Pr(>|z|)    
## (Intercept)                                        0.04830 *  
## green_vote_share_dec                               0.04922 *  
## grant_percap                                       0.46735    
## log_proj_percap                                    0.53756    
## popul.rel.change                                   0.25430    
## pct_artificial_2006_pop                           1.91e-09 ***
## pct_green_2006_pop                                 0.21335    
## I(green_vote_share_dec * pct_artificial_2006_pop)  0.00029 ***
## mean_DIST_BORD_K                                   0.32623    
## ELEV_classelev_250-500                             0.37760    
## ELEV_classelev_500-1000                            0.37393    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 414.47  on 298  degrees of freedom
## Residual deviance: 145.46  on 288  degrees of freedom
## AIC: 167.46
## 
## Number of Fisher Scoring iterations: 11
summary(mod2_inhab)
## 
## Call:
## glm(formula = greened_binary_pop ~ green_vote_share_dec + grant_percap + 
##     popul.rel.change + pct_artificial_2006_pop + I(green_vote_share_dec * 
##     pct_artificial_2006_pop) + mean_DIST_BORD_K + ELEV_class, 
##     family = binomial(link = "probit"), data = dat)
## 
## Coefficients:
##                                                    Estimate Std. Error z value
## (Intercept)                                       -8.676029   2.858275  -3.035
## green_vote_share_dec                              17.204170   8.752413   1.966
## grant_percap                                      -0.008384   0.026672  -0.314
## popul.rel.change                                   3.446442   2.986048   1.154
## pct_artificial_2006_pop                            0.451803   0.075168   6.011
## I(green_vote_share_dec * pct_artificial_2006_pop) -1.956454   0.553040  -3.538
## mean_DIST_BORD_K                                   3.007147   2.237581   1.344
## ELEV_classelev_250-500                             0.083448   0.260160   0.321
## ELEV_classelev_500-1000                            0.197624   0.400364   0.494
##                                                   Pr(>|z|)    
## (Intercept)                                       0.002402 ** 
## green_vote_share_dec                              0.049339 *  
## grant_percap                                      0.753250    
## popul.rel.change                                  0.248426    
## pct_artificial_2006_pop                           1.85e-09 ***
## I(green_vote_share_dec * pct_artificial_2006_pop) 0.000404 ***
## mean_DIST_BORD_K                                  0.178972    
## ELEV_classelev_250-500                            0.748396    
## ELEV_classelev_500-1000                           0.621581    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 414.47  on 298  degrees of freedom
## Residual deviance: 147.61  on 290  degrees of freedom
## AIC: 165.61
## 
## Number of Fisher Scoring iterations: 11

Spatial diagnostics

Moran’s I for residuals

model Moran_I p_value
M1 0.039 0.117
M2 0.030 0.177

Machine learning models

To complement the regression analysis, we apply machine learning methods that allow for flexible functional forms and interactions between predictors.

ML performance

model AUC Accuracy Kappa
Random Forest 0.966 0.898 0.796
XGBoost 0.963 0.881 0.762

Random Forest: variable importance

ggplot(rf_imp_df, aes(x = reorder(Variable, Importance), y = Importance)) +
  geom_col() +
  coord_flip() +
  theme_minimal() +
  labs(
    x = "Predictor",
    y = "Importance"
  )

XGBoost: variable importance

vip(xgb_model$finalModel, num_features = 10)

SHAP values (XGBoost)

sv_importance(sv, show_numbers = TRUE, max_display = 8)

sv_importance(sv, kind = "bee", show_numbers = TRUE, max_display = 8)

autoplot(pdp_artificial) +
  labs(
    x = "Share of artificial land in 2006 (inhabited areas)",
    y = "Predicted probability of urban greening"
  ) +
  theme_minimal()

Mediation analysis

This section tests whether environmental funding mediates the relationship between Green Party support and urban greening in inhabited areas. The preferred specification includes the baseline share of artificial land in 2006 within inhabited areas as a structural control.

Mediation results

Mediator ACME ACME 95% CI ADE ADE 95% CI Total Effect
Grants per capita -0.0125 [-0.0449; 0.0143] -0.0785 [-0.1831; 0.0258] -0.0889
Projects per capita -0.0054 [-0.0377; 0.0258] -0.0875 [-0.1973; 0.0179] -0.0921