ValuGaps Survey
Data Guide

Maximilian Föhl

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig

Leipzig University

2025-12-04

Outline

About the project
Survey Summary
Related Materials
Data Structure and DCE Variables

ValuGaps Project

Survey is the product of joint work within the ValuGaps consortium from 2020–2025 with several institutions and collaborators involved
Main topic: Investigating gaps in valuation of natural capital and develop practical, scientifically robust methods to scale and transfer biodiversity and natural capital values across space, time, user groups, and ecosystem types.
Visit https://valugaps.de/en/ for more information

Institutions
Names

Julian Sagebiel, Nino Cavallaro, Maximilian Föhl, Martin Quaas, Aletta Bonn, Maria Schnabel, Marie Meemken, Kevin Rozario, Deutsches Zentrum für integrative Biodiversitätsforschung (iDiv) Halle-Jena-Leipzig; Stefan Baumgärtner, Albert-Ludwigs-Universität Freiburg (ALU); Moritz Drupp, Björn Bos, Pier Basaglia, Simon Disque, Jonas Grunau, Universität Hamburg (UHH); Britta Tietjen, Fanny Langerwisch, Freie Universität Berlin (FU); Harry Gölz, Beyhan Ekinci, Veronika Liebelt, Bundesamt für Naturschutz (BfN); Björn Bünger, Astrid Matthey, Jan Philipp Schägner, Umweltbundesamt (UBA); Burkhard Schweppe-Kraft; Katarzyna Skrzypek, Katarzyna Zagórska (University of Warsaw)

Survey Summary

Survey Abstract

Online survey conducted in the first quarter of 2025 in Germany with a core sample of around 15,000 respondents
Eliciting values for changes in protected natural areas and high‑nature‑value farmland, accounting for respondents’ current endowment of natural capital.
Methods: Discrete choice experiment, life‑satisfaction valuation, and travel‑cost analysis
The full dataset will be made publicly available as open-source

Life Satisfaction

Agricultural Intensification

Basic background information on the impacts of intensified agriculture on biodiversity
Illustrating examples of intensification measures and their effects
Explaining the purpose of landscape changes to help restore biodiversity (next slides)

Characteristics HNV

Characteristics PA

HNV

PA

Choice Experiment Attribute Levels

Attribute	Levels	Description
Protected areas*	Status quo, +100, +200, +300, +500, +800; hectares	The total area designated as protected area. Levels indicate the expansion in hectares from the current status.
Accessibility of the new protected areas	Not accessible, Half accessible, Fully accessible	The extent to which the public can access newly designated protected areas, ranging from no to full access.
High nature value farmland areas*	Status quo, +100, +200, +300, +500, +800; hectares	The total area of high nature value farmland. Levels indicate the expansion in hectares from the current status.
Visibility of high nature value farmland areas	Barely visible, Clearly visible	Indicates how visible the new areas of high nature value farmland are from public roads or paths.
Mandatory levy to environmental funds (annual sum)	5, 10, 20, 40, 60, 80, 120, 150, 200, 250; euros	The amount each household pays annually to a fund dedicated to nature conservation efforts.

*Note: For the size of protected areas and high nature value farmland, half of the respondents received the attribute levels listed for this vector, while for the other half the levels are twice as high

Choice Set Example

Each respondent receives 10 choice sets (compared against their status quo)

Each set shows different combinations of attributes
Respondents pick their preferred option in each set

Choice Set Map

Choice Set Attributes

Data Downloads

Source	File/Folder	Description	Type
OSF	all_datasets.RData	Cleaned dataset (`all_data`, `complete_data`, `database`)	Dataset
Nextcloud	`datapublication.pdf`	Manuscript explaining the study and dataset	Suppl. materials
Nextcloud	`codebook.html`	Description of survey variables	Suppl. materials
OSF	Survey Video Documentation	Respondent-view video screencasts of the survey	Suppl. materials

RStudio Setup

Open RStudio.
Create a new folder on your computer (e.g. NV_Data).
In RStudio, go to File > New Project > New Directory > New Project
Name the project (e.g. NV_Data) and save it in the folder you created
- Benefit: The .Rproj keeps all files (data, scripts, outputs) in one place and automatically sets the working directory to the project folder.
Optional: Create subfolders for a better organization
- data/ (for raw datasets)
- scripts/ (for R scripts)
- output/ (for results, plots, and processed data)

Data structure `all_datasets.RData`

Download all_datasets.RData and store it in your newly created Folder.
Open RStudio and load all_datasets.RData (e.g. from the Files pane).
After loading your Environment will show that the file contains three data frames:
- all_data has one observation per respondent sampled (incl. dropouts and screenouts)
- complete_data is a subsample of all_data that excludes dropouts and screenouts
- database has ten observations per respondent (excl. dropouts and screenouts)

Data frame database

This is the data frame required for analyzing the DCE
Differs from complete_data because it contains one observation for each of the 10 choice sets answered.
All other variables remain constant across the 10 observations per respondent.

DCE Variables

Codebook Excerpt

Data Exploration I

The file all_datasets.RData is already cleaned, but there are two things to keep in mind.

1. Different numbers of observations

In the Environment pane we see different numbers of observations, as explained before.

However, the number of observations of database should be ten times as high as those of complete_data. Checking the difference:
```
nrow(database) - 10 * nrow(complete_data)
```
Reason: database still contains observations whose geolocations (database$lat, database$lon) were removed after anonymization because they were outside of Germany.
We can remove these observations to exclude them from our sample:
```
database_cleaned <- database[!is.na(database$lat), ]
```

Data Exploration II

2. Different numbers of variables

We also see different numbers of variables across the three data frames. Manually inspecting variables via names() is not very useful in the case of 450+ variables.

We first determine which variables exist in all three data frames:

commonvars <- Reduce(intersect,
  list(names(all_data), names(complete_data), names(database))
)

With 453 variables shared across all three data frames, we can use setdiff() to retrieve the variables that are present only in complete_data:
```
setdiff(names(complete_data), commonvars)
```
All of these are supplementary information derived from the respondent’s geolocation, so it is not problematic if these are not present.

Data Exploration III

Understand the Data Structure

To get a first idea of the data structure use dplyr::glimpse():

install.packages("dplyr")
library(dplyr)
dplyr::glimpse(complete_data)

Relevant Variables for DCE Analysis

Variable	Type	Description
DCE Variables
pref1	bin.	[ONLY IN DATABASE] Selected choice in a choice task in the Discrete Choice Experiment (DCE) (1=Status quo, 2=Landscape modifications)
Dummy_pa_no	bin.	[ONLY IN DATABASE] Dummy: New protected area is inaccessible (a2_x2 == 1)
Dummy_pa_half	bin.	[ONLY IN DATABASE] Dummy: New protected area is partly accessible (a2_x2 == 2)
Dummy_pa_full	bin.	[ONLY IN DATABASE] Dummy: New protected area is fully accessible (a2_x2 == 3)
Dummy_hnv_no	bin.	[ONLY IN DATABASE] Dummy: New HNV farmland is barely visible (a2_x4 == 1)
Dummy_hnv_visible	bin.	[ONLY IN DATABASE] Dummy: New HNV farmland is well visible (a2_x4 == 2)
hnv_att	num.	[ONLY IN DATABASE] Total size of new + existing HNV farmland (in hectares)
pa_att	num.	[ONLY IN DATABASE] Total size of new + existing protected areas (in hectares)
cost_att	num.	[ONLY IN DATABASE] Annual mandatory conservation levy (€)
getZoom	num.	[ONLY IN DATABASE] GETZOOM: Captures the final zoom level on the map in the respective choice task; NAs from respondent behavior
getZoomMax	num.	[ONLY IN DATABASE] GETZOOMMAX: Captures the maximum zoom level on the map in the respective choice task; NAs from respondent behavior
getTime	num.	[ONLY IN DATABASE] GETTIME: Captures the time spent on zooming in and out on the map in the respective choice task; NAs from respondent behavior
sq_hnv_area	num.	Status quo: Total area of HNV farmland near residence (within radius of 30km, in hectares)
sq_hnv_share	num.	Status quo: Share of HNV on non-agricultural land near residence (within radius of 30km, in %)
sq_pa_area	num.	Status quo: Total area of protected areas near residence (within radius of 30km, in hectares)
sq_pa_share	num.	Status quo: Share of protected areas on non-sealed land near residence (within radius of 30km, in %)
hnv[extension]*_area	num.	Area of proposed total HNV farmland near residence with extension measures applied (within radius of 30km, in hectares): hnv[extension]_area = sq_hnv_area + extension
hnv[extension]*_share	num.	Share of HNV on non-agricultural land near residence with extension measures applied (within radius of 30km, in %)
pa[extension]*_area	num.	Area of proposed total protected areas near residence with extension measures applied (within radius of 30km, in hectares): pa[extension]_area = sq_pa_area + extension
pa[extension]*_share	num.	Share of PA on non-agricultural land near residence with extension measures applied (within radius of 30km, in %)
*[extension] = {100, 200, 300, 400, 500, 600, 800, 1000, 1600}
Other Relevant Variables: Split samples for Between-Subject Experiments
equity	const.	Cost equity split sample: 1 = Equal contribution, 2 = Progressive contribution; NAs from coding error before survey waves of main study
radius	const.	Area radius (meters) concerned by landscape changes in map shown to respondent
arm	const.	Split sample payment duration (1 = 5 years, 2 = 10 years, 3 = 20 years, 4 = indefinitely)
dce_version	const.	DCE version assignment (1 = Opt-out left/Low-scale, 2 = Opt-out right/Low-scale, 3 = Left/High-scale, 4 = Right/High-scale)

Seminar Work: The setting

Friedrich Merz wants to assess the potential gains of expanding PA and HNV in Germany. He is confronted with several lobby organisations that lobby for and against an expansion and he has also a tight household. Further, several counties fight about the potential money they gain or have to pay. To get a better idea on whether a law on an PA and HNV expansion is a good idea, he asks an engineering office to assess the potential costs. The costs are given and amount to 1500 Euro per ha. But Merz also wants to know the benefits. How much do people actually benefit from it? Veronika Grimm tells him about the potential of economic valuation and recommends you as an expert in it. Merz calls you and asks you to analise the ValuGaps dataset and guide him on the decision. Unfortunately, he is not an economist, and also quite busy, so that he does not give you more details. He expects a presentation of you in one month.

Your Tasks

Prepare a presentation to guide the German government on a new law to expand PA and HNV
Use the ValuGaps dataset to derive economic values for the expansion
Estimate willingness to pay for 100 ha more PA and and HNV
- Using choice experiments
- Using life satisfaction
Use different functional forms of the utility function and compare model performance
Only based on your results from the choice experiment and the life satisfaction, what would you recommend?
Hint: A recommendation is not necessarily a statement on whether or not to do it. It can also relate to the validity of the methods, or a reminder of taking a specific argument into consideration.

Some hints

Use the scripts from the tutorials as templates
For some inspiration on the utility functions, see https://mpra.ub.uni-muenchen.de/125429/1/MPRA_paper_125429.pdf
When writing about the results, make sure to shortly summarize the data, explain the method and then correctly interpret the results
Hand in your code (that should be executable for us and include many comments in order to understand what you did) alongside your paper
Look at the material from Econometrics again and google or ask ChatGPT (without uploading the data set) when you have coding questions

ValuGaps Survey Data Guide