This file contains instructions to reproduce data used for the survey design and analysis in Elmendorf, Nall, and Oklobdzija (2025b). The file structure of this replication package (Elmendorf, Nall, and Oklobdzija 2025a) is organized as follows:
JEP_replication/
└── README.md
└── references.bib
└── License.txt
└── code/
└── data/
└── figures/
└── survey_instruments/
└── tables/
The file /code/folk_econ_rep_code_JEP.rmd generates all
figures and tables for the paper and online appendix, saving them to the
/figures and /tables folders, respectively.
The replicator should expect the code to run for about 6 minutes.
The file
/code/zip_replicator/rents_home_price_api_data_generator.R
will create the data table used to power the API that provided survey
respondents with home price and rent data for their zip code. That data
is also provided in the data folder as price_by_zip.csv. ##
Data Availability and Provenance Statements
The figures in our paper use data from four original surveys with registered preanalysis plans. (One figure is borrowed from a previously published paper, as noted in the text.) The surveys were conducted using the Qualtrics platform. The sampling frame consists of residents of U.S. urban and suburban zip codes. We excluded zip codes with a population density of less than 500 persons per square mile. Code to generate the
The respondents are from an online panel provided by Forthright, a leading vendor. We directed the survey vendor to maintain equal proportions of homeowners and renters in the sample and to balance on age, race, and gender using the vendor’s nationally representative population quotas. See Online Appendix Section 8 for a table that benchmarking the demographics of our samples against the U.S. population.
The data included with this replication package were downloaded from
Qualtrics shortly after each survey closed ( download dates are noted in
folk_econ_rep_code_JEP.Rmd). These .csv files were renamed
for convenience but not altered in any way, with one exception: we
excluded two columns from JPIPE paper data.csv, because
some respondents provided personally identifiable information in these
fields and the fields aren’t used or referenced in the present
paper.
The data are licensed under a Creative Commons/CC-BY-NC license. See LICENSE.txt for details.
| Data.Name | Data.Files | Location | Provided | Citation |
|---|---|---|---|---|
| “Survey 1” | Survey 1.csv |
data/ | TRUE | (Elmendorf, Nall, and Oklobdzija 2025a) |
| “Survey 2” | Survey 2.csv |
data/ | TRUE | (Elmendorf, Nall, and Oklobdzija 2025a) |
| “Survey 3” | Survey 3.csv |
data/ | TRUE | (Elmendorf, Nall, and Oklobdzija 2025a) |
| “ZCTA, NHGIS” | nhgis0009_ds244_20195_zcta.csv |
data/ | TRUE | (Manson et al. 2024) |
| “Block Group, NHGIS” | nhgis0010_ds244_20195_blck_grp.csv |
data/ | TRUE | (Manson et al. 2024) |
| “ZCTA GIS, NHGIS” | US_zcta_2020.shp |
data/ | TRUE | (Manson et al. 2024) |
| “Blk Grp GIS, NHGIS” | US_blck_grp_2019.shp |
data/ | TRUE | (Manson et al. 2024) |
| “Zillow HVI by Zip” | zhvi.csv |
data/zip_replication | TRUE | (Zillow Group n.d.) |
| “Zillow HVI by County” | zhvi_county.csv |
data/zip_replication | TRUE | (Zillow Group n.d.) |
| “Zillow ORI by Zip” | zori.csv |
data/zip_replication | TRUE | (Zillow Group n.d.) |
| “US Zips” | uszips.csv |
data/zip_replication | TRUE | (Pareto Software, LLC 2022) |
| “Zip-County Crosswalk” | zip_county_cw.csv |
data/zip_replication | TRUE | (Missouri Census Data Center 2025) |
| “City/Co. Med Rents” | alist_2021_11.csv |
data/zip_replication | TRUE | (Apartment List Research Team 2022) |
Each GIS dataset contains multiple, related GIS files, usually with the same filename stem. For brevity, we refer only to the .shp file.
In addition to screening zip codes by respondent, we assembled
ZCTA-level data on local demographics and prices.
These data were used to feed locally specific housing price information
to respondents.
Below, we list each data file, its origin, and a website with data description or codebook.
Datafiles:
data/zip_replication/zhvi.csv: Zip-code-level measures
of the Zillow Home Value Index (ZHVI) All Homes (SFR, Condo/Co-op) Time
Series, Smoothed, Seasonally Adjusted($) obtained from Zillow at https://www.zillow.com/research/data/.
data/zip_replication/zhvi_county.csv: County-level
measures of the Zillow Home Value Index (ZHVI) All Homes (SFR,
Condo/Co-op) Time Series, Smoothed, Seasonally Adjusted($) obtained from
Zillow at https://www.zillow.com/research/data/.
data/zip_replication/zori.csv: Zip-code-level measures
of the Zillow Observed Rent Index (ZORI) All Homes Plus Multifamily Time
Series ($) obtained from Zillow at https://www.zillow.com/research/data/.\
data/zip_replication/uszips.csv: A list of all ZCTAs in
the 50 American states as well as the District of Columbia was obtained
from https://simplemaps.com/data/us-zips with a basic
subscription. A codebook is available at the listed website.
data/zip_replication/zip_county_cw.csv: A crosswalk
linking ZCTAs to counties obtained from the University of Missouri’s
Census Data Center at https://mcdc.missouri.edu/applications/geocorr2022.html.
The crosswalk was obtained by selecting all 50 U.S. states and the
District of Columbia, ZIP/ZCTA in the “Source Geography” and
County in the “Target Geography.” Population was selected for
the weighting variable.
data/zip_replication/alist_2021_11: Data on city and
county median rents was obtained from Apartment List at https://www.apartmentlist.com/research/category/data-rent-estimates.
The Current Month Summary report option was selected in
November 2021 from the Download Report dropdown. A description
of the data is also featured on that page.
We use IPUMS NHGIS data to identify the zip codes to be included in our survey sampling frame. We calculate the population density of each block group. Then, using R spatial packages, we spatially join block-group centroids to zip code tabulation area and calculate the average density of block groups within a ZCTA, weighting by block-group population. IPUMS does not allow for redistribution, except for inclusion of files in replication archives.
Datafiles:
data/nhgis0009_ds244_20195_zcta.csv: Zip code tabulation
area (ZCTA) data for the 2019 5-year ACS estimates with population
attributes, used to identify rural and non-rural zip codes for purposes
of survey sampling. A codebook appears in the data archive at
data/nhgis0009_ds244_20195_zcta_codebook.txt.
data/nhgis0010_ds244_20195_blck_grp.csv: ACS 2019 5-year
population estimates from NHGIS. These populations are used in an
overlay of block-group centroids over ZCTA to calculated
population-weighted density within each ZCTA. A codebook appears in the
data archive
atdata/nhgis0010_ds244_20195_blck_grp_codebook.txt.
data/US_zcta_2020.shp: The ZCTA shapefile used to
overlay block-groups to calculate population-weighted population
density. All files with the “US_zcta_2020” stem are loaded by GIS
software or GIS R packages.
data/US_blck_grp_2019.shp: Block-group shapefile, which
is subsequently converted to centroids and spatially joined to the ZCTA
file to calculate block-group-weighted population density. All files
with the “US_blck_grp_2019” stem are loaded by GIS software or GIS R
packages.
Several of our code scripts call the R tidycensus package to import Census data for analysis in R. Instructions to obtain a Census API key needed to use this package appears in the code-execution instructions.
To access the plain-English labels for the various Census variables, use the function load_variables() in the R Census package. For example, for the full “codebook” for the 2015-2019 5-year ACS estimates, you would execute the following line: acs5_vars<-load_variables(year=2019, dataset=“acs5”) Then, search for the variable(s) downloaded with the tidycensus function.
| Data File | Source | Notes | Provided |
|---|---|---|---|
data/Survey 1.csv |
authors | raw data | Yes |
data/Survey 2.csv |
authors | raw data | Yes |
data/Survey 3.csv |
authors | raw data | Yes |
data/PIPE paper data.csv |
authors | raw data | Yes |
data/nhgis0009_ds244_20195_zcta.csv |
(Manson et al. 2024) | raw data | Yes |
data/nhgis0010_ds244_20195_blck_grp.csv |
(Manson et al. 2024) | raw data | Yes |
data/US_zcta_2020.shp |
(Manson et al. 2024) | raw data | Yes |
data/US_blck_grp_2019.shp |
(Manson et al. 2024) | raw data | Yes |
data/zip_replication/zhvi.csv |
(Zillow Group n.d.) | raw data | Yes |
data/zip_replication/zhvi_county.csv |
(Zillow Group n.d.) | raw data | Yes |
data/zip_replication/zori.csv |
(Zillow Group n.d.) | raw data | Yes |
data/zip_replication/uszips.csv |
(Pareto Software, LLC 2022) | raw data | Yes |
data/zip_replication/zip_county_cw.csv |
(Missouri Census Data Center 2025) | raw data | Yes |
data/zip_replication/alist_2021_11.csv |
(Apartment List Research Team 2022) | raw data | Yes |
data/rural_zip_codes_for_rep.csv |
authors | derived | Yes |
All survey data are provided in raw form as downloaded from
Qualtrics. The main survey questions are encoded and labeled in the
opening chunks of the analysis code,
code/folk_econ_rep_code_JEP.rmd. A comprehensive guide to
the encoded data, with text of the associated survey questions, is
provided as data/jep_codebook.md.
For each itemized data source, we provide a reference to public codebooks, or refer to a codebook stored in the data folder.
Our code will run on a typical personal computer.
The replication archive is found at https://www.openicpsr.org/openicpsr/project/233932.
Code for this project was written in R Markdown, using R Studio version 2025.5.0.496. All packages required to run the replication code are named and “libraried” in the opening code chunks. If run in R Studio, you will be prompted to install the necessary packages. Two packages are not presently on CRAN and must be installed unprompted, using the following commands:
R version 4.5.0 (2025-04-11)
The code chunk {r setup, include=FALSE}, beginning
on line 25 of code/folk_econ_rep_code_JEP.rmd, libraries
all packages on which the data cleaning and analysis code relies. It
will prompt you to install any packages that you do not already have
installed, except for two packages not available on CRAN, which must be
installed separately, as follows:
install.packages('fwildclusterboot', repos ='https://s3alfisc.r-universe.dev')
install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')
To compile the “demographics” chunk of the replication code, you
will need an API key from the U.S. census. One can sign up for API
access and obtain a key at https://api.census.gov/data/key_signup.html. Once you
have obtained an API key, assign it to census.api.key on
line 93 of folk_econ_rep_code_JEP.rmd, per comment “YOUR
KEY GOES HERE”.
Approximate time needed to reproduce the analyses on a standard (2025) desktop machine:
Approximate storage space needed:
[] 25 MB - 250 MB
The code was last run on a 24-core Mac Studio with an M2 Ultra chip, MacOS Sequoia 15.5, and 440 GB of free space, with a runtime of 5.75 minutes.
The total size of the replication directory after outputting figures and tables is 4.19 GB. We assume that most replicators will have already downloaded and installed most of the packages on which the replication depends. We did not separately calculate the storage space required for those packages.
code/folk_econ_rep_code_JEP.rmd, recodes
the raw data and generates all tables and figures for the paper and
Online Appendix. It save tables to /tables and figures to
/figures.code/popweighted_for_replication_original.R, is our
original code used to create the list of of rural zip codes
(data/weighted_zips_pop_original.csv), our
population-weighted densities under 500 persons per square mile. It does
not compile because several of the installed packages were discontinued
in 2022 or 2023. The full list of zip codes generated by this script was
used to determine which codes to sample in all 4 surveys on Qualtrics.
It is provided here as a reference.code/popweighted_for_replication_final.R
contains updated code to produce the same file for replication purposes,
using contemporary GIS packages available through the CRAN repository.
It creates a table called data/zcta_merge_comparison.csv
that compares the population densities of the zip codes that we used in
our study (columns labeled _orig) to zip codes generated using the new
GIS code. This script produces results that deviate very slightly from
the dataset actually used in our sample, for several possible reasons.
In our original code, we joined rows by indexes, not by the GISJOIN
field as we do in the new code. It is also possible that the new code
used for spatial joins yielded slight differences in the block groups
included in each zip code, as centroids might have been calculated
slightly differently. The provided code shows that the weighted
population densities of zip codes calculated using the two different
code scripts are correlated at r=0.96 (with log transformation, r=0.87).
On net, the updated code identified 2.8 million more residents as rural.
In addition, ZCTA population density (unweighted) is correlated at
r=0.90 with the weighted density measure used in our data, and r=0.93
with the weighted density measure generated using our new code. The zip
codes actually targeted in our sample had an average unweighted ZCTA
population density of 2,891 persons per square mile. If we had used the
new code, the average population density of sampled zip codes would have
been 3,127 persons per square mile. Though we have been unable to
identify the coding difference that caused these minor changes, the full
list of zip codes used to exclude rural zip codes in our Qualtrics
sampling has been included in the replication archive. The
script,code/popweighted_for_replication_final.R, also
generates QQ plots and scatterplots illustrating the similarity of the
two datasets.The code is licensed under a CC BY 4.0 license. See
/LICENSE.txt for details.
INSTRUCTIONS: The first two sections ensure that the data and software necessary to conduct the replication have been collected. This section then describes a human-readable instruction to conduct the replication. This may be simple, or may involve many complicated steps. It should be a simple list, no excess prose. Strict linear sequence. If more than 4-5 manual steps, please wrap a main program/Makefile around them, in logical sequences. Examples follow.
To replicate the figures and tables in the paper and Online Appendix:
Open code/folk_econ_rep_code_JEP.rmd in R
Studio
Install any packages you are notified to install
Enter these commands in the console:
install.packages('fwildclusterboot', repos ='https://s3alfisc.r-universe.dev')install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')Request a Census API key, if you don’t have one already, and
assign it, quoted, to census.api.key on line 93 (bottom of
the setup chunk), replacing the words “YOUR KEY GOES
HERE.”
Click “Run All” or “Knit”
To replicate our population-weighted zip code screen, which we provided to the survey vendor to exclude potential respondents from rural areas:
Open code/popweighted_for_replication_final.R in R
Studio
Install any packages you are notified to install
Click “Run All” or “Knit”
To replicate our method for defining counterfactual (no-supply-shock)
home prices and rents in survey questions, i.e., the prices and rents in
price_by_zip.csv:
Open code/rents_home_price_api_data_generator.R in R
Studio.
Install any packages you are notified to install
Click “Run All” or “Knit”
The provided code reproduces:
Code chunks are labeled with the number of the figure or table they
produce. The prefix SI indicates that a figure or table is
produced as supplemental information for the Online Appendix.
Elmendorf, Christopher S., Clayton Nall, and Stan Oklobdzija. “The Folk Economics of Housing.” Journal of Economic Perspectives (2025).
Elmendorf, Christopher S., Clayton Nall, and Stan Oklobdzija. “The Folk Economics of Housing: Replication Data.” Journal of Economic Perspectives (2025).
Content on this page was adapted from the AEA Data Editor’s replication template.