R
based on the DAG.Rpubs
using your temporary account.RPubs
link of your work and submit it on
Canvas.RPubs
link last has copied the others. So, timely
submissions are important. Own your work. I can randomly ask your
R
script and .Rmd files for double-checking purposes. As a
standard practice, work in a script file before making your code chunks
in the .Rmd file. Your .Rmd file and Rpubs
submission page
MUST show the code used to produce any of the outputs you present in
your answers.Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity, respect other students’ dignity, rights and property, and help create and maintain an environment in which all can succeed through the fruits of their efforts.
Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.
The primary practice data of this problem set is a housing dataset on
Canvas, testdata20250121.RDS
, with information on sale
price and date of the house, longitude (x), latitude (y), state, FIPS
county code, year in which the house was built, number of bedrooms,
bathrooms, fireplaces, stories, square footage, and presence or absence
of AC. You will have to add secondary data to solve certain questions of
the problem set.
# Load packages
library(pacman)
p_load(tidyverse, lubridate, usmap, gridExtra, stringr, readxl, plot3D,
cowplot, reshape2, scales, broom, data.table, ggplot2, stargazer,
foreign, ggthemes, ggforce, ggridges, latex2exp, viridis, extrafont,
kableExtra, snakecase, janitor)
# Load housing data
housingdata <- readRDS("testdata20250121.RDS")
As a student in your graduate program, you have free access to a comprehensive housing dataset with variations in housing attributes and prices across counties in the United States from 2008 to 2019. Using this dataset will save you significant time, complications, and challenges associated with collecting observational or experimental data in the field or acquiring secondary data at a cost.
You are tasked with studying the causal effect on housing prices (\(Y\)) of a variable (\(D\)) that you select based on a literature search, your interests, and data availability. Your choice of \(D\) must meet the following criteria:
Feel free to choose any variable \(D\) as long as the data source is public, \(D\) varies over time within locations, and you can provide a plausible justification for why \(D\) may affect housing prices.
State your research question. What is the causal relationship of interest? Clearly define the treatment and outcome variables. Provide a concise motivation for this research.
Draw a basic Directed Acyclic Graph (DAG) corresponding to your research question. Construct the figure based on your literature review, theoretical considerations, and any simplifications you’ve made. A DAG is inherently a simplified representation of the causal pathways in your study, so include all key variables and their connections while omitting less important ones. Provide a brief description of your DAG.
List all causal paths from \(D\) to \(Y\).
Are there any confounder(s) and collider(s)?
Based on your DAG, write your estimating equation that isolates your causal path of interest: \(D \rightarrow Y\).
Construct the dataset needed to study your research question by
merging your primary dataset, testdata20250121.RDS
, with
publicly available data on \(D\) and
any other variables necessary to estimate the equation in part (v) of
Problem 1. For reproducibility, document where you download the new data
from and all steps involved in the data merging process, including the
code used.
Plot the means of your treatment variable by state and year.
Create state-level maps showing the data year-by-year. You may use the
usmap
package or any other package that helps you produce
high-quality maps. Interpret the figure.
Plot the means of your outcome variable by state and year. Create
state-level maps showing the data year-by-year. You may use the
usmap
package or any other package that helps you produce
high-quality maps. Interpret the figure.
Generate a table of summary statistics for all variables needed
to estimate the equation in part (v) of Problem 1. You may use packages
like stargazer
, xtable
, kable
, or
any other package that helps you produce well-formatted tables of
descriptive statistics.
Estimate the equation in part (v) of Problem 1 and generate a
formatted table summarizing the estimation results. You may use packages
like stargazer
, texreg
, etable
,
or any other package that helps you produce well-formatted estimation
tables. Discuss your research findings.
HAVE FUN AND KEEP FAITH IN THE FUN!