R
.Rpubs
using your temporary account.RPubs
link of your work and submit it on
Canvas.RPubs
link last has copied the others. So, timely
submissions are important. Own your work. I can randomly ask your
R
script and .Rmd files for double-checking purposes. As a
standard practice, work in a script file before making your code chunks
in the .Rmd file. Your .Rmd file and Rpubs
submission page
MUST show the code used to produce any of the outputs you present in
your answers.Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity, respect other students’ dignity, rights and property, and help create and maintain an environment in which all can succeed through the fruits of their efforts.
Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.
# Load packages
library(pacman)
p_load(causalweight, lmtest, sandwich, AER, ivmodel, haven, estimatr, tidyverse,
lubridate, usmap, gridExtra, stringr, readxl,
reshape2, scales, broom, data.table, ggplot2, stargazer,
foreign, ggthemes, ggforce, ggridges, latex2exp, viridis, extrafont,
kableExtra, snakecase, janitor)
Find one paper in the applied literature which uses an IV approach in their identification strategy. Provide the following list of information:
One of the instruments commonly used in demand estimation is the Hausman-Nevo instrument, which instruments the price of a specific good (e.g., price of good \(i\) in time \(t\) in a geographical market \(m\)) with the contemporaneous price of the same good in neighboring markets (e.g., average price of good \(i\) in time \(t\) across geographical markets (\(-m\))). The identifying assumption is that a contemporaneous variation in prices in geographical markets (\(-m\)) reflects a contemporaneous variation in marginal production costs for the same good, which will be correlated with the contemporaneous variation in prices in the geographical market \(m\), and marginal cost shifters or shocks are uncorrelated with demand shocks. You can read more about this in Hausman (NBER 1996), Nevo (Econometrica 2001), Nevo (NBER 2012), Berry & Haile (NBER 2021), or Hahn et al. (2024).
Discussion. Despite having its issues, the use of the Hausman-Nevo instrument remains acceptable in reputable journals. In what follows, you are asked to discuss its validity or exclusion restriction in a published paper.
Pick one paper: DellaVigna & Gentzkow (QJE 2019) or Oh & Vukina (AJAE 2022) or any recent paper that relies on the Hausman-Nevo instrument in their identification strategy.
What evidence did the authors use to support their argument that the instrument satisfies the exclusion restriction?
Practice data. The practice data of this question is
the dataset of the “National Job Corps (JC) Study” JC
. JC
is a large education program from the US Department of Labor that
enrolled disadvantaged individuals aged 16-24 in education and/or
vocational training from late 1994 to early 1996 with the goal to
increase their employment and earnings some years after the program and
decrease their criminal activity. You can read more about this
randomized experiment in Schochet et
al. (Mathematica 2001) and Schochet et al.
(AER 2008), etc. You can load the package causalweight
by Bodory and Huber (2018) that contains the JC
dataset.
The JC
sample has a size of 9,240 observations from
individuals randomly assigned to JC treatment group (5,577 observations)
and control group (3,663 observations), which you can check by examining
the assignment variable (assignment
). There is a
discrepancy between the random program assignment
(assignment
) and the actual treatment variable in the first
year after assignment (trainy1
) due to noncompliance. The
outcome variable is weekly earnings in US dollars (USD) in the fourth
year post-treatment (earny4
). There are multiple other
variables, which we will use later in a de-biased/double/causal machine
learning (DML/CML) exercise. I suggest you call ?JC
to open
the help file and check the description of the dataset, even though you
will only use the variables assignment
,
trainy1
, and earny4
for this problem set.
# Load data
data(JC)
table(JC$assignment)
##
## 0 1
## 3663 5577
table(JC$trainy1)
##
## 0 1
## 2666 6574
#table(JC$trainy2)
table(JC$assignment, JC$trainy1)
##
## 0 1
## 0 1809 1854
## 1 857 4720
prop.table(table(JC$assignment, JC$trainy1))
##
## 0 1
## 0 0.19577922 0.20064935
## 1 0.09274892 0.51082251
as.data.frame(table(JC$assignment, JC$trainy1))
## Var1 Var2 Freq
## 1 0 0 1809
## 2 1 0 857
## 3 0 1 1854
## 4 1 1 4720
#?JC
(a) Intent-To-Treat (ITT) Effect.
Without using a regression, compute the estimated ITT. Hint: This is the mean difference in earnings between individuals randomly assigned to JC treatment and those randomly assigned to control.
Use a regression to obtain the estimated ITT and its standard error. Hint: This is the effect of JC random assignment on earnings, assuming full compliance.
(b) Complier share.
Note that you are facing a double-sided noncompliance issue. Compute the following difference: actual treatment take up rate among individuals randomly assigned to treatment minus actual treatment take up rate among those randomly assigned to control.
Use a regression to obtain the same difference in actual treatment take up rates.
(c) Complier/Local Average Treatment Effect (LATE).
Without using a regression, compute the estimated LATE among compliers.
Load the AER
package by Kleiber & Zeileis
(2008), which contains the ivreg
command for 2SLS
regression. Use the ivreg
command to estimate the LATE and
its standard error. Alternatively, you may also load other packages for
IV, such as the ivmodel
package by Kang et al.
(2020) and use the ivmodel
command to estimate the LATE and
its standard error.
Interpret ITT versus LATE estimates of the JC program effect on earnings.
You will replicate and extend the work of Card (NBER 1993; in Christofides et al. 1995, Aspects of Labour Market Behavior: Essays in Honour of John Vanderkamp ) on the returns to schooling using college proximity as an IV for education in a wage regression. The paper’s identifying assumption is that living closer to a college reduces the cost barrier to attending college, thereby increasing the likelihood of enrollment; however, proximity to a college is not assumed to directly impact a student’s skills or abilities and, therefore, should not directly affect their market wage.
Practice data. The data is from the National Longitudinal Survey of Young Men (NLSYM) for 1976. It is sourced from this link and posted as Card1995 on Canvas.
In this exercise, we will focus on using proximity to public college
(nearc4a
) and private college (nearc4b
) as
instruments for education (years of schooling) in 1976
(ed76
). The dependent variable is the log of weekly
earnings (lwage76
). Other variables you will need for this
exercise include age (years) in 1976 (age76
); years of work
experience \((exp)\), calculated as
\((age76 - ed76 -6)\); \((exp^{2}/100)\); an indicator for black
(black
); an indicator for residence in the southern region
of the U.S. (reg76r
); an indicator for urban residence in a
standard metropolitan statistical area (smsa76r
); and an
indicator for growing up in the same county as a 4-year college
(nearc4
). See the description file for the variable
definitions on Canvas.
OLS, IV, First-stage, and Reduced form estimations using
proximity to college. Estimate and show in the same formatted
table the four models described below. You may use packages like
stargazer
, texreg
, or any other package that
helps you produce well-formatted estimation tables. Interpret your
results.
\(lwage76 = \beta_{0} + \beta_{1} ed76 + \beta_{2} exp + \beta_{3} (exp^{2}/100) + \beta_{4} black + \beta_{5} reg76r + \beta_{6} smsa76r + \epsilon\), which should replicate the first OLS model in Table 2 of Card (NBER 1993)’s paper.
the IV model that uses nearc4
as an instrument for
ed76
.
the first-stage regression.
the reduced form regression, i.e., the multivariate
linear regression of lwage76
as a function of the same
independent variables used in the first-stage regression.
IV, First-stage, and Reduced form estimations using
proximity to public and private colleges as instruments.
Estimate and show in the same formatted table the IV model that uses
nearc4a
and nearc4b
as instruments for
ed76
, first-stage regression, and reduced form regression.
You may use packages like stargazer
, texreg
,
or any other package that helps you produce well-formatted estimation
tables. Interpret your results.
Endogeneity discussion.
Is education endogenous in the wage regression? If it is, then is experience endogenous?
Create the interactions \((nearc4a * age76)\) and \((nearc4a * age76^{2}/100)\).
Estimate the structural equation by 2SLS using
nearc4a
, nearc4b
, and the interactions above
as instruments for ed76
, \(exp\), and \((exp^{2}/100)\).
How do the results differ from your earlier ones instrumenting
only for ed76
using nearc4a
and
nearc4b
?
Test the hypothesis that ed76
is exogenous for the
structural return to schooling.
HAVE FUN AND KEEP FAITH IN THE FUN!