setwd("C:/Users/Lenovo/OneDrive - The Pennsylvania State University/Penn State/Coursework/Sem IV/EEFE 530/Problem Sets/Problem Set 4")
#Libraries
# Load packages
library(pacman)
p_load(Matching, Jmisc, lmtest, sandwich, kdensity, haven, boot,
cobalt, Matchit, Zelig, estimatr, cem, tidyverse,
lubridate, usmap, gridExtra, stringr, readxl, plot3D,
cowplot, reshape2, scales, broom, data.table, ggplot2, stargazer,
foreign, ggthemes, ggforce, ggridges, latex2exp, viridis, extrafont,
kableExtra, snakecase, janitor)
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: package 'Matchit' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
## Warning: Perhaps you meant 'MatchIt' ?
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
## cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## Warning: 'BiocManager' not available. Could not check Bioconductor.
##
## Please use `install.packages('BiocManager')` and then retry.
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'Matchit'
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: package 'Zelig' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
## cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## Warning: 'BiocManager' not available. Could not check Bioconductor.
##
## Please use `install.packages('BiocManager')` and then retry.
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'Zelig'
## Warning in p_load(Matching, Jmisc, lmtest, sandwich, kdensity, haven, boot, : Failed to install/load:
## Matchit, Zelig
Citation of Paper Selected: Taraz, V. (2017). Adaptation to Climate Change: Historical Evidence from the Indian Monsoon. Environment and Development Economics, 22(5), 517–545.
The paper focuses on how do Indian farmers adapt to medium-run variations in monsoon rainfall, and to what extent can adaptation offset climate-induced income losses? It examines whether farmers adjust irrigation and crop choices in response to persistent rainfall shifts.
Outcome variable: Farm profits per hectare (annually); Treatment variable: Wealth (measured in land in hectares owned by household); Endogeneity concern: Wealth is likely endogenous because it may be correlated with unobserved farmer characteristics (e.g., productivity, risk preferences) or past economic shocks that also influence profits.
IV: Inherited land (land passed down through generations and not acquired through market transactions).
DAG
# Load libraries
library(dagitty)
## Warning: package 'dagitty' was built under R version 4.4.3
library(ggdag)
## Warning: package 'ggdag' was built under R version 4.4.2
##
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
##
## filter
# Define the DAG
dag <- dagitty("dag {
InheritedLand -> Wealth -> Profit
}")
# Plot the DAG
ggdag(dag, text = FALSE, use_labels = "name") +
theme_minimal() +
labs(title = "DAG for IV Strategy in Vis Taraz (2017)")
Wealth is endogenous to Profit due to unobservables such as farmer ability, productivity, etc. Therefore, “Unobs” is the unobservable part that makes Wealth endogenous to Profit.
The IV, inherited land, is relevant because it is a strong predictor of a farmer’s current landholdings (wealth). In agrarian India, most land is passed down across generations, so inherited land explains substantial variation in current wealth. The paper confirms this with a strong first-stage relationship.
The IV is valid because inherited land is determined by historical family transfers, not by the farmer’s recent productivity or profit outcomes. It affects profits only indirectly through wealth, not through any other unobserved pathway. This satisfies the exclusion restriction needed for a valid instrument.
The paper finds that farmers adapt to medium-run rainfall declines by adjusting irrigation and crop choices (increasing irrigated), but these adaptations recover only a small share of lost profits (25-30%). Wealthier farmers adapt more effectively, and the effect of rainfall shock on their profits is smaller. Overall, adaptation mitigates but does not fully offset the negative effects of changing monsoon patterns.
I choose DellaVigna & Gentzkow (QJE 2019) paper. The paper looks into why chain stores keep uniformed prices across stores despite local level demand changes. They are looking at quantities of an item sold in a particular store at a particular time.
Problem: Estimating quantity sold based on price is endogenous because it may respond to local demand shocks (e.g., if demand unexpectedly rises, prices might increase). In this case, simply regressing quantity on price would bias the estimate of the price elasticity of demand.
Therefore, authors use Hausman-Nevo IV by using local income (household income per zipcode level) as an instrument for prices. Their argument is that local income affects price sensitivity (elasticity), and thus indirectly influences pricing decisions (relevance), but it’s unlikely to be directly correlated with store-level demand shocks for specific products at specific times (exogeneity).
install.packages("causalweight")
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
## cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
##
## There is a binary version available but the source version is later:
## binary source needs_compilation
## causalweight 1.1.1 1.1.2 FALSE
## installing the source package 'causalweight'
library(causalweight)
## Loading required package: ranger
## Warning: package 'ranger' was built under R version 4.4.3
# Load data
data(JC)
table(JC$assignment)
##
## 0 1
## 3663 5577
table(JC$trainy1)
##
## 0 1
## 2666 6574
#table(JC$trainy2)
table(JC$assignment, JC$trainy1)
##
## 0 1
## 0 1809 1854
## 1 857 4720
prop.table(table(JC$assignment, JC$trainy1))
##
## 0 1
## 0 0.19577922 0.20064935
## 1 0.09274892 0.51082251
as.data.frame(table(JC$assignment, JC$trainy1))
## Var1 Var2 Freq
## 1 0 0 1809
## 2 1 0 857
## 3 0 1 1854
## 4 1 1 4720
# Mean earnings for treatment and control groups
ITT <- mean(JC$earny4[JC$assignment == 1]) - mean(JC$earny4[JC$assignment == 0])
# ITT effect
print(ITT)
## [1] 16.05513
The ITT estimate of 16.05 means that on average, individuals who were randomly assigned to the Job Corps (JC) program earned $16.05 more per week in the fourth year after the program, compared to those assigned to the control group.
# Run the ITT regression
ITT_model <- lm(earny4 ~ assignment, data = JC)
# Display full regression output
summary(ITT_model)
##
## Call:
## lm(formula = earny4 ~ assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -213.98 -164.65 -24.02 99.25 2211.98
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 197.926 3.212 61.620 < 2e-16 ***
## assignment 16.055 4.134 3.883 0.000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 194.4 on 9238 degrees of freedom
## Multiple R-squared: 0.00163, Adjusted R-squared: 0.001522
## F-statistic: 15.08 on 1 and 9238 DF, p-value: 0.0001038
# Extract ITT estimate (coefficient for assignment) and standard error
ITT_estimate <- coef(ITT_model)["assignment"]
SE <- summary(ITT_model)$coefficients["assignment", "Std. Error"]
# Print results
cat("ITT Estimate:", ITT_estimate, "\n")
## ITT Estimate: 16.05513
cat("Standard Error:", SE, "\n")
## Standard Error: 4.134466
# Compute ITT on earnings
ITT_earnings <- mean(JC$earny4[JC$assignment == 1]) - mean(JC$earny4[JC$assignment == 0])
# Compute difference in take-up rates
takeup_treatment <- mean(JC$trainy1[JC$assignment == 1]) # Among assigned to treatment
takeup_control <- mean(JC$trainy1[JC$assignment == 0]) # Among assigned to control
takeup_difference <- takeup_treatment - takeup_control
# Compute LATE
LATE <- ITT_earnings / takeup_difference
# Print results
cat("ITT on Earnings:", ITT_earnings, "\n")
## ITT on Earnings: 16.05513
cat("Difference in Take-up Rates:", takeup_difference, "\n")
## Difference in Take-up Rates: 0.3401906
cat("Estimated LATE:", LATE, "\n")
## Estimated LATE: 47.1945
install.packages("AER")
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
## cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## package 'AER' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Lenovo\AppData\Local\Temp\Rtmp8OMm4v\downloaded_packages
library(AER)
## Warning: package 'AER' was built under R version 4.4.3
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
## The following object is masked from 'package:boot':
##
## logit
## The following object is masked from 'package:Jmisc':
##
## recode
## Loading required package: survival
##
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
##
## aml
# Estimate LATE using IV regression (2SLS)
LATE_model <- ivreg(earny4 ~ trainy1 | assignment, data = JC)
# Display regression results
summary(LATE_model)
##
## Call:
## ivreg(formula = earny4 ~ trainy1 | assignment, data = JC)
##
## Residuals:
## Min 1Q Median 3Q Max
## -221.23 -165.43 -22.55 100.01 2235.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 174.039 8.909 19.536 < 2e-16 ***
## trainy1 47.194 12.192 3.871 0.000109 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 195 on 9238 degrees of freedom
## Multiple R-Squared: -0.004797, Adjusted R-squared: -0.004905
## Wald test: 14.98 on 1 and 9238 DF, p-value: 0.0001092
# Using "ivmodel"
install.packages("ivmodel")
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
## cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## package 'ivmodel' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Lenovo\AppData\Local\Temp\Rtmp8OMm4v\downloaded_packages
library(ivmodel)
## Warning: package 'ivmodel' was built under R version 4.4.3
# Estimate IV model
iv_result <- ivmodel(Y = JC$earny4, D = JC$trainy1, Z = JC$assignment)
# Display results
summary(iv_result)
##
## Call:
## ivmodel(Y = JC$earny4, D = JC$trainy1, Z = JC$assignment)
## sample size: 9240
## _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
##
## First Stage Regression Result:
##
## F=1440.46, df1=1, df2=9238, p-value is < 2.22e-16
## R-squared=0.1348939, Adjusted R-squared=0.1348003
## Residual standard error: 0.4214583 on 9239 degrees of freedom
## _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
##
## Coefficients of k-Class Estimators:
##
## k Estimate Std. Error t value Pr(>|t|)
## OLS 0.0000 14.2283 4.4649 3.187 0.001444 **
## Fuller 0.9999 47.1681 12.1881 3.870 0.000110 ***
## TSLS 1.0000 47.1945 12.1924 3.871 0.000109 ***
## LIML 1.0000 47.1945 12.1924 3.871 0.000109 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
##
## Alternative tests for the treatment effect under H_0: beta=0.
##
## Anderson-Rubin test (under F distribution):
## F=15.07956, df1=1, df2=9238, p-value is 0.00010379
## 95 percent confidence interval:
## [23.3644072308337, 71.2284358822774]
##
## Conditional Likelihood Ratio test (under Normal approximation):
## Test Stat=15.07956, p-value is 0.00010379
## 95 percent confidence interval:
## [23.3644368880534, 71.2284057155105]
ITT measures the effect of being assigned to treatment, regardless of whether participants actually took it up. Individuals randomly assigned to the Job Corps program earned, on average, $16.05 more per week in the fourth year compared to those assigned to the control group. This includes both compliers (who actually participated) and noncompliers (who didn’t take up the treatment).
LATE estimates the effect of treatment on compliers—those who actually participated only because they were assigned to treatment. For compliers, participating in Job Corps caused an average increase in weekly earnings of $47.19 in the fourth year. Since LATE is estimated using only the variation caused by random assignment, it isolates the true causal effect on those who took the program due to assignment.
data <- read_dta("Card1995.dta")
#Create exp variable
data <- data %>%
mutate(exp = age76 - ed76 - 6,
exp_sq = exp^2/100)
ols_model <- lm(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(ols_model)
##
## Call:
## lm(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r +
## smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.59297 -0.22315 0.01893 0.24223 1.33190
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.733664 0.067603 70.022 < 2e-16 ***
## ed76 0.074009 0.003505 21.113 < 2e-16 ***
## exp 0.083596 0.006648 12.575 < 2e-16 ***
## exp_sq -0.224088 0.031784 -7.050 2.21e-12 ***
## black -0.189632 0.017627 -10.758 < 2e-16 ***
## reg76r -0.124862 0.015118 -8.259 < 2e-16 ***
## smsa76r 0.161423 0.015573 10.365 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3742 on 3003 degrees of freedom
## (603 observations deleted due to missingness)
## Multiple R-squared: 0.2905, Adjusted R-squared: 0.2891
## F-statistic: 204.9 on 6 and 3003 DF, p-value: < 2.2e-16
iv_model <- ivreg(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r |
nearc4 + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(iv_model)
##
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r +
## smsa76r | nearc4 + exp + exp_sq + black + reg76r + smsa76r,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82125 -0.24065 0.02368 0.25469 1.43205
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.75278 0.82934 4.525 6.27e-06 ***
## ed76 0.13229 0.04923 2.687 0.00725 **
## exp 0.10750 0.02130 5.047 4.76e-07 ***
## exp_sq -0.22841 0.03341 -6.836 9.84e-12 ***
## black -0.13080 0.05287 -2.474 0.01342 *
## reg76r -0.10490 0.02307 -4.546 5.67e-06 ***
## smsa76r 0.13132 0.03013 4.359 1.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.391 on 3003 degrees of freedom
## Multiple R-Squared: 0.2252, Adjusted R-squared: 0.2237
## Wald test: 120.8 on 6 and 3003 DF, p-value: < 2.2e-16
first_stage <- lm(ed76 ~ nearc4 + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(first_stage)
##
## Call:
## lm(formula = ed76 ~ nearc4 + exp + exp_sq + black + reg76r +
## smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.6389 -1.4325 -0.1028 1.3268 6.2332
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.53964 0.16286 101.559 < 2e-16 ***
## nearc4 0.30628 0.07666 3.995 6.59e-05 ***
## exp -0.35881 0.03040 -11.805 < 2e-16 ***
## exp_sq -0.21620 0.14590 -1.482 0.138
## black -1.03873 0.08358 -12.428 < 2e-16 ***
## reg76r -0.32964 0.07385 -4.464 8.29e-06 ***
## smsa76r 0.39091 0.07788 5.019 5.44e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.982 on 3606 degrees of freedom
## Multiple R-squared: 0.4813, Adjusted R-squared: 0.4805
## F-statistic: 557.7 on 6 and 3606 DF, p-value: < 2.2e-16
reduced_form <- lm(lwage76 ~ nearc4 + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(reduced_form)
##
## Call:
## lm(formula = lwage76 ~ nearc4 + exp + exp_sq + black + reg76r +
## smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.56525 -0.24771 0.01465 0.27091 1.38743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.956604 0.036371 163.775 < 2e-16 ***
## nearc4 0.044624 0.017011 2.623 0.00876 **
## exp 0.053258 0.006948 7.666 2.38e-14 ***
## exp_sq -0.218720 0.034021 -6.429 1.49e-10 ***
## black -0.263903 0.018485 -14.277 < 2e-16 ***
## reg76r -0.143458 0.016336 -8.782 < 2e-16 ***
## smsa76r 0.184752 0.017503 10.555 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4005 on 3003 degrees of freedom
## (603 observations deleted due to missingness)
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1854
## F-statistic: 115.2 on 6 and 3003 DF, p-value: < 2.2e-16
stargazer(ols_model, iv_model, first_stage, reduced_form,
type = "text",
column.labels = c("OLS", "2SLS (IV)", "First Stage", "Reduced Form"),
title = "Card (1993) Replication Results",
keep.stat = c("n", "rsq", "adj.rsq"))
##
## Card (1993) Replication Results
## ============================================================
## Dependent variable:
## -----------------------------------------------
## lwage76 ed76 lwage76
## OLS instrumental OLS OLS
## variable
## OLS 2SLS (IV) First Stage Reduced Form
## (1) (2) (3) (4)
## ------------------------------------------------------------
## ed76 0.074*** 0.132***
## (0.004) (0.049)
##
## nearc4 0.306*** 0.045***
## (0.077) (0.017)
##
## exp 0.084*** 0.107*** -0.359*** 0.053***
## (0.007) (0.021) (0.030) (0.007)
##
## exp_sq -0.224*** -0.228*** -0.216 -0.219***
## (0.032) (0.033) (0.146) (0.034)
##
## black -0.190*** -0.131** -1.039*** -0.264***
## (0.018) (0.053) (0.084) (0.018)
##
## reg76r -0.125*** -0.105*** -0.330*** -0.143***
## (0.015) (0.023) (0.074) (0.016)
##
## smsa76r 0.161*** 0.131*** 0.391*** 0.185***
## (0.016) (0.030) (0.078) (0.018)
##
## Constant 4.734*** 3.753*** 16.540*** 5.957***
## (0.068) (0.829) (0.163) (0.036)
##
## ------------------------------------------------------------
## Observations 3,010 3,010 3,613 3,010
## R2 0.291 0.225 0.481 0.187
## Adjusted R2 0.289 0.224 0.480 0.185
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
iv_model2 <- ivreg(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r |
nearc4a + nearc4b + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(iv_model2)
##
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r +
## smsa76r | nearc4a + nearc4b + exp + exp_sq + black + reg76r +
## smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.93985 -0.25152 0.01722 0.27365 1.48154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.26801 0.68718 4.756 2.07e-06 ***
## ed76 0.16109 0.04077 3.951 7.96e-05 ***
## exp 0.11931 0.01818 6.564 6.16e-11 ***
## exp_sq -0.23054 0.03503 -6.582 5.46e-11 ***
## black -0.10173 0.04531 -2.245 0.0248 *
## reg76r -0.09504 0.02165 -4.389 1.18e-05 ***
## smsa76r 0.11645 0.02705 4.305 1.73e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4108 on 3003 degrees of freedom
## Multiple R-Squared: 0.1447, Adjusted R-squared: 0.143
## Wald test: 111 on 6 and 3003 DF, p-value: < 2.2e-16
first_stage2 <- lm(ed76 ~ nearc4a + nearc4b + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(first_stage2)
##
## Call:
## lm(formula = ed76 ~ nearc4a + nearc4b + exp + exp_sq + black +
## reg76r + smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7266 -1.4340 -0.0617 1.3095 6.2239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.52964 0.16266 101.621 < 2e-16 ***
## nearc4a 0.39599 0.08119 4.877 1.12e-06 ***
## nearc4b 0.09648 0.09929 0.972 0.331
## exp -0.36039 0.03036 -11.872 < 2e-16 ***
## exp_sq -0.20488 0.14573 -1.406 0.160
## black -1.03976 0.08347 -12.457 < 2e-16 ***
## reg76r -0.30427 0.07414 -4.104 4.15e-05 ***
## smsa76r 0.38804 0.07778 4.989 6.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.979 on 3605 degrees of freedom
## Multiple R-squared: 0.4829, Adjusted R-squared: 0.4819
## F-statistic: 481 on 7 and 3605 DF, p-value: < 2.2e-16
reduced_form2 <- lm(lwage76 ~ nearc4a + nearc4b + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(reduced_form2)
##
## Call:
## lm(formula = lwage76 ~ nearc4a + nearc4b + exp + exp_sq + black +
## reg76r + smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.58218 -0.24720 0.01594 0.26671 1.36660
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9562112 0.0363139 164.020 < 2e-16 ***
## nearc4a 0.0640062 0.0180132 3.553 0.000386 ***
## nearc4b -0.0001056 0.0219131 -0.005 0.996155
## exp 0.0525799 0.0069398 7.577 4.70e-14 ***
## exp_sq -0.2146431 0.0339914 -6.315 3.11e-10 ***
## black -0.2639382 0.0184556 -14.301 < 2e-16 ***
## reg76r -0.1383836 0.0163857 -8.445 < 2e-16 ***
## smsa76r 0.1838926 0.0174780 10.521 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3999 on 3002 degrees of freedom
## (603 observations deleted due to missingness)
## Multiple R-squared: 0.1899, Adjusted R-squared: 0.188
## F-statistic: 100.5 on 7 and 3002 DF, p-value: < 2.2e-16
stargazer(iv_model2, first_stage2, reduced_form2,
type = "text",
column.labels = c("IV Model", "First Stage", "Reduced Form"),
title = "IV Estimation Using Public and Private College Proximity",
keep.stat = c("n", "rsq", "adj.rsq"))
##
## IV Estimation Using Public and Private College Proximity
## ==================================================
## Dependent variable:
## -------------------------------------
## lwage76 ed76 lwage76
## instrumental OLS OLS
## variable
## IV Model First Stage Reduced Form
## (1) (2) (3)
## --------------------------------------------------
## ed76 0.161***
## (0.041)
##
## nearc4a 0.396*** 0.064***
## (0.081) (0.018)
##
## nearc4b 0.096 -0.0001
## (0.099) (0.022)
##
## exp 0.119*** -0.360*** 0.053***
## (0.018) (0.030) (0.007)
##
## exp_sq -0.231*** -0.205 -0.215***
## (0.035) (0.146) (0.034)
##
## black -0.102** -1.040*** -0.264***
## (0.045) (0.083) (0.018)
##
## reg76r -0.095*** -0.304*** -0.138***
## (0.022) (0.074) (0.016)
##
## smsa76r 0.116*** 0.388*** 0.184***
## (0.027) (0.078) (0.017)
##
## Constant 3.268*** 16.530*** 5.956***
## (0.687) (0.163) (0.036)
##
## --------------------------------------------------
## Observations 3,010 3,613 3,010
## R2 0.145 0.483 0.190
## Adjusted R2 0.143 0.482 0.188
## ==================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Yes, education is endogenous in the wage equation since it is argued that educational levels are not random in the population. They are determined by individual choices of schooling and income over time. Therefore, to control for that endogeneity, they introduce geographic distance to college.
Here, experience is not endogenous since it is calculated using observed variables and does not have any measurement error or choice based aspect.
data$int1 <- data$nearc4a * data$age76
data$int2 <- data$nearc4a * (data$age76^2 / 100)
iv_structural <- ivreg(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r |
nearc4a + nearc4b + int1 + int2 + black + reg76r + smsa76r, data = data)
summary(iv_structural)
##
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r +
## smsa76r | nearc4a + nearc4b + int1 + int2 + black + reg76r +
## smsa76r, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.10210 -0.28407 0.02576 0.29699 1.49721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.91429 0.46315 8.451 < 2e-16 ***
## ed76 0.16815 0.04408 3.815 0.000139 ***
## exp -0.02323 0.04422 -0.525 0.599389
## exp_sq 0.32689 0.22106 1.479 0.139313
## black -0.04971 0.07055 -0.705 0.481085
## reg76r -0.08832 0.02731 -3.234 0.001234 **
## smsa76r 0.08118 0.04578 1.773 0.076320 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4568 on 3003 degrees of freedom
## Multiple R-Squared: -0.0575, Adjusted R-squared: -0.05961
## Wald test: 105.3 on 6 and 3003 DF, p-value: < 2.2e-16
The results from previous estimation where IV are only used for education is 0.16109, meaning that education increases weekly log earnings by 16%. Whereas, the results from using IV for both education and experience (and square of experience) is similar at 0.168 or 16.8% which is very close to the earlier value. This indicates that that experience is not endogenous in the set up.