Setting Working Directory

setwd("C:/Users/Lenovo/OneDrive - The Pennsylvania State University/Penn State/Coursework/Sem IV/EEFE 530/Problem Sets/Problem Set 4")

#Libraries

# Load packages
library(pacman)
p_load(Matching, Jmisc, lmtest, sandwich, kdensity, haven, boot, 
       cobalt, Matchit, Zelig, estimatr, cem, tidyverse, 
       lubridate, usmap, gridExtra, stringr, readxl, plot3D,  
       cowplot, reshape2, scales, broom, data.table, ggplot2, stargazer,  
       foreign, ggthemes, ggforce, ggridges, latex2exp, viridis, extrafont, 
       kableExtra, snakecase, janitor)
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: package 'Matchit' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
## Warning: Perhaps you meant 'MatchIt' ?
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## Warning: 'BiocManager' not available.  Could not check Bioconductor.
## 
## Please use `install.packages('BiocManager')` and then retry.
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'Matchit'
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: package 'Zelig' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## Warning: 'BiocManager' not available.  Could not check Bioconductor.
## 
## Please use `install.packages('BiocManager')` and then retry.
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'Zelig'
## Warning in p_load(Matching, Jmisc, lmtest, sandwich, kdensity, haven, boot, : Failed to install/load:
## Matchit, Zelig

Part i . Example of an IV Paper

  1. Citation of Paper Selected: Taraz, V. (2017). Adaptation to Climate Change: Historical Evidence from the Indian Monsoon. Environment and Development Economics, 22(5), 517–545.

  2. The paper focuses on how do Indian farmers adapt to medium-run variations in monsoon rainfall, and to what extent can adaptation offset climate-induced income losses? It examines whether farmers adjust irrigation and crop choices in response to persistent rainfall shifts.

  3. Outcome variable: Farm profits per hectare (annually); Treatment variable: Wealth (measured in land in hectares owned by household); Endogeneity concern: Wealth is likely endogenous because it may be correlated with unobserved farmer characteristics (e.g., productivity, risk preferences) or past economic shocks that also influence profits.

  4. IV: Inherited land (land passed down through generations and not acquired through market transactions).

  5. DAG

# Load libraries
library(dagitty)
## Warning: package 'dagitty' was built under R version 4.4.3
library(ggdag)
## Warning: package 'ggdag' was built under R version 4.4.2
## 
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
## 
##     filter
# Define the DAG
dag <- dagitty("dag {
  InheritedLand -> Wealth -> Profit
}")

# Plot the DAG
ggdag(dag, text = FALSE, use_labels = "name") +
  theme_minimal() +
  labs(title = "DAG for IV Strategy in Vis Taraz (2017)")

Wealth is endogenous to Profit due to unobservables such as farmer ability, productivity, etc. Therefore, “Unobs” is the unobservable part that makes Wealth endogenous to Profit.

  1. The IV, inherited land, is relevant because it is a strong predictor of a farmer’s current landholdings (wealth). In agrarian India, most land is passed down across generations, so inherited land explains substantial variation in current wealth. The paper confirms this with a strong first-stage relationship.

  2. The IV is valid because inherited land is determined by historical family transfers, not by the farmer’s recent productivity or profit outcomes. It affects profits only indirectly through wealth, not through any other unobserved pathway. This satisfies the exclusion restriction needed for a valid instrument.

  3. The paper finds that farmers adapt to medium-run rainfall declines by adjusting irrigation and crop choices (increasing irrigated), but these adaptations recover only a small share of lost profits (25-30%). Wealthier farmers adapt more effectively, and the effect of rainfall shock on their profits is smaller. Overall, adaptation mitigates but does not fully offset the negative effects of changing monsoon patterns.

Part ii. Hausman-Nevo instrument

I choose DellaVigna & Gentzkow (QJE 2019) paper. The paper looks into why chain stores keep uniformed prices across stores despite local level demand changes. They are looking at quantities of an item sold in a particular store at a particular time.

Problem: Estimating quantity sold based on price is endogenous because it may respond to local demand shocks (e.g., if demand unexpectedly rises, prices might increase). In this case, simply regressing quantity on price would bias the estimate of the price elasticity of demand.

Therefore, authors use Hausman-Nevo IV by using local income (household income per zipcode level) as an instrument for prices. Their argument is that local income affects price sensitivity (elasticity), and thus indirectly influences pricing decisions (relevance), but it’s unlikely to be directly correlated with store-level demand shocks for specific products at specific times (exogeneity).

Part iii. ITT and LATE

install.packages("causalweight")
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## 
##   There is a binary version available but the source version is later:
##              binary source needs_compilation
## causalweight  1.1.1  1.1.2             FALSE
## installing the source package 'causalweight'
library(causalweight)
## Loading required package: ranger
## Warning: package 'ranger' was built under R version 4.4.3
# Load data
data(JC)
table(JC$assignment)
## 
##    0    1 
## 3663 5577
table(JC$trainy1)
## 
##    0    1 
## 2666 6574
#table(JC$trainy2)
table(JC$assignment, JC$trainy1)
##    
##        0    1
##   0 1809 1854
##   1  857 4720
prop.table(table(JC$assignment, JC$trainy1))
##    
##              0          1
##   0 0.19577922 0.20064935
##   1 0.09274892 0.51082251
as.data.frame(table(JC$assignment, JC$trainy1))
##   Var1 Var2 Freq
## 1    0    0 1809
## 2    1    0  857
## 3    0    1 1854
## 4    1    1 4720

(a) Intent to Treat Effect

(1)

# Mean earnings for treatment and control groups
ITT <- mean(JC$earny4[JC$assignment == 1]) - mean(JC$earny4[JC$assignment == 0])

# ITT effect
print(ITT)
## [1] 16.05513

The ITT estimate of 16.05 means that on average, individuals who were randomly assigned to the Job Corps (JC) program earned $16.05 more per week in the fourth year after the program, compared to those assigned to the control group.

(2) Regression Model

# Run the ITT regression
ITT_model <- lm(earny4 ~ assignment, data = JC)

# Display full regression output
summary(ITT_model)
## 
## Call:
## lm(formula = earny4 ~ assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -213.98 -164.65  -24.02   99.25 2211.98 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  197.926      3.212  61.620  < 2e-16 ***
## assignment    16.055      4.134   3.883 0.000104 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.4 on 9238 degrees of freedom
## Multiple R-squared:  0.00163,    Adjusted R-squared:  0.001522 
## F-statistic: 15.08 on 1 and 9238 DF,  p-value: 0.0001038
# Extract ITT estimate (coefficient for assignment) and standard error
ITT_estimate <- coef(ITT_model)["assignment"]
SE <- summary(ITT_model)$coefficients["assignment", "Std. Error"]

# Print results
cat("ITT Estimate:", ITT_estimate, "\n")
## ITT Estimate: 16.05513
cat("Standard Error:", SE, "\n")
## Standard Error: 4.134466

(b) Complier Share

(1)

# Compute actual treatment take-up rates
takeup_treatment <- mean(JC$trainy1[JC$assignment == 1])  # Among those assigned to treatment
takeup_control <- mean(JC$trainy1[JC$assignment == 0])    # Among those assigned to control

# Compute the difference
takeup_difference <- takeup_treatment - takeup_control

# Print results
cat("Take-up Rate (Treatment Group):", takeup_treatment, "\n")
## Take-up Rate (Treatment Group): 0.8463332
cat("Take-up Rate (Control Group):", takeup_control, "\n")
## Take-up Rate (Control Group): 0.5061425
cat("Difference in Take-up Rates:", takeup_difference, "\n")
## Difference in Take-up Rates: 0.3401906

(2)

# Run the regression of actual treatment on assignment
takeup_model <- lm(trainy1 ~ assignment, data = JC)

# Display regression results
summary(takeup_model)
## 
## Call:
## lm(formula = trainy1 ~ assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8463 -0.5061  0.1537  0.1537  0.4939 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.506143   0.006964   72.68   <2e-16 ***
## assignment  0.340191   0.008963   37.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4215 on 9238 degrees of freedom
## Multiple R-squared:  0.1349, Adjusted R-squared:  0.1348 
## F-statistic:  1440 on 1 and 9238 DF,  p-value: < 2.2e-16

(c) Complier/Local Average Treatment Effect (LATE)

(1) Without Regression

# Compute ITT on earnings
ITT_earnings <- mean(JC$earny4[JC$assignment == 1]) - mean(JC$earny4[JC$assignment == 0])

# Compute difference in take-up rates
takeup_treatment <- mean(JC$trainy1[JC$assignment == 1])  # Among assigned to treatment
takeup_control <- mean(JC$trainy1[JC$assignment == 0])    # Among assigned to control
takeup_difference <- takeup_treatment - takeup_control

# Compute LATE
LATE <- ITT_earnings / takeup_difference

# Print results
cat("ITT on Earnings:", ITT_earnings, "\n")
## ITT on Earnings: 16.05513
cat("Difference in Take-up Rates:", takeup_difference, "\n")
## Difference in Take-up Rates: 0.3401906
cat("Estimated LATE:", LATE, "\n")
## Estimated LATE: 47.1945

(2) ivreg

install.packages("AER")
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## package 'AER' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Lenovo\AppData\Local\Temp\Rtmp8OMm4v\downloaded_packages
library(AER)
## Warning: package 'AER' was built under R version 4.4.3
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
## The following object is masked from 'package:boot':
## 
##     logit
## The following object is masked from 'package:Jmisc':
## 
##     recode
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
## 
##     aml
# Estimate LATE using IV regression (2SLS)
LATE_model <- ivreg(earny4 ~ trainy1 | assignment, data = JC)

# Display regression results
summary(LATE_model)
## 
## Call:
## ivreg(formula = earny4 ~ trainy1 | assignment, data = JC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -221.23 -165.43  -22.55  100.01 2235.87 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  174.039      8.909  19.536  < 2e-16 ***
## trainy1       47.194     12.192   3.871 0.000109 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 195 on 9238 degrees of freedom
## Multiple R-Squared: -0.004797,   Adjusted R-squared: -0.004905 
## Wald test: 14.98 on 1 and 9238 DF,  p-value: 0.0001092
# Using "ivmodel"
install.packages("ivmodel")  
## Installing package into 'C:/Users/Lenovo/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4:
##   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/4.4/PACKAGES'
## package 'ivmodel' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Lenovo\AppData\Local\Temp\Rtmp8OMm4v\downloaded_packages
library(ivmodel)
## Warning: package 'ivmodel' was built under R version 4.4.3
# Estimate IV model
iv_result <- ivmodel(Y = JC$earny4, D = JC$trainy1, Z = JC$assignment)

# Display results
summary(iv_result)
## 
## Call:
## ivmodel(Y = JC$earny4, D = JC$trainy1, Z = JC$assignment)
## sample size: 9240
## _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
## 
## First Stage Regression Result:
## 
## F=1440.46, df1=1, df2=9238, p-value is < 2.22e-16
## R-squared=0.1348939,   Adjusted R-squared=0.1348003
## Residual standard error: 0.4214583 on 9239 degrees of freedom
## _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
## 
## Coefficients of k-Class Estimators:
## 
##              k Estimate Std. Error t value Pr(>|t|)    
## OLS     0.0000  14.2283     4.4649   3.187 0.001444 ** 
## Fuller  0.9999  47.1681    12.1881   3.870 0.000110 ***
## TSLS    1.0000  47.1945    12.1924   3.871 0.000109 ***
## LIML    1.0000  47.1945    12.1924   3.871 0.000109 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
## 
## Alternative tests for the treatment effect under H_0: beta=0.
## 
## Anderson-Rubin test (under F distribution):
## F=15.07956, df1=1, df2=9238, p-value is 0.00010379
## 95 percent confidence interval:
##  [23.3644072308337, 71.2284358822774]
## 
## Conditional Likelihood Ratio test (under Normal approximation):
## Test Stat=15.07956, p-value is 0.00010379
## 95 percent confidence interval:
##  [23.3644368880534, 71.2284057155105]

(3) Interpretation

ITT measures the effect of being assigned to treatment, regardless of whether participants actually took it up. Individuals randomly assigned to the Job Corps program earned, on average, $16.05 more per week in the fourth year compared to those assigned to the control group. This includes both compliers (who actually participated) and noncompliers (who didn’t take up the treatment).

LATE estimates the effect of treatment on compliers—those who actually participated only because they were assigned to treatment. For compliers, participating in Job Corps caused an average increase in weekly earnings of $47.19 in the fourth year. Since LATE is estimated using only the variation caused by random assignment, it isolates the true causal effect on those who took the program due to assignment.

Part iv. Replication and Extension

data <- read_dta("Card1995.dta")

#Create exp variable
data <- data %>%
  mutate(exp = age76 - ed76 - 6,
         exp_sq = exp^2/100) 

(a) Different estimations

(1) OLS Estimation

ols_model <- lm(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r, data = data)

summary(ols_model)
## 
## Call:
## lm(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r + 
##     smsa76r, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.59297 -0.22315  0.01893  0.24223  1.33190 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.733664   0.067603  70.022  < 2e-16 ***
## ed76         0.074009   0.003505  21.113  < 2e-16 ***
## exp          0.083596   0.006648  12.575  < 2e-16 ***
## exp_sq      -0.224088   0.031784  -7.050 2.21e-12 ***
## black       -0.189632   0.017627 -10.758  < 2e-16 ***
## reg76r      -0.124862   0.015118  -8.259  < 2e-16 ***
## smsa76r      0.161423   0.015573  10.365  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3742 on 3003 degrees of freedom
##   (603 observations deleted due to missingness)
## Multiple R-squared:  0.2905, Adjusted R-squared:  0.2891 
## F-statistic: 204.9 on 6 and 3003 DF,  p-value: < 2.2e-16

(2) IV Estimation

iv_model <- ivreg(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r |
                    nearc4 + exp + exp_sq + black + reg76r + smsa76r, data = data)
summary(iv_model)
## 
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r + 
##     smsa76r | nearc4 + exp + exp_sq + black + reg76r + smsa76r, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82125 -0.24065  0.02368  0.25469  1.43205 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.75278    0.82934   4.525 6.27e-06 ***
## ed76         0.13229    0.04923   2.687  0.00725 ** 
## exp          0.10750    0.02130   5.047 4.76e-07 ***
## exp_sq      -0.22841    0.03341  -6.836 9.84e-12 ***
## black       -0.13080    0.05287  -2.474  0.01342 *  
## reg76r      -0.10490    0.02307  -4.546 5.67e-06 ***
## smsa76r      0.13132    0.03013   4.359 1.35e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.391 on 3003 degrees of freedom
## Multiple R-Squared: 0.2252,  Adjusted R-squared: 0.2237 
## Wald test: 120.8 on 6 and 3003 DF,  p-value: < 2.2e-16

(3) First Stage

first_stage <- lm(ed76 ~ nearc4 + exp + exp_sq + black + reg76r + smsa76r, data = data)

summary(first_stage)
## 
## Call:
## lm(formula = ed76 ~ nearc4 + exp + exp_sq + black + reg76r + 
##     smsa76r, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6389 -1.4325 -0.1028  1.3268  6.2332 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.53964    0.16286 101.559  < 2e-16 ***
## nearc4       0.30628    0.07666   3.995 6.59e-05 ***
## exp         -0.35881    0.03040 -11.805  < 2e-16 ***
## exp_sq      -0.21620    0.14590  -1.482    0.138    
## black       -1.03873    0.08358 -12.428  < 2e-16 ***
## reg76r      -0.32964    0.07385  -4.464 8.29e-06 ***
## smsa76r      0.39091    0.07788   5.019 5.44e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.982 on 3606 degrees of freedom
## Multiple R-squared:  0.4813, Adjusted R-squared:  0.4805 
## F-statistic: 557.7 on 6 and 3606 DF,  p-value: < 2.2e-16

(4) Reduced Form

reduced_form <- lm(lwage76 ~ nearc4 + exp + exp_sq + black + reg76r + smsa76r, data = data)

summary(reduced_form)
## 
## Call:
## lm(formula = lwage76 ~ nearc4 + exp + exp_sq + black + reg76r + 
##     smsa76r, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.56525 -0.24771  0.01465  0.27091  1.38743 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.956604   0.036371 163.775  < 2e-16 ***
## nearc4       0.044624   0.017011   2.623  0.00876 ** 
## exp          0.053258   0.006948   7.666 2.38e-14 ***
## exp_sq      -0.218720   0.034021  -6.429 1.49e-10 ***
## black       -0.263903   0.018485 -14.277  < 2e-16 ***
## reg76r      -0.143458   0.016336  -8.782  < 2e-16 ***
## smsa76r      0.184752   0.017503  10.555  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4005 on 3003 degrees of freedom
##   (603 observations deleted due to missingness)
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1854 
## F-statistic: 115.2 on 6 and 3003 DF,  p-value: < 2.2e-16

Display Results

stargazer(ols_model, iv_model, first_stage, reduced_form,
          type = "text",
          column.labels = c("OLS", "2SLS (IV)", "First Stage", "Reduced Form"),
          title = "Card (1993) Replication Results",
          keep.stat = c("n", "rsq", "adj.rsq"))
## 
## Card (1993) Replication Results
## ============================================================
##                            Dependent variable:              
##              -----------------------------------------------
##                     lwage76            ed76       lwage76   
##                 OLS    instrumental     OLS         OLS     
##                          variable                           
##                 OLS     2SLS (IV)   First Stage Reduced Form
##                 (1)        (2)          (3)         (4)     
## ------------------------------------------------------------
## ed76         0.074***    0.132***                           
##               (0.004)    (0.049)                            
##                                                             
## nearc4                               0.306***     0.045***  
##                                       (0.077)     (0.017)   
##                                                             
## exp          0.084***    0.107***    -0.359***    0.053***  
##               (0.007)    (0.021)      (0.030)     (0.007)   
##                                                             
## exp_sq       -0.224***  -0.228***     -0.216     -0.219***  
##               (0.032)    (0.033)      (0.146)     (0.034)   
##                                                             
## black        -0.190***   -0.131**    -1.039***   -0.264***  
##               (0.018)    (0.053)      (0.084)     (0.018)   
##                                                             
## reg76r       -0.125***  -0.105***    -0.330***   -0.143***  
##               (0.015)    (0.023)      (0.074)     (0.016)   
##                                                             
## smsa76r      0.161***    0.131***    0.391***     0.185***  
##               (0.016)    (0.030)      (0.078)     (0.018)   
##                                                             
## Constant     4.734***    3.753***    16.540***    5.957***  
##               (0.068)    (0.829)      (0.163)     (0.036)   
##                                                             
## ------------------------------------------------------------
## Observations   3,010      3,010        3,613       3,010    
## R2             0.291      0.225        0.481       0.187    
## Adjusted R2    0.289      0.224        0.480       0.185    
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

(b) Using Proximity to Public and Private College as Instruments

(1) IV Estimation

iv_model2 <- ivreg(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r |
                     nearc4a + nearc4b + exp + exp_sq + black + reg76r + smsa76r, data = data)

summary(iv_model2)
## 
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r + 
##     smsa76r | nearc4a + nearc4b + exp + exp_sq + black + reg76r + 
##     smsa76r, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.93985 -0.25152  0.01722  0.27365  1.48154 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.26801    0.68718   4.756 2.07e-06 ***
## ed76         0.16109    0.04077   3.951 7.96e-05 ***
## exp          0.11931    0.01818   6.564 6.16e-11 ***
## exp_sq      -0.23054    0.03503  -6.582 5.46e-11 ***
## black       -0.10173    0.04531  -2.245   0.0248 *  
## reg76r      -0.09504    0.02165  -4.389 1.18e-05 ***
## smsa76r      0.11645    0.02705   4.305 1.73e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4108 on 3003 degrees of freedom
## Multiple R-Squared: 0.1447,  Adjusted R-squared: 0.143 
## Wald test:   111 on 6 and 3003 DF,  p-value: < 2.2e-16

(2) First Stage

first_stage2 <- lm(ed76 ~ nearc4a + nearc4b + exp + exp_sq + black + reg76r + smsa76r, data = data)

summary(first_stage2)
## 
## Call:
## lm(formula = ed76 ~ nearc4a + nearc4b + exp + exp_sq + black + 
##     reg76r + smsa76r, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7266 -1.4340 -0.0617  1.3095  6.2239 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.52964    0.16266 101.621  < 2e-16 ***
## nearc4a      0.39599    0.08119   4.877 1.12e-06 ***
## nearc4b      0.09648    0.09929   0.972    0.331    
## exp         -0.36039    0.03036 -11.872  < 2e-16 ***
## exp_sq      -0.20488    0.14573  -1.406    0.160    
## black       -1.03976    0.08347 -12.457  < 2e-16 ***
## reg76r      -0.30427    0.07414  -4.104 4.15e-05 ***
## smsa76r      0.38804    0.07778   4.989 6.35e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.979 on 3605 degrees of freedom
## Multiple R-squared:  0.4829, Adjusted R-squared:  0.4819 
## F-statistic:   481 on 7 and 3605 DF,  p-value: < 2.2e-16

(3) Reduced Form

reduced_form2 <- lm(lwage76 ~ nearc4a + nearc4b + exp + exp_sq + black + reg76r + smsa76r, data = data)

summary(reduced_form2)
## 
## Call:
## lm(formula = lwage76 ~ nearc4a + nearc4b + exp + exp_sq + black + 
##     reg76r + smsa76r, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.58218 -0.24720  0.01594  0.26671  1.36660 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.9562112  0.0363139 164.020  < 2e-16 ***
## nearc4a      0.0640062  0.0180132   3.553 0.000386 ***
## nearc4b     -0.0001056  0.0219131  -0.005 0.996155    
## exp          0.0525799  0.0069398   7.577 4.70e-14 ***
## exp_sq      -0.2146431  0.0339914  -6.315 3.11e-10 ***
## black       -0.2639382  0.0184556 -14.301  < 2e-16 ***
## reg76r      -0.1383836  0.0163857  -8.445  < 2e-16 ***
## smsa76r      0.1838926  0.0174780  10.521  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3999 on 3002 degrees of freedom
##   (603 observations deleted due to missingness)
## Multiple R-squared:  0.1899, Adjusted R-squared:  0.188 
## F-statistic: 100.5 on 7 and 3002 DF,  p-value: < 2.2e-16

Display all models in one formatted table

stargazer(iv_model2, first_stage2, reduced_form2,
          type = "text",
          column.labels = c("IV Model", "First Stage", "Reduced Form"),
          title = "IV Estimation Using Public and Private College Proximity",
          keep.stat = c("n", "rsq", "adj.rsq"))
## 
## IV Estimation Using Public and Private College Proximity
## ==================================================
##                       Dependent variable:         
##              -------------------------------------
##                lwage76       ed76       lwage76   
##              instrumental     OLS         OLS     
##                variable                           
##                IV Model   First Stage Reduced Form
##                  (1)          (2)         (3)     
## --------------------------------------------------
## ed76           0.161***                           
##                (0.041)                            
##                                                   
## nearc4a                    0.396***     0.064***  
##                             (0.081)     (0.018)   
##                                                   
## nearc4b                      0.096      -0.0001   
##                             (0.099)     (0.022)   
##                                                   
## exp            0.119***    -0.360***    0.053***  
##                (0.018)      (0.030)     (0.007)   
##                                                   
## exp_sq        -0.231***     -0.205     -0.215***  
##                (0.035)      (0.146)     (0.034)   
##                                                   
## black          -0.102**    -1.040***   -0.264***  
##                (0.045)      (0.083)     (0.018)   
##                                                   
## reg76r        -0.095***    -0.304***   -0.138***  
##                (0.022)      (0.074)     (0.016)   
##                                                   
## smsa76r        0.116***    0.388***     0.184***  
##                (0.027)      (0.078)     (0.017)   
##                                                   
## Constant       3.268***    16.530***    5.956***  
##                (0.687)      (0.163)     (0.036)   
##                                                   
## --------------------------------------------------
## Observations    3,010        3,613       3,010    
## R2              0.145        0.483       0.190    
## Adjusted R2     0.143        0.482       0.188    
## ==================================================
## Note:                  *p<0.1; **p<0.05; ***p<0.01

(c) Endogeneity Discussion

(1) Endogeneity of Education and Experience

Yes, education is endogenous in the wage equation since it is argued that educational levels are not random in the population. They are determined by individual choices of schooling and income over time. Therefore, to control for that endogeneity, they introduce geographic distance to college.

Here, experience is not endogenous since it is calculated using observed variables and does not have any measurement error or choice based aspect.

(2) Create Interactions

data$int1 <- data$nearc4a * data$age76
data$int2 <- data$nearc4a * (data$age76^2 / 100)

(3) Estimation

iv_structural <- ivreg(lwage76 ~ ed76 + exp + exp_sq + black + reg76r + smsa76r |
                         nearc4a + nearc4b + int1 + int2 + black + reg76r + smsa76r, data = data)

summary(iv_structural)
## 
## Call:
## ivreg(formula = lwage76 ~ ed76 + exp + exp_sq + black + reg76r + 
##     smsa76r | nearc4a + nearc4b + int1 + int2 + black + reg76r + 
##     smsa76r, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.10210 -0.28407  0.02576  0.29699  1.49721 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.91429    0.46315   8.451  < 2e-16 ***
## ed76         0.16815    0.04408   3.815 0.000139 ***
## exp         -0.02323    0.04422  -0.525 0.599389    
## exp_sq       0.32689    0.22106   1.479 0.139313    
## black       -0.04971    0.07055  -0.705 0.481085    
## reg76r      -0.08832    0.02731  -3.234 0.001234 ** 
## smsa76r      0.08118    0.04578   1.773 0.076320 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4568 on 3003 degrees of freedom
## Multiple R-Squared: -0.0575, Adjusted R-squared: -0.05961 
## Wald test: 105.3 on 6 and 3003 DF,  p-value: < 2.2e-16

The results from previous estimation where IV are only used for education is 0.16109, meaning that education increases weekly log earnings by 16%. Whereas, the results from using IV for both education and experience (and square of experience) is similar at 0.168 or 16.8% which is very close to the earlier value. This indicates that that experience is not endogenous in the set up.