Piromalli - Problem Set 4

For this Problem Set we will replicate the well-known paper that uses IV: “Using Geographic Variation in College Proximity to Estimate the Return to Schooling” by David Card. In this paper, Card investigates the economic return to schooling and uses college proximity as an instrumental variable.

  1. Regress the natural log of wages on education, unemp, ethnicity, gender and urban.

#1)

CollegeDistance %>%
  lm((log(wage)) ~ education + unemp + ethnicity + gender + urban , data = .) %>%
  summary()
## 
## Call:
## lm(formula = (log(wage)) ~ education + unemp + ethnicity + gender + 
##     urban, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39998 -0.08223  0.02833  0.09486  0.37945 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.0900860  0.0169208 123.522   <2e-16 ***
## education          0.0006723  0.0011121   0.605   0.5455    
## unemp              0.0135938  0.0007203  18.874   <2e-16 ***
## ethnicityother     0.0619139  0.0055990  11.058   <2e-16 ***
## ethnicityhispanic  0.0083934  0.0066957   1.254   0.2101    
## genderfemale      -0.0091150  0.0039785  -2.291   0.0220 *  
## urbanyes           0.0089393  0.0048005   1.862   0.0626 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1361 on 4732 degrees of freedom
## Multiple R-squared:  0.1026, Adjusted R-squared:  0.1015 
## F-statistic:  90.2 on 6 and 4732 DF,  p-value: < 2.2e-16
  1. Using ivreg() command, calculate question 1 using 2SLS and distance as an IV for education.
reg2 <- ivreg (log(wage) ~ education + unemp + ethnicity + gender + urban | distance + unemp + ethnicity + gender + urban, data= CollegeDistance)
summary(reg2)
## 
## Call:
## ivreg(formula = log(wage) ~ education + unemp + ethnicity + gender + 
##     urban | distance + unemp + ethnicity + gender + urban, data = CollegeDistance)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.5885016 -0.1191974 -0.0001799  0.1452146  0.4576460 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.1894167  0.1946072   6.112 1.06e-09 ***
## education          0.0673242  0.0143812   4.681 2.93e-06 ***
## unemp              0.0142234  0.0009648  14.743  < 2e-16 ***
## ethnicityother     0.0277621  0.0104342   2.661  0.00782 ** 
## ethnicityhispanic -0.0057422  0.0093844  -0.612  0.54064    
## genderfemale      -0.0076101  0.0052865  -1.440  0.15007    
## urbanyes           0.0064494  0.0063892   1.009  0.31283    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1805 on 4732 degrees of freedom
## Multiple R-Squared: -0.5786, Adjusted R-squared: -0.5806 
## Wald test: 54.89 on 6 and 4732 DF,  p-value: < 2.2e-16
  1. What is the logic behind using college distance as an IV? Do you consider it to be a valid IV for this model?

The goal here is to look at the economic return of having schooling years. To do so, we used college proximity as an instrumental variable. That acts as a variable that does not explain the movements in the natural log of wages (y variable), but that is having a causal effect on education (x variable). I do think that it is a valid instrumental variable for this model as it seems that it is related to education, but I don’t think it is related to the unobservables or to wages directly. However, the validityof such use of college distance as an IV is somewhat challenged by the developments of online degrees and virtual learning. In a world where not all the population may move as freely as it wants, we might observe that college distance is not anymore a variable affecting which college one chooses, and ultimately one’s wages We could already mention the MOOCS of widely recognized universities that attract people and that probably has an impact on their wages.