For this Problem Set we will replicate the well-known paper that uses IV: “Using Geographic Variation in College Proximity to Estimate the Return to Schooling” by David Card. In this paper, Card investigates the economic return to schooling and uses college proximity as an instrumental variable.
#1)
CollegeDistance %>%
lm((log(wage)) ~ education + unemp + ethnicity + gender + urban , data = .) %>%
summary()
##
## Call:
## lm(formula = (log(wage)) ~ education + unemp + ethnicity + gender +
## urban, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39998 -0.08223 0.02833 0.09486 0.37945
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.0900860 0.0169208 123.522 <2e-16 ***
## education 0.0006723 0.0011121 0.605 0.5455
## unemp 0.0135938 0.0007203 18.874 <2e-16 ***
## ethnicityother 0.0619139 0.0055990 11.058 <2e-16 ***
## ethnicityhispanic 0.0083934 0.0066957 1.254 0.2101
## genderfemale -0.0091150 0.0039785 -2.291 0.0220 *
## urbanyes 0.0089393 0.0048005 1.862 0.0626 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1361 on 4732 degrees of freedom
## Multiple R-squared: 0.1026, Adjusted R-squared: 0.1015
## F-statistic: 90.2 on 6 and 4732 DF, p-value: < 2.2e-16
reg2 <- ivreg (log(wage) ~ education + unemp + ethnicity + gender + urban | distance + unemp + ethnicity + gender + urban, data= CollegeDistance)
summary(reg2)
##
## Call:
## ivreg(formula = log(wage) ~ education + unemp + ethnicity + gender +
## urban | distance + unemp + ethnicity + gender + urban, data = CollegeDistance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5885016 -0.1191974 -0.0001799 0.1452146 0.4576460
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1894167 0.1946072 6.112 1.06e-09 ***
## education 0.0673242 0.0143812 4.681 2.93e-06 ***
## unemp 0.0142234 0.0009648 14.743 < 2e-16 ***
## ethnicityother 0.0277621 0.0104342 2.661 0.00782 **
## ethnicityhispanic -0.0057422 0.0093844 -0.612 0.54064
## genderfemale -0.0076101 0.0052865 -1.440 0.15007
## urbanyes 0.0064494 0.0063892 1.009 0.31283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1805 on 4732 degrees of freedom
## Multiple R-Squared: -0.5786, Adjusted R-squared: -0.5806
## Wald test: 54.89 on 6 and 4732 DF, p-value: < 2.2e-16
The goal here is to look at the economic return of having schooling years. To do so, we used college proximity as an instrumental variable. That acts as a variable that does not explain the movements in the natural log of wages (y variable), but that is having a causal effect on education (x variable). I do think that it is a valid instrumental variable for this model as it seems that it is related to education, but I don’t think it is related to the unobservables or to wages directly. However, the validityof such use of college distance as an IV is somewhat challenged by the developments of online degrees and virtual learning. In a world where not all the population may move as freely as it wants, we might observe that college distance is not anymore a variable affecting which college one chooses, and ultimately one’s wages We could already mention the MOOCS of widely recognized universities that attract people and that probably has an impact on their wages.