#1)
reg1 <- ivreg (log(wage) ~ education + unemp + ethnicity + gender + urban | distance + unemp + ethnicity + gender + urban, data= CollegeDistance)
summary(reg1)
##
## Call:
## ivreg(formula = log(wage) ~ education + unemp + ethnicity + gender +
## urban | distance + unemp + ethnicity + gender + urban, data = CollegeDistance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5885016 -0.1191974 -0.0001799 0.1452146 0.4576460
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1894167 0.1946072 6.112 1.06e-09 ***
## education 0.0673242 0.0143812 4.681 2.93e-06 ***
## unemp 0.0142234 0.0009648 14.743 < 2e-16 ***
## ethnicityother 0.0277621 0.0104342 2.661 0.00782 **
## ethnicityhispanic -0.0057422 0.0093844 -0.612 0.54064
## genderfemale -0.0076101 0.0052865 -1.440 0.15007
## urbanyes 0.0064494 0.0063892 1.009 0.31283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1805 on 4732 degrees of freedom
## Multiple R-Squared: -0.5786, Adjusted R-squared: -0.5806
## Wald test: 54.89 on 6 and 4732 DF, p-value: < 2.2e-16
reg2 <- ivreg (log(wage) ~ education + unemp + ethnicity + gender + urban | fcollege + unemp + ethnicity + gender + urban, data= CollegeDistance)
summary(reg2)
##
## Call:
## ivreg(formula = log(wage) ~ education + unemp + ethnicity + gender +
## urban | fcollege + unemp + ethnicity + gender + urban, data = CollegeDistance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40412 -0.08475 0.02403 0.09649 0.37608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.0065381 0.0559559 35.859 <2e-16 ***
## education 0.0068550 0.0041004 1.672 0.0946 .
## unemp 0.0136522 0.0007236 18.868 <2e-16 ***
## ethnicityother 0.0587459 0.0059700 9.840 <2e-16 ***
## ethnicityhispanic 0.0070822 0.0067694 1.046 0.2955
## genderfemale -0.0089754 0.0039924 -2.248 0.0246 *
## urbanyes 0.0087083 0.0048184 1.807 0.0708 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1365 on 4732 degrees of freedom
## Multiple R-Squared: 0.09677, Adjusted R-squared: 0.09562
## Wald test: 90.02 on 6 and 4732 DF, p-value: < 2.2e-16
The goal here is to look at the economic return of having schooling years. To do so, we either used college proximity as an instrumental variable (with distance as an IV) or whether the father went to college or not (fcollege as an IV). Those act as variables that do not explain the movements in the natural log of wages (y variable), but that are having a causal effect on education (x variable).
I do think that both are valid instrumental variables for this model as it seems that both are related to education while probably not related to the unobservables or to wage directly.
However, I do think that fcollege is doing a better job at being an IV than distance. Let’s see why by comparing it with distance:
First, both are relevant as they are correlated with x, in this context education. Indeed, whether the father went to college or not seems as important, if not more, than the distance to the closest 4-year college in 10 miles when considering the correlation with education. In today’s world, where more and more students are going to a place far from home to study (sometimes even going abroad), it seems that while distance is a potential IV candidate, it might have limitations. While fcollege is not perfect either, it seems more likely for one to go to college if one’s father attended college himself (it’d be interesting here to look at the proportion of students going to college because it is in in their immediate surroundings as opposed to further away, and compare that to the students who are going to college when their father attended college too. I suspect the latter to be much higher than the former).
Second, the exogenous or exclusion restriction. It seems that both are uncorrelated to unobservables (obviously, both are imperfect once again as in some precise cases they might be)
Third, randomly assigned. Surprisingly, I would think that whether or not your father went to college is “more” random than distance. You don’t chose either of them as a student, but the probability that your father went to college still depends on more factors than the distance of your house to the closest 4-year college in 10 miles. Here, to make this ordinal comparison, I follow this rationale:
if numerous factors are going to affect your decision in both cases, the one variable that happens first in time will have even more factors affecting it and therefore will be “more” random. Because more often than not, your father got (or not) a college degree before having his (and yours) home/appartment next to a college (or not), the former is “more” random than the latter for the student.
For those reasons, I would think that fcollege is more suitable as an IV for edcuation in this context.