library(AER)
library(ggplot2)
library(dplyr)
data("PSID1976")
set.seed(2020)Labor Econ: HW4
You should replicate my results in Quarto or Rmd. You should also state your answers clearly when being asked.
When submitting your HW, you should submit a ZIP file containing both the qmd (or Rmd) source and the generated HTML file.
We’ll be using the dataset PSID1976 from package AER. PSID1976 contains cross-section data originating from the 1976 Panel Study of Income Dynamics (PSID).
You may want to view documentation of the dataset by running ?PSID1976. It’s from the paper Mroz (1987) published in Econometrica.
Part I
- Use
summary()function to show the summary of the datasetPSID1976.
participation hours youngkids oldkids
no :325 Min. : 0.0 Min. :0.0000 Min. :0.000
yes:428 1st Qu.: 0.0 1st Qu.:0.0000 1st Qu.:0.000
Median : 288.0 Median :0.0000 Median :1.000
Mean : 740.6 Mean :0.2377 Mean :1.353
3rd Qu.:1516.0 3rd Qu.:0.0000 3rd Qu.:2.000
Max. :4950.0 Max. :3.0000 Max. :8.000
age education wage repwage hhours
Min. :30.00 Min. : 5.00 Min. : 0.000 Min. :0.00 Min. : 175
1st Qu.:36.00 1st Qu.:12.00 1st Qu.: 0.000 1st Qu.:0.00 1st Qu.:1928
Median :43.00 Median :12.00 Median : 1.625 Median :0.00 Median :2164
Mean :42.54 Mean :12.29 Mean : 2.375 Mean :1.85 Mean :2267
3rd Qu.:49.00 3rd Qu.:13.00 3rd Qu.: 3.788 3rd Qu.:3.58 3rd Qu.:2553
Max. :60.00 Max. :17.00 Max. :25.000 Max. :9.98 Max. :5010
hage heducation hwage fincome
Min. :30.00 Min. : 3.00 Min. : 0.4121 Min. : 1500
1st Qu.:38.00 1st Qu.:11.00 1st Qu.: 4.7883 1st Qu.:15428
Median :46.00 Median :12.00 Median : 6.9758 Median :20880
Mean :45.12 Mean :12.49 Mean : 7.4822 Mean :23081
3rd Qu.:52.00 3rd Qu.:15.00 3rd Qu.: 9.1667 3rd Qu.:28200
Max. :60.00 Max. :17.00 Max. :40.5090 Max. :96000
tax meducation feducation unemp city
Min. :0.4415 Min. : 0.000 Min. : 0.000 Min. : 3.000 no :269
1st Qu.:0.6215 1st Qu.: 7.000 1st Qu.: 7.000 1st Qu.: 7.500 yes:484
Median :0.6915 Median :10.000 Median : 7.000 Median : 7.500
Mean :0.6789 Mean : 9.251 Mean : 8.809 Mean : 8.624
3rd Qu.:0.7215 3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:11.000
Max. :0.9415 Max. :17.000 Max. :17.000 Max. :14.000
experience college hcollege
Min. : 0.00 no :541 no :458
1st Qu.: 4.00 yes:212 yes:295
Median : 9.00
Mean :10.63
3rd Qu.:15.00
Max. :45.00
Regress
log(wage)oneducationusinglm(). You will get an error. Why? Explain using theparticipationvariable.Regress
log(wage)oneducationand state the estimated return to another year of education for women that participated in the labor force.
(Intercept) education
-0.1851968 0.1086487
- Plot
log(wage)againsteducationalong with the fitted regression line above. (Note: here I useggplot2for plotting. You may use Base Rplot()if you like)
- Regress
educationonfeducation(father’s years of education) and comment on the regression table. Specifically, if we usefeducationas the IV, will it satisfy the relevance restriction? How about the as-good-as-random assignment and exclusion restrictions?
Call:
lm(formula = education ~ feducation, data = subset(PSID1976,
participation == "yes"))
Residuals:
Min 1Q Median 3Q Max
-8.4704 -1.1231 -0.1231 0.9546 5.9546
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.23705 0.27594 37.099 <2e-16 ***
feducation 0.26944 0.02859 9.426 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.081 on 426 degrees of freedom
Multiple R-squared: 0.1726, Adjusted R-squared: 0.1706
F-statistic: 88.84 on 1 and 426 DF, p-value: < 2.2e-16
- Use
feducationas an IV foreducationin estimating the effect of schooling years on log(wage).
Call:
ivreg(formula = log(wage) ~ education | feducation, data = subset(PSID1976,
participation == "yes"))
Residuals:
Min 1Q Median 3Q Max
-3.0870 -0.3393 0.0525 0.4042 2.0677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.44110 0.44610 0.989 0.3233
education 0.05917 0.03514 1.684 0.0929 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6894 on 426 degrees of freedom
Multiple R-Squared: 0.09344, Adjusted R-squared: 0.09131
Wald test: 2.835 on 1 and 426 DF, p-value: 0.09294
- Repeat Q6 using mother’s education (
meducation) as the IV and comment on the results.
Call:
ivreg(formula = log(wage) ~ education | meducation, data = subset(PSID1976,
participation == "yes"))
Residuals:
Min 1Q Median 3Q Max
-3.14184 -0.34291 0.05939 0.39750 2.05410
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.70217 0.48510 1.447 0.148
education 0.03855 0.03823 1.008 0.314
Residual standard error: 0.6987 on 426 degrees of freedom
Multiple R-Squared: 0.06881, Adjusted R-squared: 0.06663
Wald test: 1.017 on 1 and 426 DF, p-value: 0.3138
Part II
- Create a data frame
dffor all married women that were employed. This data frame will be used for the following exercises. (That is, you should filter the dataset using the conditionparticipation=="yes". I preferdplyr::tibble()when dealing with data frames. You can usedata.frame()if you like.)
# A tibble: 428 × 21
participation hours youngkids oldkids age education wage repwage hhours
<fct> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
1 yes 1610 1 0 32 12 3.35 2.65 2708
2 yes 1656 0 2 30 12 1.39 2.65 2310
3 yes 1980 1 3 35 12 4.55 4.04 3072
4 yes 456 0 3 34 12 1.10 3.25 1920
5 yes 1568 1 2 31 14 4.59 3.6 2000
6 yes 2032 0 0 54 12 4.74 4.7 1040
7 yes 1440 0 2 37 16 8.33 5.95 2670
8 yes 1020 0 0 54 12 7.84 9.98 4120
9 yes 1458 0 2 48 12 2.13 0 1995
10 yes 1600 0 2 39 12 4.69 4.15 2100
# … with 418 more rows, and 12 more variables: hage <int>, heducation <int>,
# hwage <dbl>, fincome <int>, tax <dbl>, meducation <int>, feducation <int>,
# unemp <dbl>, city <fct>, experience <int>, college <fct>, hcollege <fct>
- Regress
log(wage)oneducation,experienceandexperience^2. What’s the OLS estimate for return to education?
(Intercept) education experience I(experience^2)
-0.5220405591 0.1074896390 0.0415665105 -0.0008111931
- Regress
educationonexperience,experience^2,feducationandmeducation. Comment on your results.
(Intercept) experience I(experience^2) feducation meducation
9.102640110 0.045225423 -0.001009091 0.189548410 0.157597033
- Use
ivreg()to estimate the return to education using bothfeducationandmeducationas IV.
(Intercept) education experience I(experience^2)
0.0481003046 0.0613966279 0.0441703943 -0.0008989696
- Regress
log(wage)oneducation,experience,experience^2andresidualsfrom the model estimated in Q3. Use your result to test for the endogeneity of education. Can you conclude that education is endogenous?
Call:
lm(formula = log(wage) ~ education + experience + I(experience^2) +
residuals(model3), data = df)
Residuals:
Min 1Q Median 3Q Max
-3.03743 -0.30775 0.04191 0.40361 2.33303
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0481003 0.3945753 0.122 0.903033
education 0.0613966 0.0309849 1.981 0.048182 *
experience 0.0441704 0.0132394 3.336 0.000924 ***
I(experience^2) -0.0008990 0.0003959 -2.271 0.023672 *
residuals(model3) 0.0581666 0.0348073 1.671 0.095441 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.665 on 423 degrees of freedom
Multiple R-squared: 0.1624, Adjusted R-squared: 0.1544
F-statistic: 20.5 on 4 and 423 DF, p-value: 1.888e-15
- Compare your answers in Q2 and Q4. Which estimate is higher? Why?
model_ols$coefficients (Intercept) education experience I(experience^2)
-0.5220405591 0.1074896390 0.0415665105 -0.0008111931
model_iv3$coefficients (Intercept) education experience I(experience^2)
0.0481003046 0.0613966279 0.0441703943 -0.0008989696