Solution:
To consider variable as an omitted variable it should satisfy the following conditions: \(Z\) should be a determinant of \(Y\) and also \(corr(Z, X)\neq 0\).
Gender could be considered as a determinant of income and if it is correlated with years of education then we can consider gender as an omitted variable. Basically, whether it is correlated depends on the time period that that we are observing. Because for people, who were born before ~60’s it most likely will be correlated because in those times there was a generational gap in education but for people born after 60’s we will not probably see much correlation.
First letter of the name seems to be most likely not an omitted variable. Because it is difficult to imagine how the first letter of the name might be correlated with the years of education and also how it can be a determinant of income.
Native ability is possibly an omitted variable. People with better native ability tend to earn more and also they tend to spend more years of their life on education because it is not difficult for them but has a positive effect on their expected earnings.
Solution:
We randomly choose children and assign them to different treatment groups, each group has the assigned number of years of education. So after that these children are transferred to the schools where they obtain the education on the exact level that was randomly decided at the beginning. They are prohibited to get more or less years of education if they are willing to. The people from the treatment groups are being observed during their life until they retire and their income levels are being tracked. Having got the data we do linear regression of earnings on the years of education and if it is linear regression then the estimator of the effect of the years of education equals the OLS estimator of the coefficient in the regression that is made using the dataset that we got from the experiment.
1) Regression 1:
reg1 <- lm(ahe ~ yrseduc, data = cps99_ps1)
summary(reg1)
Call:
lm(formula = ahe ~ yrseduc, data = cps99_ps1)
Residuals:
Min 1Q Median 3Q Max
-18.415 -5.079 -1.234 3.575 38.670
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.60905 0.65816 -3.964 7.5e-05 ***
yrseduc 1.32186 0.04807 27.499 < 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.329 on 3779 degrees of freedom
Multiple R-squared: 0.1667, Adjusted R-squared: 0.1665
F-statistic: 756.2 on 1 and 3779 DF, p-value: < 2.2e-16
coeftest(reg1, vcov = vcovHC(reg1, type = "HC1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.609050 0.647177 -4.0314 5.653e-05 ***
yrseduc 1.321859 0.050043 26.4145 < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2) Regression 2:
reg2 <- lm(ahe ~ yrseduc + female, data = cps99_ps1)
summary(reg2)
Call:
lm(formula = ahe ~ yrseduc + female, data = cps99_ps1)
Residuals:
Min 1Q Median 3Q Max
-19.296 -4.818 -1.089 3.662 37.209
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.38396 0.64615 -2.142 0.0323 *
yrseduc 1.34147 0.04681 28.659 <2e-16 ***
female -3.39600 0.23388 -14.520 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.133 on 3778 degrees of freedom
Multiple R-squared: 0.2108, Adjusted R-squared: 0.2104
F-statistic: 504.5 on 2 and 3778 DF, p-value: < 2.2e-16
coeftest(reg2, vcov = vcovHC(reg2, type = "HC1"))
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.383958 0.630403 -2.1954 0.0282 *
yrseduc 1.341470 0.048854 27.4587 <2e-16 ***
female -3.395995 0.228019 -14.8935 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Solution: coefficient on yrseduc is \(1.32186\) which means that if all other parameters are same, one additional year at school increases predicted earnings for 1.34$ per hour.
Solution: according to t-statistic the t value on a female is \(-14.520 < -1.96\) so we reject the null hypothesis at a 5% interval. The hypothesis that we rejected states that there is no gap in earnings if we consider male and female workers with the same number of educational years and that additional year of education increases the earnings of each worker equally.
Solution:
with(cps99_ps1, cor(yrseduc, female))
[1] 0.02885538