This will be an empirical problem set examining the cps dataset that we have ofen referred to in class. You may be reading this in a pdf file, which was created using RMarkdown. The RMarkdown file used to create this file is posted on BruinLearn and contains code boxes to help you get started. You may want to load the RMarkdown file in RStudio and work on it directly to obtain your answers and display your code.
To get started: (i) Clear the workspace, (ii) Load the
PoEdata, and (iii) Import the cps dataset. The
description of all the variables contained in the cps
dataset can be found at the following website: http://www.principlesofeconometrics.com/poe4/data/def/cps.def
You should only submit the knitted html or pdf file. Submitting the rmd file is not required.
# Tell R to clear the workspace
rm(list = ls())
# Tell R to load the PoEdata library
library(PoEdata)
# Tell R to load the cps dataset
data(cps)
(25 pts) Consider a basic model in which we regress wages on education in a model \[{\rm wage} = \beta_1 + \beta_2\text{educ} + \epsilon.\]
What is the description of the variable wage in the
data? earning per hour
What is the description of the variable educ? years
of education
Estimate the model by linear regression, what are the estimates \(b_1\) and \(b_2\)? \(b_1\) = -5.20260 \(b_2\) = 1.15692
##
## Call:
## lm(formula = wage ~ educ, data = cps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.282 -3.728 -1.188 2.382 63.088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.20260 0.46549 -11.18 <2e-16 ***
## educ 1.15692 0.03446 33.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.585 on 4731 degrees of freedom
## Multiple R-squared: 0.1924, Adjusted R-squared: 0.1923
## F-statistic: 1127 on 1 and 4731 DF, p-value: < 2.2e-16
## (Intercept) educ
## -5.202605 1.156924
(25 pts) We want to examine whether expected wages depend on
education in a different manner for men and women. To this end, let
female = 1 for women, female = 0 otherwise,
and consider the model: \[{\rm wage} =
\beta_1 + \delta_1 \text{female} + \beta_2 \text{educ} + \delta_2
\text{female}\times \text{educ} + \epsilon\]
What are the linear regression estimates for \(\delta_1\) and \(\delta_2\)? What do these estimates say about the differences in how expected wage conditional on education evolve differently for men and women? \(\delta_1 =-3.31073\): women are predicted to earn about 3.31 dollars less per hour than men when education = 0. \(\delta_2 = 0.06091\): means the return to one more year of education is about 0.061 dollars higher for women than for men. This implies that both men’s and women’s wages increase with education, but the estimated wage-education slope is slightly steeper for women
Use an F-test to examine the null hypothesis that expected wages conditional on education are the same for mean and women. What is the p-value for your test? (Note: This test is known as a Chow-test). What does the p-value tell you that you could not conclude from the estimates of \(\delta_1\) and \(\delta_2\) you obtained in the previous part? The null hypothesis, delta_1 and delta_2 = 0, says men and women have the same expected wage conditional on education. The p-value is essentially zero, so you reject the null hypothesis that the wage-education relationship is the same for men and women.
##
## Call:
## lm(formula = wage ~ educ, data = cps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.282 -3.728 -1.188 2.382 63.088
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.20260 0.46549 -11.18 <2e-16 ***
## educ 1.15692 0.03446 33.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.585 on 4731 degrees of freedom
## Multiple R-squared: 0.1924, Adjusted R-squared: 0.1923
## F-statistic: 1127 on 1 and 4731 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = wage ~ female * educ, data = cps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.446 -3.372 -1.064 2.256 64.139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.85895 0.60441 -6.385 1.88e-10 ***
## female -3.31073 0.91506 -3.618 0.0003 ***
## educ 1.14693 0.04492 25.533 < 2e-16 ***
## female:educ 0.06091 0.06769 0.900 0.3683
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.444 on 4729 degrees of freedom
## Multiple R-squared: 0.233, Adjusted R-squared: 0.2325
## F-statistic: 478.8 on 3 and 4729 DF, p-value: < 2.2e-16
## [1] 124.9239
## [1] 0
(25 pts) In this question we will graphically examine whether we should be concerned about the homoskedasticity assumption not holding.
Make a plot with wages on the Y axis and education on the X axis using only observations for women. Clearly label the axes. (Hint: See RMarkdown file for a hint on how to extract observations for women only.)
Redo the plot using observations only for men.
What do the graphs suggest about the homoskedasticity assumption? In both graphs, wages appear to rise with education. However, the vertical spread of wages seems to get wider at higher levels of education. This suggests that the variance of the error term may not be constant. Therefore, the graphs suggest there may be heteroskedasticity, so the homoskedasticity assumption may not hold.
(25 pts) For simplicity we next drop the female variable and examine the role of experience in wages.
What is the name of the variable in the cps dataset that stores the experience data? exper, unit-in years
Consider a simple model in which we suppose that \({\rm wage} = \beta_1 + \beta_2 {\rm educ} + \beta_3 {\rm experience} + \epsilon\). What does the estimate for \(\beta_3\) say about how an additional year of experience affects expected wages? Does the impact depend on the level of experience or education? \(\beta_3 = 0.1223\) holding education constant, one additional year of experience is associated with about $0.122 higher expected hourly wage. the effect does not depend on the level of experience or education.
We want to examine whether the linear specification is adequate or we should be using a more complex function of education and experience. Let \(\widehat{\rm wage} = b_1 + b_2 {\rm education} + b_3 {\rm experience}\) be the fitted value from your regression. Next run the regression \({\rm wage} = \beta_1 + \beta_2 {\rm education} + \beta_3 {\rm experience} + \gamma (\widehat{\rm wage})^2 + \epsilon\) What is your estimate for \(\gamma\)? \(\gamma\) = 0.047269
What is the p-value for the null hypothesis that \(\gamma = 0\) against the alternative that \(\gamma\neq 0\)? What does this suggest about whether we should include be using a more complicated functional form than the linear model? (Note: This is called a RESET test). The p-value is around 0, so we reject the null hypothesis. This suggests that the simple linear model is misspecified and that a more complicated functional form should be considered.
##
## Call:
## lm(formula = wage ~ educ + exper, data = cps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.428 -3.338 -1.011 2.262 60.235
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.69571 0.49321 -17.63 <2e-16 ***
## educ 1.24449 0.03377 36.86 <2e-16 ***
## exper 0.12230 0.00698 17.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.412 on 4730 degrees of freedom
## Multiple R-squared: 0.2417, Adjusted R-squared: 0.2413
## F-statistic: 753.7 on 2 and 4730 DF, p-value: < 2.2e-16
## (Intercept) educ exper
## -8.6957107 1.2444864 0.1222984
##
## Call:
## lm(formula = wage ~ educ + exper + I(wage_hat^2), data = cps)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.867 -3.216 -0.972 2.152 57.537
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.316953 1.482531 2.912 0.00361 **
## educ 0.036315 0.134189 0.271 0.78669
## exper 0.002138 0.014659 0.146 0.88405
## I(wage_hat^2) 0.047269 0.005084 9.297 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.364 on 4729 degrees of freedom
## Multiple R-squared: 0.2553, Adjusted R-squared: 0.2548
## F-statistic: 540.3 on 3 and 4729 DF, p-value: < 2.2e-16
## (Intercept) educ exper I(wage_hat^2)
## 4.316953219 0.036315385 0.002138014 0.047268739