Problem Set 4

This will be an empirical problem set examining the cps dataset that we have ofen referred to in class. You may be reading this in a pdf file, which was created using RMarkdown. The RMarkdown file used to create this file is posted on BruinLearn and contains code boxes to help you get started. You may want to load the RMarkdown file in RStudio and work on it directly to obtain your answers and display your code.

To get started: (i) Clear the workspace, (ii) Load the PoEdata, and (iii) Import the cps dataset. The description of all the variables contained in the cps dataset can be found at the following website: http://www.principlesofeconometrics.com/poe4/data/def/cps.def

You should only submit the knitted html or pdf file. Submitting the rmd file is not required.

Questions

(25 pts) Consider a basic model in which we regress wages on education in a model \[{\rm wage} = \beta_1 + \beta_2\text{educ} + \epsilon.\]
- What is the description of the variable wage in the data? earning per hour
- What is the description of the variable educ? years of education
- Estimate the model by linear regression, what are the estimates $b_1$ and $b_2$? $b_1$ = -5.20260 $b_2$ = 1.15692

## 
## Call:
## lm(formula = wage ~ educ, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.282  -3.728  -1.188   2.382  63.088 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.20260    0.46549  -11.18   <2e-16 ***
## educ         1.15692    0.03446   33.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.585 on 4731 degrees of freedom
## Multiple R-squared:  0.1924, Adjusted R-squared:  0.1923 
## F-statistic:  1127 on 1 and 4731 DF,  p-value: < 2.2e-16

## (Intercept)        educ 
##   -5.202605    1.156924

(25 pts) We want to examine whether expected wages depend on education in a different manner for men and women. To this end, let female = 1 for women, female = 0 otherwise, and consider the model: \[{\rm wage} = \beta_1 + \delta_1 \text{female} + \beta_2 \text{educ} + \delta_2 \text{female}\times \text{educ} + \epsilon\]
- What are the linear regression estimates for $\delta_1$ and $\delta_2$? What do these estimates say about the differences in how expected wage conditional on education evolve differently for men and women? $\delta_1 =-3.31073$: women are predicted to earn about 3.31 dollars less per hour than men when education = 0. $\delta_2 = 0.06091$: means the return to one more year of education is about 0.061 dollars higher for women than for men. This implies that both men’s and women’s wages increase with education, but the estimated wage-education slope is slightly steeper for women
- Use an F-test to examine the null hypothesis that expected wages conditional on education are the same for mean and women. What is the p-value for your test? (Note: This test is known as a Chow-test). What does the p-value tell you that you could not conclude from the estimates of $\delta_1$ and $\delta_2$ you obtained in the previous part? The null hypothesis, delta_1 and delta_2 = 0, says men and women have the same expected wage conditional on education. The p-value is essentially zero, so you reject the null hypothesis that the wage-education relationship is the same for men and women.

## 
## Call:
## lm(formula = wage ~ educ, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.282  -3.728  -1.188   2.382  63.088 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.20260    0.46549  -11.18   <2e-16 ***
## educ         1.15692    0.03446   33.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.585 on 4731 degrees of freedom
## Multiple R-squared:  0.1924, Adjusted R-squared:  0.1923 
## F-statistic:  1127 on 1 and 4731 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = wage ~ female * educ, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.446  -3.372  -1.064   2.256  64.139 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.85895    0.60441  -6.385 1.88e-10 ***
## female      -3.31073    0.91506  -3.618   0.0003 ***
## educ         1.14693    0.04492  25.533  < 2e-16 ***
## female:educ  0.06091    0.06769   0.900   0.3683    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.444 on 4729 degrees of freedom
## Multiple R-squared:  0.233,  Adjusted R-squared:  0.2325 
## F-statistic: 478.8 on 3 and 4729 DF,  p-value: < 2.2e-16

## [1] 124.9239

## [1] 0

(25 pts) In this question we will graphically examine whether we should be concerned about the homoskedasticity assumption not holding.
- Make a plot with wages on the Y axis and education on the X axis using only observations for women. Clearly label the axes. (Hint: See RMarkdown file for a hint on how to extract observations for women only.)
- Redo the plot using observations only for men.
- What do the graphs suggest about the homoskedasticity assumption? In both graphs, wages appear to rise with education. However, the vertical spread of wages seems to get wider at higher levels of education. This suggests that the variance of the error term may not be constant. Therefore, the graphs suggest there may be heteroskedasticity, so the homoskedasticity assumption may not hold.

(25 pts) For simplicity we next drop the female variable and examine the role of experience in wages.
- What is the name of the variable in the cps dataset that stores the experience data? exper, unit-in years
- Consider a simple model in which we suppose that ${\rm wage} = \beta_1 + \beta_2 {\rm educ} + \beta_3 {\rm experience} + \epsilon$. What does the estimate for $\beta_3$ say about how an additional year of experience affects expected wages? Does the impact depend on the level of experience or education? $\beta_3 = 0.1223$ holding education constant, one additional year of experience is associated with about $0.122 higher expected hourly wage. the effect does not depend on the level of experience or education.
- We want to examine whether the linear specification is adequate or we should be using a more complex function of education and experience. Let $\widehat{\rm wage} = b_1 + b_2 {\rm education} + b_3 {\rm experience}$ be the fitted value from your regression. Next run the regression ${\rm wage} = \beta_1 + \beta_2 {\rm education} + \beta_3 {\rm experience} + \gamma (\widehat{\rm wage})^2 + \epsilon$ What is your estimate for $\gamma$? $\gamma$ = 0.047269
- What is the p-value for the null hypothesis that $\gamma = 0$ against the alternative that $\gamma\neq 0$? What does this suggest about whether we should include be using a more complicated functional form than the linear model? (Note: This is called a RESET test). The p-value is around 0, so we reject the null hypothesis. This suggests that the simple linear model is misspecified and that a more complicated functional form should be considered.

## 
## Call:
## lm(formula = wage ~ educ + exper, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.428  -3.338  -1.011   2.262  60.235 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.69571    0.49321  -17.63   <2e-16 ***
## educ         1.24449    0.03377   36.86   <2e-16 ***
## exper        0.12230    0.00698   17.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.412 on 4730 degrees of freedom
## Multiple R-squared:  0.2417, Adjusted R-squared:  0.2413 
## F-statistic: 753.7 on 2 and 4730 DF,  p-value: < 2.2e-16

## (Intercept)        educ       exper 
##  -8.6957107   1.2444864   0.1222984

## 
## Call:
## lm(formula = wage ~ educ + exper + I(wage_hat^2), data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.867  -3.216  -0.972   2.152  57.537 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.316953   1.482531   2.912  0.00361 ** 
## educ          0.036315   0.134189   0.271  0.78669    
## exper         0.002138   0.014659   0.146  0.88405    
## I(wage_hat^2) 0.047269   0.005084   9.297  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.364 on 4729 degrees of freedom
## Multiple R-squared:  0.2553, Adjusted R-squared:  0.2548 
## F-statistic: 540.3 on 3 and 4729 DF,  p-value: < 2.2e-16

##   (Intercept)          educ         exper I(wage_hat^2) 
##   4.316953219   0.036315385   0.002138014   0.047268739

Problem Set 4

Econ 103: Introduction to Econometrics

Due Date: 03/12/2026

Questions