This will be an empirical problem set examining the cps dataset that we have ofen referred to in class. You may be reading this in a pdf file, which was created using RMarkdown. The RMarkdown file used to create this file is posted on CCLE and contains code boxes to help you get started. You may want to load the RMarkdown file in RStudio and work on it directly to obtain your answers and display your code.

To get started: (i) Clear the workspace, (ii) Load the PoEdata, and (iii) Import the cps dataset. The description of all the variables contained in the cps dataset can be found at the following website: http://www.principlesofeconometrics.com/poe4/data/def/cps.def

# Tell R to clear the workspace
rm(list = ls())

# Tell R to load the PoEdata library
cps_url <- "https://raw.githubusercontent.com/ccolonescu/PoEdata/master/data/cps.rda"
load(url(cps_url))

Questions

## 
## Call:
## lm(formula = wage ~ educ, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.282  -3.728  -1.188   2.382  63.088 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.20260    0.46549  -11.18   <2e-16 ***
## educ         1.15692    0.03446   33.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.585 on 4731 degrees of freedom
## Multiple R-squared:  0.1924, Adjusted R-squared:  0.1923 
## F-statistic:  1127 on 1 and 4731 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = wage ~ educ + female + educ:female, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.446  -3.372  -1.064   2.256  64.139 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.85895    0.60441  -6.385 1.88e-10 ***
## educ         1.14693    0.04492  25.533  < 2e-16 ***
## female      -3.31073    0.91506  -3.618   0.0003 ***
## educ:female  0.06091    0.06769   0.900   0.3683    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.444 on 4729 degrees of freedom
## Multiple R-squared:  0.233,  Adjusted R-squared:  0.2325 
## F-statistic: 478.8 on 3 and 4729 DF,  p-value: < 2.2e-16
##    female 
## -3.310733
## educ:female 
##  0.06090984
## [1] 124.9239
## [1] 0

**+ Redo the plot using observations only for men.

**+ What do the graphs suggest about the homoskedasticity assumption?

The scatter plots for both men and women show that wages start off pretty clustered when education is low, but the spread gets a lot wider as education increases. Since the variation in wages grows with education, this points to heteroskedasticity and suggests that the homoskedasticity assumption probably does not hold. People with more years of schooling tend to have a much bigger range of possible wages.

## 
## Call:
## lm(formula = wage ~ educ + exper, data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.428  -3.338  -1.011   2.262  60.235 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.69571    0.49321  -17.63   <2e-16 ***
## educ         1.24449    0.03377   36.86   <2e-16 ***
## exper        0.12230    0.00698   17.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.412 on 4730 degrees of freedom
## Multiple R-squared:  0.2417, Adjusted R-squared:  0.2413 
## F-statistic: 753.7 on 2 and 4730 DF,  p-value: < 2.2e-16
##     exper 
## 0.1222984
## [1]  8.561796 10.295476  6.152118  9.007986 14.194238  8.928692
## 
## Call:
## lm(formula = wage ~ educ + exper + I(wagehat^2), data = cps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.867  -3.216  -0.972   2.152  57.537 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.316953   1.482531   2.912  0.00361 ** 
## educ         0.036315   0.134189   0.271  0.78669    
## exper        0.002138   0.014659   0.146  0.88405    
## I(wagehat^2) 0.047269   0.005084   9.297  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.364 on 4729 degrees of freedom
## Multiple R-squared:  0.2553, Adjusted R-squared:  0.2548 
## F-statistic: 540.3 on 3 and 4729 DF,  p-value: < 2.2e-16
## I(wagehat^2) 
##   0.04726874
## [1] 2.148954e-20