Introduction

The purpose of this assignment is for you to get more hands on experience running and interpretting regressions, and doing F-tests for joint significance (hint: the command anova(mod.u, mod.r) will run that F test… check lecture slides for more details).

Format

Please turn in a hard copy of your script and a separate document with your answers to the questions. For questions that ask you to estimate a model, copy/paste a summary table of your model using stargazer.

Grading

This assignment will be graded on a \(\checkmark\)/\(\checkmark +\) basis. Completing the assignment gets you a \(\checkmark\) (worth 85%) and getting the hardest part right gets you a \(\checkmark +\) (worth 100%). Incomplete work is worth 0%.

The deadline is the beginning of class next Thursday (Nov 10) or by email prior to the class. (Late submissions will be docked 20 points.)

The Assignment

Step 0: set up your workspace

Breathe deeply, brew some coffee, and create a new project with its own folder named “HW4” (or whatever you want to call it). Type Ctrl + Shift + n to open up a new script and save it (name it something creative like “script.R”… something you’ll remember).

Step 1: gather the data

From the Wooldridge textbook data find the datasets below:

  • WAGE1
  • WAGE2
  • HPRICE1

Copy them into your working directory (the folder on your computer R is operating in). Each question below will use a different dataset. You don’t have to do this, but I’m going to copy each dataset into a new object instead of using Wooldridge’s default object data.

load("wage1.RData")
wage <- data
wage.desc <- desc
load("wage2.RData")
wage2 <- data
wage2.desc <- desc
load("gpa2.RData")
gpa <- data
gpa.desc <- desc

Step 2: answer the questions

We’re doing questions C1 and C2 from chapter 5 and C2 and C3 from chapter 6

C1. Use the wage1 data.

  1. Estimate the equation \[wage = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure + u\] save the residuals and plot a histogram. (hint: If your model is named m1.1, then this line of code will make the histogram qplot(m1.1$residuals.)

  2. Repeat part (i) using \(log(wage)\) as the dependent variable.

  3. Which model looks like it satisfies assumption 6 (normality of the residuals) better?

C2. Use the gpa2 data

  1. Using all 4,137 observations, estimate the model \[colgpa = \beta_0 + \beta_1 hsperc + \beta_2 sat + u\] and report the results
  2. Repeat (i) using only the first 2,070 observations. (hint: data = gpa[1:2070,] in your call to lm() will accomplish this.)
  3. Find the ratio of the standard errors on \(hsperc\) from parts (i) and (ii). Compare this result from equation 5.10 in the textbook which states:

\[se(\hat{\beta_j}) \approx c_j/\sqrt{n}\],

where \(c_j\) is a constant that doesn’t matter for the ratio you’re calculating since it will cancel out.

Chapter 6 questions

C2. Use the wage1 data

  1. Use OLS to estimate the model \[log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 exper^2 + u \] and report the results
  2. Is \(exper^2\) statistically significant at the 1% level?
  3. Using the approximation \[\%\Delta\hat{wage} \approx 100(\hat{\beta_2} + 2\hat{\beta_3}exper)\Delta exper\], find the approximate return to the fifth year of experience. Repeat for the 12th year of experience.

C3. (with modification) Use the wage2 data

  1. Estimate the model \[log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 educ \times exper + u \] and report the results.

  2. Re-estimate the model in part (i) using demeaned data (hint: data = mutate_each(wage,funs(scale(.,scale=F)))). Report the results along side the original model using stargazer. What values changed? What values didn’t change? (e.g. did the \(R^2\) change? Did the intercept change? etc.)

  3. What is the effect of an additional year of experience for someone with an average amount of education?