The purpose of this assignment is for you to get more hands on experience running and interpretting regressions, and doing F-tests for joint significance (hint: the command anova(mod.u, mod.r)
will run that F test… check lecture slides for more details).
Please turn in a hard copy of your script and a separate document with your answers to the questions. For questions that ask you to estimate a model, copy/paste a summary table of your model using stargazer
.
This assignment will be graded on a \(\checkmark\)/\(\checkmark +\) basis. Completing the assignment gets you a \(\checkmark\) (worth 85%) and getting the hardest part right gets you a \(\checkmark +\) (worth 100%). Incomplete work is worth 0%.
The deadline is the beginning of class next Thursday (Nov 10) or by email prior to the class. (Late submissions will be docked 20 points.)
Breathe deeply, brew some coffee, and create a new project with its own folder named “HW4” (or whatever you want to call it). Type Ctrl + Shift + n
to open up a new script and save it (name it something creative like “script.R”… something you’ll remember).
From the Wooldridge textbook data find the datasets below:
Copy them into your working directory (the folder on your computer R is operating in). Each question below will use a different dataset. You don’t have to do this, but I’m going to copy each dataset into a new object instead of using Wooldridge’s default object data
.
load("wage1.RData")
wage <- data
wage.desc <- desc
load("wage2.RData")
wage2 <- data
wage2.desc <- desc
load("gpa2.RData")
gpa <- data
gpa.desc <- desc
We’re doing questions C1 and C2 from chapter 5 and C2 and C3 from chapter 6
C1. Use the wage1 data.
Estimate the equation \[wage = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure + u\] save the residuals and plot a histogram. (hint: If your model is named m1.1, then this line of code will make the histogram qplot(m1.1$residuals
.)
Repeat part (i) using \(log(wage)\) as the dependent variable.
Which model looks like it satisfies assumption 6 (normality of the residuals) better?
C2. Use the gpa2 data
data = gpa[1:2070,]
in your call to lm()
will accomplish this.)\[se(\hat{\beta_j}) \approx c_j/\sqrt{n}\],
where \(c_j\) is a constant that doesn’t matter for the ratio you’re calculating since it will cancel out.
C2. Use the wage1 data
C3. (with modification) Use the wage2 data
Estimate the model \[log(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 educ \times exper + u \] and report the results.
Re-estimate the model in part (i) using demeaned data (hint: data = mutate_each(wage,funs(scale(.,scale=F)))
). Report the results along side the original model using stargazer
. What values changed? What values didn’t change? (e.g. did the \(R^2\) change? Did the intercept change? etc.)
What is the effect of an additional year of experience for someone with an average amount of education?