Introduction

The purpose of this assignment is for you to get more hands on experience running and interpretting regressions, and doing F-tests for joint significance (hint: the command anova(mod.u, mod.r) will run that F test… check lecture slides for more details).

Format

Please turn in a hard copy of your script and a separate document with your answers to the questions. For questions that ask you to estimate a model, copy/paste a summary table of your model using stargazer.

Grading

This assignment will be graded on a \(\checkmark\)/\(\checkmark +\) basis. Completing the assignment gets you a \(\checkmark\) (worth 85%) and getting the hardest part right gets you a \(\checkmark +\) (worth 100%). Incomplete work is worth 0%.

The deadline is the beginning of class next Thursday (Nov 3) or by email prior to the class. (Late submissions will be docked 20 points.)

The Assignment

Step 0: set up your workspace

Breathe deeply, brew some coffee, and create a new project with its own folder named “HW2” (or whatever you want to call it). Type Ctrl + Shift + n to open up a new script and save it (name it something creative like “script.R”… something you’ll remember).

Step 1: gather the data

From the Wooldridge textbook data find the datasets below:

  • LAWSCH85
  • HPRICE1
  • MLB1

Copy them into your working directory (the folder on your computer R is operating in). Each question below will use a different dataset. You don’t have to do this, but I’m going to copy each dataset into a new object instead of using Wooldridge’s default object data.

load("lawsch85.RData")
law <- data
law.desc <- desc
load("hprice1.RData")
hprice <- data
hprice.desc <- desc
load("mlb1.RData")
mlb <- data
mlb.desc <- desc

Step 2: answer the questions

From chapter 4 of the textbook, answer questions C2, C3

C2. Use the data in LAWSCH85 for this exercise.

  1. Using the model below, state and test the null hypothesis that the rank of law schools has no ceteris paribus effect on median starting salary.

\[log(salary) = \beta_0 + \beta_1 LSAT + \beta_2 GPA + \beta_3 log(libvol) + \beta_4 log(cost) + \beta_5 rank + u\]

  1. Are features of the incoming class of students—namely, LSAT and GPA—individually or jointly significant for explaining salary?
  2. Test whether the size of the entering class (clsize) or the size of the faculty (faculty) needs to be added to this equation; carry out a single test.
  3. What factors might influence the rank of the law school that are not included in the salary regression?

C3. Refer to Computer Exercise C2 in Chapter 3. Now, use the log of the housing price as the dependent variable:

\[log(price) = \beta_0 + \beta_1 sqrft + \beta_2 bdrms + u\]

  1. You are interested in estimating and obtaining a confidence interval for the percentage change in price when a 150-square-foot bedroom is added to a house. In decimal form, this is \(\theta_1 = 150\beta_1 + \beta_2\) . Use the data in HPRICE1 to estimate \(\theta_1\).
  2. Write \(\beta_2\) in terms of \(\theta_1\) and \(\beta_1\) and plug this into the \(log(price)\) equation. (i.e. You’re doing some algebra.)
  3. Use part (ii) to obtain a standard error for and use this standard error to construct a 95% confidence interval.

C5.

Use the data in MLB1 for this exercise.

  1. Estimate the model below. Compared to the results from equation 4.31 in the text (included below for your convenience) what happens to the statistical significance of hrunsyr? What about the size of the coefficient on hrunsyr?

\[log(salary) = \beta_0 + \beta_1 years + \beta_2 gamesyr + \beta_3 bavg + \beta_4 hrunsyr + u\]

fit5.0 <- lm(lsalary ~ years + gamesyr + bavg + hrunsyr + rbisyr, mlb)
stargazer(fit5.0,type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                              lsalary          
-----------------------------------------------
years                        0.069***          
                              (0.012)          
                                               
gamesyr                      0.013***          
                              (0.003)          
                                               
bavg                           0.001           
                              (0.001)          
                                               
hrunsyr                        0.014           
                              (0.016)          
                                               
rbisyr                         0.011           
                              (0.007)          
                                               
Constant                     11.192***         
                              (0.289)          
                                               
-----------------------------------------------
Observations                    353            
R2                             0.628           
Adjusted R2                    0.622           
Residual Std. Error      0.727 (df = 347)      
F Statistic          117.060*** (df = 5; 347)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01
  1. Add the variables runsyr (runs per year), fldperc (fielding percentage), and sbasesyr (stolen bases per year) to the model from part (i). Which of these factors are individually significant?
  2. In the model from part (ii), test the joint significance of bavg, fldperc, and sbasesyr.