Introduction

The purpose of this assignment is for you to get hands on experience running and interpretting regressions. We will use questions from the textbook to provide some structure.

Format

Please turn in a hard copy of your script and a separate document with your answers to the questions. For questions that ask you to estimate a model, copy/paste a summary table of your model using stargazer (see below for more details).

Grading

This assignment will be graded on a \(\checkmark\)/\(\checkmark +\) basis. Completing the assignment gets you a \(\checkmark\) (worth 85%) and getting the hardest part right gets you a \(\checkmark +\) (worth 100%). Incomplete work is worth 0%.

The deadline is the beginning of class next Tuesday (Oct 24) or by email prior to the class. (Late submissions will be docked 20 points.) I would give you more time, but this assignment is fairly short.

The Assignment

Step 0: set up your workspace

Breathe deeply, brew some coffee, and create a new project with its own folder named “HW2” (or whatever you want to call it). Type Ctrl + Shift + n to open up a new script and save it (name it something creative like “script.R”… something you’ll remember).

Step 1: gather the data

From the Wooldridge textbook data find the datasets below:

  • CEOSAL2
  • ATTEND
  • MEAP93

Copy them into your working directory (the folder on your computer R is operating in). Each question below will use a different dataset

load("ceosal2.RData")
# load("attend.RData") # no point loading this now... it will just overwrite 
# load("meap93.RData") # the other one.

Side note: R operates in a specific folder mostly because of how the unix operating system worked long ago when R was invented. If you have a Mac, you’re technically using a unix operating system. Some R commands also have unix roots like ls() which lists the contents of a unix directory or your R workspace and rm() which removes files (unix) or objects in your workspace (R).

Step 2: answer the questions

C3. The file ceosal2.RData contains data on 177 chief executive officers and can be used to examine the effects of firm performance on CEO salary.

  1. Estimate a model relating annual salary to firm sales and market value. Make the model of the constant elasticity (i.e. log-log) variety for both independent variables.
  2. Add \(profits\) to the model from part (i). Why can this variable not be included in logarithmic form? Would you say that these firm performance variables explain most of the variation in CEO salaries?
  3. Add the variable \(ceoten\) to the model in part (ii). What is the estimated percentage return for another year of CEO tenure, holding other factors fixed?
  4. Find the sample correlation coefficient between the variables and profits. Are these variables highly correlated? What does this say about the OLS estimators?

Example code:

load("ceosal2.RData")
m3.1 <- lm(lsalary ~ comten + lsales + lmktval, data)
m3.2 <- lm(lsalary ~ comten + lsales + lmktval + profits, data)
m3.3 <- lm(lsalary ~ comten + comtensq + lsales + lmktval + profits, data)
data %>% select(profits,lsalary,lsales,lmktval,comten) %>% cor() 
          profits      lsalary    lsales   lmktval       comten
profits 1.0000000  0.396694779 0.6063325 0.7768976  0.143737237
lsalary 0.3966948  1.000000000 0.5299602 0.4814910 -0.002314525
lsales  0.6063325  0.529960173 1.0000000 0.7359232  0.237819855
lmktval 0.7768976  0.481490997 0.7359232 1.0000000  0.101931416
comten  0.1437372 -0.002314525 0.2378199 0.1019314  1.000000000
stargazer(m3.1,m3.2,m3.3,type = "text")

===========================================================================================
                                              Dependent variable:                          
                    -----------------------------------------------------------------------
                                                    lsalary                                
                              (1)                     (2)                     (3)          
-------------------------------------------------------------------------------------------
comten                      -0.006*                 -0.006*                 -0.001         
                            (0.003)                 (0.003)                 (0.012)        
                                                                                           
comtensq                                                                    -0.0001        
                                                                           (0.0003)        
                                                                                           
lsales                     0.180***                0.180***                0.177***        
                            (0.041)                 (0.041)                 (0.041)        
                                                                                           
lmktval                     0.096*                   0.081                   0.081         
                            (0.050)                 (0.064)                 (0.064)        
                                                                                           
profits                                             0.0001                  0.0001         
                                                   (0.0002)                (0.0002)        
                                                                                           
Constant                   4.701***                4.814***                4.784***        
                            (0.256)                 (0.383)                 (0.389)        
                                                                                           
-------------------------------------------------------------------------------------------
Observations                  177                     177                     177          
R2                           0.313                   0.314                   0.314         
Adjusted R2                  0.301                   0.298                   0.294         
Residual Std. Error    0.507 (df = 173)        0.508 (df = 172)        0.509 (df = 171)    
F Statistic         26.274*** (df = 3; 173) 19.649*** (df = 4; 172) 15.688*** (df = 5; 171)
===========================================================================================
Note:                                                           *p<0.1; **p<0.05; ***p<0.01

C4. Use the data in attend.RData for this exercise.

  1. Obtain the minimum, maximum, and average values for the variables \(atndrte\), \(priGPA\), and \(ACT\).
  2. Estimate the model: \[atndrte = \beta_0 + \beta_1priGPA + \beta_2ACT + u,\] Interpret the intercept. Does it have a useful meaning?
  3. Discuss the estimated slope coefficients. Are there any surprises?
  4. What is the predicted \(atndrte\) if \(priGPA = 3.65\) and \(ACT = 20\)? What do you make of this result? Are there any students in the sample with these values of the explanatory variables?
  5. If Student A has \(priGPA = 3.1\) and \(ACT = 21\) and Student B has \(priGPA = 2.1\) and \(ACT = 26\), what is the predicted difference in their attendance rates?

C7. Use the data in MEAP93 to answer this question.

  1. Estimate the model \[ math10 = \beta_0 + \beta_1 lexpend + \beta_2 lnchprg + u,\] and report the results, including the sample size and R-squared. Are the signs of the slope coefficients what you expected? Explain.
  2. What do you make of the intercept you estimated in part (i)? In particular, does it make sense to set the two explanatory variables to zero? [Hint: Recall that \(log(1) = 0\).]
  3. Now run the simple regression of math10 on \(log(expend)\), and compare the slope coefficient with the estimate obtained in part (i). Is the estimated spending effect now larger or smaller than in part (i)?
  4. Find the correlation between \(lexpend\) (\(log(expend)\)) and \(lnchprg\). Does its sign make sense to you?
  5. Use part (iv) to explain your findings in part (iii).

Demonstration: making regression tables with stargazer

Often you’ll want to report results for multiple models. We’re going to use a new package: stargazer to format our regression results into a table. Having a table will make it easier to make direct comparisons of models–remember: the goal isn’t just to do some statistical voodoo to find out what’s happening in the world; the goal is to communicate to some audience.

Stargazer also has the nice feature of creating easy tables of summary statistics. Running the code stargazer(data, median = TRUE, flip = TRUE, type = "html") generated this table:

Statistic N Mean St. Dev. Min Median Max
lnchprg 408 25.201 13.610 1.400 23.850 79.500
enroll 408 2,663.806 2,696.821 212 1,840.5 16,793
staff 408 100.642 13.300 65.900 99.000 166.600
expend 408 4,376.578 775.790 3,332 4,145 7,419
salary 408 31,774.510 5,038.304 19,764 31,266 52,812
benefits 408 6,463.429 1,456.338 0 6,304.5 11,618
droprate 408 5.066 5.485 0.000 3.700 61.900
gradrate 408 83.652 13.368 23.500 86.300 127.100
math10 408 24.107 10.494 1.900 23.400 66.700
sci11 408 49.183 12.525 7.200 49.100 85.700
totcomp 408 38,237.940 5,985.086 24,498 37,443.5 63,518
ltotcomp 408 10.540 0.151 10.106 10.531 11.059
lexpend 408 8.370 0.162 8.111 8.330 8.912
lenroll 408 7.510 0.867 5.357 7.518 9.729
lstaff 408 4.603 0.127 4.188 4.595 5.116
bensal 408 0.205 0.038 0.000 0.202 0.450
lsalary 408 10.354 0.154 9.892 10.350 10.874

(See below for more on working with html table in non-html format.)

Using the package is simple. Just enter the name(s) of the model(s) you want a table for, and set the type of output you want.

You have three output options: text (like you saw above), html (which we’ll discuss more below), and \(\LaTeX\). The default is \(\LaTeX\), a typesetting program that is popular among academics (It’s pronounced “Lay-Teck” or “Lah-Teck”). If you take the output of stargazer() and compile it as \(\LaTeX\) code you’ll get very pretty output (pdf), but that requires figuring out \(\LaTeX\).

# install.packages("stargazer") # uncomment this line if not already installed
library(stargazer, quietly = TRUE) # load the package so we can use it.
fit1 <- lm(y1 ~ x1, anscombe) # make a linear model and name it `fit1`
fit2 <- lm(y2 ~ x2, anscombe) # etc.
fit3 <- lm(y3 ~ x3, anscombe)
fit4 <- lm(y4 ~ x4, anscombe)
stargazer(fit1, fit2, fit3, fit4) # create a table using the default output option

% Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Mon, Oct 24, 2016 - 11:06:02 AM
\begin{table}[!htbp] \centering 
  \caption{} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lcccc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{4}{c}{\textit{Dependent variable:}} \\ 
\cline{2-5} 
\\[-1.8ex] & y1 & y2 & y3 & y4 \\ 
\\[-1.8ex] & (1) & (2) & (3) & (4)\\ 
\hline \\[-1.8ex] 
 x1 & 0.500$^{***}$ &  &  &  \\ 
  & (0.118) &  &  &  \\ 
  & & & & \\ 
 x2 &  & 0.500$^{***}$ &  &  \\ 
  &  & (0.118) &  &  \\ 
  & & & & \\ 
 x3 &  &  & 0.500$^{***}$ &  \\ 
  &  &  & (0.118) &  \\ 
  & & & & \\ 
 x4 &  &  &  & 0.500$^{***}$ \\ 
  &  &  &  & (0.118) \\ 
  & & & & \\ 
 Constant & 3.000$^{**}$ & 3.001$^{**}$ & 3.002$^{**}$ & 3.002$^{**}$ \\ 
  & (1.125) & (1.125) & (1.124) & (1.124) \\ 
  & & & & \\ 
\hline \\[-1.8ex] 
Observations & 11 & 11 & 11 & 11 \\ 
R$^{2}$ & 0.667 & 0.666 & 0.666 & 0.667 \\ 
Adjusted R$^{2}$ & 0.629 & 0.629 & 0.629 & 0.630 \\ 
Residual Std. Error (df = 9) & 1.237 & 1.237 & 1.236 & 1.236 \\ 
F Statistic (df = 1; 9) & 17.990$^{***}$ & 17.966$^{***}$ & 17.972$^{***}$ & 18.003$^{***}$ \\ 
\hline 
\hline \\[-1.8ex] 
\textit{Note:}  & \multicolumn{4}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
\end{tabular} 
\end{table} 

See this pdf to see what that mess looks like when compiled with \(\LaTeX\).

Text output is useful for comparing multiple models within R, but since it’s in plain text, it doesn’t look as professional as we’d want for a presentation (that means you, ECO 490W students). That said, text format is acceptable for this assignment.

The text table works because it’s being printed in a monospace font: every character is exactly as wide as the others, so 10 spaces are exactly as wide as 10 dashes, 10 letters, etc. This is also true of the usual summary(fit) output in R. If you paste it into a Word document, make sure it’s in a monospace font like Courier New or it will be very difficult to read. (Hint: if you want a \(\checkmark +\) then be sure your output is easily readable.)

stargazer(fit1, fit2, fit3, fit4,type = "text")

====================================================================
                                       Dependent variable:          
                             ---------------------------------------
                                y1        y2        y3        y4    
                                (1)       (2)       (3)       (4)   
--------------------------------------------------------------------
x1                           0.500***                               
                              (0.118)                               
                                                                    
x2                                     0.500***                     
                                        (0.118)                     
                                                                    
x3                                               0.500***           
                                                  (0.118)           
                                                                    
x4                                                         0.500*** 
                                                            (0.118) 
                                                                    
Constant                      3.000**   3.001**   3.002**   3.002** 
                              (1.125)   (1.125)   (1.124)   (1.124) 
                                                                    
--------------------------------------------------------------------
Observations                    11        11        11        11    
R2                             0.667     0.666     0.666     0.667  
Adjusted R2                    0.629     0.629     0.629     0.630  
Residual Std. Error (df = 9)   1.237     1.237     1.236     1.236  
F Statistic (df = 1; 9)      17.990*** 17.966*** 17.972*** 18.003***
====================================================================
Note:                                    *p<0.1; **p<0.05; ***p<0.01

With a bit of legwork we can use the html output to create a nice looking table that we could put into a professional presentation (e.g. a paper for ECO 490W). We’ll take the html code from stargazer(), have our browser compile it, then cut and paste that version of the table into Excel (or LibreOffice) where we can manually adjust how the table looks.

stargazer(fit1, fit2, fit3, fit4, type = "html")

<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="4"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="4" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>y1</td><td>y2</td><td>y3</td><td>y4</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">x1</td><td>0.500<sup>***</sup></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td>(0.118)</td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x2</td><td></td><td>0.500<sup>***</sup></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.118)</td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x3</td><td></td><td></td><td>0.500<sup>***</sup></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(0.118)</td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x4</td><td></td><td></td><td></td><td>0.500<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td>(0.118)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>3.000<sup>**</sup></td><td>3.001<sup>**</sup></td><td>3.002<sup>**</sup></td><td>3.002<sup>**</sup></td></tr>
<tr><td style="text-align:left"></td><td>(1.125)</td><td>(1.125)</td><td>(1.124)</td><td>(1.124)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>11</td><td>11</td><td>11</td><td>11</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.667</td><td>0.666</td><td>0.666</td><td>0.667</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.629</td><td>0.629</td><td>0.629</td><td>0.630</td></tr>
<tr><td style="text-align:left">Residual Std. Error (df = 9)</td><td>1.237</td><td>1.237</td><td>1.236</td><td>1.236</td></tr>
<tr><td style="text-align:left">F Statistic (df = 1; 9)</td><td>17.990<sup>***</sup></td><td>17.966<sup>***</sup></td><td>17.972<sup>***</sup></td><td>18.003<sup>***</sup></td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="4" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>

You can copy/paste that html code into an html interpreter like this one. Just delete everything on the left side of the page and paste in the html code. Hit the green “Run” button and you’ll have a decent looking table.

Click anywhere in the right side pane and type Ctrl + a (a for “select all”), then Ctrl + c (c for “copy”). Now open Excel, click anywhere and type Ctrl + v (v for “paste”… p was already taken by “print”… still, it makes more sense than most of the English language). You should have something that looks like this:

Now you can modify your table as you see fit, and finally copy/paste it into Word (or whatever word processor you’re using). Alternatively, you could just use the html code to present your table on a webpage. The html version looks like this:

Dependent variable:
y1 y2 y3 y4
(1) (2) (3) (4)
x1 0.500***
(0.118)
x2 0.500***
(0.118)
x3 0.500***
(0.118)
x4 0.500***
(0.118)
Constant 3.000** 3.001** 3.002** 3.002**
(1.125) (1.125) (1.124) (1.124)
Observations 11 11 11 11
R2 0.667 0.666 0.666 0.667
Adjusted R2 0.629 0.629 0.629 0.630
Residual Std. Error (df = 9) 1.237 1.237 1.236 1.236
F Statistic (df = 1; 9) 17.990*** 17.966*** 17.972*** 18.003***
Note: *p<0.1; **p<0.05; ***p<0.01