Computer Assignment #2: Running your first regression!

Introduction

The purpose of this assignment is for you to get hands on experience running and interpretting regressions. We will use questions from the textbook to provide some structure.

Format

Please turn in a hard copy of your script and a separate document with your answers to the questions. For questions that ask you to estimate a model, copy/paste a summary table of your model using stargazer (see below for more details).

Grading

This assignment will be graded on a \(\checkmark\)/\(\checkmark +\) basis. Completing the assignment gets you a \(\checkmark\) (worth 85%) and getting the hardest part right gets you a \(\checkmark +\) (worth 100%). Incomplete work is worth 0%.

The deadline is the beginning of class next Tuesday (Oct 24) or by email prior to the class. (Late submissions will be docked 20 points.) I would give you more time, but this assignment is fairly short.

The Assignment

Step 0: set up your workspace

Breathe deeply, brew some coffee, and create a new project with its own folder named “HW2” (or whatever you want to call it). Type Ctrl + Shift + n to open up a new script and save it (name it something creative like “script.R”… something you’ll remember).

Step 1: gather the data

From the Wooldridge textbook data find the datasets below:

CEOSAL2
ATTEND
MEAP93

Copy them into your working directory (the folder on your computer R is operating in). Each question below will use a different dataset

load("ceosal2.RData")
# load("attend.RData") # no point loading this now... it will just overwrite 
# load("meap93.RData") # the other one.

Side note: R operates in a specific folder mostly because of how the unix operating system worked long ago when R was invented. If you have a Mac, you’re technically using a unix operating system. Some R commands also have unix roots like ls() which lists the contents of a unix directory or your R workspace and rm() which removes files (unix) or objects in your workspace (R).

Step 2: answer the questions

C3. The file ceosal2.RData contains data on 177 chief executive officers and can be used to examine the effects of firm performance on CEO salary.

Estimate a model relating annual salary to firm sales and market value. Make the model of the constant elasticity (i.e. log-log) variety for both independent variables.
Add \(profits\) to the model from part (i). Why can this variable not be included in logarithmic form? Would you say that these firm performance variables explain most of the variation in CEO salaries?
Add the variable \(ceoten\) to the model in part (ii). What is the estimated percentage return for another year of CEO tenure, holding other factors fixed?
Find the sample correlation coefficient between the variables and profits. Are these variables highly correlated? What does this say about the OLS estimators?

Example code:

load("ceosal2.RData")
m3.1 <- lm(lsalary ~ comten + lsales + lmktval, data)
m3.2 <- lm(lsalary ~ comten + lsales + lmktval + profits, data)
m3.3 <- lm(lsalary ~ comten + comtensq + lsales + lmktval + profits, data)
data %>% select(profits,lsalary,lsales,lmktval,comten) %>% cor()

          profits      lsalary    lsales   lmktval       comten
profits 1.0000000  0.396694779 0.6063325 0.7768976  0.143737237
lsalary 0.3966948  1.000000000 0.5299602 0.4814910 -0.002314525
lsales  0.6063325  0.529960173 1.0000000 0.7359232  0.237819855
lmktval 0.7768976  0.481490997 0.7359232 1.0000000  0.101931416
comten  0.1437372 -0.002314525 0.2378199 0.1019314  1.000000000

stargazer(m3.1,m3.2,m3.3,type = "text")


===========================================================================================
                                              Dependent variable:                          
                    -----------------------------------------------------------------------
                                                    lsalary                                
                              (1)                     (2)                     (3)          
-------------------------------------------------------------------------------------------
comten                      -0.006*                 -0.006*                 -0.001         
                            (0.003)                 (0.003)                 (0.012)        
                                                                                           
comtensq                                                                    -0.0001        
                                                                           (0.0003)        
                                                                                           
lsales                     0.180***                0.180***                0.177***        
                            (0.041)                 (0.041)                 (0.041)        
                                                                                           
lmktval                     0.096*                   0.081                   0.081         
                            (0.050)                 (0.064)                 (0.064)        
                                                                                           
profits                                             0.0001                  0.0001         
                                                   (0.0002)                (0.0002)        
                                                                                           
Constant                   4.701***                4.814***                4.784***        
                            (0.256)                 (0.383)                 (0.389)        
                                                                                           
-------------------------------------------------------------------------------------------
Observations                  177                     177                     177          
R2                           0.313                   0.314                   0.314         
Adjusted R2                  0.301                   0.298                   0.294         
Residual Std. Error    0.507 (df = 173)        0.508 (df = 172)        0.509 (df = 171)    
F Statistic         26.274*** (df = 3; 173) 19.649*** (df = 4; 172) 15.688*** (df = 5; 171)
===========================================================================================
Note:                                                           *p<0.1; **p<0.05; ***p<0.01

C4. Use the data in attend.RData for this exercise.

Obtain the minimum, maximum, and average values for the variables \(atndrte\), \(priGPA\), and \(ACT\).
Estimate the model: \[atndrte = \beta_0 + \beta_1priGPA + \beta_2ACT + u,\] Interpret the intercept. Does it have a useful meaning?
Discuss the estimated slope coefficients. Are there any surprises?
What is the predicted \(atndrte\) if \(priGPA = 3.65\) and \(ACT = 20\)? What do you make of this result? Are there any students in the sample with these values of the explanatory variables?
If Student A has \(priGPA = 3.1\) and \(ACT = 21\) and Student B has \(priGPA = 2.1\) and \(ACT = 26\), what is the predicted difference in their attendance rates?

C7. Use the data in MEAP93 to answer this question.

Estimate the model \[ math10 = \beta_0 + \beta_1 lexpend + \beta_2 lnchprg + u,\] and report the results, including the sample size and R-squared. Are the signs of the slope coefficients what you expected? Explain.
What do you make of the intercept you estimated in part (i)? In particular, does it make sense to set the two explanatory variables to zero? [Hint: Recall that \(log(1) = 0\).]
Now run the simple regression of math10 on \(log(expend)\), and compare the slope coefficient with the estimate obtained in part (i). Is the estimated spending effect now larger or smaller than in part (i)?
Find the correlation between \(lexpend\) (\(log(expend)\)) and \(lnchprg\). Does its sign make sense to you?
Use part (iv) to explain your findings in part (iii).

Demonstration: making regression tables with `stargazer`

Often you’ll want to report results for multiple models. We’re going to use a new package: stargazer to format our regression results into a table. Having a table will make it easier to make direct comparisons of models–remember: the goal isn’t just to do some statistical voodoo to find out what’s happening in the world; the goal is to communicate to some audience.

Stargazer also has the nice feature of creating easy tables of summary statistics. Running the code stargazer(data, median = TRUE, flip = TRUE, type = "html") generated this table:


Statistic	N	Mean	St. Dev.	Min	Median	Max

lnchprg	408	25.201	13.610	1.400	23.850	79.500
enroll	408	2,663.806	2,696.821	212	1,840.5	16,793
staff	408	100.642	13.300	65.900	99.000	166.600
expend	408	4,376.578	775.790	3,332	4,145	7,419
salary	408	31,774.510	5,038.304	19,764	31,266	52,812
benefits	408	6,463.429	1,456.338	0	6,304.5	11,618
droprate	408	5.066	5.485	0.000	3.700	61.900
gradrate	408	83.652	13.368	23.500	86.300	127.100
math10	408	24.107	10.494	1.900	23.400	66.700
sci11	408	49.183	12.525	7.200	49.100	85.700
totcomp	408	38,237.940	5,985.086	24,498	37,443.5	63,518
ltotcomp	408	10.540	0.151	10.106	10.531	11.059
lexpend	408	8.370	0.162	8.111	8.330	8.912
lenroll	408	7.510	0.867	5.357	7.518	9.729
lstaff	408	4.603	0.127	4.188	4.595	5.116
bensal	408	0.205	0.038	0.000	0.202	0.450
lsalary	408	10.354	0.154	9.892	10.350	10.874

(See below for more on working with html table in non-html format.)

Using the package is simple. Just enter the name(s) of the model(s) you want a table for, and set the type of output you want.

You have three output options: text (like you saw above), html (which we’ll discuss more below), and \(\LaTeX\). The default is \(\LaTeX\), a typesetting program that is popular among academics (It’s pronounced “Lay-Teck” or “Lah-Teck”). If you take the output of stargazer() and compile it as \(\LaTeX\) code you’ll get very pretty output (pdf), but that requires figuring out \(\LaTeX\).

# install.packages("stargazer") # uncomment this line if not already installed
library(stargazer, quietly = TRUE) # load the package so we can use it.
fit1 <- lm(y1 ~ x1, anscombe) # make a linear model and name it `fit1`
fit2 <- lm(y2 ~ x2, anscombe) # etc.
fit3 <- lm(y3 ~ x3, anscombe)
fit4 <- lm(y4 ~ x4, anscombe)
stargazer(fit1, fit2, fit3, fit4) # create a table using the default output option


% Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Mon, Oct 24, 2016 - 11:06:02 AM
\begin{table}[!htbp] \centering 
  \caption{} 
  \label{} 
\begin{tabular}{@{\extracolsep{5pt}}lcccc} 
\\[-1.8ex]\hline 
\hline \\[-1.8ex] 
 & \multicolumn{4}{c}{\textit{Dependent variable:}} \\ 
\cline{2-5} 
\\[-1.8ex] & y1 & y2 & y3 & y4 \\ 
\\[-1.8ex] & (1) & (2) & (3) & (4)\\ 
\hline \\[-1.8ex] 
 x1 & 0.500$^{***}$ &  &  &  \\ 
  & (0.118) &  &  &  \\ 
  & & & & \\ 
 x2 &  & 0.500$^{***}$ &  &  \\ 
  &  & (0.118) &  &  \\ 
  & & & & \\ 
 x3 &  &  & 0.500$^{***}$ &  \\ 
  &  &  & (0.118) &  \\ 
  & & & & \\ 
 x4 &  &  &  & 0.500$^{***}$ \\ 
  &  &  &  & (0.118) \\ 
  & & & & \\ 
 Constant & 3.000$^{**}$ & 3.001$^{**}$ & 3.002$^{**}$ & 3.002$^{**}$ \\ 
  & (1.125) & (1.125) & (1.124) & (1.124) \\ 
  & & & & \\ 
\hline \\[-1.8ex] 
Observations & 11 & 11 & 11 & 11 \\ 
R$^{2}$ & 0.667 & 0.666 & 0.666 & 0.667 \\ 
Adjusted R$^{2}$ & 0.629 & 0.629 & 0.629 & 0.630 \\ 
Residual Std. Error (df = 9) & 1.237 & 1.237 & 1.236 & 1.236 \\ 
F Statistic (df = 1; 9) & 17.990$^{***}$ & 17.966$^{***}$ & 17.972$^{***}$ & 18.003$^{***}$ \\ 
\hline 
\hline \\[-1.8ex] 
\textit{Note:}  & \multicolumn{4}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\ 
\end{tabular} 
\end{table}

See this pdf to see what that mess looks like when compiled with \(\LaTeX\).

Text output is useful for comparing multiple models within R, but since it’s in plain text, it doesn’t look as professional as we’d want for a presentation (that means you, ECO 490W students). That said, text format is acceptable for this assignment.

The text table works because it’s being printed in a monospace font: every character is exactly as wide as the others, so 10 spaces are exactly as wide as 10 dashes, 10 letters, etc. This is also true of the usual summary(fit) output in R. If you paste it into a Word document, make sure it’s in a monospace font like Courier New or it will be very difficult to read. (Hint: if you want a \(\checkmark +\) then be sure your output is easily readable.)

stargazer(fit1, fit2, fit3, fit4,type = "text")


====================================================================
                                       Dependent variable:          
                             ---------------------------------------
                                y1        y2        y3        y4    
                                (1)       (2)       (3)       (4)   
--------------------------------------------------------------------
x1                           0.500***                               
                              (0.118)                               
                                                                    
x2                                     0.500***                     
                                        (0.118)                     
                                                                    
x3                                               0.500***           
                                                  (0.118)           
                                                                    
x4                                                         0.500*** 
                                                            (0.118) 
                                                                    
Constant                      3.000**   3.001**   3.002**   3.002** 
                              (1.125)   (1.125)   (1.124)   (1.124) 
                                                                    
--------------------------------------------------------------------
Observations                    11        11        11        11    
R2                             0.667     0.666     0.666     0.667  
Adjusted R2                    0.629     0.629     0.629     0.630  
Residual Std. Error (df = 9)   1.237     1.237     1.236     1.236  
F Statistic (df = 1; 9)      17.990*** 17.966*** 17.972*** 18.003***
====================================================================
Note:                                    *p<0.1; **p<0.05; ***p<0.01

With a bit of legwork we can use the html output to create a nice looking table that we could put into a professional presentation (e.g. a paper for ECO 490W). We’ll take the html code from stargazer(), have our browser compile it, then cut and paste that version of the table into Excel (or LibreOffice) where we can manually adjust how the table looks.

stargazer(fit1, fit2, fit3, fit4, type = "html")


<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="4"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="4" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>y1</td><td>y2</td><td>y3</td><td>y4</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">x1</td><td>0.500<sup>***</sup></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td>(0.118)</td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x2</td><td></td><td>0.500<sup>***</sup></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.118)</td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x3</td><td></td><td></td><td>0.500<sup>***</sup></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(0.118)</td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x4</td><td></td><td></td><td></td><td>0.500<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td>(0.118)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>3.000<sup>**</sup></td><td>3.001<sup>**</sup></td><td>3.002<sup>**</sup></td><td>3.002<sup>**</sup></td></tr>
<tr><td style="text-align:left"></td><td>(1.125)</td><td>(1.125)</td><td>(1.124)</td><td>(1.124)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>11</td><td>11</td><td>11</td><td>11</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.667</td><td>0.666</td><td>0.666</td><td>0.667</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.629</td><td>0.629</td><td>0.629</td><td>0.630</td></tr>
<tr><td style="text-align:left">Residual Std. Error (df = 9)</td><td>1.237</td><td>1.237</td><td>1.236</td><td>1.236</td></tr>
<tr><td style="text-align:left">F Statistic (df = 1; 9)</td><td>17.990<sup>***</sup></td><td>17.966<sup>***</sup></td><td>17.972<sup>***</sup></td><td>18.003<sup>***</sup></td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="4" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>

You can copy/paste that html code into an html interpreter like this one. Just delete everything on the left side of the page and paste in the html code. Hit the green “Run” button and you’ll have a decent looking table.

Click anywhere in the right side pane and type Ctrl + a (a for “select all”), then Ctrl + c (c for “copy”). Now open Excel, click anywhere and type Ctrl + v (v for “paste”… p was already taken by “print”… still, it makes more sense than most of the English language). You should have something that looks like this:

Now you can modify your table as you see fit, and finally copy/paste it into Word (or whatever word processor you’re using). Alternatively, you could just use the html code to present your table on a webpage. The html version looks like this:


	Dependent variable:

	y1	y2	y3	y4
	(1)	(2)	(3)	(4)

x1	0.500^***
	(0.118)

x2		0.500^***
		(0.118)

x3			0.500^***
			(0.118)

x4				0.500^***
				(0.118)

Constant	3.000^**	3.001^**	3.002^**	3.002^**
	(1.125)	(1.125)	(1.124)	(1.124)


Observations	11	11	11	11
R²	0.667	0.666	0.666	0.667
Adjusted R²	0.629	0.629	0.629	0.630
Residual Std. Error (df = 9)	1.237	1.237	1.236	1.236
F Statistic (df = 1; 9)	17.990^***	17.966^***	17.972^***	18.003^***

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Computer Assignment #2: Running your first regression!

Rick Weber

October 17, 2016

Introduction

Format

Grading

The Assignment

Step 0: set up your workspace

Step 1: gather the data

Step 2: answer the questions

Example code:

Demonstration: making regression tables with `stargazer`

Computer Assignment #2: Running your first regression!

Rick Weber

October 17, 2016

Introduction

Format

Grading

The Assignment

Step 0: set up your workspace

Step 1: gather the data

Step 2: answer the questions

Example code:

Demonstration: making regression tables with stargazer

Demonstration: making regression tables with `stargazer`