The purpose of this assignment is for you to get hands on experience running and interpretting regressions. We will use questions from the textbook to provide some structure.
Please turn in a hard copy of your script and a separate document with your answers to the questions. For questions that ask you to estimate a model, copy/paste a summary table of your model using stargazer
(see below for more details).
This assignment will be graded on a \(\checkmark\)/\(\checkmark +\) basis. Completing the assignment gets you a \(\checkmark\) (worth 85%) and getting the hardest part right gets you a \(\checkmark +\) (worth 100%). Incomplete work is worth 0%.
The deadline is the beginning of class next Tuesday (Oct 24) or by email prior to the class. (Late submissions will be docked 20 points.) I would give you more time, but this assignment is fairly short.
Breathe deeply, brew some coffee, and create a new project with its own folder named “HW2” (or whatever you want to call it). Type Ctrl + Shift + n
to open up a new script and save it (name it something creative like “script.R”… something you’ll remember).
From the Wooldridge textbook data find the datasets below:
Copy them into your working directory (the folder on your computer R is operating in). Each question below will use a different dataset
load("ceosal2.RData")
# load("attend.RData") # no point loading this now... it will just overwrite
# load("meap93.RData") # the other one.
Side note: R operates in a specific folder mostly because of how the unix operating system worked long ago when R was invented. If you have a Mac, you’re technically using a unix operating system. Some R commands also have unix roots like ls()
which lists the contents of a unix directory or your R workspace and rm()
which removes files (unix) or objects in your workspace (R).
C3. The file ceosal2.RData
contains data on 177 chief executive officers and can be used to examine the effects of firm performance on CEO salary.
load("ceosal2.RData")
m3.1 <- lm(lsalary ~ comten + lsales + lmktval, data)
m3.2 <- lm(lsalary ~ comten + lsales + lmktval + profits, data)
m3.3 <- lm(lsalary ~ comten + comtensq + lsales + lmktval + profits, data)
data %>% select(profits,lsalary,lsales,lmktval,comten) %>% cor()
profits lsalary lsales lmktval comten
profits 1.0000000 0.396694779 0.6063325 0.7768976 0.143737237
lsalary 0.3966948 1.000000000 0.5299602 0.4814910 -0.002314525
lsales 0.6063325 0.529960173 1.0000000 0.7359232 0.237819855
lmktval 0.7768976 0.481490997 0.7359232 1.0000000 0.101931416
comten 0.1437372 -0.002314525 0.2378199 0.1019314 1.000000000
stargazer(m3.1,m3.2,m3.3,type = "text")
===========================================================================================
Dependent variable:
-----------------------------------------------------------------------
lsalary
(1) (2) (3)
-------------------------------------------------------------------------------------------
comten -0.006* -0.006* -0.001
(0.003) (0.003) (0.012)
comtensq -0.0001
(0.0003)
lsales 0.180*** 0.180*** 0.177***
(0.041) (0.041) (0.041)
lmktval 0.096* 0.081 0.081
(0.050) (0.064) (0.064)
profits 0.0001 0.0001
(0.0002) (0.0002)
Constant 4.701*** 4.814*** 4.784***
(0.256) (0.383) (0.389)
-------------------------------------------------------------------------------------------
Observations 177 177 177
R2 0.313 0.314 0.314
Adjusted R2 0.301 0.298 0.294
Residual Std. Error 0.507 (df = 173) 0.508 (df = 172) 0.509 (df = 171)
F Statistic 26.274*** (df = 3; 173) 19.649*** (df = 4; 172) 15.688*** (df = 5; 171)
===========================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
C4. Use the data in attend.RData
for this exercise.
C7. Use the data in MEAP93 to answer this question.
stargazer
Often you’ll want to report results for multiple models. We’re going to use a new package: stargazer
to format our regression results into a table. Having a table will make it easier to make direct comparisons of models–remember: the goal isn’t just to do some statistical voodoo to find out what’s happening in the world; the goal is to communicate to some audience.
Stargazer also has the nice feature of creating easy tables of summary statistics. Running the code stargazer(data, median = TRUE, flip = TRUE, type = "html")
generated this table:
Statistic | N | Mean | St. Dev. | Min | Median | Max |
lnchprg | 408 | 25.201 | 13.610 | 1.400 | 23.850 | 79.500 |
enroll | 408 | 2,663.806 | 2,696.821 | 212 | 1,840.5 | 16,793 |
staff | 408 | 100.642 | 13.300 | 65.900 | 99.000 | 166.600 |
expend | 408 | 4,376.578 | 775.790 | 3,332 | 4,145 | 7,419 |
salary | 408 | 31,774.510 | 5,038.304 | 19,764 | 31,266 | 52,812 |
benefits | 408 | 6,463.429 | 1,456.338 | 0 | 6,304.5 | 11,618 |
droprate | 408 | 5.066 | 5.485 | 0.000 | 3.700 | 61.900 |
gradrate | 408 | 83.652 | 13.368 | 23.500 | 86.300 | 127.100 |
math10 | 408 | 24.107 | 10.494 | 1.900 | 23.400 | 66.700 |
sci11 | 408 | 49.183 | 12.525 | 7.200 | 49.100 | 85.700 |
totcomp | 408 | 38,237.940 | 5,985.086 | 24,498 | 37,443.5 | 63,518 |
ltotcomp | 408 | 10.540 | 0.151 | 10.106 | 10.531 | 11.059 |
lexpend | 408 | 8.370 | 0.162 | 8.111 | 8.330 | 8.912 |
lenroll | 408 | 7.510 | 0.867 | 5.357 | 7.518 | 9.729 |
lstaff | 408 | 4.603 | 0.127 | 4.188 | 4.595 | 5.116 |
bensal | 408 | 0.205 | 0.038 | 0.000 | 0.202 | 0.450 |
lsalary | 408 | 10.354 | 0.154 | 9.892 | 10.350 | 10.874 |
(See below for more on working with html table in non-html format.)
Using the package is simple. Just enter the name(s) of the model(s) you want a table for, and set the type of output you want.
You have three output options: text (like you saw above), html (which we’ll discuss more below), and \(\LaTeX\). The default is \(\LaTeX\), a typesetting program that is popular among academics (It’s pronounced “Lay-Teck” or “Lah-Teck”). If you take the output of stargazer()
and compile it as \(\LaTeX\) code you’ll get very pretty output (pdf), but that requires figuring out \(\LaTeX\).
# install.packages("stargazer") # uncomment this line if not already installed
library(stargazer, quietly = TRUE) # load the package so we can use it.
fit1 <- lm(y1 ~ x1, anscombe) # make a linear model and name it `fit1`
fit2 <- lm(y2 ~ x2, anscombe) # etc.
fit3 <- lm(y3 ~ x3, anscombe)
fit4 <- lm(y4 ~ x4, anscombe)
stargazer(fit1, fit2, fit3, fit4) # create a table using the default output option
% Table created by stargazer v.5.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Mon, Oct 24, 2016 - 11:06:02 AM
\begin{table}[!htbp] \centering
\caption{}
\label{}
\begin{tabular}{@{\extracolsep{5pt}}lcccc}
\\[-1.8ex]\hline
\hline \\[-1.8ex]
& \multicolumn{4}{c}{\textit{Dependent variable:}} \\
\cline{2-5}
\\[-1.8ex] & y1 & y2 & y3 & y4 \\
\\[-1.8ex] & (1) & (2) & (3) & (4)\\
\hline \\[-1.8ex]
x1 & 0.500$^{***}$ & & & \\
& (0.118) & & & \\
& & & & \\
x2 & & 0.500$^{***}$ & & \\
& & (0.118) & & \\
& & & & \\
x3 & & & 0.500$^{***}$ & \\
& & & (0.118) & \\
& & & & \\
x4 & & & & 0.500$^{***}$ \\
& & & & (0.118) \\
& & & & \\
Constant & 3.000$^{**}$ & 3.001$^{**}$ & 3.002$^{**}$ & 3.002$^{**}$ \\
& (1.125) & (1.125) & (1.124) & (1.124) \\
& & & & \\
\hline \\[-1.8ex]
Observations & 11 & 11 & 11 & 11 \\
R$^{2}$ & 0.667 & 0.666 & 0.666 & 0.667 \\
Adjusted R$^{2}$ & 0.629 & 0.629 & 0.629 & 0.630 \\
Residual Std. Error (df = 9) & 1.237 & 1.237 & 1.236 & 1.236 \\
F Statistic (df = 1; 9) & 17.990$^{***}$ & 17.966$^{***}$ & 17.972$^{***}$ & 18.003$^{***}$ \\
\hline
\hline \\[-1.8ex]
\textit{Note:} & \multicolumn{4}{r}{$^{*}$p$<$0.1; $^{**}$p$<$0.05; $^{***}$p$<$0.01} \\
\end{tabular}
\end{table}
See this pdf to see what that mess looks like when compiled with \(\LaTeX\).
Text output is useful for comparing multiple models within R, but since it’s in plain text, it doesn’t look as professional as we’d want for a presentation (that means you, ECO 490W students). That said, text format is acceptable for this assignment.
The text table works because it’s being printed in a monospace font: every character is exactly as wide as the others, so 10 spaces are exactly as wide as 10 dashes, 10 letters, etc. This is also true of the usual summary(fit)
output in R. If you paste it into a Word document, make sure it’s in a monospace font like Courier New or it will be very difficult to read. (Hint: if you want a \(\checkmark +\) then be sure your output is easily readable.)
stargazer(fit1, fit2, fit3, fit4,type = "text")
====================================================================
Dependent variable:
---------------------------------------
y1 y2 y3 y4
(1) (2) (3) (4)
--------------------------------------------------------------------
x1 0.500***
(0.118)
x2 0.500***
(0.118)
x3 0.500***
(0.118)
x4 0.500***
(0.118)
Constant 3.000** 3.001** 3.002** 3.002**
(1.125) (1.125) (1.124) (1.124)
--------------------------------------------------------------------
Observations 11 11 11 11
R2 0.667 0.666 0.666 0.667
Adjusted R2 0.629 0.629 0.629 0.630
Residual Std. Error (df = 9) 1.237 1.237 1.236 1.236
F Statistic (df = 1; 9) 17.990*** 17.966*** 17.972*** 18.003***
====================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
With a bit of legwork we can use the html output to create a nice looking table that we could put into a professional presentation (e.g. a paper for ECO 490W). We’ll take the html code from stargazer()
, have our browser compile it, then cut and paste that version of the table into Excel (or LibreOffice) where we can manually adjust how the table looks.
stargazer(fit1, fit2, fit3, fit4, type = "html")
<table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="4"><em>Dependent variable:</em></td></tr>
<tr><td></td><td colspan="4" style="border-bottom: 1px solid black"></td></tr>
<tr><td style="text-align:left"></td><td>y1</td><td>y2</td><td>y3</td><td>y4</td></tr>
<tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">x1</td><td>0.500<sup>***</sup></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td>(0.118)</td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x2</td><td></td><td>0.500<sup>***</sup></td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td>(0.118)</td><td></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x3</td><td></td><td></td><td>0.500<sup>***</sup></td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td>(0.118)</td><td></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">x4</td><td></td><td></td><td></td><td>0.500<sup>***</sup></td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td>(0.118)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td style="text-align:left">Constant</td><td>3.000<sup>**</sup></td><td>3.001<sup>**</sup></td><td>3.002<sup>**</sup></td><td>3.002<sup>**</sup></td></tr>
<tr><td style="text-align:left"></td><td>(1.125)</td><td>(1.125)</td><td>(1.124)</td><td>(1.124)</td></tr>
<tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>11</td><td>11</td><td>11</td><td>11</td></tr>
<tr><td style="text-align:left">R<sup>2</sup></td><td>0.667</td><td>0.666</td><td>0.666</td><td>0.667</td></tr>
<tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.629</td><td>0.629</td><td>0.629</td><td>0.630</td></tr>
<tr><td style="text-align:left">Residual Std. Error (df = 9)</td><td>1.237</td><td>1.237</td><td>1.236</td><td>1.236</td></tr>
<tr><td style="text-align:left">F Statistic (df = 1; 9)</td><td>17.990<sup>***</sup></td><td>17.966<sup>***</sup></td><td>17.972<sup>***</sup></td><td>18.003<sup>***</sup></td></tr>
<tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="4" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
</table>
You can copy/paste that html code into an html interpreter like this one. Just delete everything on the left side of the page and paste in the html code. Hit the green “Run” button and you’ll have a decent looking table.
Click anywhere in the right side pane and type Ctrl + a
(a for “select all”), then Ctrl + c
(c for “copy”). Now open Excel, click anywhere and type Ctrl + v
(v for “paste”… p was already taken by “print”… still, it makes more sense than most of the English language). You should have something that looks like this:
Now you can modify your table as you see fit, and finally copy/paste it into Word (or whatever word processor you’re using). Alternatively, you could just use the html code to present your table on a webpage. The html version looks like this:
Dependent variable: | ||||
y1 | y2 | y3 | y4 | |
(1) | (2) | (3) | (4) | |
x1 | 0.500*** | |||
(0.118) | ||||
x2 | 0.500*** | |||
(0.118) | ||||
x3 | 0.500*** | |||
(0.118) | ||||
x4 | 0.500*** | |||
(0.118) | ||||
Constant | 3.000** | 3.001** | 3.002** | 3.002** |
(1.125) | (1.125) | (1.124) | (1.124) | |
Observations | 11 | 11 | 11 | 11 |
R2 | 0.667 | 0.666 | 0.666 | 0.667 |
Adjusted R2 | 0.629 | 0.629 | 0.629 | 0.630 |
Residual Std. Error (df = 9) | 1.237 | 1.237 | 1.236 | 1.236 |
F Statistic (df = 1; 9) | 17.990*** | 17.966*** | 17.972*** | 18.003*** |
Note: | *p<0.1; **p<0.05; ***p<0.01 |