Question 1: Using the information below to provide short answers to the questions that follow


The crime dataset from the R package Ecdat contains data on the number of reported crimes (variable reports) for the 90 counties of the state of North Carolina for the year 1984. The data also contain information on the number of empoyed police officers (variable police) and the tax revenue in millions of dollars (variable taxrev) for those counties.

Suppose we regress the number of reported crimes on the number of police officers and the tax revenue. R reports the following output, some of which has been hidden:

Multiple Regression Analysis:
    3 regressors(including intercept) and 90 observations

lm(formula = reports ~ police + taxrev, data = crime)

Coefficients:
            Estimate  Std Error  t value    p value
(Intercept)  22.0700   [hidden]     4.57      0.000
police        1.0930     0.4792  [hidden]  [hidden]
taxrev      [hidden]     0.1555  [hidden]     0.273
---
Standard Error of the Regression: [hidden]
Multiple R-squared:  0.0775  Adjusted R-squared:  0.056
Overall F stat: [hidden] on 2 and 87 DF, pvalue= 0.03


For reference, two R commands and their results are provided below:

qt(p = 0.273/2, df = 87)
## [1] -1.10316
qt(p = 0.025, df = 87)
## [1] -1.987608


1a. [1.5 points] How do you interpret the coefficient on police? In other words, write an English sentence that explains what the coefficient on police means.

The question is ambiguous as to whether it pertains to \(\beta_\text{police}\) or \(b_\text{police}\) and so answers that address either are acceptable.

Regarding \(b_\text{police}\): Holding the tax revenue constant, an increase of one employed police officer in a county (in North Carolina in 1984) is associated with an increase of 1.093 reported crimes in that county, on average.

Regarding \(\beta_\text{police}\): Holding tax revenue constant, an increase of one employed police officer in a county (in North Carolina in 1984) is associated with an expected increase of 1.093 reported crimes in that county.

Reference: Chapter 2 slide 3


1b. [1 point] Is the coefficient on police statistically significantly different from zero at the 95% confidence level? Why?

Yes because the t-statistic for police is 1.093 / 0.4792 = 2.281, which is larger than the critical value of 1.9876 reported above.

Reference: Chapter 1 slide 86-89


1c. [1.5 points] Calculate the estimated coefficient on taxrev.

The absolute value of the t-value associated with a two-tailed p-value of 0.273 is 1.10316, as reported above. Then because \[ \frac{b_\text{taxrev} - 0}{s_\text{taxrev}} = t_{87} \hspace{1em} \Longrightarrow \hspace{1em} b_{taxrev} = t_{87} \times s_\text{taxrev} \]

we find that \(b_\text{taxrev} = \pm 1.10316 \times 0.1555 = \pm 0.171\).

The question did not provide enough information to assess the sign on the coefficient. Full credit is awarded if the student’s solution provided the value up to the sign (i.e., \(0.171\) and \(-0.171\) are both given full credit).

Reference: Chapter 1 slides 86-89 and 97-98


1d. [2 points] Using the fact that the Regression Sum of Squares (often abbreviated \(SSR\)) is equal to 1940.2, calculate the standard error of the regression (often denoted \(s\)).

\[ R^2 = \frac{SSR}{TSS} \hspace{1em} \Longrightarrow \hspace{1em} TSS = \frac{SSR}{R^2} = \frac{1940.2}{0.0775} = 25034.84 \]

\[ SSE = TSS - RSS = 25034.84 - 1940.2 = 23094.64 \]

\[ s = \sqrt{\frac{SSE}{n-k-1}} = \sqrt{\frac{23094.64}{87}} = 16.29 \]

Reference: Chapter 1 slides 40 and 52, and Chapter 2 slide 7


Question 2: R Programming


2a. [1.5 points] Suppose you have two objects in the R global environment: (1) the length-\(n\) vector \(Y\) and (2) the \(n \times k\) matrix \(X\). Write the R code that calculates the vector of least squares coefficient estimates. Store the calculated result in an object name \(b\).

b <- chol2inv(chol(crossprod(X))) %*% crossprod(X,Y)

or either of the two following set of commands

b <- lm(y ~ x[,2])$coef

b <- lm(y ~ 0 + x)$coef

Reference: Chapter 3 slide 9


2b. [1.5 points] What does the following R code do?

rt(n=100, df=10)

Returns a vector of 100 psuedo-random draws from a t-distribution with 10 degrees of freedom.

Reference: Problem Set 1 question 5


Question 3: Least Squares in Matrix Form


Use the following matrices to answer the two sub-parts of this question:

\[\begin{align*} (X'X)^{-1} = \begin{bmatrix} 0.2 & -0.1 \\ -0.1 & 0.5 \end{bmatrix} \hspace{3em} X'Y = \begin{bmatrix} 1 \\ 5 \end{bmatrix} \end{align*}\]


3a. [2 points] Compute the least squares estimate vector \(\mathbf{b} = [b_0\ b_1]'\).

\(\mathbf{b} = \begin{bmatrix} b_0 \\ b_1 \end{bmatrix} = [X'X]^{-1}[X'Y] = \begin{bmatrix} 0.2 & -0.1 \\ -0.1 & 0.5 \end{bmatrix} \begin{bmatrix} 1 \\ 5 \end{bmatrix} = \begin{bmatrix} 0.2*1 - 0.1*5 \\ -0.1*1 + 0.5*5 \end{bmatrix} = \begin{bmatrix} -0.3 \\ 2.4 \end{bmatrix}\)

Reference: Chapter 3 slide 3 and standard matrix multiplication


3b. [2 points] If \(s^2 = 3\), compute the standard errors of each of the least squares coefficients.

\(\hat{\text{var}}(\mathbf{b}) = s^2(X'X)^{-1} = 3 * \begin{bmatrix} 0.2 & -0.1 \\ -0.1 & 0.5 \end{bmatrix} = \begin{bmatrix} 0.6 & -0.3 \\ -0.3 & 1.5 \end{bmatrix}\)

This is the estimated variance-covariance matrix of the OLS estimates, with the estimated variances of the least squares coefficients on the diagonal. So \(s_{b_0} = \sqrt{0.6} \approx 0.77\) and \(s_{b_1} = \sqrt{1.5} \approx 1.22\).

Reference: Chapter 3 slides 12–13


Question 4: Proofs


4a. [2 points] Let \(e_i\) be a least squares residual from the simple linear regression model. Verify that \(\sum_{i=1}^{n}e_i=0\).


\[\begin{align*} \sum_{i=1}^n e_i &= \sum_{i=1}^n \left( y_i - \hat{y}_i \right) &\\ &= \sum_{i=1}^n \left( y_i - b_0 - b_1x_i \right) &\\ &= \sum_{i=1}^n \left( y_i - (\bar{y} - b_1 \times \bar{x}) - b_1x_i \right) \hspace{3em} \text{because } b_0 = \bar{y} - b_1 \times \bar{x} &\\ &= \sum_{i=1}^n y_i - \sum_{i=1}^n \bar{y} + \sum_{i=1}^n b_1\bar{x} - b_1 \sum_{i=1}^n x_i &\\ &= \left( \sum_{i=1}^n y_i \right) - n\bar{y} + nb_1\bar{x} - b_1 \left(\sum_{i=1}^n x_i\right) &\\ &= \left( \sum_{i=1}^n y_i \right) - n\left( \frac{1}{n} \sum_{i=1}^n y_i \right) + nb_1 \left( \frac{1}{n} \sum_{i=1}^n x_i\right) - b_1 \left(\sum_{i=1}^n x_i\right) &\\ &= 0 + 0 \end{align*}\]


4b. [1 point] Define the “hat matrix” \(P\) to be \(P = X(X'X)^{-1}X'\). Show that \(Py=\hat{y}\) where \(\hat{y}\) is the vector of least squares fitted values from the multiple regression model.


Using the fact that \(b = (X'X)^{-1}X'y\), we have that \(Py = X(X'X)^{-1}X'y = Xb = \hat{y}\)


4c. [1 point] Define the “annihilator matrix” \(M\) to be \(M = I - P\) where \(P\) is defined above and \(I\) is an \(n \times n\) identity matrix. Show that \(My=e\) where \(e\) is the vector of least squares residuals from the multiple regression model.


Using the fact that \(e = y - \hat{y}\), we have that \(MY = (I-P)y = y - Py = y - \hat{y} = e\)


Question 5: Two short answers required, full proofs are not necessary


5a. [1.5 points] Geoff eats either 0, 1, or 2 powerbars each morning. He claims the amount of powerbars he eats in the morning (\(X\)) positively affects his productivity during the day, measured in lines of R code written (\(Y\)). Geoff also collects some data on the amount of code he writes each day and he reports that var(\(Y\)) = var(\(Y|X\)=1 bar) = var(\(Y|X\)=2 bars). Do you agree with Geoff that his powerbar consumption affects his productivity? Why or why not?

No. I disagree with Goeff. Powerbar consumption does not affect Geoff’s productivity because his marginal and conditional variances are equal.

Recall that \(\sigma_Y^2 = \beta_1^2 \times \sigma_X^2 + \sigma_\varepsilon^2\) which is greater than \(\sigma_\varepsilon^2\) whenever \(\beta_1 \ne 0\). So the marginal (\(\sigma_Y^2\)) and conditional (\(\sigma_{Y|X}^2 = \sigma_\varepsilon^2\)) variances are equal when the linear relationship between \(X\) and \(Y\) is zero.

Reference: Chapter 1 slides 53–58 and 61


5b. [1.5 points] A friend in the Berkeley MFE program tells you that his professor has invented a new procedure that is better than least squares for simple linear regression. This new procedure fits the slope of the fitted line with the formula below. Explain why your friend is wrong.

\[ b_1 = \frac{\frac{1}{n} \sum_{i=1}^{n} (Y_i - \bar{Y})X_i}{\frac{1}{n} \sum_{i=1}^{n}(X_i - \bar{X})^2} \]

The Berkeley professor’s slope estimator is the same as the OLS estimator, and so it is not better.

\[ b_1 = \frac{\frac{1}{n} \sum_{i=1}^{n} (Y_i - \bar{Y})X_i}{\frac{1}{n} \sum_{i=1}^{n}(X_i - \bar{X})^2} = \frac{s_{X,Y}}{s_X^2} = \frac{\sum_{i=1}^{n} (Y_i - \bar{Y})(X_i - \bar{X})}{\sum_{i=1}^{n}(X_i - \bar{X})^2} = b_{OLS} \]

This follows from the fact that you only need to ``de-mean’’ one of the two variables to calculate covariance:

\[\begin{align*} \hspace{2em} s_{X,Y} &= \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{n} &\\ &= \frac{\sum_{i=1}^{n}(X_iY_i - X_i\bar{Y} - Y_i\bar{X} + \bar{X}\bar{Y}}{n}) &\\ &= \frac{\sum_{i=1}^{n}X_iY_i}{n} - \frac{\bar{Y} \sum_{i=1}^{n} X_i}{n} - \frac{\bar{X} \sum_{i=1}^{n} Y_i}{n} + \frac{\bar{X}\bar{Y} \sum_{i=1}^{n} 1}{n} &\\ &= \frac{1}{n}\sum_{i=1}^{n}(X_iY_i) - \bar{X}\bar{Y} - \bar{X}\bar{Y} + \bar{X}\bar{Y} &\\ &= \frac{1}{n}\sum_{i=1}^{n}(X_iY_i) - \bar{X}\bar{Y} &\\ &= \frac{1}{n}\sum_{i=1}^{n}(X_iY_i) - \bar{Y}\frac{1}{n}\sum_{i=1}^{n}X_i &\\ &= \frac{1}{n}\sum_{i=1}^{n}(Y_i - \bar{Y})X_i & \end{align*}\]

Reference: Chapter 1 slide 70, and Problem Set 1 question 1