1. Suppose we have data on MFE students’ GMAT scores and their overall GPA in the program. We would like to see if GMAT scores predict academic performance (as measured by GPA), and so we fit a linear regression. R reports this output:
lm(formula = GPA ~ GMAT)
Coefficients:
Estimate Std Error t value p value
(Intercept) -1.601000 [hidden] -0.64 [hidden]
GMAT 0.006892 0.003531 [hidden] [hidden]
---
Standard Error of the Regression: 0.5945
Multiple R-squared: 0.05 Adjusted R-squared: 0.037
Overall F stat: 3.81 on 1 and 72 DF, pvalue= 0.055
1a. [2 points] Calculate the standard error of the intercept.
1b. [2 points] Is the coefficient on GMAT statistically significantly different from zero at a 95% confidence level? For reference, the R code qt(p=0.025, df=72)
returns the value -1.9934
.
1c. [2 points] Kate and Kalyan are two new MFE students not included in the regression. Kate’s GMAT score is 100 points higher than Kalyan’s. How much better do we expect Kate’s GPA to be?
2a. [1 point] Suppose you are working in R. Your Global Environment has a dataframe named “DF”. DF has two columns: a column named “Y” and a column named “X”. Write R code to run a regression of Y on X and store the output in an object named “out”.
2b. [2 points] Suppose your code from the last question worked. Write R code to create a scatterplot of the residuals from the regression (on the vertical axis) against the X variable (on the horizontal axis).
2c. [2 points] Fill in the blanks:
qnorm(p=0.975)
= __________
pt(q=0, df=37)
= __________
3a. [3 points] Let \(X\) be an \(n \times k\) matrix and \(e\) be an \(n\)-length vector of least squares residuals from a multiple regression. Show \(X'e=0\).
3b. [3 points] You regress \(Y\) on \(X\) (e.g., lm(Y~X)
). The \(X\) values and the residuals are shown below. Are these residuals consistent with a linear regression model or not? Why or why not?
\[ X = \begin{bmatrix} 2 \\ 4 \\ 1 \\ 3 \end{bmatrix} \hspace{3em} e = \begin{bmatrix} -1 \\ 2 \\ -2 \\ 1 \end{bmatrix}\]
4. [3 points] The rat
dataset from the DataAnalytics
package has three variables: \(y\), \(x_1\), and \(x_2\). Below are coefficient estimates for three regressions using data from the rat
dataset.
First, a regression of \(y\) on \(x_1\) and \(x_2\):
Estimate Std. Error t value
(Intercept) 0.178357 0.227775 0.7830
x1 0.035349 0.151375 0.2335
x2 1.232600 2.041265 0.6038
Second, a regression of \(x_1\) on \(x_2\):
Estimate Std. Error t value
(Intercept) 1.18864 0.22378 5.3117
x2 6.74253 2.83235 2.3805
Third, a regression of \(x_2\) on \(x_1\):
Estimate Std. Error t value
(Intercept) 0.014504 0.026834 0.5405
x1 0.037080 0.015576 2.3805
Suppose you run the simple regression of \(y\) on \(x_1\). Calculate what the estimated slope coefficient would be.