2019 Final Exam Solutions

Question 1 [8 pts] Yes/No with Support

Answer the following six subparts of Question 1 as “Yes” or “No” and provide support for your answer. A simple “Yes” or “No” without support will be given zero credit. All questions ask about simple linear regression models of the form \(Y = \beta_0 + \beta_1 X + \varepsilon\), with the usual assumptions about \(\varepsilon\). The subparts are not related to each other.

1a) You run a simple linear regression in R on 5000 data points and then you view the output with the summary() command. You find that the t-statistic for the estimated coefficient (\(b_1\)) on the explanatory variable is more than 1.96. Will the p-value reported by R for that t-statistic definitely be less than 0.025?

No. P-values in regression output are 2-tailed values. So if the t-stat is greater than 1.96, the p-value is definitely less than 0.05, but not necessarily less than 0.025.

1b) Your friend Joe runs a simple linear regression and finds that both the estimated intercept coefficient (\(b_0\)) and the estimated slope coefficient (\(b_1\)) on the explanatory variable are statistically significally different than zero at a 99% confidence level. He concludes that the relationship between \(X\) and \(Y\) is linear. Should you agree with Joe’s reasoning?

No. Coefficient significance from a linear regression model does not indicate the true relationship between \(X\) and \(Y\), if any.

Consider a true nonlinear relationship of \(Y = \gamma_0 + \gamma_1 X^2 + \varepsilon\). If data were generated from this true nonlinear relationship, we could still fit a line to the data using linear regression (even though this might not be a very sensible thing to do) and we might even find that estimated coefficients from the linear regression are significant at a high (say, 99%) confidence level.

1c) Alyssa runs a simple linear regression. In her data, the sample standard deviation of \(X\) is equal to the sample standard deviation of \(Y\) (i.e., \(s_X = s_Y\)). For her regression, is the slope coefficient equal to the coefficient of determination (i.e., does \(b_1 = R^2\))?

In general, no. The slope coefficient is equal to the correlation between \(X\) and \(Y\). \(R^2\) is the square of that correlation.

\[ b_1 = r_{XY} \times \frac{s_Y}{s_X} = r_{XY} \hspace{2em} \text{and} \hspace{2em} R^2 = r_{XY}^2 \]

However, if there is perfect correlation between \(X\) and \(Y\), then yes \(b_1 = R^2\).

1d) Suppose \(X\) can take on values from 0 to 100. In your dataset, you have \(n=1000\) observations and the average \(X\) value is 30 (i.e., \(\bar{X}=30\)). Your boss asks you to calculate a prediction interval for \(Y_\text{f}|X_\text{f}=29\). You need to do this quickly. You know \(s\), but not \(s_\text{pred}\). Can you use \(s\) as a good approximation of \(s_\text{pred}\) to calculate the requested prediction interval?

Yes. \(s_\text{pred}^2 = s^2\left( 1+ \frac{1}{n} + \frac{X_f - \bar{X}}{(N-1)s_X^2} \right)\). This says that \(s_\text{pred}\) equals \(s\) times a term that will be very close to 1 if \(n\) is large, \(X_f\) is close to \(\bar{X}\), and \(s_X\) is not tiny. These are exactly the conditions in this situation and so \(s_\text{pred} \approx s\).

Question 2 [12 pts] Short Answer

Question 2 has six sub-parts; they are not related to each other. Provide the required calculation or short answer for each sub-part.

2a) Suppose you create a portfolio from two stocks. The weights of the stocks included in the portfolio are provided in the vector \(w\) and the coviariance matrix of the stocks is provided in the matrix \(\Sigma\). Calculate the variance of the portfolio.

\[ w = \begin{bmatrix} 0.3 \\ 0.7 \end{bmatrix} \hspace{3em} \Sigma = \begin{bmatrix} 2 & 0.5 \\ 0.5 & 1 \end{bmatrix} \]

\[ w' \Sigma w = \begin{bmatrix} 0.3 & 0.7 \end{bmatrix} \begin{bmatrix} 2 & 0.5 \\ 0.5 & 1 \end{bmatrix} \begin{bmatrix} 0.3 \\ 0.7 \end{bmatrix} = 0.3^2(2) + 0.7^2(1) + 2(0.3)(0.7)(0.5) = 0.88 \]

2b) Suppose your dataset has 100 observations and 6 variables (the dependent variable \(Y\) and 5 independent variables \(X_1\) through \(X_5\)). You regress \(Y\) on the 5 independent variables and find an \(R^2\) value of 0.66. Calculate the overall F-statistic for this regression.

\[ F = \frac{R^2/k}{(1-R^2)/(N-k-1)} = \frac{0.66/5}{(1-0.66)/(100-6)} = 36.49 \]

2c) Consider a stationary AR(1) model. Is the conditional mean equal to the unconditional mean? In other words, does \(\mathbb{E}[Y_t \vert Y_{t-1}] = \mathbb{E}[Y_t]\)? Why or why not?

No.

The mean of \(Y\) is \(\mathbb{E}[Y_t] = \mathbb{E}[\beta_0 + \beta_1 Y_{t-1} + \varepsilon_t] = \beta_0 + \beta_1\mathbb{E}[Y_{t-1}]\)

Using the fact that \(\mathbb{E}[Y_{t-1}] = \mathbb{E}[Y_t]\), we solve for \(\mathbb{E}[Y_t] = \beta_0 / (1 - \beta_1)\).

The conditional mean of \(Y_t \vert Y_{t-1}\) is \(\mathbb{E}[Y_{t-1}] = \beta_0 + \beta_1 Y_{t-1}\).

2d) Consider the LASSO technique. Suppose you choose a very small value for the penalty parameter \(\lambda\). Will your estimated coefficients be similar to OLS estimates of the same set of parameters, or will they be very different from the OLS estimates. Why?

With a small value for the penalty parameter \(\lambda\), the coefficient estimates from a LASSO model will be similar to those from least squares. To see this, recall that LASSO minimizes the sum of squared errors plus a penalty. In the limit, as the penalty goes to zero, LASSO minimizes the same function as least squares and thus the estimated coefficients are the same.

\[ \hat{\beta}_\text{LASSO} = \arg\min \left\{ \sum_{i=1}^N (Y_i - \hat{Y}_i)^2 + \lambda \sum_{j=1}^J \beta_J \right\} \]

2e) Assume you would like to model \(Y\) as a linear function of \(X\), but you are concerned about heteroskedasticity. In particular, you model \(Y_i = \beta_0 + \beta_1X_i + \varepsilon_i\) with \(\varepsilon_i\) independently but not identically distributed, i.e., \(\varepsilon_i \sim \mathcal{N}(0, \sigma_i^2)\). Let \(e\) denote the vector of residuals. Calculate the least squares coefficient estimates using the following information.

\[ [X'X]^{-1} = \begin{bmatrix} 0.5 & 0.1 \\ 0.1 & 3 \end{bmatrix} \hspace{2em} X'Y = \begin{bmatrix} -4 \\ 2 \end{bmatrix} \hspace{2em} e'e = 212.91 \]

The OLS estimates are not affected by heterskedasticity:

\[ b = (X'X)^{-1}X'Y = \begin{bmatrix} 0.5 & 0.1 \\ 0.1 & 3 \end{bmatrix} \begin{bmatrix} -4 \\ 2 \end{bmatrix} = \begin{bmatrix} (0.5)(-4) + (0.1)(2) \\ (0.1)(-4) + (3)(2) \end{bmatrix} = \begin{bmatrix} -2+0.2 \\ -0.4+6 \end{bmatrix} = \begin{bmatrix} -1.8 \\ 5.6 \end{bmatrix} \]

2f) Assume you have fit a multiple linear regression model with several independent variables. Suppose the quantity of interest is \(\theta = \beta_3 / \beta_2\). Explain how to find a 95% confidence interval for \(\hat{\theta}\) using bootstrapping.

Sample with replacement from the original dataset and calculate the quantity of interest. Do this many times and take the 0.025 and 0.975 quantiles.

In psuedo-code:

B = 10000
result_vec <- double(B)

for(i in 1:B) {
  # sample N rows with replacement from your dataset
  boot_data <- orig_dataset[sample(N, replace=TRUE), ]
  
  # fit the model on the bootstrap dataset
  out <- lm(Y ~ X, data=boot_data)
  
  # calculate and store the quantity of interest
  result_vec[i] <- coef(out)[3] / coef(out)[2]
}

# get 95% interval
quantile(result_vec, probs=c(0.025, 0.975))

Question 3 [7 pts] New Food

The newfood dataset in DataAnalytics package contains 72 observations about a new food product. One observation is one grocery store. The variables in the dataset are:

sales – average monthly unit sales of the newfood product at each grocery store
price – price in cents of the newfood product at each grocery store
income – average income of families that shop at each grocery store
city – an identifier to indicate which of 4 cities the grocery store is located

Use the information in the following two regressions to answer the four questions on the next page.

data(newfood, package="DataAnalytics")
out1 <- lm(sales ~ price + income + I(income^2) + as.factor(city), data=newfood)
DataAnalytics::lmSumm(out1)

Coefficients:
                  Estimate Std Error t value p value
(Intercept)      -2287.000  1401.000   -1.63   0.108
price              -12.260     1.605   -7.64   0.000
income             809.700   404.800    2.00   0.050
I(income^2)        -56.990    29.370   -1.94   0.057
as.factor(city)2    60.320    22.360    2.70   0.009
as.factor(city)3    -6.305    19.270   -0.33   0.745
as.factor(city)4    97.910    74.420    1.32   0.193
---
Standard Error of the Regression:  49.44
Multiple R-squared:  0.662  Adjusted R-squared:  0.619

out2 <- lm(sales ~ price + income + I(income^2), data=newfood)
DataAnalytics::lmSumm(out2)

Coefficients:
                 Estimate Std Error t value p value
(Intercept)       1248.00    735.10    1.70   0.094
price              -12.87      1.54   -8.36   0.000
income            -185.10    201.00   -0.92   0.360
I(income^2)         13.23     13.76    0.96   0.340
---
Standard Error of the Regression:  52.09
Multiple R-squared:  0.607  Adjusted R-squared:  0.577

3a) According to the first regression on the previous page, at a 10% level of significance, does income have a non-linear affect on sales of the new food product? Why or why not?

Yes. At a 10% level of significance, we would reject the Null Hypothesis that the coefficient on squared-income is zero (because the p-value is less than 0.10) and thus conclude that there is a non-linear relationship between income and sales (holding constant price and city location).

3b) For given fixed values of price and income, which city has the highest average sales of the new food product?

City 4 because its dummy variable has the largest coefficient, and because that coefficient is positive, average sales in City 4 are greater than the default city (City 1).

3c) Test whether the set of city dummy variables significantly improved the regression at a 99% confidence level. Note the following:

qt(p=0.995, df=66) = 2.652

qf(p=0.990, df1=3, df2=65) = 4.098

qf(p=0.995, df1=3, df2=65) = 4.692

This requires a partial F test on the 3 city dummy variables.

\[ F_\text{Partial} = \frac{(R^2_\text{full}-R^2_\text{partial})/k_2}{(1-R^2_\text{full})/(N-k_1-K_2-1)} = \frac{(0.662-0.607)/3}{(1-0.662)/(72-3-3-1)} = 3.53 \]

Because 3.53 < 4.098, we fail to reject the Null hypothesis; the addition of the city dummy variables did not significantly improve the regression at a 99% confidence level.

3d) What does the following R code test for? Given the result, what do you conclude?

lmtest::bptest(out1)

## 
##  studentized Breusch-Pagan test
## 
## data:  out1
## BP = 26.173, df = 8, p-value = 0.0009811

The Breusch-Pagan test is a test for heteroskedasticy. Because the p-value is quite small, we reject the Null Hypothesis of homoskedasticity at any standard confidence level and conclude that our data exhibit heteroskedasticity.

Question 4 [5 pts] Proofs

Question 4 has three subparts.

Assume the linear regression model in matrix form: \(Y = X\beta + \varepsilon\) with \(\varepsilon \sim \mathcal{N}(0, \Sigma)\). Consider \(X\) as fixed or nonstochastic, and define:

The least squares estimator \(b = (X'X)^{-1}X'Y\).
The variance-covariance matrix \(\Sigma = \mathbb{E}[\varepsilon\varepsilon']\).

4a) Show that \(b = \beta + (X'X)^{-1}X'\varepsilon\).

\(b\)
\(= (X'X)^{-1}X'Y\)
\(= (X'X)^{-1}X'(X\beta + \varepsilon)\)
\(= \beta + (X'X)^{-1}X'\varepsilon\)

4b) Show that var\((b) = (X'X)^{-1}X' \Sigma X(X'X)^{-1}\).

\(\text{var}(b)\)
\(= \text{var}(\beta + (X'X)^{-1}X'\varepsilon)\)
\(= \text{var}((X'X)^{-1}X'\varepsilon)\)
\(= \mathbb{E}\{[(X'X)^{-1}X'\varepsilon][(X'X)^{-1}X'\varepsilon]'\}\)
\(= \mathbb{E}[(X'X)^{-1}X'\varepsilon\varepsilon'X(X'X)^{-1}]\)
\(= (X'X)^{-1}X'\mathbb{E}[\varepsilon\varepsilon']X(X'X)^{-1}\)
\(= (X'X)^{-1}X' \Sigma X(X'X)^{-1}\)

4c) If we now assume homoskedasticity, specifically that \(\Sigma = \sigma^2 I\), show that var\((b) = \sigma^2 (X'X)^{-1}\).

\(\text{var}(b)\)
\(= (X'X)^{-1}X' \Sigma X(X'X)^{-1}\)
\(= (X'X)^{-1}X' \sigma^2 I X(X'X)^{-1}\)
\(= \sigma^2 (X'X)^{-1}X'X(X'X)^{-1}\)
\(= \sigma^2 (X'X)^{-1}\)

Question 5 [5 pts] Time Series

The gtAuto dataset in the DataAnalytics package provides 91 observations of monthly US motor vehicles sales data (in millions of dollars). The data span the time period January 2004 through July 2011.

gtAuto$diff <- gtAuto$sales - back(gtAuto$sales)
out <- lm(diff ~ back(diff), data=gtAuto)
lmSumm(out)

Coefficients:
            Estimate Std Error t value p value
(Intercept)  73.1400  642.4000    0.11   0.910
back(diff)   -0.2094    0.1046   -2.00   0.048
---
Standard Error of the Regression:  6060
Multiple R-squared:  0.044  Adjusted R-squared:  0.033
Overall F stat: 4.01 on 1 and 87 DF, pvalue= 0.048

5a) If the sales in June 2011 were 70749 and the sales in July 2011 were 69910, what are the predicted sales in August 2011 according to the fitted ARIMA(1,1,0) model above?

Let \(\delta_t = t_{T} - Y_{t-1}\)

\(\hat{\delta}_{T+1} = 73.14 - 0.2094 \times \delta_T = 73.14 - 0.2094 \times (69910 - 70749) = 248.8\)

\(Y_{T+1}^\text{pred} = Y_{T} + 248.8 = 69910 + 248.8 = 70158.8\)

5b) The first six autocorrelations of the residuals are provided below. The p-value from a Box-Ljung test of these autocorrelations is \(4.5769 \times 10^{-5}\). What do these post-estimation analyses tell us?

cor(e_t, e_{t})   =  1   
cor(e_t, e_{t-1}) = -0.023
cor(e_t, e_{t-2}) = -0.002
cor(e_t, e_{t-3}) =  0.067
cor(e_t, e_{t-4}) = -0.098
cor(e_t, e_{t-5}) = -0.073
cor(e_t, e_{t-6}) = -0.528

The 6th autocorrelation is high (above 0.5) and the Box-Ljung test has a p-value below any standard threshold. These results indicate that there is autocorrelation in the residuals of the ARIMA(1,1,0) model and perhaps that an ARIMA(6,1,0) model should be considered.

Question 6 [3 pts] ARCH

Suppose you fit an ARCH-1 model to 1000 days of S&P 500 returns using R’s optimizer optim(). The call to optim() returns the following maximum likelihood estimates (\(\hat{\theta}\)) and Hessian matrix (\(H\)). Use \(\hat{\theta}\) and \(H\) to answer the following two questions.

\[ \hat{\theta} = \begin{bmatrix} 1.037 \\ 0.613 \end{bmatrix} \hspace{3em} H = \begin{bmatrix} -231 & -117 \\ -117 & -267 \end{bmatrix} \]

6a) Calculate the variance-covariance matrix.

\[ \hat{\Sigma} = -H^{-1} = \frac{1}{(231)(267)-(117)(117)} \begin{bmatrix} 267 & -117 \\ -117 & 231 \end{bmatrix} = \begin{bmatrix} 0.005563891 & -0.002438110 \\ -0.002438110 & 0.004813703 \end{bmatrix}\]

6b) Test whether the coefficients are each individually statistically significantly different from zero at the 95% confidence level. For reference, qnorm(p=0.975) = 1.9599.

\(s_{\theta_1} = \sqrt{0.005563891} = 0.0746\)

\(t_1 = 1.037 / 0.0746 = 13.9\)

\(13.9 > 1.9599\) \(\Rightarrow\) \(\theta_1\) is statistically significantly different from zero at the 95% confidence level.

\(s_{\theta_2} = \sqrt{0.004813703} = 0.0694\)

\(t_1 = 0.613 / 0.0694 = 8.83\)

\(8.83 > 1.9599\) \(\Rightarrow\) \(\theta_2\) is statistically significantly different from zero at the 95% confidence level.