Question 1: We perform best subset, forward stepwise, and backward step wise selection on a single data set. For each approach, we obtain p + 1 models, containing 0, 1, 2, . . . ,p predictors. Explain your answers:

(c) True or False:

i. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1)-variable model identified by forward stepwise selection.

True. Forward stepwise selection builds up models by sequentially adding predictors. Thus, the (k+1)-variable model must contain all predictors from the k-variable model plus one additional predictor.

ii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k+1)-variable model identified by backward stepwise selection.

False. Backward stepwise selection starts with all predictors and removes one at a time. Hence, the (k+1)-variable model may contain different predictors than the k-variable model.

iii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors in the (k+1)-variable model identified by forward stepwise selection.

False. Forward and backward stepwise selections do not necessarily produce nested models across techniques.

iv. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in the (k+1)-variable model identified by backward stepwise selection.

False. Again, models from forward and backward selection are not necessarily nested in each other.

v. The predictors in the k-variable model identified by best subset are a subset of the predictors in the (k+1)-variable model identified by best subset selection.

False. Best subset selection evaluates all combinations and does not require that a k-variable model be nested within a (k+1)-variable model.

Question 2: For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

  1. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  2. More flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.
  3. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.
  4. Less flexible and hence will give improved prediction accuracy when its increase in variance is less than its decrease in bias.


Correct Answer:
iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation: The lasso is less flexible than least squares because it constrains the sum of the absolute values of the coefficients. This increases bias but decreases variance. Improved prediction accuracy occurs when the increase in bias is less than the decrease in variance.

(b) Repeat (a) for ridge regression relative to least squares.

Correct Answer: iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation: Like lasso, ridge regression adds a penalty term to the loss function and is also less flexible than least squares. The same reasoning about bias-variance tradeoff applies.

(c) Repeat (a) for non-linear methods relative to least squares.

Correct Answer: i. More flexible and hence will give improved prediction accuracy when its increase in bias is less than its decrease in variance.

Explanation: Non-linear methods are more flexible than least squares, often reducing bias at the cost of increased variance. Prediction improves when the bias decrease outweighs the variance increase.

Question 3: Suppose we estimate the regression coefficients in a linear regression model by minimizing for a particular value of s. For parts (a) through (e), indicate which of i. through v. is correct. Justify your answer.

(a) As we increase s from 0, the training RSS will:

  1. Increase initially, and then eventually start decreasing in an inverted U shape.
  2. Decrease initially, and then eventually start increasing in a U shape.
  3. Steadily increase.
  4. Steadily decrease.
  5. Remain constant

Correct Answer: iv. Steadily decrease.

Explanation: As s increases, the constraint on the sum of the absolute values of coefficients is relaxed, allowing more flexibility in fitting the training data, hence training RSS will steadily decrease.

(b) Repeat (a) for test RSS.

Correct Answer: ii. Decrease initially, and then eventually start increasing in a U shape.

Explanation: Initially, increasing s improves the fit and reduces bias, lowering test RSS. Beyond a point, increased model complexity leads to over-fitting and increased variance, making test RSS increase again. This forms a U-shaped curve.

(c) Repeat (a) for variance.

Correct Answer: iv. Steadily decrease.

Explanation: Increasing s leads to more model flexibility, which increases the model variance steadily.

(d) Repeat (a) for (squared) bias.

Correct Answer: iv. Steadily decrease.

Explanation: Increasing s reduces the penalty, which lowers bias. Hence, squared bias steadily decreases.

(e) Repeat (a) for the irreducible error.

Correct Answer: v. Remain constant

Explanation: Irreducible error is due to inherent noise in the data and does not depend on the model. It remains constant regardless of s.