For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer.
We now revisit the bias-variance decomposition.
Provide a sketch of typical (squared) bias, variance, training mean squared error, test mean squared error, and Bayes (or irreducible) error rate curves, on a single plot, as we go from less flexible statistical learning methods towards more flexible approaches. The \(x\)-axis should represent the amount of flexibility in the method, and the \(y\)-axis should represent the values for each curve. There should be five curves. Make sure to label each one.
Explain why each of the five curves has the shape displayed in part (a)

The table below provides a training data set containing six observations, three predictors, and one qualitative response variable
| Obs. | \(X_1\) | \(X_2\) | \(X_3\) | \(Y\) |
|---|---|---|---|---|
| 0 | 3 | 0 | Red | |
| 2 | 0 | 0 | Red | |
| 0 | 1 | 3 | Red | |
| 0 | 1 | 2 | Green | |
| -1 | 0 | 1 | Green | |
| 1 | 1 | 1 | Red |
Suppose we wish to use this data set to make a prediction for \(Y\) when \(X_1 = X_2 = X_3 = 0\) using K-nearest neighbors.
(Note: the Euclidean distance of two vectors \(a = (a_1,a_2,a_3)\) and \(b = (b_1,b_2,b_3)\) is given by \(d(a,b) = \sqrt{(a_1-b_1)^2 + (a_2-b_2)^2 + (a_3-b_3)^2}\). The same idea extends to vectors with \(n\) coordinates.)
| Obs. | \(X_1\) | \(X_2\) | \(X_3\) | \(d\big(obs, (0,0,0) \big)\) |
|---|---|---|---|---|
| 0 | 3 | 0 | |
|
| 2 | 0 | 0 | |
|
| 0 | 1 | 3 | |
|
| 0 | 1 | 2 | |
|
| -1 | 0 | 1 | |
|
| 1 | 1 | 1 | |
What are the advantages and disadvantages of a very flexible (versus a less flexible) approach for regression or classification? Under what circumstances might a more flexible approach be preferred to a less flexible approach? When might a less flexible approach be preferred?
The advantage of very flexible models is that they can fit the data
better, match the data better, and reduce bias, especially for nonlinear
problems. The disadvantages are that more parameters need to be
estimated, tend to overfit the data, increase the variance, and have
poor interpretability.
A more flexible approach is better when our goal is prediction rather
than interpretation. The less flexible approach is better when our
purpose is interpretation rather than prediction.
Describe the differences between a parametric and a non-parametric statistical learning approach. What are the advantages of a parametric approach to regression or classification (as opposed to a non-parametric approach)? What are its disadvantages? Describe the differences between a parametric and a non-parametric statistical learning approach.
Parametric methods make certain prior assumptions about the form of
f, reducing the problem of estimating f to that of estimating the
parameters; whereas nonparametric methods do not make prior assumptions
about the form of f.
Thus the advantage of parametric methods is that they do not require
many observations.
The disadvantage is that the form of f needs to be chosen correctly, and
once it is chosen incorrectly, a bad or even wrong model is
obtained.