5.)

a.) Since QDA is more flexible, as a result QDA will often achieve lower training error than LDA simply because it has more parameters and can overfit more easily. If the boundary is found to be linear, LDA will likely perform better in the test set but if the relationship is more complex or nonlinear, QDA will often achieve worst test performance due to unneccesary variance. Thus, LDA will outperform QDA on the test set if the true boundary is linear.

b.)

c.) QDA has more parameters to estimate and therefore higher variance when n is small. As n grows, we can estimate those extra parameters more reliably. In general, as n becomes large, QDA’s extra flexibility becomes less of a drawback (variance goes down), so if the true boundary is at all non‐linear, QDA tends to gain an advantage over LDA. Conversely, if the boundary is truly linear, LDA remains at least as good as QDA (often better).

d.) False, if the boundary is genuinely linear, then LDA is already the correct parametric form. QDA’s additional flexibility will typically increase variance and can hurt test‐set performance. Hence QDA would not automatically achieve lower test error in the truly linear case.

6.)

a.) 38%

b.) ~60hrs

8.)

Our two classifiers:

  1. Logistic Regression

    • Training error: 20%

    • Test error: 30%

  2. 1-Nearest Neighbor (K = 1)

    • Average error (training + test): 18%

Because K=1 typically has very low (often 0%) training error, having an average error of 18% suggests that its test error alone is around 36%. In contrast, logistic regression’s test error is 30%. For new observations, test error is what matters most. Since 30% is lower than about 36%, logistic regression is the better choice for classifying new data.

9.)

a.) 27%

b.) 0.19