5.)

a.) Since QDA is more flexible, as a result QDA will often achieve lower training error than LDA simply because it has more parameters and can overfit more easily. If the boundary is found to be linear, LDA will likely perform better in the test set but if the relationship is more complex or nonlinear, QDA will often achieve worst test performance due to unneccesary variance. Thus, LDA will outperform QDA on the test set if the true boundary is linear.

b.)

Training Set: Again, QDA’s flexibility usually yields lower training error (it can more closely capture non‐linear boundaries).
Test Set: If the true boundary is non‐linear, QDA should do a better job modeling it, provided we have enough data to estimate the separate covariance matrices reliably. Hence QDA would typically have lower test error than LDA in the non‐linear‐boundary case.

c.) QDA has more parameters to estimate and therefore higher variance when n is small. As n grows, we can estimate those extra parameters more reliably. In general, as n becomes large, QDA’s extra flexibility becomes less of a drawback (variance goes down), so if the true boundary is at all non‐linear, QDA tends to gain an advantage over LDA. Conversely, if the boundary is truly linear, LDA remains at least as good as QDA (often better).

d.) False, if the boundary is genuinely linear, then LDA is already the correct parametric form. QDA’s additional flexibility will typically increase variance and can hurt test‐set performance. Hence QDA would not automatically achieve lower test error in the truly linear case.

8.)

Our two classifiers:

Logistic Regression
- Training error: 20%
- Test error: 30%
1-Nearest Neighbor (K = 1)
- Average error (training + test): 18%

Because K=1 typically has very low (often 0%) training error, having an average error of 18% suggests that its test error alone is around 36%. In contrast, logistic regression’s test error is 30%. For new observations, test error is what matters most. Since 30% is lower than about 36%, logistic regression is the better choice for classifying new data.

Exercise#3

2025-03-13

5.)

6.)

8.)

9.)