Question 1

  1. better: more data points make flexible method more accurate;

  2. worse: flexible method tend to be overfit on less data points;

  3. better: flexible method is more suitable in non-linear situations;

  4. worse: high variance of error terms make flexible method overfit;

Question 2

  1. Regression, inference, n=500, p=3;

  2. Classification, prediction, n=20, p=13;

  3. Regression, prediction, n=52 (how many weeks in 2012), p=3;

Question 3

3a

See fig 2.9 for pattern of training MSE and test MSE when flexibility increase. See fig. 2.12 for pattern of variance, bias and test MSE when flexibility increase.

3b

Squared bias decreases monotonically because increases in flexibility yield a closer fit.

Variance increases monotonically because increases in flexibility yield overfit.

Training error decreases monotonically because increases in flexibility yield a closer fit.

Test error concave up curve because increase in flexibility yields a closer fit before it overfits.

Bayes irreducible error don’t change according to the flexibility.

All above five lines are always positive (whose values are greater than 0).

Question 5

For prediction, use flexible method, with enough observations to avoid overfit.

For inference, use a less flexible approach, while the bias may be high.

Question 6

A parametric approach reduces the problem of estimating \(f\) down to a set of parameters, which defines the form for f.

A non-parametric approach does not assume a functional form of \(f\), hence requires a very large number of observations to accurately estimate it.

The advantages of parametric methods are the simplification of modeling \(f\). And less observations are required compared to non-parametric methods.

The disadvantages of parametric methods are potential to inaccurately estimate \(f\) if the form of \(f\) assumed is wrong, or overfitting the observations if more flexible model used.

Question 7

  1. \(3, 2, \sqrt{10}, \sqrt5, \sqrt2, \sqrt3\)

  2. Green, for the minimum of results in (a) above is \(\sqrt2\)

  3. Red, for there are 2 Red (for Euclidean distance = 2 and \(\sqrt3\)) and 1 Green (for it = \(\sqrt2\))

  4. Small, because small K is more suitable for non-linear problems.

LS0tCnRpdGxlOiAiQ29uY2VwdHVhbCBFeGVyY2lzZXMgb2YgQ2hhcHRlciAyIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIFF1ZXN0aW9uIDEKCihhKSBiZXR0ZXI6IG1vcmUgZGF0YSBwb2ludHMgbWFrZSBmbGV4aWJsZSBtZXRob2QgbW9yZSBhY2N1cmF0ZTsKCihiKSB3b3JzZTogZmxleGlibGUgbWV0aG9kIHRlbmQgdG8gYmUgb3ZlcmZpdCBvbiBsZXNzIGRhdGEgcG9pbnRzOwoKKGMpIGJldHRlcjogZmxleGlibGUgbWV0aG9kIGlzIG1vcmUgc3VpdGFibGUgaW4gbm9uLWxpbmVhciBzaXR1YXRpb25zOwoKKGQpIHdvcnNlOiBoaWdoIHZhcmlhbmNlIG9mIGVycm9yIHRlcm1zIG1ha2UgZmxleGlibGUgbWV0aG9kIG92ZXJmaXQ7CgojIFF1ZXN0aW9uIDIKCihhKSBSZWdyZXNzaW9uLCBpbmZlcmVuY2UsIG49NTAwLCBwPTM7CgooYikgQ2xhc3NpZmljYXRpb24sIHByZWRpY3Rpb24sIG49MjAsIHA9MTM7CgooYykgUmVncmVzc2lvbiwgcHJlZGljdGlvbiwgbj01MiAoaG93IG1hbnkgd2Vla3MgaW4gMjAxMiksIHA9MzsKCiMgUXVlc3Rpb24gMwoKIyMgM2EKU2VlIGZpZyAyLjkgZm9yIHBhdHRlcm4gb2YgdHJhaW5pbmcgTVNFIGFuZCB0ZXN0IE1TRSB3aGVuIGZsZXhpYmlsaXR5IGluY3JlYXNlLiBTZWUgZmlnLiAyLjEyIGZvciBwYXR0ZXJuIG9mIHZhcmlhbmNlLCBiaWFzIGFuZCB0ZXN0IE1TRSB3aGVuIGZsZXhpYmlsaXR5IGluY3JlYXNlLgoKIyMgM2IKClNxdWFyZWQgYmlhcyBkZWNyZWFzZXMgbW9ub3RvbmljYWxseSBiZWNhdXNlIGluY3JlYXNlcyBpbiBmbGV4aWJpbGl0eSB5aWVsZCBhIGNsb3NlciBmaXQuCgpWYXJpYW5jZSBpbmNyZWFzZXMgbW9ub3RvbmljYWxseSBiZWNhdXNlIGluY3JlYXNlcyBpbiBmbGV4aWJpbGl0eSB5aWVsZCBvdmVyZml0LgoKVHJhaW5pbmcgZXJyb3IgZGVjcmVhc2VzIG1vbm90b25pY2FsbHkgYmVjYXVzZSBpbmNyZWFzZXMgaW4gZmxleGliaWxpdHkgeWllbGQgYSBjbG9zZXIgZml0LgoKVGVzdCBlcnJvciBjb25jYXZlIHVwIGN1cnZlIGJlY2F1c2UgaW5jcmVhc2UgaW4gZmxleGliaWxpdHkgeWllbGRzIGEgY2xvc2VyIGZpdCBiZWZvcmUgaXQgb3ZlcmZpdHMuCgpCYXllcyBpcnJlZHVjaWJsZSBlcnJvciBkb24ndCBjaGFuZ2UgYWNjb3JkaW5nIHRvIHRoZSBmbGV4aWJpbGl0eS4KCkFsbCBhYm92ZSBmaXZlIGxpbmVzIGFyZSBhbHdheXMgcG9zaXRpdmUgKHdob3NlIHZhbHVlcyBhcmUgZ3JlYXRlciB0aGFuIDApLgoKIyBRdWVzdGlvbiA1CgpGb3IgcHJlZGljdGlvbiwgdXNlIGZsZXhpYmxlIG1ldGhvZCwgd2l0aCBlbm91Z2ggb2JzZXJ2YXRpb25zIHRvIGF2b2lkIG92ZXJmaXQuCgpGb3IgaW5mZXJlbmNlLCB1c2UgYSBsZXNzIGZsZXhpYmxlIGFwcHJvYWNoLCB3aGlsZSB0aGUgYmlhcyBtYXkgYmUgaGlnaC4KCiMgUXVlc3Rpb24gNgoKQSBwYXJhbWV0cmljIGFwcHJvYWNoIHJlZHVjZXMgdGhlIHByb2JsZW0gb2YgZXN0aW1hdGluZyAkZiQgZG93biB0byBhIHNldCBvZiBwYXJhbWV0ZXJzLAp3aGljaCBkZWZpbmVzIHRoZSBmb3JtIGZvciBmLgoKQSBub24tcGFyYW1ldHJpYyBhcHByb2FjaCBkb2VzIG5vdCBhc3N1bWUgYSBmdW5jdGlvbmFsIGZvcm0gb2YgJGYkLApoZW5jZSByZXF1aXJlcyBhIHZlcnkgbGFyZ2UgbnVtYmVyIG9mIG9ic2VydmF0aW9ucyB0byBhY2N1cmF0ZWx5IGVzdGltYXRlIGl0LgoKVGhlIGFkdmFudGFnZXMgb2YgcGFyYW1ldHJpYyBtZXRob2RzIGFyZSB0aGUgc2ltcGxpZmljYXRpb24gb2YgbW9kZWxpbmcgJGYkLgpBbmQgbGVzcyBvYnNlcnZhdGlvbnMgYXJlIHJlcXVpcmVkIGNvbXBhcmVkIHRvIG5vbi1wYXJhbWV0cmljIG1ldGhvZHMuCgpUaGUgZGlzYWR2YW50YWdlcyBvZiBwYXJhbWV0cmljIG1ldGhvZHMgYXJlIHBvdGVudGlhbCB0byBpbmFjY3VyYXRlbHkgZXN0aW1hdGUgJGYkCmlmIHRoZSBmb3JtIG9mICRmJCBhc3N1bWVkIGlzIHdyb25nLCBvciBvdmVyZml0dGluZyB0aGUgb2JzZXJ2YXRpb25zIGlmIG1vcmUgZmxleGlibGUgbW9kZWwgdXNlZC4KCiMgUXVlc3Rpb24gNwoKKGEpICQzLCAyLCBcc3FydHsxMH0sIFxzcXJ0NSwgXHNxcnQyLCBcc3FydDMkCgooYikgR3JlZW4sIGZvciB0aGUgbWluaW11bSBvZiByZXN1bHRzIGluICooYSkqIGFib3ZlIGlzICRcc3FydDIkCgooYykgUmVkLCBmb3IgdGhlcmUgYXJlIDIgKlJlZCogKGZvciBFdWNsaWRlYW4gZGlzdGFuY2UgPSAyIGFuZCAkXHNxcnQzJCkgYW5kIDEgKkdyZWVuKiAoZm9yIGl0ID0gJFxzcXJ0MiQpCgooZCkgU21hbGwsIGJlY2F1c2Ugc21hbGwgSyBpcyBtb3JlIHN1aXRhYmxlIGZvciBub24tbGluZWFyIHByb2JsZW1zLgo=