7.2 Step Functions
For equation (7.4), note that the \(X\) is a column vector (dim(X) = c(n, 1)
) , because in section 7.1 ~ 7.6 there is only 1 predictor, which means \(p = 1\).
7.4 Regression Splines
Since each polynomial has four parameters, we are using a total of eight degrees of freedom in fitting this piecewise polynomial model.
有助于理解什么是自由度。
The general definition of a degree-d spline is that it is a piecewise degree-d polynomial, with continuity in derivatives up to degree \(d − 1\) at each knot.
概念定义:
自由度为 \(d\) 的样条曲线 (spline): 在节点处 \(d-1\) 阶导数连续。例如272页最后一段表明一个3阶 spline (cubic spline) 的1阶、2阶导数 在节点处 都要连续。
cubic spline: 3阶样条曲线,在节点处保证1阶和2阶导数连续的 \(\beta(x)\);
natural spline: 有边界约束的spline,也就是在 \(X\) 小于最小的节点和大于最大的节点时,\(\beta(x)\) 必须是线性的。
7.5 Smoothing Splines
光滑曲线方法的计算公式 (7.11) 与 第6章 ridge regression 和 lasso 的思想是类似的,都是在 training RSS 的基础上增加一个带调节参数 \(\lambda\) 的 penalty 项,从而避免过拟合。 区别在于后两者通过控制系数的1阶 (lasso) 和2阶 (ridge) 模总和(书中用“预算” budget 来比喻十分贴切)达到目的,而 smooth spline 通过控制曲线粗糙度 (roughness) 达到目的。
\(\int g''(t)^2dt\) is simply a measure of the total change in the function \(g'(t)\) over its entire range.
The larger the value of λ, the smoother g will be.
下面这句话解释了为什么 natural cubic spline 要求2阶导数连续:
it is a piecewise cubic polynomial with knots at the unique values of \(x_1, \dots, x_n\) and continuous first and second derivatives at each knot.
如果2阶导数不连续,式 (7.11) 的 penalty 部分的 \(g''(t)\) 会变为无穷大。
对于式 (7.12),\(\hat g_\lambda\) 不应该是一个长度为 \(n\) 的向量吧,\(\hat g_i\) 应该是一个三次曲线的系数。
7.6 Local Regression
算法 7.1 中,参数 \(s\) 越小,拟合越不平滑,相当于上一节式 (7.11) 中 \(\lambda\) 越小; \(s\) 越大,拟合越平滑,相当于 \(\lambda\) 越大。
Local regression 不适于特征数量 \(p\) 大于3或者4的场景:
However, local regression can perform poorly if \(p\) is much larger than about 3 or 4 because there will generally be very few training observations close to \(x_0\).
7.7 Generalized Additive Models
前6章介绍了有参数方法,包括线性回归和分类,优点是计算量小,解释性好,缺点是适用范围窄,待解问题必须满足线性和可加性两个要求;第8章介绍无参数方法特征正好相反,本章介绍的非线性回归和分类介于二者之间,只要满足可加性即可,如果不满足,还可以用增加交互项的方法弥补:
For fully general models, we have to look for even more flexible approaches such as random forests and boosting, described in Chapter 8. GAMs provide a useful compromise between linear and fully nonparametric models.
对比式 (7.17) 和 (7.18) 可知,回归或者分类中,可加性保证响应变量的估计值 \(\hat y\) 由各个原始特征观测值 \(X_i \,(i \in [1..p])\) 的某种变换相加得到, 线性与非线性的区别在于变换形式不同:
线性变换: 在 \(X_i\) 上乘一个实数 \(\beta_i\):$_i X_i ,(i ) $;
非线性变换:以 \(X_i\) 为自变量做某种函数变换:\(f_i(X_i) \,(i \in [1..p])\),所以线性回归是 \(f_i(X_i) = \beta_i X_i\) 的一种特殊的非线性回归。
LS0tCnRpdGxlOiAiQ2hhcHRlciA3IE5vdGVzIG9mIElTTCIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyA3LjIgU3RlcCBGdW5jdGlvbnMKCkZvciBlcXVhdGlvbiAoNy40KSwgbm90ZSB0aGF0IHRoZSAkWCQgaXMgYSBjb2x1bW4gdmVjdG9yIChgZGltKFgpID0gYyhuLCAxKWAgKSAsIGJlY2F1c2UgaW4gc2VjdGlvbiA3LjEgfiA3LjYgdGhlcmUgaXMgb25seSAxIHByZWRpY3Rvciwgd2hpY2ggbWVhbnMgJHAgPSAxJC4KCiMgNy40IFJlZ3Jlc3Npb24gU3BsaW5lcwoKPiBTaW5jZSBlYWNoIHBvbHlub21pYWwgaGFzIGZvdXIgcGFyYW1ldGVycywgd2UgYXJlIHVzaW5nIGEgdG90YWwgb2YgZWlnaHQgZGVncmVlcyBvZiBmcmVlZG9tIGluIGZpdHRpbmcgdGhpcyBwaWVjZXdpc2UgcG9seW5vbWlhbCBtb2RlbC4KCuacieWKqeS6jueQhuino+S7gOS5iOaYr+iHqueUseW6puOAggoKPiBUaGUgZ2VuZXJhbCBkZWZpbml0aW9uIG9mIGEgZGVncmVlLWQgc3BsaW5lIGlzIHRoYXQgaXQgaXMgYSBwaWVjZXdpc2UgIGRlZ3JlZS1kIHBvbHlub21pYWwsIHdpdGggY29udGludWl0eSBpbiBkZXJpdmF0aXZlcyB1cCB0byBkZWdyZWUgJGQg4oiSIDEkIGF0ICBlYWNoIGtub3QuCgrmpoLlv7XlrprkuYnvvJoKCiog6Ieq55Sx5bqm5Li6ICRkJCDnmoTmoLfmnaHmm7Lnur8gKHNwbGluZSk6ICDlnKjoioLngrnlpIQgJGQtMSQg6Zi25a+85pWw6L+e57ut44CC5L6L5aaCMjcy6aG15pyA5ZCO5LiA5q616KGo5piO5LiA5LiqM+mYtiBzcGxpbmUgKGN1YmljIHNwbGluZSkg55qEMemYtuOAgTLpmLblr7zmlbAg5Zyo6IqC54K55aSEIOmDveimgei/nue7reOAggoKKiAgY3ViaWMgc3BsaW5lOiAz6Zi25qC35p2h5puy57q/77yM5Zyo6IqC54K55aSE5L+d6K+BMemYtuWSjDLpmLblr7zmlbDov57nu63nmoQgJFxiZXRhKHgpJO+8mwoKKiBuYXR1cmFsIHNwbGluZTog5pyJ6L6555WM57qm5p2f55qEc3BsaW5l77yM5Lmf5bCx5piv5ZyoICRYJCDlsI/kuo7mnIDlsI/nmoToioLngrnlkozlpKfkuo7mnIDlpKfnmoToioLngrnml7bvvIwkXGJldGEoeCkkIOW/hemhu+aYr+e6v+aAp+eahOOAggoKIyA3LjUgU21vb3RoaW5nIFNwbGluZXMKCuWFiea7keabsue6v+aWueazleeahOiuoeeul+WFrOW8jyAoNy4xMSkg5LiOIOesrDbnq6AgcmlkZ2UgcmVncmVzc2lvbiDlkowgbGFzc28g55qE5oCd5oOz5piv57G75Ly855qE77yM6YO95piv5ZyoIHRyYWluaW5nIFJTUyDnmoTln7rnoYDkuIrlop7liqDkuIDkuKrluKbosIPoioLlj4LmlbAgJFxsYW1iZGEkIOeahCBwZW5hbHR5IOmhue+8jOS7juiAjOmBv+WFjei/h+aLn+WQiOOAggrljLrliKvlnKjkuo7lkI7kuKTogIXpgJrov4fmjqfliLbns7vmlbDnmoQx6Zi2IChsYXNzbykg5ZKMMumYtiAocmlkZ2UpIOaooeaAu+WSjO+8iOS5puS4reeUqOKAnOmihOeul+KAnSBidWRnZXQg5p2l5q+U5Za75Y2B5YiG6LS05YiH77yJ6L6+5Yiw55uu55qE77yM6ICMIHNtb290aCBzcGxpbmUg6YCa6L+H5o6n5Yi25puy57q/57KX57OZ5bqmIChyb3VnaG5lc3MpIOi+vuWIsOebrueahOOAggoKPiAgJFxpbnQgZycnKHQpXjJkdCQgaXMgc2ltcGx5IGEgbWVhc3VyZSBvZiB0aGUgdG90YWwgIGNoYW5nZSBpbiB0aGUgZnVuY3Rpb24gJGcnKHQpJCBvdmVyIGl0cyBlbnRpcmUgcmFuZ2UuCgpUaGUgbGFyZ2VyIHRoZSB2YWx1ZSBvZiDOuywgdGhlIHNtb290aGVyIGcgd2lsbCBiZS4KCuS4i+mdoui/meWPpeivneino+mHiuS6huS4uuS7gOS5iCBuYXR1cmFsIGN1YmljIHNwbGluZSDopoHmsYIy6Zi25a+85pWw6L+e57ut77yaCgo+ICBpdCBpcyBhIHBpZWNld2lzZSBjdWJpYyBwb2x5bm9taWFsIHdpdGgga25vdHMgYXQgdGhlIHVuaXF1ZSAgdmFsdWVzIG9mICR4XzEsIFxkb3RzLCB4X24kIGFuZCBjb250aW51b3VzIGZpcnN0IGFuZCBzZWNvbmQgZGVyaXZhdGl2ZXMgYXQgZWFjaCAga25vdC4KCuWmguaenDLpmLblr7zmlbDkuI3ov57nu63vvIzlvI8gKDcuMTEpIOeahCBwZW5hbHR5IOmDqOWIhueahCAkZycnKHQpJCDkvJrlj5jkuLrml6DnqbflpKfjgIIKCuWvueS6juW8jyAoNy4xMinvvIwkXGhhdCBnX1xsYW1iZGEkIOS4jeW6lOivpeaYr+S4gOS4qumVv+W6puS4uiAkbiQg55qE5ZCR6YeP5ZCn77yMJFxoYXQgZ19pJCDlupTor6XmmK/kuIDkuKrkuInmrKHmm7Lnur/nmoTns7vmlbDjgIIKCiMgNy42IExvY2FsIFJlZ3Jlc3Npb24KCueul+azlSA3LjEg5Lit77yM5Y+C5pWwICRzJCDotorlsI/vvIzmi5/lkIjotorkuI3lubPmu5HvvIznm7jlvZPkuo7kuIrkuIDoioLlvI8gKDcuMTEpIOS4rSAkXGxhbWJkYSQg6LaK5bCP77ybCiRzJCDotorlpKfvvIzmi5/lkIjotorlubPmu5HvvIznm7jlvZPkuo4gJFxsYW1iZGEkIOi2iuWkp+OAggoKTG9jYWwgcmVncmVzc2lvbiDkuI3pgILkuo7nibnlvoHmlbDph48gJHAkIOWkp+S6jjPmiJbogIU055qE5Zy65pmv77yaCgo+IEhvd2V2ZXIsIGxvY2FsIHJlZ3Jlc3Npb24gY2FuIHBlcmZvcm0gcG9vcmx5IGlmICRwJCBpcyBtdWNoICBsYXJnZXIgdGhhbiBhYm91dCAzIG9yIDQgYmVjYXVzZSB0aGVyZSB3aWxsIGdlbmVyYWxseSBiZSB2ZXJ5IGZldyB0cmFpbmluZyAgb2JzZXJ2YXRpb25zIGNsb3NlIHRvICR4XzAkLgoKIyA3LjcgR2VuZXJhbGl6ZWQgQWRkaXRpdmUgTW9kZWxzCgrliY0256ug5LuL57uN5LqG5pyJ5Y+C5pWw5pa55rOV77yM5YyF5ous57q/5oCn5Zue5b2S5ZKM5YiG57G777yM5LyY54K55piv6K6h566X6YeP5bCP77yM6Kej6YeK5oCn5aW977yM57y654K55piv6YCC55So6IyD5Zu056qE77yM5b6F6Kej6Zeu6aKY5b+F6aG75ruh6Laz57q/5oCn5ZKM5Y+v5Yqg5oCn5Lik5Liq6KaB5rGC77yb56ysOOeroOS7i+e7jeaXoOWPguaVsOaWueazleeJueW+geato+WlveebuOWPje+8jOacrOeroOS7i+e7jeeahOmdnue6v+aAp+WbnuW9kuWSjOWIhuexu+S7i+S6juS6jOiAheS5i+mXtO+8jOWPquimgea7oei2s+WPr+WKoOaAp+WNs+WPr++8jOWmguaenOS4jea7oei2s++8jOi/mOWPr+S7peeUqOWinuWKoOS6pOS6kumhueeahOaWueazleW8peihpe+8mgoKPiBGb3IgZnVsbHkgZ2VuZXJhbCBtb2RlbHMsIHdlIGhhdmUgdG8gbG9vayBmb3IgZXZlbiBtb3JlIGZsZXhpYmxlIGFwcHJvYWNoZXMgIHN1Y2ggYXMgcmFuZG9tIGZvcmVzdHMgYW5kIGJvb3N0aW5nLCBkZXNjcmliZWQgaW4gQ2hhcHRlciA4LiBHQU1zIHByb3ZpZGUKYSB1c2VmdWwgY29tcHJvbWlzZSBiZXR3ZWVuIGxpbmVhciBhbmQgZnVsbHkgbm9ucGFyYW1ldHJpYyBtb2RlbHMuCgrlr7nmr5TlvI8gKDcuMTcpIOWSjCAoNy4xOCkg5Y+v55+l77yM5Zue5b2S5oiW6ICF5YiG57G75Lit77yM5Y+v5Yqg5oCn5L+d6K+B5ZON5bqU5Y+Y6YeP55qE5Lyw6K6h5YC8ICRcaGF0IHkkIOeUseWQhOS4quWOn+Wni+eJueW+geingua1i+WAvCAkWF9pIFwsKGkgXGluIFsxLi5wXSkkIOeahOafkOenjeWPmOaNouebuOWKoOW+l+WIsO+8jArnur/mgKfkuI7pnZ7nur/mgKfnmoTljLrliKvlnKjkuo7lj5jmjaLlvaLlvI/kuI3lkIzvvJoKCiog57q/5oCn5Y+Y5o2i77yaIOWcqCAkWF9pJCDkuIrkuZjkuIDkuKrlrp7mlbAgJFxiZXRhX2kk77yaJFxiZXRhX2kgWF9pICBcLChpIFxpbiBbMS4ucF0pICTvvJsKCiog6Z2e57q/5oCn5Y+Y5o2i77ya5LulICRYX2kkIOS4uuiHquWPmOmHj+WBmuafkOenjeWHveaVsOWPmOaNou+8miRmX2koWF9pKSBcLChpIFxpbiBbMS4ucF0pJO+8jOaJgOS7pee6v+aAp+WbnuW9kuaYryAkZl9pKFhfaSkgPSBcYmV0YV9pIFhfaSQg55qE5LiA56eN54m55q6K55qE6Z2e57q/5oCn5Zue5b2S44CCCgo=