4.1 An Overview of Classification
The relationship between classification and regression:
… the methods used for classification first predict the probability of each of the categories of a qualitative variable, as the basis for making the classi fication. In this sense they also behave like regression methods.
What is a Credit Card Balance"?
Your credit card balance is the amount of money you owe to your credit card company on your account. It could be a positive number if you owe money, a negative number if you’ve paid more than you owe or zero if you’ve paid off the balance in full.
See What is a Credit Card Balance? and Credit Card Balance for detailed explanations.
Mazda RX4 |
21.0 |
6 |
160 |
110 |
3.90 |
2.620 |
Mazda RX4 Wag |
21.0 |
6 |
160 |
110 |
3.90 |
2.875 |
Datsun 710 |
22.8 |
4 |
108 |
93 |
3.85 |
2.320 |
Hornet 4 Drive |
21.4 |
6 |
258 |
110 |
3.08 |
3.215 |
Hornet Sportabout |
18.7 |
8 |
360 |
175 |
3.15 |
3.440 |
Valiant |
18.1 |
6 |
225 |
105 |
2.76 |
3.460 |
4.4 Linear Discriminant Analysis
LDA 预测响应变量 \(Y\) 所属分类的原理: 假设响应变量 \(Y\) 有 \(1..K\) 共 \(K\) 个可选类别,且特征在每个类别内服从(多变量)正态分布(4.4.3节第1、2段),则基于已有的 \(Pr(X=x|Y=k_i),\; i \in [1..K]\),利用 Bayes 定理反推已知 \(X = x\) 情况下 \(Y = k_i,\; i \in [1..K]\) 的概率,见式 (4.10),分类预测结果就是概率最大的那个分类 \(k\)。
适用场景:
… linear discriminant analysis is popular when we have more than two response classes.
For equation (4.10), see detailed explanations of Bayer’s Formula, equation (1.9) on page 13 of “Introduction to Probability Theory” 11th edition by Sheldon M. Ross (IPT). Here \(f_k(x)\) corresponds to \(P(E\vert F_j)\), \(\pi_k\) corresponds to \(P(F_j)\).
For equation (4.11), see section 2.3.4 of IPT.
See exercise 2 for the proof of why the maximum of equation (4.13) is the maximum of equation (4.12). Notice for equation (4.12), the \(p_k(x)\) is the function of \(k\), not \(x\).
Explanation of equation (4.14):
The Bayes decision boundary is \(\delta_1(x) = \delta_2(x)\). Take equation (4.13) into it: \[
\frac{2x\mu_1 - \mu_1^2}{2\sigma^2} = \frac{2x\mu_2 - \mu_2^2}{2\sigma^2} \\
2x(\mu_1 - \mu_2) = \mu_1^2 - \mu_2^2 \\
x = \frac{\mu_1 + \mu_2}2
\]
LDA 中线性名称来自于式(4.17):
The word linear in the classifier’s name stems from the fact that the discriminant functions \(\hat \delta_k(x)\) in (4.17) are linear functions of \(x\) (as opposed to a more complex function of x).
For section 4.4.2, “varying the classifier threshold changes its true positive and false positive rate”, but how to change the classifier threshold in the equation (4.19)?
4.4.4 Quadratic Discriminant Analysis
QDA 不再要求响应变量的任意两组间协方差相同,表现在图4.9中,Bayes 边界从直线变成了曲线, 因此比 LDA 具有更大的灵活性。
LDA is a much less flexible classifier than QDA, and so has substantially lower variance. This can potentially lead to improved prediction performance. But there is a trade-off: if LDA’s assumption that the K classes share a common covariance matrix is badly off, then LDA can suffer from high bias. Roughly speaking, LDA tends to be a better bet than QDA if there are relatively few training observations and so reducing variance is crucial. In contrast, QDA is recommended if the training set is very large, so that the variance of the classifier is not a major concern, or if the assumption of a common covariance matrix for the K classes is clearly untenable.
LS0tCnRpdGxlOiAiSVNMIOesrDTnq6DnrJTorrAiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCiMgNC4xIEFuIE92ZXJ2aWV3IG9mIENsYXNzaWZpY2F0aW9uCgpUaGUgcmVsYXRpb25zaGlwIGJldHdlZW4gY2xhc3NpZmljYXRpb24gYW5kIHJlZ3Jlc3Npb246Cgo+IC4uLiB0aGUgbWV0aG9kcyB1c2VkIGZvciBjbGFzc2lmaWNhdGlvbiBmaXJzdCBwcmVkaWN0IHRoZSAqKnByb2JhYmlsaXR5Kiogb2YgZWFjaCBvZgo+IHRoZSBjYXRlZ29yaWVzIG9mIGEgcXVhbGl0YXRpdmUgdmFyaWFibGUsIGFzIHRoZSBiYXNpcyBmb3IgbWFraW5nIHRoZSBjbGFzc2kgZmljYXRpb24uCj4gSW4gdGhpcyBzZW5zZSB0aGV5IGFsc28gYmVoYXZlICoqbGlrZSByZWdyZXNzaW9uIG1ldGhvZHMqKi4KCldoYXQgaXMgYSBDcmVkaXQgQ2FyZCBCYWxhbmNlIj8KCj4gWW91ciBjcmVkaXQgY2FyZCBiYWxhbmNlIGlzIHRoZSBhbW91bnQgb2YgbW9uZXkgeW91IG93ZSB0byB5b3VyIGNyZWRpdCBjYXJkIGNvbXBhbnkgb24geW91ciBhY2NvdW50Lgo+IEl0IGNvdWxkIGJlIGEgcG9zaXRpdmUgbnVtYmVyIGlmIHlvdSBvd2UgbW9uZXksCj4gYSBuZWdhdGl2ZSBudW1iZXIgaWYgeW914oCZdmUgcGFpZCBtb3JlIHRoYW4geW91IG93ZSBvciB6ZXJvIGlmIHlvdeKAmXZlIHBhaWQgb2ZmIHRoZSBiYWxhbmNlIGluIGZ1bGwuCgpTZWUgW1doYXQgaXMgYSBDcmVkaXQgQ2FyZCBCYWxhbmNlP10oaHR0cHM6Ly93d3cuZGlzY292ZXIuY29tL2NyZWRpdC1jYXJkcy9yZXNvdXJjZXMvd2hhdC1pcy1hLWNyZWRpdC1jYXJkLWJhbGFuY2UpIGFuZCBbQ3JlZGl0IENhcmQgQmFsYW5jZV0oaHR0cHM6Ly93d3cuaW52ZXN0b3BlZGlhLmNvbS90ZXJtcy9jL2NyZWRpdC1jYXJkLWJhbGFuY2UuYXNwKSBmb3IgZGV0YWlsZWQgZXhwbGFuYXRpb25zLgoKYGBge3J9CmtuaXRyOjprYWJsZShtdGNhcnNbMTo2LCAxOjZdLCBjYXB0aW9uID0gJ0Egc3Vic2V0IG9mIG10Y2Fycy4nKQpgYGAKCiMgNC40IExpbmVhciBEaXNjcmltaW5hbnQgQW5hbHlzaXMKCkxEQSDpooTmtYvlk43lupTlj5jph48gJFkkIOaJgOWxnuWIhuexu+eahOWOn+eQhu+8mgrlgYforr7lk43lupTlj5jph48gJFkkIOaciSAkMS4uSyQg5YWxICRLJCDkuKrlj6/pgInnsbvliKvvvIzkuJTnibnlvoHlnKjmr4/kuKrnsbvliKvlhoXmnI3ku47vvIjlpJrlj5jph4/vvInmraPmgIHliIbluIPvvIg0LjQuM+iKguesrDHjgIEy5q6177yJ77yM5YiZ5Z+65LqO5bey5pyJ55qEICRQcihYPXh8WT1rX2kpLFw7IGkgXGluIFsxLi5LXSTvvIzliKnnlKggQmF5ZXMg5a6a55CG5Y+N5o6o5bey55+lICRYID0geCQg5oOF5Ya15LiLICRZID0ga19pLFw7IGkgXGluIFsxLi5LXSQg55qE5qaC546H77yM6KeB5byPICg0LjEwKe+8jOWIhuexu+mihOa1i+e7k+aenOWwseaYr+amgueOh+acgOWkp+eahOmCo+S4quWIhuexuyAkayTjgIIKCumAgueUqOWcuuaZr++8mgoKPiAuLi4gbGluZWFyIGRpc2NyaW1pbmFudCBhbmFseXNpcyBpcyBwb3B1bGFyIHdoZW4gd2UgaGF2ZSBtb3JlIHRoYW4gdHdvIHJlc3BvbnNlIGNsYXNzZXMuCgpGb3IgZXF1YXRpb24gKDQuMTApLCBzZWUgZGV0YWlsZWQgZXhwbGFuYXRpb25zIG9mIEJheWVyJ3MgRm9ybXVsYSwgZXF1YXRpb24gKDEuOSkgb24gcGFnZSAxMyBvZiAiSW50cm9kdWN0aW9uIHRvIFByb2JhYmlsaXR5IFRoZW9yeSIgMTF0aCBlZGl0aW9uIGJ5IFNoZWxkb24gTS4gUm9zcyAoSVBUKS4KSGVyZSAkZl9rKHgpJCBjb3JyZXNwb25kcyB0byAkUChFXHZlcnQgRl9qKSQsICRccGlfayQgY29ycmVzcG9uZHMgdG8gJFAoRl9qKSQuCgpGb3IgZXF1YXRpb24gKDQuMTEpLCBzZWUgc2VjdGlvbiAyLjMuNCBvZiBJUFQuCgpTZWUgZXhlcmNpc2UgMiBmb3IgdGhlIHByb29mIG9mIHdoeSB0aGUgbWF4aW11bSBvZiBlcXVhdGlvbiAoNC4xMykgaXMgdGhlIG1heGltdW0gb2YgZXF1YXRpb24gKDQuMTIpLiBOb3RpY2UgZm9yIGVxdWF0aW9uICg0LjEyKSwgdGhlICRwX2soeCkkIGlzIHRoZSBmdW5jdGlvbiBvZiAkayQsIG5vdCAkeCQuCgpFeHBsYW5hdGlvbiBvZiBlcXVhdGlvbiAoNC4xNCk6CgpUaGUgQmF5ZXMgZGVjaXNpb24gYm91bmRhcnkgaXMgICRcZGVsdGFfMSh4KSA9IFxkZWx0YV8yKHgpJC4KVGFrZSBlcXVhdGlvbiAoNC4xMykgaW50byBpdDoKJCQKXGZyYWN7MnhcbXVfMSAtIFxtdV8xXjJ9ezJcc2lnbWFeMn0gPSAgXGZyYWN7MnhcbXVfMiAtIFxtdV8yXjJ9ezJcc2lnbWFeMn0gXFwKMngoXG11XzEgLSBcbXVfMikgPSBcbXVfMV4yIC0gXG11XzJeMiAgXFwKeCA9IFxmcmFje1xtdV8xICsgXG11XzJ9MgokJAoKTERBIOS4ree6v+aAp+WQjeensOadpeiHquS6juW8jyg0LjE3Ke+8mgoKPiBUaGUgd29yZCAqbGluZWFyKiBpbiB0aGUgY2xhc3NpZmllcuKAmXMgbmFtZSBzdGVtcyBmcm9tIHRoZSBmYWN0ICB0aGF0IHRoZSAqZGlzY3JpbWluYW50IGZ1bmN0aW9ucyogJFxoYXQgXGRlbHRhX2soeCkkIGluICg0LjE3KSBhcmUgbGluZWFyIGZ1bmN0aW9ucyBvZiAkeCQgKGFzIG9wcG9zZWQgdG8gYSBtb3JlIGNvbXBsZXggZnVuY3Rpb24gb2YgKngqKS4KCkZvciBzZWN0aW9uIDQuNC4yLCAidmFyeWluZyB0aGUgY2xhc3NpZmllciB0aHJlc2hvbGQgY2hhbmdlcyBpdHMgdHJ1ZSBwb3NpdGl2ZSBhbmQgZmFsc2UgcG9zaXRpdmUgcmF0ZSIsIGJ1dCBob3cgdG8gY2hhbmdlIHRoZSBjbGFzc2lmaWVyIHRocmVzaG9sZCBpbiB0aGUgZXF1YXRpb24gKDQuMTkpPwoKIyMgNC40LjQgUXVhZHJhdGljIERpc2NyaW1pbmFudCBBbmFseXNpcwoKUURBIOS4jeWGjeimgeaxguWTjeW6lOWPmOmHj+eahOS7u+aEj+S4pOe7hOmXtOWNj+aWueW3ruebuOWQjO+8jOihqOeOsOWcqOWbvjQuOeS4re+8jEJheWVzIOi+ueeVjOS7juebtOe6v+WPmOaIkOS6huabsue6v++8jArlm6DmraTmr5QgTERBIOWFt+acieabtOWkp+eahOeBtea0u+aAp+OAggoKPiBMREEgaXMgYSBtdWNoICoqbGVzcyBmbGV4aWJsZSoqIGNsYXNzaWZpZXIgdGhhbiBRREEsIGFuZCBzbyBoYXMgc3Vic3RhbnRpYWxseSAqKmxvd2VyIHZhcmlhbmNlKiouClRoaXMgY2FuIHBvdGVudGlhbGx5IGxlYWQgdG8gKippbXByb3ZlZCAgcHJlZGljdGlvbiBwZXJmb3JtYW5jZSoqLgpCdXQgdGhlcmUgaXMgYSB0cmFkZS1vZmY6IGlmIExEQeKAmXMgYXNzdW1wdGlvbiB0aGF0ICB0aGUgSyBjbGFzc2VzIHNoYXJlIGEgY29tbW9uIGNvdmFyaWFuY2UgbWF0cml4IGlzIGJhZGx5IG9mZiwgdGhlbiBMREEgIGNhbiBzdWZmZXIgZnJvbSBoaWdoIGJpYXMuClJvdWdobHkgc3BlYWtpbmcsIExEQSB0ZW5kcyB0byBiZSBhIGJldHRlciBiZXQgIHRoYW4gUURBIGlmIHRoZXJlIGFyZSByZWxhdGl2ZWx5IGZldyB0cmFpbmluZyBvYnNlcnZhdGlvbnMgYW5kIHNvIHJlZHVjaW5nICB2YXJpYW5jZSBpcyBjcnVjaWFsLgpJbiBjb250cmFzdCwgUURBIGlzIHJlY29tbWVuZGVkIGlmIHRoZSB0cmFpbmluZyBzZXQgaXMgIHZlcnkgbGFyZ2UsIHNvIHRoYXQgdGhlIHZhcmlhbmNlIG9mIHRoZSBjbGFzc2lmaWVyIGlzIG5vdCBhIG1ham9yIGNvbmNlcm4sIG9yIGlmICB0aGUgYXNzdW1wdGlvbiBvZiBhIGNvbW1vbiBjb3ZhcmlhbmNlIG1hdHJpeCBmb3IgdGhlIEsgY2xhc3NlcyBpcyBjbGVhcmx5ICB1bnRlbmFibGUuCg==