Hint: For this problem, you should follow the arguments laid out in Section 4.4.1, but without making the assumption that \(σ^2_1 = ... = σ^2_K\).
This is the posterior probability for the Bayes’ classifier:
\(p_k(X) = \frac{\pi_k \frac{1}{\sqrt{2\pi}\sigma_k} exp \bigg( {-\frac{1}{2\sigma^2_k}(x - \mu_k)^2} \bigg)}{\sum_{l=1}^{k}\pi_l\frac{1}{\sqrt{2\pi}\sigma_k} exp \bigg( {-\frac{1}{2\sigma^2_k}(x - \mu_l)^2} \bigg)}\)
If we take the log of this function and expand the quadratic term we will get the discriminant function:
\(\delta_k(X) = log(\pi_k) - log(\sigma_k) - \frac{\mu_k^2}{2\sigma^2_k} + x\frac{\mu_k}{\sigma^2_k} - \frac{x^2}{2\sigma^2_k}\)
Here we can see that the Bayes classifier here is not linear but instead quadratic since it has quadratic terms.
(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?
For the training set, QDA would perform better because it is more flexible and could better fit the data. On the test set however, it would perform worse than LDA because it could overfit the data.
(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?
For the training set, QDA would perform better because it is once again more flexible than LDA. It would also perform better on the test set because LDA is not flexible enough to fit a non-linear decision boundary like QDA can.
(c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?
Generally, both QDA and LDA will improve as n increases. However, QDA has higher variance than LDA, so as the number of training set observations increases then QDA will start to perform better than LDA, especially if the training set is very large.
(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer.
False. Since QDA is so flexible, it could overfit the linear decision boundary.
(a) Estimate the probability that a student who studies for \(40 h\) and has an undergrad GPA of 3.5 gets an A in the class.
Here is the probability function:
\(p(X) = \frac{e^{\beta_0 + \beta_1X_1 + … + \beta_pX_p}}{1 + e^{\beta_0 + \beta_1X_1 + … + \beta_pX_p}}\)
And with the proper numbers plugged in:
\(p = \frac{e^-6 + .05 * 40+1*3.5}{1 + e^-6 + .05 * 40+1*3.5}\)
(exp(-6+.05*40+1*3.5))/(1+exp(-6+.05*40+1*3.5))
## [1] 0.3775407
The probability that a student who studies for 40 hours and has an undergrad GPA of 3.5 getting an A in the class is 37.75%.
(b) How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?
\(\frac{e^{-6 + .05X_1 + 3.5}}{(1 + e^{-6 + 0.05X_1 + 3.5})} = .5,\) then:
\(e^{-6 + .05X_1 + 3.5} = 1\)
Taking the log of both sides gets you:
\(-6+.05x_1+3.5=0\), then:
\(x_1=(6 -3.5)/.05\)
(6-3.5)/.05
## [1] 50
Therefore, the student would need to study 50 hours to have a 50% chance of getting an A in the class.
(a) On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?
Odds is \(\frac{p}{1-p} = 0.37\)
so \(p=.37-.37p\), then:
\(1.37p=.37\)
.37/1.37
## [1] 0.270073
So the fraction of people defaulting on their credit card payment with an odds of .37 is \(27/100\).
(b) Suppose that an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?
Odds here is \(\frac{.16}{1-.16} = odds\)
.16/(1-.16)
## [1] 0.1904762
The odds that she will default are \(.19\).