3. This problem relates to the QDA model, in which the observations within each class are drawn from a normal distribution with a class specific mean vector and a class specific covariance matrix. We consider the simple case where \(p = 1\); i.e. there is only one feature. Suppose that we have K classes, and that if an observation belongs to the kth class then X comes from a one-dimensional normal distribution, \(X ∼ N(µ_k, σ^2_k)\). Recall that the density function for the one-dimensional normal distribution is given in (4.16). Prove that in this case, the Bayes classifier is not linear. Argue that it is in fact quadratic.

Hint: For this problem, you should follow the arguments laid out in Section 4.4.1, but without making the assumption that \(σ^2_1 = ... = σ^2_K\).

This is the posterior probability for the Bayes’ classifier:

\(p_k(X) = \frac{\pi_k \frac{1}{\sqrt{2\pi}\sigma_k} exp \bigg( {-\frac{1}{2\sigma^2_k}(x - \mu_k)^2} \bigg)}{\sum_{l=1}^{k}\pi_l\frac{1}{\sqrt{2\pi}\sigma_k} exp \bigg( {-\frac{1}{2\sigma^2_k}(x - \mu_l)^2} \bigg)}\)

If we take the log of this function and expand the quadratic term we will get the discriminant function:

\(\delta_k(X) = log(\pi_k) - log(\sigma_k) - \frac{\mu_k^2}{2\sigma^2_k} + x\frac{\mu_k}{\sigma^2_k} - \frac{x^2}{2\sigma^2_k}\)

Here we can see that the Bayes classifier here is not linear but instead quadratic since it has quadratic terms.

5. We now examine the differences between LDA and QDA.

(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

For the training set, QDA would perform better because it is more flexible and could better fit the data. On the test set however, it would perform worse than LDA because it could overfit the data.

(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?

For the training set, QDA would perform better because it is once again more flexible than LDA. It would also perform better on the test set because LDA is not flexible enough to fit a non-linear decision boundary like QDA can.

(c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?

Generally, both QDA and LDA will improve as n increases. However, QDA has higher variance than LDA, so as the number of training set observations increases then QDA will start to perform better than LDA, especially if the training set is very large.

(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer.

False. Since QDA is so flexible, it could overfit the linear decision boundary.

6. Suppose we collect data for a group of students in a statistics class with variables \(X_1 =\) hours studied, \(X_2\) = undergrad GPA, and \(Y =\) receive an A. We fit a logistic regression and produce estimated coefficient, \(βˆ_0 = −6\), \(βˆ_1 = 0.05\), \(βˆ_2 = 1\).

(a) Estimate the probability that a student who studies for \(40 h\) and has an undergrad GPA of 3.5 gets an A in the class.

Here is the probability function:

\(p(X) = \frac{e^{\beta_0 + \beta_1X_1 + … + \beta_pX_p}}{1 + e^{\beta_0 + \beta_1X_1 + … + \beta_pX_p}}\)

And with the proper numbers plugged in:

\(p = \frac{e^-6 + .05 * 40+1*3.5}{1 + e^-6 + .05 * 40+1*3.5}\)

(exp(-6+.05*40+1*3.5))/(1+exp(-6+.05*40+1*3.5))
## [1] 0.3775407

The probability that a student who studies for 40 hours and has an undergrad GPA of 3.5 getting an A in the class is 37.75%.

(b) How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?

\(\frac{e^{-6 + .05X_1 + 3.5}}{(1 + e^{-6 + 0.05X_1 + 3.5})} = .5,\) then:

\(e^{-6 + .05X_1 + 3.5} = 1\)

Taking the log of both sides gets you:

\(-6+.05x_1+3.5=0\), then:

\(x_1=(6 -3.5)/.05\)

(6-3.5)/.05
## [1] 50

Therefore, the student would need to study 50 hours to have a 50% chance of getting an A in the class.

9. This problem has to do with odds.

(a) On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?

Odds is \(\frac{p}{1-p} = 0.37\)

so \(p=.37-.37p\), then:

\(1.37p=.37\)

.37/1.37
## [1] 0.270073

So the fraction of people defaulting on their credit card payment with an odds of .37 is \(27/100\).

(b) Suppose that an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?

Odds here is \(\frac{.16}{1-.16} = odds\)

.16/(1-.16)
## [1] 0.1904762

The odds that she will default are \(.19\).