5. We now examine the differences between LDA and QDA.

(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

Answer:

(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?

Answer:

(c) As the sample size (n) increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or stay the same?

Answer: As the sample size increases, the test accuracy of QDA relative to LDA should improve. QDA’s complexity requires more data to avoid overfitting. With more data, QDA can accurately model non-linear boundaries and outperform LDA, especially when the true boundary is non-linear.

(d) True or False: If the Bayes decision boundary is linear, we will likely achieve a better test error rate using QDA rather than LDA because QDA is flexible enough to model linear decision boundaries.

Answer: False. Although QDA is flexible and can model a linear decision boundary, it adds unnecessary complexity when the boundary is truly linear. This complexity can lead to overfitting, meaning LDA, with its simpler model, will generally achieve a lower test error rate for linear problems.


6 Suppose we collect data for a group of students in a statistics class with variables:

We fit a logistic regression and obtain the estimated coefficients:

\[ \hat{\beta}_0 = -6, \quad \hat{\beta}_1 = 0.05, \quad \hat{\beta}_2 = 1 \]

The logistic regression model is given by:

\[ \hat{p} = \frac{e^{\left( \hat{\beta}_0 + \hat{\beta}_1 X_1 + \hat{\beta}_2 X_2 \right)}}{1 + e^{\left( \hat{\beta}_0 + \hat{\beta}_1 X_1 + \hat{\beta}_2 X_2 \right)}} \]

(a) Estimate the probability that a student who studies for 40 hours and has an undergrad GPA of 3.5 will receive an A in the class.

Substituting \(X_1 = 40\) and \(X_2 = 3.5\) into the model:

\[ \hat{p} = \frac{e^{\left( -6 + 0.05 \cdot 40 + 1 \cdot 3.5 \right)}}{1 + e^{\left( -6 + 0.05 \cdot 40 + 1 \cdot 3.5 \right)}} \]

Simplifying the expression:

\[ \hat{p} = \frac{e^{-0.5}}{1 + e^{-0.5}} \approx \frac{0.6065}{1 + 0.6065} \approx 0.378 \]

Thus, the probability that the student will receive an A is approximately 37.8%.

(b) How many hours would the student in part (a) need to study to have a 50% chance of getting an A?

We need to find \(X_1\) such that the probability \(\hat{p} = 0.5\). Setting the model equal to 0.5:

\[ 0.5 = \frac{e^{\left( -6 + 0.05 X_1 + 1 \cdot 3.5 \right)}}{1 + e^{\left( -6 + 0.05 X_1 + 1 \cdot 3.5 \right)}} \]

This simplifies to:

\[ e^{\left( -6 + 0.05 X_1 + 3.5 \right)} = 1 \]

Taking the natural logarithm of both sides:

\[ -6 + 0.05 X_1 + 3.5 = 0 \]

Solving for \(X_1\):

\[ 0.05 X_1 = 2.5 \quad \Rightarrow \quad X_1 = \frac{2.5}{0.05} = 50 \]

Thus, the student would need to study 50 hours to have a 50% chance of getting an A in the class.


Problem 8: Comparison of Logistic Regression vs. 1-Nearest Neighbor (K=1)

We divide a dataset into equally-sized training and test sets and use two different classification methods:

Which method should we prefer?

We should prefer logistic regression over 1-nearest neighbors (K=1) for classifying new observations. Here’s why:

Why is Logistic Regression a better choice?


Problem 9

Part 1: Comparing Logistic Regression and 1-Nearest Neighbors (K=1)

We divide a data set into equally-sized training and test sets and compare two classification methods:

Which method should we prefer?

Although the 1-Nearest Neighbor method has a lower average error rate, it is likely overfitting due to the nature of \(K=1\). This method can perfectly classify training data but may generalize poorly to new observations. Logistic regression, despite having a higher error rate, might generalize better because it avoids overfitting as much as the nearest neighbor method does. Therefore, logistic regression is the preferred method for classifying new observations.

Part 2: Odds and Probabilities

Odds are related to probability through the formula:

\[ \text{Odds} = \frac{P(\text{Event Happening})}{P(\text{Event Not Happening})} = \frac{p}{1-p} \]

If the odds are given, we can compute the probability as:

\[ p = \frac{\text{Odds}}{1 + \text{Odds}} \]

(a) On average, what fraction of people with odds of 0.37 of defaulting on their credit card payment will in fact default?

Given that the odds are 0.37, the probability can be calculated as:

\[ p = \frac{0.37}{1 + 0.37} = \frac{0.37}{1.37} \approx 0.27 \]

Thus, approximately 27% of people with these odds will default on their credit card payments.

(b) Suppose an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?

Given that the probability of default is \(p = 0.16\), the odds can be calculated using the formula:

\[ \text{Odds} = \frac{0.16}{1 - 0.16} = \frac{0.16}{0.84} \approx 0.19 \]

So, the odds that the individual will default are approximately 0.19.