Exercise- 3

Question - 5

(a).

  • Training Set: QDA is more flexible than LDA, so it can fit the training data more closely, leading to lower training error. Thus, QDA may perform better on the training set.
  • Test Set: Since LDA assumes a linear boundary, it is likely to generalize better if the true Bayes boundary is indeed linear. QDA, being more flexible, might overfit, leading to worse test performance. Hence, LDA is expected to perform better on the test set in this case.

(b).

  • Training Set: QDA, which models quadratic decision boundaries, will have higher flexibility and should fit the training data better, leading to a lower training error.
  • Test Set: If the true decision boundary is non-linear, QDA should perform better because it can capture the curvature of the boundary. However, if the dataset is small, QDA might overfit, and LDA might still perform reasonably well.
    In general, QDA is expected to perform better on the test set if the Bayes decision boundary is truly non-linear and enough data is available.

(c).

As n increases, the estimation of class covariance matrices in QDA becomes more reliable, reducing variance in the model.

  • When nnn is small, QDA is prone to overfitting due to its high flexibility, and LDA performs better.

  • As nnn increases, QDA’s flexibility becomes beneficial because it can capture more complex decision boundaries, leading to improved test accuracy relative to LDA.

The test accuracy of QDA relative to LDA is expected to improve as n increases.

(d).

  • False.
  • While QDA can model a linear decision boundary (since a linear boundary is a special case of a quadratic one), it estimates more parameters than LDA.
  • This additional flexibility increases variance and requires more data for reliable estimation. If the true boundary is linear, LDA will be more efficient because it assumes shared covariance matrices and requires fewer parameters to estimate.
  • Therefore, using QDA when the decision boundary is linear introduces unnecessary complexity, leading to worse generalization on the test set.

Question - 6

We are given a logistic regression model:

\[ \log\left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]

where the estimated coefficients are:

\[ \hat{\beta}_0 = -6, \quad \hat{\beta}_1 = 0.05, \quad \hat{\beta}_2 = 1 \]

(a).

The probability is given by the logistic function:

\[ P(Y=1) = \frac{e^{(\beta_0 + \beta_1 X_1 + \beta_2 X_2)}}{1 + e^{(\beta_0 + \beta_1 X_1 + \beta_2 X_2)}} \]

Substituting \(X_1 = 40\) (hours studied) and \(X_2 = 3.5\) (GPA):

\[ P(Y=1) = \frac{e^{-6 + (0.05 \times 40) + (1 \times 3.5)}}{1 + e^{-6 + (0.05 \times 40) + (1 \times 3.5)}} \]

beta_0 <- -6
beta_1 <- 0.05
beta_2 <- 1

# Given values
X1_a <- 40  # Hours studied
X2_a <- 3.5  # GPA

# Compute the probability
logit_value_a <- beta_0 + beta_1 * X1_a + beta_2 * X2_a
probability_a <- exp(logit_value_a) / (1 + exp(logit_value_a))

# Print result
probability_a
## [1] 0.3775407

The probability of getting an A with 40 hours of study and a 3.5 GPA is 0.378 (≈ 37.8%).

(b).

We set the probability to 50%, so:

\[ \log\left(\frac{0.5}{1 - 0.5}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]

Since \(\log(1) = 0\), we solve for \(X_1\):

\[ 0 = -6 + 0.05X_1 + 1(3.5) \]

\[ 0.05X_1 = 6 - 3.5 \]

\[ X_1 = \frac{2.5}{0.05} = 50 \text{ hours} \]

X2_b <- 3.5  

# Solve for required hours (X1) for 50% probability
X1_b <- (6 - X2_b) / beta_1  

# Print result
X1_b
## [1] 50

The student would need to study 50 hours to have a 50% chance of getting an A in the class.

Question - 8

We are given two classification methods:

  1. Logistic Regression:
    • Training error rate = 20%
    • Test error rate = 30%
  2. 1-Nearest Neighbors (K = 1):
    • Average error rate (across training and test sets) = 18%

Which method should we prefer for classification of new observations?

We analyze the results:

  • Logistic regression has a low training error (20%) but a higher test error (30%), indicating that it is not overfitting, but may not generalize well to new data.
  • KNN with K = 1 has an average error rate of 18%, which suggests a very low training error (likely close to 0%) but may have high variance due to overfitting.

Since we care about generalization to new observations, we should focus on the test error rate rather than the training error.
- KNN with \(K = 1\) is likely overfitting because it perfectly memorizes training data but may not perform well on unseen data. - Logistic regression has a higher test error, but it may be more stable for new predictions.

Final Conclusion:

Since the test error rate for logistic regression (30%) is not explicitly compared to the test error rate of KNN, we cannot definitively say that KNN generalizes better. However, given that KNN with \(K = 1\) typically overfits, logistic regression is likely the better choice for classifying new observations.

If we were to consider KNN, we should try higher values of \(K\) to balance bias and variance.

Question - 9

Odds and probability are related by the formula:

\[ P = \frac{\text{Odds}}{1 + \text{Odds}} \]

Conversely, if given a probability \(P\), the odds are calculated as:

\[ \text{Odds} = \frac{P}{1 - P} \]

(a).

We are given:

\[ \text{Odds} = 0.37 \]

Using the formula:

\[ P = \frac{0.37}{1 + 0.37} \]

# Given odds
odds_a <- 0.37

# Compute probability
prob_a <- odds_a / (1 + odds_a)

# Print result
prob_a
## [1] 0.270073

The fraction of people who default is approximately 0.27 (or 27%)

(b).

The formula to compute odds from probability is:

\[ \text{Odds} = \frac{P}{1 - P} \]

We are given:

\[ P = 0.16 \]

Substituting the values:

\[ \text{Odds} = \frac{0.16}{1 - 0.16} = \frac{0.16}{0.84} \]

# Given probability
prob_b <- 0.16

# Compute odds
odds_b <- prob_b / (1 - prob_b)

# Print result
odds_b
## [1] 0.1904762

The odds of defaulting are 0.19 (or 19%).