As n increases, the estimation of class covariance matrices in QDA becomes more reliable, reducing variance in the model.
When nnn is small, QDA is prone to overfitting due to its high flexibility, and LDA performs better.
As nnn increases, QDA’s flexibility becomes beneficial because it can capture more complex decision boundaries, leading to improved test accuracy relative to LDA.
The test accuracy of QDA relative to LDA is expected to improve as n increases.
We are given a logistic regression model:
\[ \log\left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]
where the estimated coefficients are:
\[ \hat{\beta}_0 = -6, \quad \hat{\beta}_1 = 0.05, \quad \hat{\beta}_2 = 1 \]
The probability is given by the logistic function:
\[ P(Y=1) = \frac{e^{(\beta_0 + \beta_1 X_1 + \beta_2 X_2)}}{1 + e^{(\beta_0 + \beta_1 X_1 + \beta_2 X_2)}} \]
Substituting \(X_1 = 40\) (hours studied) and \(X_2 = 3.5\) (GPA):
\[ P(Y=1) = \frac{e^{-6 + (0.05 \times 40) + (1 \times 3.5)}}{1 + e^{-6 + (0.05 \times 40) + (1 \times 3.5)}} \]
beta_0 <- -6
beta_1 <- 0.05
beta_2 <- 1
# Given values
X1_a <- 40 # Hours studied
X2_a <- 3.5 # GPA
# Compute the probability
logit_value_a <- beta_0 + beta_1 * X1_a + beta_2 * X2_a
probability_a <- exp(logit_value_a) / (1 + exp(logit_value_a))
# Print result
probability_a
## [1] 0.3775407
The probability of getting an A with 40 hours of study and a 3.5 GPA is 0.378 (≈ 37.8%).
We set the probability to 50%, so:
\[ \log\left(\frac{0.5}{1 - 0.5}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]
Since \(\log(1) = 0\), we solve for \(X_1\):
\[ 0 = -6 + 0.05X_1 + 1(3.5) \]
\[ 0.05X_1 = 6 - 3.5 \]
\[ X_1 = \frac{2.5}{0.05} = 50 \text{ hours} \]
X2_b <- 3.5
# Solve for required hours (X1) for 50% probability
X1_b <- (6 - X2_b) / beta_1
# Print result
X1_b
## [1] 50
The student would need to study 50 hours to have a 50% chance of getting an A in the class.
We are given two classification methods:
We analyze the results:
Since we care about generalization to new observations, we should
focus on the test error rate rather than the training error.
- KNN with \(K = 1\) is likely
overfitting because it perfectly memorizes training data but may not
perform well on unseen data. - Logistic regression has a higher test
error, but it may be more stable for new predictions.
Since the test error rate for logistic regression (30%) is not explicitly compared to the test error rate of KNN, we cannot definitively say that KNN generalizes better. However, given that KNN with \(K = 1\) typically overfits, logistic regression is likely the better choice for classifying new observations.
If we were to consider KNN, we should try higher values of \(K\) to balance bias and variance.
Odds and probability are related by the formula:
\[ P = \frac{\text{Odds}}{1 + \text{Odds}} \]
Conversely, if given a probability \(P\), the odds are calculated as:
\[ \text{Odds} = \frac{P}{1 - P} \]
We are given:
\[ \text{Odds} = 0.37 \]
Using the formula:
\[ P = \frac{0.37}{1 + 0.37} \]
# Given odds
odds_a <- 0.37
# Compute probability
prob_a <- odds_a / (1 + odds_a)
# Print result
prob_a
## [1] 0.270073
The fraction of people who default is approximately 0.27 (or 27%)
The formula to compute odds from probability is:
\[ \text{Odds} = \frac{P}{1 - P} \]
We are given:
\[ P = 0.16 \]
Substituting the values:
\[ \text{Odds} = \frac{0.16}{1 - 0.16} = \frac{0.16}{0.84} \]
# Given probability
prob_b <- 0.16
# Compute odds
odds_b <- prob_b / (1 - prob_b)
# Print result
odds_b
## [1] 0.1904762
The odds of defaulting are 0.19 (or 19%).