question 5
We now examine the differences between LDA and QDA.
Training Set Performance: QDA has more parameters and therefore can fit the training data more flexibly. As a result, QDA will often fit the training data “better” than LDA (i.e., achieve lower training error), even if the true boundary is linear.
Test Set Performance: If the true boundary is linear, LDA is the correct model (it matches the true data-generating process) and so will tend to give better (or at least not worse) test performance. QDA, being more flexible, tends to overfit if the boundary is truly linear, and hence its test error often ends up higher than LDA’s in this situation.
Training Set Performance: QDA is more flexible: it allows each class to have its own covariance matrix. Consequently, when the true decision boundary is non-linear, QDA typically can fit the training data better than LDA can.
Test Set Performance: Because QDA can capture non-linear boundaries while LDA cannot, we usually expect QDA to have better test performance provided there is enough data to estimate all the QDA parameters well. When the sample size is sufficient, QDA will more closely approximate the true, non-linear decision boundary than LDA can.
QDA requires estimating a separate covariance matrix for each class, so it has many more parameters to estimate than LDA (which estimates just one common covariance matrix). When n is small, QDA is prone to overfitting and can have higher variance in its parameter estimates, leading to worse test performance relative to LDA.
As n increases, the parameter estimates in QDA become more stable and accurate. Hence, the test accuracy of QDA relative to LDA improves as n grows. In other words, having more data mitigates the risk of overfitting in QDA and allows its more flexible form to pay off if the true boundary deviates from linearity.
question 6
Suppose we collect data for a group of students in a statistics class with variables X1 =hours studied, X2=undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, ˆ β0 = −6, ˆ β1 =0.05, ˆ β2 =1.
# Coefficients
b0 <- -6
b1 <- 0.05
b2 <- 1
# Predictors
X1 <- 40 # hours studied
X2 <- 3.5 # GPA
# Compute the linear predictor
logit_val <- b0 + b1 * X1 + b2 * X2
# Convert log-odds to probability
prob <- 1 / (1 + exp(-logit_val))
cat("The estimated probability of getting an A is:", prob, "\n")
## The estimated probability of getting an A is: 0.3775407
target_prob <- 0.5
X2_fixed <- 3.5 # assume same GPA
# Solve 0 = b0 + b1*X1_needed + b2*X2_fixed
# => X1_needed = - (b0 + b2*X2_fixed) / b1
X1_needed <- -(b0 + b2 * X2_fixed) / b1
cat("Hours needed for a 50% chance of getting an A:", X1_needed, "\n")
## Hours needed for a 50% chance of getting an A: 50
question 8
Although 1-nearest neighbors achieves a lower overall error rate when averaging across both the training and test sets, it is important to note that what truly matters for classifying new observations is the test error rate alone. Typically, 1-NN perfectly (or nearly perfectly) classifies its training set, which can lower its average error substantially; however, its test error is therefore higher, around 36%. Meanwhile, logistic regression has a 30% test error, which is a more reliable measure of future performance. Consequently, logistic regression should be preferred for classifying new observations in this scenario, since it has a lower expected error on unseen data.
question 9
This problem has to do with odds.
odds_1 <- 0.37
p_1 <- odds_1 / (1 + odds_1)
cat("Probability of default if odds are 0.37:", p_1, "\n")
## Probability of default if odds are 0.37: 0.270073
p_2 <- 0.16
odds_2 <- p_2 / (1 - p_2)
cat("Odds of default if probability is 0.16:", odds_2, "\n")
## Odds of default if probability is 0.16: 0.1904762