Q5. We now examine the differences between LDA and QDA.
(c) In general, as the sample size n increases, do we expect the
test prediction accuracy of QDA relative to LDA to improve, decline, or
be unchanged? Why?
- As the sample size increases, QDA’s accuracy improves
relative to LDA.
- With small sample sizes, QDA suffers from
high variance because it estimates more
parameters.
- With large sample sizes, QDA’s estimation becomes
more stable, allowing it to leverage its
flexibility.
(d) True or False: Even if the Bayes decision boundary for a given
problem is linear, we will probably achieve a superior test error rate
using QDA rather than LDA because QDA is flexible enough to model a
linear decision boundary. Justify your answer.
- False. If the true boundary is linear, then
LDA is preferable because it makes fewer assumptions
and requires estimating fewer parameters.
- QDA, despite being flexible enough to model a linear decision
boundary, estimates more parameters, which can
introduce unnecessary variance, leading to higher test error rates when
sample sizes are small.
Q6. Suppose we collect data for a group of students in a statistics
class with variables X1 = hours studied, X2 = undergrad GPA, and Y =
receive an A. We fit a logistic regression and produce estimated
coefficient, βˆ0 = −6, βˆ1 = 0.05, βˆ2 = 1.
(a) Estimate the probability that a student who studies for 40 h and
has an undergrad GPA of 3.5 gets an A in the class.
# Define the coefficients
beta_0 <- -6
beta_1 <- 0.05
beta_2 <- 1
# (a) Probability estimation
X1 <- 40 # hours studied
X2 <- 3.5 # undergrad GPA
logit_prob <- exp(beta_0 + beta_1 * X1 + beta_2 * X2) / (1 + exp(beta_0 + beta_1 * X1 + beta_2 * X2))
logit_prob
## [1] 0.3775407
- A student who studies 40 hours and has a GPA of 3.5 has an estimated
37.75% probability of getting an A in the class.
(b) How many hours would the student in part (a) need to study to
have a 50 % chance of getting an A in the class?
# (b) Solving for X1 when probability = 0.5
X2_fixed <- 3.5
X1_needed <- (log(0.5 / (1 - 0.5)) - beta_0 - beta_2 * X2_fixed) / beta_1
X1_needed
## [1] 50
- The student would need to study 50 hours to have a 50% probability
of getting an A.
Q8. Suppose that we take a data set, divide it into equally-sized
training and test sets, and then try out two different classification
procedures. First we use logistic regression and get an error rate of 20
% on the training data and 30 % on the test data. Next we use 1-nearest
neighbors (i.e. K = 1) and get an average error rate (averaged over both
test and training data sets) of 18 %. Based on these results, which
method should we prefer to use for classification of new observations?
Why?
- Logistic Regression: Training Error = 20%, Test Error =
30%
- 1-Nearest Neighbor (K=1): Average Error = 18%
Which method should we prefer?
- KNN (K=1) has a lower average error, but this does not necessarily
mean it generalizes better.
- Logistic regression has a lower variance and might generalize better
on new data.
- KNN (K=1) is highly flexible and may over-fit the training data,
leading to poor performance on unseen data.
- A better approach could be increasing K (e.g., K=3 or 5) to reduce
over-fitting.
Q9. This problem has to do with odds.
(a) On average, what fraction of people with an odds of 0.37 of
defaulting on their credit card payment will in fact default?
odds <- 0.37
default_prob <- odds / (1 + odds)
default_prob
## [1] 0.270073
- A person with odds of 0.37 has an estimated 27% probability of
defaulting.
(b) Suppose that an individual has a 16 % chance of defaulting on
her credit card payment. What are the odds that she will default?
prob <- 0.16
default_odds <- prob / (1 - prob)
default_odds
## [1] 0.1904762
- If a person has a 16% probability of defaulting, the corresponding
odds are 0.19.