question 5

We now examine the differences between LDA and QDA.

  1. If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?
  1. If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?
  1. In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?
  1. True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is fexible enough to model a linear decision boundary. Justify your answer

question 6

Suppose we collect data for a group of students in a statistics class with variables X1 =hours studied, X2=undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, ˆ β0 = −6, ˆ β1 =0.05, ˆ β2 =1.

  1. Estimate the probability that a student who studies for 40h and has an undergrad GPA of 3.5 gets an A in the class.
# Coefficients
b0 <- -6
b1 <- 0.05
b2 <- 1

# Predictors
X1 <- 40      # hours studied
X2 <- 3.5     # GPA

# Compute the linear predictor
logit_val <- b0 + b1 * X1 + b2 * X2

# Convert log-odds to probability
prob <- 1 / (1 + exp(-logit_val))

cat("The estimated probability of getting an A is:", prob, "\n")
## The estimated probability of getting an A is: 0.3775407
  1. How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?
target_prob <- 0.5
X2_fixed <- 3.5  # assume same GPA

# Solve  0 = b0 + b1*X1_needed + b2*X2_fixed
# => X1_needed = - (b0 + b2*X2_fixed) / b1

X1_needed <- -(b0 + b2 * X2_fixed) / b1

cat("Hours needed for a 50% chance of getting an A:", X1_needed, "\n")
## Hours needed for a 50% chance of getting an A: 50

question 8

Although 1-nearest neighbors achieves a lower overall error rate when averaging across both the training and test sets, it is important to note that what truly matters for classifying new observations is the test error rate alone. Typically, 1-NN perfectly (or nearly perfectly) classifies its training set, which can lower its average error substantially; however, its test error is therefore higher, around 36%. Meanwhile, logistic regression has a 30% test error, which is a more reliable measure of future performance. Consequently, logistic regression should be preferred for classifying new observations in this scenario, since it has a lower expected error on unseen data.

question 9

This problem has to do with odds.

  1. On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?
odds_1 <- 0.37
p_1 <- odds_1 / (1 + odds_1)
cat("Probability of default if odds are 0.37:", p_1, "\n")
## Probability of default if odds are 0.37: 0.270073
  1. Suppose that an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?
p_2 <- 0.16
odds_2 <- p_2 / (1 - p_2)
cat("Odds of default if probability is 0.16:", odds_2, "\n")
## Odds of default if probability is 0.16: 0.1904762