Exercise 3

5. We now examine the differences between LDA and QDA.

a. If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

Training Set: QDA may fit the data better as it is more flexible, capturing complexities even if unnecessary. Test Set: LDA is expected to perform better since it has lower variance and a linear decision boundary accurately approximates the true Bayes boundary.

b. If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?

QDA will perform better on both.

c. In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?

It should improve. As n increases, the variance of QDA decreases, leading to better performance relative to LDA.

d. True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer

False. While QDA can model a linear decision boundary, it introduces unnecessary variance when the true boundary is linear. LDA will likely yield a better test error rate in this case.

6. Suppose we collect data for a group of students in a statistics class with variables X1 =hours studied, X2=undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, ˆ β0 = −6, ˆ β1 =0.05, ˆ β2 =1.

a. Estimate the probability that a student who studies for 40h and has an undergrad GPA of 3.5 gets an A in the class.

b0 <- -6
b1 <- 0.05
b2 <- 1
exp(b0 + b1*40 + b2*3.5) / (1 + exp(b0 + b1*40 + b2*3.5))
## [1] 0.3775407

37.75%

b. How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?

Set p(X) = 0.05 The math does not format well on here, but it simplifies down to: 0.05X = 2.5 X1 = 50 The student needs to study for 50 hours for a 50% chance of getting an A.

8. Suppose that we take a data set, divide it into equally-sized training and test sets, and then try out two different classification procedures. First we use logistic regression and get an error rate of 20% on the training data and 30% on the test data. Next we use 1-nearest neighbors (i.e. K =1) and get an average error rate (averaged over both test and training data sets) of 18%. Based on these results, which method should we prefer to use for classification of new observations? Why?

Since KNN’s test error is not separately reported, we assume it suffers from overfitting (training error is much lower than test error). Logistic regression has a higher test error, but it generalizes better. Logistic regression is preferable.

9. This problem has to do with odds.

a. On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?

0.37/1.37 = 0.27 So, 27% of people with an odds of 0.37 will default.

b. Suppose that an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?

0.16/0.84 = 0.19 Odds are 0.19.