3. This problem relates to the QDA model, in which the observations within each class are drawn from a normal distribution with a classspecific mean vector and a class specific covariance matrix. We consider the simple case where p = 1; i.e. there is only one feature. Suppose that we have K classes, and that if an observation belongs to the kth class then X comes from a one-dimensional normal distribution, X ∼ N(µk, σ2 k). Recall that the density function for the one-dimensional normal distribution is given in (4.16). Prove that in this case, the Bayes classifier is not linear. Argue that it is in fact quadratic. Hint: For this problem, you should follow the arguments laid out in Section 4.4.1, but without making the assumption that σ2 1 = … = σ2 K.
answer: By replacing σ with σk in the second equation in question 2 and not ignoring the term “12π√σk” in the neumerator before computing the log, the discriminant function is as follows:
δk(X)=log(πk)−log(σk)−μ2k2σ2k+x.μkσ2k−x22σ2k
Bayes’ classifier thus cannot be linear in x but is quadratic actually.
5. We now examine the differences between LDA and QDA. (a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set? Answer: We predict that QDA will perform better on the training set if the Bayes decision boundary is linear because its increased flexibility might result in a better fit. Since QDA might overfit the linearity on the Bayes decision boundary, we predict LDA to perform better on the test set than QDA.
(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set? Answer: On both the training and test sets, we expect that QDA will perform better if the Bayes decision bounary is non-linear.
(c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why? Answer: If the training set is quite large, QDA is suggested so that the classifier’s variance is not a major concern (QDA is more flexible than LDA, hence it has higher variance).
(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer. Answer: False. A more flexible approach, such as QDA, may produce overfitting when there are fewer sample points, which could result in a worse test error rate.
6.Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, βˆ0 = −6, βˆ1 = 0.05, βˆ2 = 1.
(a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5 gets an A in the class. p̂ (X)=e−6+0.05X1+X2(1+e−6+0.05X1+X2)=0.3775.
(b) How many hours would the student in part (a) need to study to have a 50 % chance of getting an A in the class? e−6+0.05X1+3.5(1+e−6+0.05X1+3.5)=0.5 simplified: e−6+0.05X1+3.5=1. taking the logarithm of both sides: X1=2.50.05=50.
9. This problem has to do with odds.
(a) On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default? p(X)1−p(X)=0.37 p(X)=0.371+0.37=0.27. So on average 27% of people defaulting on their credit card payment. (b) Suppose that an individual has a 16 % chance of defaulting on her credit card payment. What are the odds that she will default? p(X)1−p(X)=0.161−0.16=0.19 the odds of default is 19%