#3. This problem relates to the QDA model, in which the observations within each class are drawn from a normal distribution with a class specific mean vector and a class specific covariance matrix. We consider the simple case where p = 1; i.e. there is only one feature. Suppose that we have K classes, and that if an observation belongs to the kth class then X comes from a one-dimensional normal distribution, X ∼ N(µk, σ2 k). Recall that the density function for the one-dimensional normal distribution is given in (4.16). Prove that in this case, the Bayes classifier is not linear. Argue that it is in fact quadratic. Hint: For this problem, you should follow the arguments laid out in Section 4.4.1, but without making the assumption that σ2 1 = … = σ2 K.

# if we form the equation
#log(pi{k})-(1/2σ^2{k})(x-u{k})=
# = -1/(2σ^2{k})x^2+u{k}/σ^2{k}x-U^2{k}/2σ^2{k}-log(σ{k})+log(pi(k))
# which is not linear in x, it is quadratic 

#5. We now examine the differences between LDA and QDA.

##(a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

# If Bayes decision boundary is linear we expect a QDA to perform better on the training set because it has a higher flexibility which could yield to a closer fit.

##(b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?

# If Bayes decision boundary is non-linear we expect QDA to perform better on both the training and test sets.

##(c)
In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?

# If n is large for the training set, QDA is better because it is more flexible which has higher variance.

##(d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer.

#False because  With fewer sample points, the variance from using a more flexible like QDA, may lead to an overfit, which lead to an inferior test error rate.

#6. Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, βˆ0 = −6, βˆ1 = 0.05, βˆ2 = 1.

##(a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5 gets an A in the class.

# p(X)=e^((-6x{1}+x{2})/(1+e^(-6x{1}+x{2}))) 
#so plugging the values in, we get 0.3775

##(b) How many hours would the student in part (a) need to study to have a 50 % chance of getting an A in the class?

# p(X)=e^((-6x{1}+x{2})/(1+e^(-6x{1}+x{2})))
# We need to solve for x{1} which will give us a student needs to study 50

#9. This problem has to do with odds.

##(a) On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?

#we first see that 
# p(X)/(1-p(X))=0.37
# so we rewrite the equation and get 
# p(X)=0.37/(1+0.37)=0.27
# so on average, 27% of people defaulting on their credit card payment

##(b) Suppose that an individual has a 16 % chance of defaulting on her credit card payment. What are the odds that she will default?

#p(0.16)/(1-p(0.16))= 0.19
# The odds that she will default is 19%