NikhilBharadwaj_DA

(5) We now examine the differences between LDA and QDA.

a) If the Bayes decision boundary is linear, do we expect LDA or QDA to perform better on the training set? On the test set?

Imagine you’re trying to classify things into different categories based on some features. For example, distinguishing between different types of fruits based on their color and size.

Now, if the natural way to separate these fruits is with a straight line (linear decision boundary), then a method called Linear Discriminant Analysis (LDA) would likely work well. LDA assumes that the spread or variability in the features is similar for all types of fruits.

On the other hand, if the boundary between different types of fruits is not a straight line (non-linear), Quadratic Discriminant Analysis (QDA) might be more suitable. QDA allows for more flexibility by considering different variabilities for each type of fruit.

However, when the underlying rule for separating fruits is a simple straight line, using a method like LDA, which assumes similar variabilities, tends to perform better. This is because QDA might try to capture more complexity than necessary, potentially leading to overfitting (fitting the training data too closely) and not generalizing well to new, unseen fruits.

b) If the Bayes decision boundary is non-linear, do we expect LDA or QDA to perform better on the training set? On the test set?

If the Bayes decision boundary is non-linear, we expect Quadratic Discriminant Analysis (QDA) to perform better than Linear Discriminant Analysis (LDA) on both the training set and the test set. QDA is more flexible and can model non-linear decision boundaries by allowing for different covariance matrices for each class, making it better suited for scenarios where the true decision boundary is not a straight line.

c) In general, as the sample size n increases, do we expect the test prediction accuracy of QDA relative to LDA to improve, decline, or be unchanged? Why?

As the sample size (n) increases, we generally expect the test prediction accuracy of Quadratic Discriminant Analysis (QDA) relative to Linear Discriminant Analysis (LDA) to improve.

The reason behind this expectation is that with a larger sample size, QDA becomes better able to estimate the parameters of the model accurately. QDA estimates separate covariance matrices for each class, and with more data, it can capture the complexities of the underlying distribution more effectively. This increased accuracy in parameter estimation often leads to better generalization performance on the test set.

In contrast, LDA assumes a common covariance matrix for all classes, and it can be more robust when the sample size is small. However, as the sample size increases, QDA’s ability to capture more nuanced relationships within the data becomes advantageous, and it tends to outperform LDA on the test set.

d) True or False: Even if the Bayes decision boundary for a given problem is linear, we will probably achieve a superior test error rate using QDA rather than LDA because QDA is flexible enough to model a linear decision boundary. Justify your answer.

The statement is False. If the underlying pattern in your data follows a linear decision boundary, using Linear Discriminant Analysis (LDA) is generally expected to give you better results on the test set. Quadratic Discriminant Analysis (QDA), which allows for more flexibility in modeling non-linear patterns, might end up fitting too closely to your training data and not perform as well on new, unseen data, potentially leading to a higher test error rate. So, in a nutshell, for a linear decision boundary, LDA is usually the better choice.

##(6) Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficients, βˆ0 = −6, βˆ1 = 0.05, and βˆ2 = 1.

a) Estimate the probability that a student who studies for 40 h and has an undergrad GPA of 3.5 gets an A in the class.

Given,

B0=-6, b1=0.05 b2=1 X1=40 X2=3.5

lets plug-in the values,

P(A/x)= exp(-6+(0.05* 40) + (1* 3.5)) / (1+exp(-6+(0.05 * 40)+(1*3.5)))

So, the estimated probability that a student who studies for 40 hours and has an undergrad GPA of 3.5 gets an A in the class is approximately 0.3781 or 37.81%.

b) How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the class?

To find the number of hours a student would need to study to have a 50% chance of getting an A in the class, we can use the logistic regression model and solve for the hours studied (X1) when the probability (P(Y=1|X1,X2)) is 0.5

Given,

B0=-6, b1=0.05 b2=1 (P(Y=1|X1,X2))=0.5 X2=3.5

0.5=exp(-6+ 0.05X1 + (1* 3.5)) / (1+exp(-6+ 0.05X1 +(1*3.5)))

2.5*20=50

So, the student would need to study for approximately 50 hours to have a 50% chance of getting an A in the class.

(8) Suppose that we take a data set, divide it into equally-sized training and test sets, and then try out two different classification procedures. First we use logistic regression and get an error rate of 20% on the training data and 30% on the test data. Next we use 1-nearest neighbors (i.e. K = 1) and get an average error rate (averaged over both test and training data sets) of 18%. Based on these results, which method should we prefer to use for classification of new observations? Why?

Firstly, when using the 1-nearest neighbors (K = 1) method, the training error rate is always 0% because each observation is compared to its nearest neighbor, making its prediction equal to the neighbor’s outcome. So, the training error is always perfect in this case.

Secondly, since the training and test sets are of equal size, both contribute equally to the average error rate calculated under the 1-nearest neighbors method. Therefore, the average error rate is 0.18, but since the training error is 0, the equation simplifies to 0.18 = Test Rate / 2. This implies that the test error rate is 0.36, or 36%.

So, the test error rate for 1-nearest neighbors (K = 1) is 36%, which is 6% higher than the test rate for logistic regression. Since we prefer methods with smaller test errors, in this case, we would prefer using logistic regression over the 1-nearest neighbors method.

(9) This problem has to do with odds.

a) On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?

The relationship between odds and probabilities in logistic regression is given by the formula:

Probability = odds/1+odds

odds of defaulting a credit card payment are 0.37

So, the probability of defaulting is:

Probability= (0.37)/(1+0.37)

Now, we can simplify to find the fraction:

Probability ~ 0.27

Therefore, on average, approximately 27% of people with an odds of 0.37 of defaulting on their credit card payment will indeed default.

b) Suppose that an individual has a 16% chance of defaulting on her credit card payment. What are the odds that she will default?

The relationship between probability and odds in logistic regression is given by the formula:

Odds= (Probability)/(1-Probability)

In this case, the probability of defaulting on a credit card payment is given as 16%, which can be expressed as 0.16.

So, the odds of defaulting are:

Odds= 0.16/(1-0.16)

odds~0.19

Therefore, the odds that an individual with a 16% chance of defaulting on her credit card payment will actually default are approximately 0.1905.

NikhilBharadwaj_DA_E3

2024-02-21