CHAPTER - 3

Conceptual Questions

Question-1

The p-values in Table 3.4 correspond to hypothesis tests that examine whether TV, radio, and newspaper advertising budgets have a significant impact on sales.

Null Hypotheses (H₀) in terms of sales:

H₀ (TV): Spending on TV advertisements has no impact on sales.
H₀ (Radio): Spending on radio advertisements has no impact on sales.
H₀ (Newspaper): Spending on newspaper advertisements has no impact on sales.

Explanation of p-values:

The p-value for TV advertising is very small, indicating strong evidence that
increasing TV ad spending leads to increased sales.
The p-value for radio advertising is also very small, suggesting that
radio ads significantly contribute to sales growth.
The p-value for newspaper advertising is relatively large, meaning that
there is no strong evidence that newspaper ads impact sales.
TV and radio ads are effective in increasing sales, so companies should
focus more on these mediums.
Newspaper ads do not have a significant effect on sales, so they may not
be a cost-effective way to boost sales.

Question-2

The K-Nearest Neighbors (KNN) method is a non-parametric machine learning algorithm used for both classification and regression tasks. However, its application differs based on whether the target variable is categorical (classification) or continuous (regression).

KNN Classification

Used when the target variable is categorical.
The algorithm finds the k nearest data points based on distance.
It then assigns the majority class among the k neighbors to the test point.
Can be used for both binary and multi-class classification:
- Binary Classification: Only two possible classes.
- Multi-Class Classification: More than two possible classes.
Example: Predicting whether an email is spam (Yes/No) based on word frequency.

KNN Regression

Used when the target variable is continuous.
The algorithm finds the k nearest data points and computes the average (or weighted average) of their target values.
The predicted value is a numerical output rather than a category.
Example: Predicting the price of a house based on its size, number of rooms and location.

Key Differences Between KNN Classifier and KNN Regression

Feature	KNN Classification	KNN Regression
Target Variable	Categorical (classes)	Continuous (numerical)
Prediction Output	Majority class of k neighbor’s	Average of k neighbor’s values
Example	Predicting if a customer will buy a product (Yes/No)	Predicting the price of a product
Decision Rule	Voting among k nearest neighbors	Mean or weighted mean of k nearest values

General Classification (Not Just Binary)

KNN works for multi-class classification, where there are more than two possible classes.
Instead of a simple Yes/No (binary classification), it assigns the test point to the most common class among the k neighbors.
Example: Classifying types of flowers as “Setosa”, “Versicolor” or “Virginica” based on petal and sepal measurements.
KNN classification can handle both binary and multi-class problems.
The main difference lies in how predictions are made majority voting (classification) vs. numerical averaging (regression).

Question -3

We are given a regression model to predict starting salary after graduation (in thousands of dollars) based on the following predictors:

GPA ($X_1$)
IQ ($X_2$)
Level ($X_3$) → 1 for College and 0 for High School
Interaction between GPA and IQ ($X_4 = X_1 \times X_2$)
Interaction between GPA and Level ($X_5 = X_1 \times X_3$)

The estimated regression equation is:

\[ \hat{Y} = 50 + 20X_1 + 0.07X_2 + 35X_3 + 0.01X_4 - 10X_5 \]

(a) Which answer is correct, and why?

We analyze which of the four given statements about salary differences between high school and college graduates is correct.

Base effect of College ($X_3 = 1$): College graduates start with a salary boost of $35K.
Negative interaction effect of GPA × Level ($-10X_5$): The impact of GPA on salary is weaker for college graduates.

Now, let’s compare the salaries:

High School Graduate ($X_3 = 0$): \[ \hat{Y}_{HS} = 50 + 20X_1 + 0.07X_2 + 0.01X_4 \]
College Graduate ($X_3 = 1$): \[ \hat{Y}_{College} = 50 + 20X_1 + 0.07X_2 + 35 + 0.01X_4 - 10X_1 \] \[ = 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Finding When High School Graduates Earn More

For high school graduates to earn more than college graduates:

\[ 50 + 20X_1 + 0.07X_2 + 0.01X_4 > 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Cancel common terms:

\[ 50 + 20X_1 > 85 + 10X_1 \]

\[ 10X_1 > 35 \]

\[ X_1 > 3.5 \]

This means that high school graduates earn more than college graduates when GPA is greater than 3.5.

The Correct option is (iii) High school graduates earn more than college graduates, provided GPA is high enough (above 3.5).

(b) Predict the salary of a college graduate with IQ = 110 and GPA = 4.0

We substitute: - $X_1 = 4.0$ (GPA) - $X_2 = 110$ (IQ) - $X_3 = 1$ (College graduate) - $X_4 = X_1 \times X_2 = 4.0 \times 110 = 440$ - $X_5 = X_1 \times X_3 = 4.0 \times 1 = 4.0$

\[ \hat{Y} = 50 + (20 \times 4.0) + (0.07 \times 110) + (35 \times 1) + (0.01 \times 440) - (10 \times 4.0) \]

\[ = 50 + 80 + 7.7 + 35 + 4.4 - 40 \]

\[ = 137.1 \]

Predicted salary is $137,100 (137.1 thousand dollars).

(c) True or False: The coefficient for the GPA/IQ interaction term is small, so there is very little evidence of an interaction effect. Justify your answer.

The coefficient for the GPA × IQ interaction term ($X_4$) is 0.01.
A small coefficient does not necessarily mean that the interaction is insignificant.
The significance of an interaction effect should be evaluated using p-values, not just the coefficient size.
Even small coefficients can have meaningful effects if the input variables take large values.

Since we do not have the p-value, we cannot conclude that the interaction is weak.

So, it is False because a small coefficient does not necessarily indicate a weak interaction effect.

Question -5

We are given the fitted values for a linear regression without an intercept:

\[ \hat{y}_i = x_i \hat{\beta} \]

where:

\[ \hat{\beta} = \frac{\sum_{i=1}^{n} x_i y_i}{\sum_{i'=1}^{n} x_{i'}^2} \]

We need to show that:

\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]

and determine $a_{i'}$.

Substitute $\hat{\beta}$ into $\hat{y}_i$

Substituting the formula for $\hat{\beta}$ into $\hat{y}_i$:

\[ \hat{y}_i = x_i \times \frac{\sum_{i'=1}^{n} x_{i'} y_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Rewriting:

\[ \hat{y}_i = \sum_{i'=1}^{n} \left( \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \right) y_{i'} \]

Comparing with:

\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]

we identify:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Thus, the coefficient $a_{i'}$ is: