Statistical Learning Exercise-2

CHAPTER - 3

Conceptual Questions

Question-1

The p-values in Table 3.4 correspond to hypothesis tests that examine whether TV, radio, and newspaper advertising budgets have a significant impact on sales.

Null Hypotheses (H₀) in terms of sales:

  • H₀ (TV): Spending on TV advertisements has no impact on sales.
  • H₀ (Radio): Spending on radio advertisements has no impact on sales.
  • H₀ (Newspaper): Spending on newspaper advertisements has no impact on sales.

Explanation of p-values:

  • The p-value for TV advertising is very small, indicating strong evidence that
    increasing TV ad spending leads to increased sales.

  • The p-value for radio advertising is also very small, suggesting that
    radio ads significantly contribute to sales growth.

  • The p-value for newspaper advertising is relatively large, meaning that
    there is no strong evidence that newspaper ads impact sales.

  • TV and radio ads are effective in increasing sales, so companies should
    focus more on these mediums.

  • Newspaper ads do not have a significant effect on sales, so they may not
    be a cost-effective way to boost sales.

Question-2

The K-Nearest Neighbors (KNN) method is a non-parametric machine learning algorithm used for both classification and regression tasks. However, its application differs based on whether the target variable is categorical (classification) or continuous (regression).

KNN Classification

  • Used when the target variable is categorical.
  • The algorithm finds the k nearest data points based on distance.
  • It then assigns the majority class among the k neighbors to the test point.
  • Can be used for both binary and multi-class classification:
    • Binary Classification: Only two possible classes.
    • Multi-Class Classification: More than two possible classes.
  • Example: Predicting whether an email is spam (Yes/No) based on word frequency.

KNN Regression

  • Used when the target variable is continuous.
  • The algorithm finds the k nearest data points and computes the average (or weighted average) of their target values.
  • The predicted value is a numerical output rather than a category.
  • Example: Predicting the price of a house based on its size, number of rooms and location.

Key Differences Between KNN Classifier and KNN Regression

Feature KNN Classification KNN Regression
Target Variable Categorical (classes) Continuous (numerical)
Prediction Output Majority class of k neighbor’s Average of k neighbor’s values
Example Predicting if a customer will buy a product (Yes/No) Predicting the price of a product
Decision Rule Voting among k nearest neighbors Mean or weighted mean of k nearest values

General Classification (Not Just Binary)

  • KNN works for multi-class classification, where there are more than two possible classes.

  • Instead of a simple Yes/No (binary classification), it assigns the test point to the most common class among the k neighbors.

  • Example: Classifying types of flowers as “Setosa”, “Versicolor” or “Virginica” based on petal and sepal measurements.

  • KNN classification can handle both binary and multi-class problems.

  • The main difference lies in how predictions are made majority voting (classification) vs. numerical averaging (regression).

Question -3

We are given a regression model to predict starting salary after graduation (in thousands of dollars) based on the following predictors:

  • GPA (\(X_1\))
  • IQ (\(X_2\))
  • Level (\(X_3\)) → 1 for College and 0 for High School
  • Interaction between GPA and IQ (\(X_4 = X_1 \times X_2\))
  • Interaction between GPA and Level (\(X_5 = X_1 \times X_3\))

The estimated regression equation is:

\[ \hat{Y} = 50 + 20X_1 + 0.07X_2 + 35X_3 + 0.01X_4 - 10X_5 \]

(a) Which answer is correct, and why?

We analyze which of the four given statements about salary differences between high school and college graduates is correct.

  • Base effect of College (\(X_3 = 1\)): College graduates start with a salary boost of $35K.
  • Negative interaction effect of GPA × Level (\(-10X_5\)): The impact of GPA on salary is weaker for college graduates.

Now, let’s compare the salaries:

  1. High School Graduate (\(X_3 = 0\)): \[ \hat{Y}_{HS} = 50 + 20X_1 + 0.07X_2 + 0.01X_4 \]

  2. College Graduate (\(X_3 = 1\)): \[ \hat{Y}_{College} = 50 + 20X_1 + 0.07X_2 + 35 + 0.01X_4 - 10X_1 \] \[ = 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Finding When High School Graduates Earn More

For high school graduates to earn more than college graduates:

\[ 50 + 20X_1 + 0.07X_2 + 0.01X_4 > 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Cancel common terms:

\[ 50 + 20X_1 > 85 + 10X_1 \]

\[ 10X_1 > 35 \]

\[ X_1 > 3.5 \]

This means that high school graduates earn more than college graduates when GPA is greater than 3.5.

The Correct option is (iii) High school graduates earn more than college graduates, provided GPA is high enough (above 3.5).

(b) Predict the salary of a college graduate with IQ = 110 and GPA = 4.0

We substitute: - \(X_1 = 4.0\) (GPA) - \(X_2 = 110\) (IQ) - \(X_3 = 1\) (College graduate) - \(X_4 = X_1 \times X_2 = 4.0 \times 110 = 440\) - \(X_5 = X_1 \times X_3 = 4.0 \times 1 = 4.0\)

\[ \hat{Y} = 50 + (20 \times 4.0) + (0.07 \times 110) + (35 \times 1) + (0.01 \times 440) - (10 \times 4.0) \]

\[ = 50 + 80 + 7.7 + 35 + 4.4 - 40 \]

\[ = 137.1 \]

Predicted salary is $137,100 (137.1 thousand dollars).

(c) True or False: The coefficient for the GPA/IQ interaction term is small, so there is very little evidence of an interaction effect. Justify your answer.

  • The coefficient for the GPA × IQ interaction term (\(X_4\)) is 0.01.
  • A small coefficient does not necessarily mean that the interaction is insignificant.
  • The significance of an interaction effect should be evaluated using p-values, not just the coefficient size.
  • Even small coefficients can have meaningful effects if the input variables take large values.

Since we do not have the p-value, we cannot conclude that the interaction is weak.

So, it is False because a small coefficient does not necessarily indicate a weak interaction effect.

Question -5

We are given the fitted values for a linear regression without an intercept:

\[ \hat{y}_i = x_i \hat{\beta} \]

where:

\[ \hat{\beta} = \frac{\sum_{i=1}^{n} x_i y_i}{\sum_{i'=1}^{n} x_{i'}^2} \]

We need to show that:

\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]

and determine \(a_{i'}\).

Substitute \(\hat{\beta}\) into \(\hat{y}_i\)

Substituting the formula for \(\hat{\beta}\) into \(\hat{y}_i\):

\[ \hat{y}_i = x_i \times \frac{\sum_{i'=1}^{n} x_{i'} y_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Rewriting:

\[ \hat{y}_i = \sum_{i'=1}^{n} \left( \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \right) y_{i'} \]

Comparing with:

\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]

we identify:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Thus, the coefficient \(a_{i'}\) is:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

This result confirms that the fitted values in linear regression are linear combinations of the response values \(y_{i'}\), weighted by \(a_{i'}\).