Statistical Learning Exercise-2

CHAPTER - 3

Conceptual Questions

Question-1

The p-values in Table 3.4 correspond to hypothesis tests that examine whether TV, radio, and newspaper advertising budgets have a significant impact on sales.

Answer

Null Hypotheses (H₀) for Sales Impact


Interpretation of p-values


Business Implications


Conclusion

Based on the hypothesis testing results, businesses should prioritize TV and radio advertisements to enhance sales. Newspaper ads may not be the most efficient way to increase revenue, and resources should be allocated accordingly.

Question-2

The K-Nearest Neighbors (KNN) method is a non-parametric machine learning algorithm used for both classification and regression tasks. However, its application differs based on whether the target variable is categorical (classification) or continuous (regression).

Answer

K-Nearest Neighbors (KNN) Algorithm

KNN Classification

  • Used when the target variable is categorical.
  • The algorithm identifies the k nearest data points based on distance.
  • The test point is assigned to the majority class among its k nearest neighbors.
  • Applicable for both binary and multi-class classification:
    • Binary Classification: Two possible categories (e.g., Yes/No, Spam/Not Spam).
    • Multi-Class Classification: More than two possible categories (e.g., classifying flower species).

Example:

Predicting whether an email is spam (Yes/No) based on word frequency.


KNN Regression

  • Used when the target variable is continuous.
  • The algorithm identifies the k nearest neighbors and computes the average (or weighted average) of their target values.
  • The output is a numerical value rather than a category.

Example:

Predicting house prices based on size, number of rooms, and location.


Key Differences Between KNN Classification and KNN Regression

Feature KNN Classification KNN Regression
Target Variable Categorical (Classes) Continuous (Numerical)
Prediction Output Majority class of k neighbors Average of k neighbors’ values
Example Predicting if a customer will buy a product (Yes/No) Predicting the price of a product
Decision Rule Voting among k nearest neighbors Mean or weighted mean of k nearest values

General Classification (Beyond Binary)

Example:

Classifying flowers as Setosa, Versicolor, or Virginica based on petal and sepal measurements.

  • KNN can handle both binary and multi-class classification problems.

Key Difference in Predictions


Conclusion

KNN is a versatile algorithm that can be used for both classification and regression tasks. The key difference lies in how predictions are made: majority voting for classification and numerical averaging for regression.

Question -3

We are given a regression model to predict starting salary after graduation (in thousands of dollars) based on the following predictors:

(a) Salary Differences Between High School and College Graduates

We analyze the given regression model:

\[ \hat{Y} = 50 + 20X_1 + 0.07X_2 + 35X_3 + 0.01X_4 - 10X_5 \]

where:
- \(X_1\) = GPA
- \(X_2\) = IQ
- \(X_3\) = Level (1 for College, 0 for High School)
- \(X_4 = X_1 \times X_2\) (GPA × IQ Interaction)
- \(X_5 = X_1 \times X_3\) (GPA × Level Interaction)

Base Effect of College (\(X_3 = 1\))

  • College graduates receive an initial salary boost of $35K.
  • The negative interaction effect of GPA × Level (\(-10X_5\)) implies that GPA has a weaker impact on salary for college graduates.

Salary Comparison

For High School Graduates (\(X_3 = 0\))

\[ \hat{Y}_{HS} = 50 + 20X_1 + 0.07X_2 + 0.01X_4 \]

For College Graduates (\(X_3 = 1\))

\[ \hat{Y}_{College} = 50 + 20X_1 + 0.07X_2 + 35 + 0.01X_4 - 10X_1 \]

\[ = 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Finding When High School Graduates Earn More

For high school graduates to earn more than college graduates:

\[ 50 + 20X_1 + 0.07X_2 + 0.01X_4 > 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Cancel common terms:

\[ 50 + 20X_1 > 85 + 10X_1 \]

\[ 10X_1 > 35 \]

\[ X_1 > 3.5 \]

Thus, high school graduates earn more than college graduates when GPA is greater than 3.5.


(b) Predicting Salary for a College Graduate with IQ = 110 and GPA = 4.0

Given:
- \(X_1 = 4.0\) (GPA)
- \(X_2 = 110\) (IQ)
- \(X_3 = 1\) (College graduate)
- \(X_4 = X_1 \times X_2 = 4.0 \times 110 = 440\)
- \(X_5 = X_1 \times X_3 = 4.0 \times 1 = 4.0\)

\[ \hat{Y} = 50 + (20 \times 4.0) + (0.07 \times 110) + (35 \times 1) + (0.01 \times 440) - (10 \times 4.0) \]

\[ = 50 + 80 + 7.7 + 35 + 4.4 - 40 \]

\[ = 137.1 \]

Predicted Salary: $137,100 (137.1 thousand dollars).


(c) Evaluating the Interaction Effect (True or False)

The coefficient for the GPA × IQ interaction term (\(X_4\)) is 0.01.

Thus, the statement is False, as a small coefficient alone does not indicate a weak interaction effect.

Question -5

Fitted Values in Linear Regression Without Intercept

We are given the fitted values for a linear regression model without an intercept:

\[ \hat{y}_i = x_i \hat{\beta} \]

where:

\[ \hat{\beta} = \frac{\sum_{i=1}^{n} x_i y_i}{\sum_{i'=1}^{n} x_{i'}^2} \]


Deriving \(\hat{y}_i\) as a Linear Combination of \(y_{i'}\)

Substituting the formula for \(\hat{\beta}\) into \(\hat{y}_i\):

\[ \hat{y}_i = x_i \times \frac{\sum_{i'=1}^{n} x_{i'} y_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Rewriting:

\[ \hat{y}_i = \sum_{i'=1}^{n} \left( \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \right) y_{i'} \]

Comparing with the general form:

\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]

we identify the weight coefficient:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Thus, the coefficient \(a_{i'}\) is:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]


Conclusion

This confirms that the fitted values in linear regression are linear combinations of the response values \(y_{i'}\), weighted by \(a_{i'}\).


Implementation in R

Below is an R code snippet that demonstrates how to compute the fitted values in a simple linear regression without an intercept.

```r # Sample data x <- c(1, 2, 3, 4, 5) y <- c(2, 4, 6, 8, 10)

Compute beta hat

beta_hat <- sum(x * y) / sum(x^2)

Compute fitted values

y_hat <- x * beta_hat

Compute the coefficients ai’

a_i_prime <- outer(x, x, FUN = function(xi, xip) xi * xip / sum(x^2))