Statistical Learning Exercise-2

CHAPTER - 3

Conceptual Questions

Question-1

The p-values in Table 3.4 correspond to hypothesis tests that examine whether TV, radio, and newspaper advertising budgets have a significant impact on sales.

Answer

Null Hypotheses (H₀) for Sales Impact

H₀ (TV): Spending on TV advertisements has no significant effect on sales.
H₀ (Radio): Spending on radio advertisements has no significant effect on sales.
H₀ (Newspaper): Spending on newspaper advertisements has no significant effect on sales.

Interpretation of p-values

The p-value for TV advertising is extremely small, providing strong evidence that increasing TV ad spending positively influences sales.
The p-value for radio advertising is also very small, indicating that radio ads play a significant role in driving sales growth.
The p-value for newspaper advertising is relatively large, suggesting that there is no substantial evidence to conclude that newspaper ads impact sales.

Business Implications

TV and radio advertisements are effective in increasing sales, making them valuable investments for marketing strategies.
Newspaper advertisements do not show a significant impact on sales, suggesting that they may not be the most cost-effective medium for boosting revenue.

Conclusion

Based on the hypothesis testing results, businesses should prioritize TV and radio advertisements to enhance sales. Newspaper ads may not be the most efficient way to increase revenue, and resources should be allocated accordingly.

Question-2

The K-Nearest Neighbors (KNN) method is a non-parametric machine learning algorithm used for both classification and regression tasks. However, its application differs based on whether the target variable is categorical (classification) or continuous (regression).

Answer

K-Nearest Neighbors (KNN) Algorithm

KNN Classification

Used when the target variable is categorical.
The algorithm identifies the k nearest data points based on distance.
The test point is assigned to the majority class among its k nearest neighbors.
Applicable for both binary and multi-class classification:
- Binary Classification: Two possible categories (e.g., Yes/No, Spam/Not Spam).
- Multi-Class Classification: More than two possible categories (e.g., classifying flower species).

Example:

Predicting whether an email is spam (Yes/No) based on word frequency.

KNN Regression

Used when the target variable is continuous.
The algorithm identifies the k nearest neighbors and computes the average (or weighted average) of their target values.
The output is a numerical value rather than a category.

Example:

Predicting house prices based on size, number of rooms, and location.

Key Differences Between KNN Classification and KNN Regression

Feature	KNN Classification	KNN Regression
Target Variable	Categorical (Classes)	Continuous (Numerical)
Prediction Output	Majority class of k neighbors	Average of k neighbors’ values
Example	Predicting if a customer will buy a product (Yes/No)	Predicting the price of a product
Decision Rule	Voting among k nearest neighbors	Mean or weighted mean of k nearest values

General Classification (Beyond Binary)

KNN is effective for multi-class classification, where there are more than two possible categories.
Instead of a simple Yes/No classification, the test point is assigned to the most common class among the k nearest neighbors.

Example:

Classifying flowers as Setosa, Versicolor, or Virginica based on petal and sepal measurements.

KNN can handle both binary and multi-class classification problems.

Key Difference in Predictions

KNN Classification: Assigns a class based on majority voting.
KNN Regression: Predicts a value based on numerical averaging.

Conclusion

KNN is a versatile algorithm that can be used for both classification and regression tasks. The key difference lies in how predictions are made: majority voting for classification and numerical averaging for regression.

Question -3

We are given a regression model to predict starting salary after graduation (in thousands of dollars) based on the following predictors:

(a) Salary Differences Between High School and College Graduates

We analyze the given regression model:

\[ \hat{Y} = 50 + 20X_1 + 0.07X_2 + 35X_3 + 0.01X_4 - 10X_5 \]

where:
- $X_1$ = GPA
- $X_2$ = IQ
- $X_3$ = Level (1 for College, 0 for High School)
- $X_4 = X_1 \times X_2$ (GPA × IQ Interaction)
- $X_5 = X_1 \times X_3$ (GPA × Level Interaction)

Base Effect of College ($X_3 = 1$)

College graduates receive an initial salary boost of $35K.
The negative interaction effect of GPA × Level ($-10X_5$) implies that GPA has a weaker impact on salary for college graduates.

Salary Comparison

For High School Graduates ($X_3 = 0$)

\[ \hat{Y}_{HS} = 50 + 20X_1 + 0.07X_2 + 0.01X_4 \]

For College Graduates ($X_3 = 1$)

\[ \hat{Y}_{College} = 50 + 20X_1 + 0.07X_2 + 35 + 0.01X_4 - 10X_1 \]

\[ = 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Finding When High School Graduates Earn More

For high school graduates to earn more than college graduates:

\[ 50 + 20X_1 + 0.07X_2 + 0.01X_4 > 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]

Cancel common terms:

\[ 50 + 20X_1 > 85 + 10X_1 \]

\[ 10X_1 > 35 \]

\[ X_1 > 3.5 \]

Thus, high school graduates earn more than college graduates when GPA is greater than 3.5.

(b) Predicting Salary for a College Graduate with IQ = 110 and GPA = 4.0

Given:
- $X_1 = 4.0$ (GPA)
- $X_2 = 110$ (IQ)
- $X_3 = 1$ (College graduate)
- $X_4 = X_1 \times X_2 = 4.0 \times 110 = 440$
- $X_5 = X_1 \times X_3 = 4.0 \times 1 = 4.0$

\[ \hat{Y} = 50 + (20 \times 4.0) + (0.07 \times 110) + (35 \times 1) + (0.01 \times 440) - (10 \times 4.0) \]

\[ = 50 + 80 + 7.7 + 35 + 4.4 - 40 \]

\[ = 137.1 \]

Predicted Salary: $137,100 (137.1 thousand dollars).

(c) Evaluating the Interaction Effect (True or False)

The coefficient for the GPA × IQ interaction term ($X_4$) is 0.01.

A small coefficient does not necessarily indicate an insignificant interaction effect.
Significance should be determined using p-values, not just coefficient size.
Even small coefficients can have a significant impact if the input variables take large values.
Since we do not have the p-value, we cannot conclude that the interaction is weak.

Thus, the statement is False, as a small coefficient alone does not indicate a weak interaction effect.

Question -5

Fitted Values in Linear Regression Without Intercept

We are given the fitted values for a linear regression model without an intercept:

\[ \hat{y}_i = x_i \hat{\beta} \]

where:

\[ \hat{\beta} = \frac{\sum_{i=1}^{n} x_i y_i}{\sum_{i'=1}^{n} x_{i'}^2} \]

Deriving $\hat{y}_i$ as a Linear Combination of $y_{i'}$

Substituting the formula for $\hat{\beta}$ into $\hat{y}_i$:

\[ \hat{y}_i = x_i \times \frac{\sum_{i'=1}^{n} x_{i'} y_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Rewriting:

\[ \hat{y}_i = \sum_{i'=1}^{n} \left( \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \right) y_{i'} \]

Comparing with the general form:

\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]

we identify the weight coefficient:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Thus, the coefficient $a_{i'}$ is:

\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]

Conclusion

This confirms that the fitted values in linear regression are linear combinations of the response values $y_{i'}$, weighted by $a_{i'}$.

Implementation in R

Below is an R code snippet that demonstrates how to compute the fitted values in a simple linear regression without an intercept.

```r # Sample data x <- c(1, 2, 3, 4, 5) y <- c(2, 4, 6, 8, 10)

Compute beta hat

beta_hat <- sum(x * y) / sum(x^2)

Compute fitted values

y_hat <- x * beta_hat

Compute the coefficients ai’

a_i_prime <- outer(x, x, FUN = function(xi, xip) xi * xip / sum(x^2))

Print results

cat(“Estimated Beta (β̂):”, beta_hat, “”) cat(“Fitted Values (ŷ):”, y_hat, “”) print(“Coefficients a_i’:”) print(a_i_prime)

Excercise_2

2025-02-26

Statistical Learning Exercise-2

CHAPTER - 3

Conceptual Questions

Question-1

Answer

Null Hypotheses (H₀) for Sales Impact

Interpretation of p-values

Business Implications

Conclusion

Question-2

Answer

K-Nearest Neighbors (KNN) Algorithm

KNN Classification

Example:

KNN Regression

Example:

Key Differences Between KNN Classification and KNN Regression

General Classification (Beyond Binary)

Example:

Key Difference in Predictions

Conclusion

Question -3

(a) Salary Differences Between High School and College Graduates

Base Effect of College (\(X_3 = 1\))

Salary Comparison

For High School Graduates (\(X_3 = 0\))

For College Graduates (\(X_3 = 1\))

Finding When High School Graduates Earn More

(b) Predicting Salary for a College Graduate with IQ = 110 and GPA = 4.0

(c) Evaluating the Interaction Effect (True or False)

Question -5

Fitted Values in Linear Regression Without Intercept

Deriving \(\hat{y}_i\) as a Linear Combination of \(y_{i'}\)

Conclusion

Implementation in R

Compute beta hat

Compute fitted values

Compute the coefficients ai’

Print results