1 Chapter 3: ISLRv2 Exercises

1.1 Problem 2 – KNN Classifier vs KNN Regression

KNN classification assigns a class label based on the majority vote among the k nearest neighbors. It outputs discrete class labels.

KNN regression takes the average response value of the k nearest neighbors, producing a continuous numeric prediction.

  • Use classification when the response is categorical.
  • Use regression when the response is quantitative.

1.2 Problem 9 – Multiple Linear Regression on Auto Dataset

library(ISLR2)
data("Auto")

# (a) Scatterplot matrix
pairs(Auto)

# (b) Correlation matrix (excluding qualitative 'name')
Auto_num <- Auto[, sapply(Auto, is.numeric)]
cor(Auto_num)
# (c) Multiple linear regression
fit_auto <- lm(mpg ~ . - name, data = Auto)
summary(fit_auto)

Interpretation: - Yes, several predictors are significantly associated with mpg (e.g., year, weight, horsepower) - The year coefficient suggests that newer cars (higher model years) tend to have higher mpg

# (d) Diagnostic plots
par(mfrow = c(2, 2))
plot(fit_auto)
# (e) Interactions
fit_interact <- lm(mpg ~ (.-name)^2, data = Auto)
summary(fit_interact)
# (f) Transformations
fit_transformed <- lm(mpg ~ log(weight) + sqrt(horsepower) + acceleration + year + origin, data = Auto)
summary(fit_transformed)

1.3 Problem 10 – Multiple Regression on Carseats Dataset

library(ISLR2)
data("Carseats")

# (a) Fit model
fit_carseats <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(fit_carseats)

(b) Coefficient Interpretation: - Price: Every $1 increase in price reduces sales by approximately 0.054 units - UrbanYes: Urban location reduces sales by ~0.021 (not significant) - USYes: U.S. stores have 1.2 more units in sales than non-US stores (significant)

(c) Model:

\[ \hat{Sales} = \beta_0 + \beta_1 \cdot Price + \beta_2 \cdot UrbanYes + \beta_3 \cdot USYes \]

(d) Significant predictors:
Only Price and US (based on p-values < 0.05)

# (e) Reduced model
fit_carseats_reduced <- lm(Sales ~ Price + US, data = Carseats)
summary(fit_carseats_reduced)
# (f) Model fit comparison
summary(fit_carseats)
summary(fit_carseats_reduced)
# (g) Confidence intervals
confint(fit_carseats_reduced)
# (h) Diagnostic plots
par(mfrow = c(2, 2))
plot(fit_carseats_reduced)

1.4 Problem 12 – Regression of Y onto X vs X onto Y (No Intercept)

# (a) Same coefficients occur when X and Y are perfectly linearly related

# (b) Example: different coefficients
set.seed(1)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
summary(lm(y ~ x + 0))
summary(lm(x ~ y + 0))

# (c) Example: same coefficients
x2 <- seq(-1, 1, length.out = 100)
y2 <- x2
summary(lm(y2 ~ x2 + 0))
summary(lm(x2 ~ y2 + 0))

(a) Explanation:
The regression coefficients of X on Y and Y on X will be the same only when the data points lie exactly on a straight line through the origin.

(b) & (c): Confirmed in R using synthetic examples.