KNN classification assigns a class label based on the majority vote among the k nearest neighbors. It outputs discrete class labels.
KNN regression takes the average response value of the k nearest neighbors, producing a continuous numeric prediction.
Auto Datasetlibrary(ISLR2)
data("Auto")
# (a) Scatterplot matrix
pairs(Auto)
# (b) Correlation matrix (excluding qualitative 'name')
Auto_num <- Auto[, sapply(Auto, is.numeric)]
cor(Auto_num)
# (c) Multiple linear regression
fit_auto <- lm(mpg ~ . - name, data = Auto)
summary(fit_auto)
Interpretation: - Yes, several
predictors are significantly associated with mpg (e.g.,
year, weight, horsepower) - The
year coefficient suggests that newer cars (higher model
years) tend to have higher mpg
# (d) Diagnostic plots
par(mfrow = c(2, 2))
plot(fit_auto)
# (e) Interactions
fit_interact <- lm(mpg ~ (.-name)^2, data = Auto)
summary(fit_interact)
# (f) Transformations
fit_transformed <- lm(mpg ~ log(weight) + sqrt(horsepower) + acceleration + year + origin, data = Auto)
summary(fit_transformed)
Carseats Datasetlibrary(ISLR2)
data("Carseats")
# (a) Fit model
fit_carseats <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(fit_carseats)
(b) Coefficient Interpretation: -
Price: Every $1 increase in price reduces sales by
approximately 0.054 units - UrbanYes: Urban location
reduces sales by ~0.021 (not significant) - USYes: U.S.
stores have 1.2 more units in sales than non-US stores (significant)
(c) Model:
\[ \hat{Sales} = \beta_0 + \beta_1 \cdot Price + \beta_2 \cdot UrbanYes + \beta_3 \cdot USYes \]
(d) Significant predictors:
Only Price and US (based on p-values <
0.05)
# (e) Reduced model
fit_carseats_reduced <- lm(Sales ~ Price + US, data = Carseats)
summary(fit_carseats_reduced)
# (f) Model fit comparison
summary(fit_carseats)
summary(fit_carseats_reduced)
# (g) Confidence intervals
confint(fit_carseats_reduced)
# (h) Diagnostic plots
par(mfrow = c(2, 2))
plot(fit_carseats_reduced)
# (a) Same coefficients occur when X and Y are perfectly linearly related
# (b) Example: different coefficients
set.seed(1)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
summary(lm(y ~ x + 0))
summary(lm(x ~ y + 0))
# (c) Example: same coefficients
x2 <- seq(-1, 1, length.out = 100)
y2 <- x2
summary(lm(y2 ~ x2 + 0))
summary(lm(x2 ~ y2 + 0))
(a) Explanation:
The regression coefficients of X on Y and Y on X will be the same only
when the data points lie exactly on a straight line through the
origin.
(b) & (c): Confirmed in R using synthetic examples.