1 Exercise 2 — KNN classifier vs. KNN regression

Both methods are non-parametric and base their prediction on the \(K\) training observations closest to a test point \(x_0\) (its neighbourhood \(\mathcal{N}_0\)). They differ in the type of response they predict and in how they combine the neighbours.

  • KNN classifier predicts a qualitative (categorical) response. It estimates the conditional probability of each class \(j\) as the fraction of the \(K\) neighbours in that class, \[\Pr(Y = j \mid X = x_0) \approx \frac{1}{K}\sum_{i \in \mathcal{N}_0} I(y_i = j),\] and assigns \(x_0\) to the class with the highest estimated probability (a majority vote). The output is a class label and performance is judged by the classification error rate.

  • KNN regression predicts a quantitative response. It estimates the regression function as the average of the responses of the \(K\) neighbours, \[\hat f(x_0) = \frac{1}{K}\sum_{i \in \mathcal{N}_0} y_i,\] producing a numeric prediction; performance is judged by a quantity such as the test MSE.

In short, the classifier estimates \(\Pr(Y=j\mid X)\) and outputs a category by majority vote (approximating the Bayes classifier), whereas the regression method averages the neighbours’ numeric responses to estimate \(E(Y\mid X)\).

2 Exercise 9 — Multiple linear regression on Auto

Auto <- na.omit(Auto)
dim(Auto)
## [1] 392   9

2.1 (a) Scatterplot matrix of all variables

pairs(Auto[, names(Auto) != "name"])

2.2 (b) Matrix of correlations

round(cor(Auto[, names(Auto) != "name"]), 3)
##                 mpg cylinders displacement horsepower weight acceleration
## mpg           1.000    -0.778       -0.805     -0.778 -0.832        0.423
## cylinders    -0.778     1.000        0.951      0.843  0.898       -0.505
## displacement -0.805     0.951        1.000      0.897  0.933       -0.544
## horsepower   -0.778     0.843        0.897      1.000  0.865       -0.689
## weight       -0.832     0.898        0.933      0.865  1.000       -0.417
## acceleration  0.423    -0.505       -0.544     -0.689 -0.417        1.000
## year          0.581    -0.346       -0.370     -0.416 -0.309        0.290
## origin        0.565    -0.569       -0.615     -0.455 -0.585        0.213
##                year origin
## mpg           0.581  0.565
## cylinders    -0.346 -0.569
## displacement -0.370 -0.615
## horsepower   -0.416 -0.455
## weight       -0.309 -0.585
## acceleration  0.290  0.213
## year          1.000  0.182
## origin        0.182  1.000

2.3 (c) Multiple regression of mpg on all predictors except name

lm.fit <- lm(mpg ~ . - name, data = Auto)
summary(lm.fit)
## 
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

i. Is there a relationship between the predictors and the response? Yes. The overall \(F\)-statistic is large with a \(p\)-value of essentially zero, so we strongly reject the null hypothesis that all slope coefficients are zero; the predictors jointly explain about \(82\%\) of the variation in mpg.

ii. Which predictors are statistically significant? At the \(5\%\) level, displacement, weight, year, and origin are significant. cylinders, horsepower, and acceleration are not — largely because they are strongly collinear with weight and displacement.

iii. What does the year coefficient suggest? Its coefficient is about \(+0.75\): holding all other predictors fixed, average fuel economy improves by roughly \(0.75\) mpg per model year, i.e. cars became about \(0.75\) mpg more efficient each year over this period.

2.4 (d) Diagnostic plots

par(mfrow = c(2, 2))
plot(lm.fit)

The residuals-vs-fitted plot shows a mild U-shape and a funnel (increasing spread at large fitted values), indicating some non-linearity and mild heteroscedasticity. The residuals-vs-leverage plot flags no extreme outliers, but observation 14 has an unusually high leverage relative to the average leverage \((p+1)/n\).

2.5 (e) Models with interactions

summary(lm(mpg ~ horsepower * weight, data = Auto))
## 
## Call:
## lm(formula = mpg ~ horsepower * weight, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7725  -2.2074  -0.2708   1.9973  14.7314 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        6.356e+01  2.343e+00  27.127  < 2e-16 ***
## horsepower        -2.508e-01  2.728e-02  -9.195  < 2e-16 ***
## weight            -1.077e-02  7.738e-04 -13.921  < 2e-16 ***
## horsepower:weight  5.355e-05  6.649e-06   8.054 9.93e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.93 on 388 degrees of freedom
## Multiple R-squared:  0.7484, Adjusted R-squared:  0.7465 
## F-statistic: 384.8 on 3 and 388 DF,  p-value: < 2.2e-16

The horsepower:weight interaction is highly significant (\(p < 0.001\)), so the effect of horsepower on mpg depends on the car’s weight (and vice versa), and adding it raises \(R^2\).

2.6 (f) Transformations of horsepower

fits <- list(
  "mpg ~ horsepower"            = lm(mpg ~ horsepower, data = Auto),
  "mpg ~ log(horsepower)"       = lm(mpg ~ log(horsepower), data = Auto),
  "mpg ~ sqrt(horsepower)"      = lm(mpg ~ sqrt(horsepower), data = Auto),
  "mpg ~ horsepower + hp^2"     = lm(mpg ~ horsepower + I(horsepower^2), data = Auto)
)
sapply(fits, function(m) round(summary(m)$r.squared, 4))
##        mpg ~ horsepower   mpg ~ log(horsepower)  mpg ~ sqrt(horsepower) 
##                  0.6059                  0.6683                  0.6437 
## mpg ~ horsepower + hp^2 
##                  0.6876

All of the non-linear transformations fit better than the raw linear term, confirming the curvature seen in the diagnostics. The quadratic and the \(\log\) transformations give the largest improvement, consistent with mpg being a decreasing, convex function of horsepower.

3 Exercise 10 — The Carseats data set

dim(Carseats)
## [1] 400  11

3.1 (a) Fit Sales ~ Price + Urban + US

lm1 <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(lm1)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

3.2 (b) Interpretation of the coefficients

All effects are “holding the other predictors fixed”. Sales is in thousands of units and Price is in dollars.

  • Price (\(\approx -0.054\)): each $1 increase in price is associated with a decrease of about \(0.054\) thousand (\(\approx 54\)) unit sales. Negative and highly significant.
  • Urban[Yes] (\(\approx -0.022\)): urban stores sell about \(0.022\) thousand units less than rural stores, but this is not significant (\(p \approx 0.94\)) — no detectable urban/rural difference.
  • US[Yes] (\(\approx +1.20\)): stores in the US sell about \(1.20\) thousand (\(\approx 1200\)) more units than non-US stores; significant.
  • Intercept (\(\approx 13.0\)): expected sales for a hypothetical rural, non-US store with Price \(= 0\).

3.3 (c) Model in equation form

With \(\text{Urban}=1\) if the store is urban (else \(0\)) and \(\text{US}=1\) if it is in the US (else \(0\)): \[\widehat{\text{Sales}} = 13.0435 - 0.0545\,\text{Price} - 0.0219\,\text{Urban} + 1.2006\,\text{US}.\]

3.4 (d) For which predictors can we reject \(H_0:\beta_j = 0\)?

round(summary(lm1)$coefficients[, "Pr(>|t|)"], 4)
## (Intercept)       Price    UrbanYes       USYes 
##      0.0000      0.0000      0.9357      0.0000

We reject \(H_0\) for Price and US (and the intercept), but not for Urban.

3.5 (e) Smaller model Sales ~ Price + US

lm2 <- lm(Sales ~ Price + US, data = Carseats)
summary(lm2)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

3.6 (f) How well do (a) and (e) fit?

data.frame(
  model  = c("(a) Price+Urban+US", "(e) Price+US"),
  R2     = c(summary(lm1)$r.squared,     summary(lm2)$r.squared),
  adj_R2 = c(summary(lm1)$adj.r.squared, summary(lm2)$adj.r.squared),
  RSE    = c(summary(lm1)$sigma,         summary(lm2)$sigma)
)
##                model        R2    adj_R2      RSE
## 1 (a) Price+Urban+US 0.2392754 0.2335123 2.472492
## 2       (e) Price+US 0.2392629 0.2354305 2.469397

Both models explain the same \(\approx 23.9\%\) of the variation in Sales (dropping the useless Urban term costs no \(R^2\)). Model (e) has a slightly higher adjusted \(R^2\) and a lower residual standard error, so it is the preferred, more parsimonious model. In absolute terms the fit is modest: about three-quarters of the variability in sales is left unexplained.

3.7 (g) 95% confidence intervals for the coefficients of model (e)

round(confint(lm2), 4)
##               2.5 %  97.5 %
## (Intercept) 11.7903 14.2713
## Price       -0.0648 -0.0442
## USYes        0.6915  1.7078

3.8 (h) Outliers and high-leverage observations in model (e)

par(mfrow = c(2, 2))
plot(lm2)

n <- nrow(Carseats)
studres <- rstandard(lm2)
lev <- hatvalues(lm2)
cat("max |studentized residual| =", round(max(abs(studres)), 3), "\n")
## max |studentized residual| = 2.865
cat("observations with |stud. resid| > 3 :", sum(abs(studres) > 3), "\n")
## observations with |stud. resid| > 3 : 0
cat("average leverage =", round((length(coef(lm2))) / n, 5),
    " max leverage =", round(max(lev), 5), "\n")
## average leverage = 0.0075  max leverage = 0.04334
cat("high-leverage (lev > 3*avg) :", sum(lev > 3 * length(coef(lm2)) / n), "\n")
## high-leverage (lev > 3*avg) : 6

The largest studentized residual in absolute value is below \(3\), so there are no clear outliers. A few observations exceed three times the average leverage, so there are some mildly high-leverage points, but none are extreme.

4 Exercise 12 — Simple linear regression without an intercept

For regression through the origin, the coefficient of \(Y\) onto \(X\) is \(\hat\beta_{Y\sim X} = \sum_i x_i y_i / \sum_i x_i^2\) and of \(X\) onto \(Y\) is \(\hat\beta_{X\sim Y} = \sum_i x_i y_i / \sum_i y_i^2\).

4.1 (a) When are the two coefficients equal?

Since the numerators (\(\sum_i x_i y_i\)) are identical, the two estimates are equal if and only if the denominators are equal, i.e. \[\sum_i x_i^2 = \sum_i y_i^2\] (assuming \(\sum_i x_i y_i \neq 0\)). In general they differ.

4.2 (b) An example where the coefficients differ

set.seed(1)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
c(beta_Y_on_X = coef(lm(y ~ x + 0)),
  beta_X_on_Y = coef(lm(x ~ y + 0)))
## beta_Y_on_X.x beta_X_on_Y.y 
##     1.9938761     0.3911145
c(sum_x2 = sum(x^2), sum_y2 = sum(y^2))
##    sum_x2    sum_y2 
##  81.05509 413.21352

The sums of squares differ greatly, so the two coefficients are very different.

4.3 (c) An example where the coefficients are the same

set.seed(1)
x <- rnorm(100)
y <- sample(x)          # a permutation of x forces sum(y^2) == sum(x^2)
c(beta_Y_on_X = coef(lm(y ~ x + 0)),
  beta_X_on_Y = coef(lm(x ~ y + 0)))
## beta_Y_on_X.x beta_X_on_Y.y 
##   -0.07767695   -0.07767695
c(sum_x2 = sum(x^2), sum_y2 = sum(y^2))
##   sum_x2   sum_y2 
## 81.05509 81.05509

Because \(y\) is a permutation of \(x\), the denominators \(\sum x_i^2\) and \(\sum y_i^2\) are identical, so the two regression coefficients are exactly equal.