1. Load the Auto Dataset

if (!requireNamespace("ISLR", quietly = TRUE)) install.packages("ISLR", repos = "https://cloud.r-project.org")
library(ISLR)
data(Auto)
head(Auto)
##   mpg cylinders displacement horsepower weight acceleration year origin
## 1  18         8          307        130   3504         12.0   70      1
## 2  15         8          350        165   3693         11.5   70      1
## 3  18         8          318        150   3436         11.0   70      1
## 4  16         8          304        150   3433         12.0   70      1
## 5  17         8          302        140   3449         10.5   70      1
## 6  15         8          429        198   4341         10.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst
## 5               ford torino
## 6          ford galaxie 500

2. Simple Linear Regression: mpg ~ horsepower

We fit a simple linear regression to predict miles per gallon (mpg) from horsepower.

lm_linear <- lm(mpg ~ horsepower, data = Auto)
summary(lm_linear)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

3. Residual vs. Fitted Plot — Linear Model

# Residual vs Fitted for linear model
plot(
  lm_linear$fitted.values,
  lm_linear$residuals,
  xlab = "Fitted Values",
  ylab = "Residuals",
  main = "Residuals vs. Fitted Values\n[Linear Model: mpg ~ horsepower]",
  pch  = 20,
  col  = "steelblue"
)
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_linear$fitted.values, lm_linear$residuals),
      col = "darkgreen", lwd = 2)
legend("topright",
       legend = c("Zero line (y = 0)", "LOWESS smoother"),
       col    = c("red", "darkgreen"),
       lty    = c(2, 1), lwd = 2)

Interpretation

The LOWESS smoother shows a clear U-shaped (concave) curve in the residuals. This systematic pattern tells us the linear model has failed to capture the nonlinear relationship between mpg and horsepower. Residuals are:

  • Positive at low fitted values (model under-predicts)
  • Negative in the middle (model over-predicts)
  • Positive again at high fitted values (model under-predicts again)

This U-shape is strong evidence of a nonlinear relationship.


4. Polynomial Regression: mpg ~ horsepower²

To address the nonlinearity, we add a quadratic term.

lm_poly <- lm(mpg ~ poly(horsepower, 2), data = Auto)
summary(lm_poly)
## 
## Call:
## lm(formula = mpg ~ poly(horsepower, 2), data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.7135  -2.5943  -0.0859   2.2868  15.8961 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            23.4459     0.2209  106.13   <2e-16 ***
## poly(horsepower, 2)1 -120.1377     4.3739  -27.47   <2e-16 ***
## poly(horsepower, 2)2   44.0895     4.3739   10.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.374 on 389 degrees of freedom
## Multiple R-squared:  0.6876, Adjusted R-squared:  0.686 
## F-statistic:   428 on 2 and 389 DF,  p-value: < 2.2e-16

5. Residual vs. Fitted Plot — Polynomial Model

plot(
  lm_poly$fitted.values,
  lm_poly$residuals,
  xlab = "Fitted Values",
  ylab = "Residuals",
  main = "Residuals vs. Fitted Values\n[Polynomial Model: mpg ~ horsepower + horsepower²]",
  pch  = 20,
  col  = "darkorange"
)
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_poly$fitted.values, lm_poly$residuals),
      col = "darkgreen", lwd = 2)
legend("topright",
       legend = c("Zero line (y = 0)", "LOWESS smoother"),
       col    = c("red", "darkgreen"),
       lty    = c(2, 1), lwd = 2)

Interpretation

After adding the quadratic term, the residuals are now randomly scattered around zero with no systematic curvature. The LOWESS smoother runs nearly flat along the zero line, confirming that the polynomial model has successfully captured the nonlinear relationship.


6. Side-by-Side Comparison

par(mfrow = c(1, 2))

# --- Linear ---
plot(lm_linear$fitted.values, lm_linear$residuals,
     xlab = "Fitted Values", ylab = "Residuals",
     main = "Linear Model\nmpg ~ horsepower",
     pch = 20, col = "steelblue")
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_linear$fitted.values, lm_linear$residuals),
      col = "darkgreen", lwd = 2)

# --- Polynomial ---
plot(lm_poly$fitted.values, lm_poly$residuals,
     xlab = "Fitted Values", ylab = "Residuals",
     main = "Polynomial Model\nmpg ~ horsepower + horsepower²",
     pch = 20, col = "darkorange")
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_poly$fitted.values, lm_poly$residuals),
      col = "darkgreen", lwd = 2)

par(mfrow = c(1, 1))

7. Conclusion

Model Residual Pattern Verdict
Linear (mpg ~ horsepower) Clear U-shape curvature ❌ Nonlinear trend NOT captured
Polynomial (mpg ~ horsepower²) Random scatter around zero ✅ Nonlinear trend captured

The residual vs. fitted plot is a powerful diagnostic tool:

Adding a quadratic term (horsepower²) resolves the nonlinearity and produces a well-behaved residual plot, confirming that the relationship between mpg and horsepower is nonlinear (quadratic) in nature.