if (!requireNamespace("ISLR", quietly = TRUE)) install.packages("ISLR", repos = "https://cloud.r-project.org")
library(ISLR)
data(Auto)
head(Auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
mpg ~ horsepowerWe fit a simple linear regression to predict miles per gallon
(mpg) from horsepower.
lm_linear <- lm(mpg ~ horsepower, data = Auto)
summary(lm_linear)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
# Residual vs Fitted for linear model
plot(
lm_linear$fitted.values,
lm_linear$residuals,
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residuals vs. Fitted Values\n[Linear Model: mpg ~ horsepower]",
pch = 20,
col = "steelblue"
)
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_linear$fitted.values, lm_linear$residuals),
col = "darkgreen", lwd = 2)
legend("topright",
legend = c("Zero line (y = 0)", "LOWESS smoother"),
col = c("red", "darkgreen"),
lty = c(2, 1), lwd = 2)
The LOWESS smoother shows a clear U-shaped (concave)
curve in the residuals. This systematic pattern tells us the
linear model has failed to capture the nonlinear
relationship between mpg and
horsepower. Residuals are:
This U-shape is strong evidence of a nonlinear relationship.
mpg ~ horsepower²To address the nonlinearity, we add a quadratic term.
lm_poly <- lm(mpg ~ poly(horsepower, 2), data = Auto)
summary(lm_poly)
##
## Call:
## lm(formula = mpg ~ poly(horsepower, 2), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7135 -2.5943 -0.0859 2.2868 15.8961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.4459 0.2209 106.13 <2e-16 ***
## poly(horsepower, 2)1 -120.1377 4.3739 -27.47 <2e-16 ***
## poly(horsepower, 2)2 44.0895 4.3739 10.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.374 on 389 degrees of freedom
## Multiple R-squared: 0.6876, Adjusted R-squared: 0.686
## F-statistic: 428 on 2 and 389 DF, p-value: < 2.2e-16
plot(
lm_poly$fitted.values,
lm_poly$residuals,
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residuals vs. Fitted Values\n[Polynomial Model: mpg ~ horsepower + horsepower²]",
pch = 20,
col = "darkorange"
)
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_poly$fitted.values, lm_poly$residuals),
col = "darkgreen", lwd = 2)
legend("topright",
legend = c("Zero line (y = 0)", "LOWESS smoother"),
col = c("red", "darkgreen"),
lty = c(2, 1), lwd = 2)
After adding the quadratic term, the residuals are now randomly scattered around zero with no systematic curvature. The LOWESS smoother runs nearly flat along the zero line, confirming that the polynomial model has successfully captured the nonlinear relationship.
par(mfrow = c(1, 2))
# --- Linear ---
plot(lm_linear$fitted.values, lm_linear$residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Linear Model\nmpg ~ horsepower",
pch = 20, col = "steelblue")
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_linear$fitted.values, lm_linear$residuals),
col = "darkgreen", lwd = 2)
# --- Polynomial ---
plot(lm_poly$fitted.values, lm_poly$residuals,
xlab = "Fitted Values", ylab = "Residuals",
main = "Polynomial Model\nmpg ~ horsepower + horsepower²",
pch = 20, col = "darkorange")
abline(h = 0, col = "red", lwd = 2, lty = 2)
lines(lowess(lm_poly$fitted.values, lm_poly$residuals),
col = "darkgreen", lwd = 2)
par(mfrow = c(1, 1))
| Model | Residual Pattern | Verdict |
|---|---|---|
Linear (mpg ~ horsepower) |
Clear U-shape curvature | ❌ Nonlinear trend NOT captured |
Polynomial (mpg ~ horsepower²) |
Random scatter around zero | ✅ Nonlinear trend captured |
The residual vs. fitted plot is a powerful diagnostic tool:
Adding a quadratic term (horsepower²) resolves the
nonlinearity and produces a well-behaved residual plot, confirming that
the relationship between mpg and horsepower is
nonlinear (quadratic) in nature.