This analysis examines the relationship between horsepower and miles per gallon (mpg) using the Auto dataset. A simple linear regression model is used, and a residual vs fitted plot is used to evaluate whether the relationship is linear or nonlinear.
# Run this once in your console if ISLR is not yet installed:
# install.packages("ISLR")
library(ISLR)
data(Auto)
head(Auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
model <- lm(mpg ~ horsepower, data = Auto)
summary(model)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
The regression estimates the relationship between horsepower and mpg.
plot(model$fitted.values, model$residuals,
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residuals vs Fitted Plot")
abline(h = 0, col = "red")
The residual vs fitted plot helps determine whether the linear regression model adequately captures the relationship between horsepower and mpg. If the residuals are randomly scattered around the horizontal line at zero, the relationship can be considered approximately linear. However, if a systematic pattern such as a U-shape or inverted U-shape appears, this indicates that the linear model fails to capture a nonlinear relationship between the variables.
In this case, the residual plot shows a curved pattern, suggesting that the relationship between horsepower and mpg is nonlinear. Therefore, a nonlinear model such as polynomial regression may provide a better fit for the data.
The analysis suggests that a simple linear regression model may not fully capture the relationship between horsepower and mpg. The residual pattern indicates potential nonlinearity, and further modeling techniques may be needed to improve the model.