First, we will load the mtcars dataset and create the
model. We will be using these fields:
I(hp^2) represents the quadratic term
for horsepower (hp). It captures the non-linear relationship between
horsepower and miles per gallon (mpg).am represents whether the car has an
automatic transmission. It’s dichotomous because it takes on two values:
0 for manual transmission and 1 for automatic transmission.hp:am represents the interaction
between horsepower (hp) and transmission type
(am). It captures how the relationship between horsepower
and miles per gallon differs depending on whether the car has an
automatic or manual transmission.We will be using these to predict miles per gallon
(mpg).
data(mtcars)
# fit the regression model
model <- lm(mpg ~ hp + I(hp^2) + wt + am + hp:am, data = mtcars)
# print the summary
summary(model)
##
## Call:
## lm(formula = mpg ~ hp + I(hp^2) + wt + am + hp:am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9689 -1.7441 -0.4744 1.2910 4.1313
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.854e+01 3.159e+00 12.200 2.9e-12 ***
## hp -1.142e-01 3.387e-02 -3.371 0.00235 **
## I(hp^2) 2.567e-04 9.441e-05 2.719 0.01152 *
## wt -2.757e+00 8.527e-01 -3.234 0.00331 **
## am 4.349e+00 2.140e+00 2.032 0.05245 .
## hp:am -2.498e-02 1.411e-02 -1.770 0.08841 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.299 on 26 degrees of freedom
## Multiple R-squared: 0.878, Adjusted R-squared: 0.8545
## F-statistic: 37.43 on 5 and 26 DF, p-value: 4.476e-11
Interpreting the fields, we can see that:
mpg when all other
predictors are zero. It’s very statistically significant and is positive
so this means all vehicles typically start with a certain
mpg.hp: Represents the estimated change in mpg
for a one-unit increase in horsepower, holding all other predictors
constant. As this is negative, this means that the higher it is, the
lower mpg and vice-versa. We can see that this is
statistically significant so it is a relevant predictor.wt: Represents the estimated change in mpg
for a one-unit increase in weight, holding all other predictors
constant. As this is negative, this means that the higher it is, the
lower mpg and vice-versa. We can see that this is just as
statistically significant as hp so this is also relevant.
This also makes sense as the heavier the car, typically this means it’s
slower.am: Represents the estimated difference in mpg between
cars with automatic and manual transmissions, holding all other
predictors constant. As this is positive, this means that the higher it
is, the higher mpg and vice-versa. this is also
statistically significant, but less so than the previous two
fields.hp:am: Represents how the relationship between
horsepower and mpg changes depending on the type of transmission. As it
is negative, this means that the higher it is, the lower
mpg and vice-versa. We can also see it is statistically
significant.I(hp^2): Represents the estimated change in mpg for a
one-unit increase in the square of horsepower, holding all other
predictors constant. This term accounts for the nonlinear relationship
between horsepower and mpg, allowing the effect of horsepower on mpg to
be nonlinear rather than strictly linear. As it is positive, this means
that the higher it is, the higher the mpg and vice versa;
this is also statistically signifcant so it’s relevant to the
prediction.# histogram
hist(resid(model))
# residuals vs. each predictor variable
par(mfrow = c(2, 2))
plot(model)
Residual analysis is done to assess whether the assumptions of linear regression are met:
This model passes our residual analysis so the model was appropriate, with a slight caveat for the normality of residuals.