Discussion Board Prompt

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

Response to Prompt:

For this analysis I will be using the built-in dataset mtcars. For my model, I will be using hp (horsepower) as my quadratic term of choice, am (transmission) as the dichotomous term, and will be looking at am vs wt.

The predictor values I chose for the model, Horsepower, weight of the car, type of transmission, and the time taken to travel 1/4 mile (qsec) from a standing start - I believe - is related in some form to fuel efficiency (mpg).


Examine Dataset

# Load the sample data set
data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Create Our Model

# Fit the multiple regression model
model <- lm(mpg ~ hp + wt + qsec + I(hp^2) + am + am:wt, data = mtcars)
# Print the model summary
summary(model)
## 
## Call:
## lm(formula = mpg ~ hp + wt + qsec + I(hp^2) + am + am:wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0468 -1.2187 -0.0369  1.1797  4.0086 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  2.233e+01  1.046e+01   2.135  0.04271 * 
## hp          -7.286e-02  4.127e-02  -1.765  0.08970 . 
## wt          -2.370e+00  8.052e-01  -2.943  0.00693 **
## qsec         5.853e-01  4.325e-01   1.353  0.18808   
## I(hp^2)      1.680e-04  9.186e-05   1.829  0.07934 . 
## am           1.332e+01  3.639e+00   3.660  0.00118 **
## wt:am       -4.257e+00  1.276e+00  -3.336  0.00266 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.034 on 25 degrees of freedom
## Multiple R-squared:  0.9082, Adjusted R-squared:  0.8861 
## F-statistic: 41.21 on 6 and 25 DF,  p-value: 8.93e-12

Interpret the coefficients

Our estimated intercept is 22.33. This tell us that our predicted value of mpg, when all our other variables (hp, wt, qsec, am) are 0, that mpg will be 22.33. The p-value is 0.04271, which tells us that this is statistically significant.

The estimated coefficient for hp is -0.07286. This tell us that as horsepower increases by 1 unit, the predicted miles per gallon decreases by 0.07286 units. The p-value of 0.08970, tell us that this is statistically insignificant.

The estimated coefficient for wt is -2.370. This tell us that as weight increases by 1 unit, the predicted miles per gallon decreases by -2.370 units. The p-value of 0.00693, tell us that this is statistically significant.

The estimated coefficient for qsec is 0.5853. This tell us that as weight increases by 1 unit, the predicted miles per gallon increases by 0.5853 units. The p-value of 0.18808, tell us that this is statistically insignificant.

The estimated coefficient for I(hp^2) is 0.0001680. This tell us that the relationship between hp and mpg is not strictly linear. The p-value of 0.07934, tell us that this is statistically insignificant.

The estimated coefficient for am is 13.32. This tell us that on average, cars with manual transmissions have 13.32 higher predicted miles per gallon compared to cars with automatic transmissions. The p-value of 0.00118, tell us that this is statistically significant.

The estimated coefficient for wt:am is -4.257. This tell us that for cars with manual transmissions, as weight increases by 1 unit, the predicted miles per gallon decreases by 4.257 units. The p-value of 0.00266, tell us that this is statistically significant.

Conduct residual analysis

# Check for normality of residuals
qqPlot(model, main = "Normal Q-Q Plot")

##         Fiat 128 Pontiac Firebird 
##               18               25
# Check for homoscedasticity of residuals
spreadLevelPlot(model, main = "Spread-Level Plot")

## 
## Suggested power transformation:  0.687543
# residuals vs fitted value plots
plot(x = fitted(model), y = residuals(model), 
     xlab = "Fitted values", ylab = "Residuals",
     main = "Residuals vs. Fitted Values Plot")

Conclusion

We can check the assumptions of the multiple regression model by examining the residual plots. If the residuals are normally distributed, have a constant variance, and are randomly scattered around the horizontal line in the spread-level plot, then the linear model is appropriate. If any of these assumptions are violated, then the linear model may not be appropriate. In this case, the residual plots show that the residuals are approximately normally distributed, have a constant variance, and are randomly scattered around the horizontal line, indicating that the linear model is appropriate for the sample data set.