In this report for Motor Trend Magazine, the regression analysis shows changes of Miles per Gallon (MPG) with different variables of cars. In the simplest model, mpg over transmission types, shows that the manual transmission is 7.25 mpg better than automatic transmission. Taking cyliner, displaycement, weight and horsepower into account, the multivariate regression model indicates that the manual transmission is 1.81 mpg better than the automatic transmission while the goodness of fit has been reached 86%.
Using density plot and pair() function, those characteristics and definition of variables are examined. For more details, see appendix section.
First, convert numeric values to factor values, and then compare mpg v.s. all other variables. The p-values shows cyl, disp and wt are significant predictors for mpg as outcome.
mtcars$am<-as.factor(mtcars$am) #transmission type
mtcars$cyl<-as.factor(mtcars$cyl)
mtcars$gear<-as.factor(mtcars$gear)
mtcars$carb<-as.factor(mtcars$carb)
mtcars$vs<-as.factor(mtcars$vs)
fit_all<-lm(mpg~.,data=mtcars) #build model mpg over others
summary(aov(fit_all))
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 2 825 412 51.38 1.9e-07 ***
## disp 1 58 58 7.18 0.017 *
## hp 1 19 19 2.31 0.150
## drat 1 12 12 1.48 0.242
## wt 1 56 56 6.95 0.019 *
## qsec 1 2 2 0.19 0.669
## vs 1 0 0 0.04 0.849
## am 1 17 17 2.06 0.171
## gear 2 5 3 0.31 0.736
## carb 5 14 3 0.34 0.881
## Residuals 15 120 8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(car)
cv<-vif(fit_all) #calculate variance inflation
head(cv[order(cv[,3],decreasing=T),],4) #sort the result in descending order
## GVIF Df GVIF^(1/(2*Df))
## disp 60.37 1 7.770
## hp 28.22 1 5.312
## wt 23.83 1 4.882
## cyl 128.12 2 3.364
The result of vif() shows the cylinder, displacement, horsepower and weight are highly corelated with each other.
The next step is to build multivariate models by adding above variables on the single variable model. From the p-values and variability inflation, cyl,disp, wt and hp columns are selected to be a part of multivariate regression model. Below R codes show adding each variable one by one. The anova() function shows the degree of freedom and p-values of each model.
fit1<-lm(mpg~am,data=mtcars)
fit2<-lm(mpg~am+cyl,data=mtcars)
fit3<-lm(mpg~am+cyl+disp,data=mtcars)
fit4<-lm(mpg~am+cyl+disp+wt,data=mtcars)
fit5<-lm(mpg~am+cyl+disp+wt+hp,data=mtcars)
anova(fit1,fit2,fit3,fit4,fit5)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl
## Model 3: mpg ~ am + cyl + disp
## Model 4: mpg ~ am + cyl + disp + wt
## Model 5: mpg ~ am + cyl + disp + wt + hp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 721
## 2 28 264 2 456 37.93 2.7e-08 ***
## 3 27 230 1 34 5.66 0.0253 *
## 4 26 183 1 48 7.91 0.0094 **
## 5 25 150 1 32 5.40 0.0286 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit1)$coefficients[1:2,] #single variable model
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.134e-15
## am1 7.245 1.764 4.106 2.850e-04
summary(fit5)$coefficients[1:2,] #multivariate model
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.864 2.695 12.564 2.668e-12
## am1 1.806 1.421 1.271 2.155e-01
In the multivariate model, the manual transmission is 1.81 mpg better than automatic transmission.
The R-squared of the multivariate model, which indicates how good the model fits data, increased from 36% to 86%. Additionally, the residual plot of fit5 is shown in Appendix section.
c(summary(fit1)$r.squared, summary(fit5)$r.squared)
## [1] 0.3598 0.8664
Red: Automatic Transmission / Green: Manual Transmission
Residual Plots
par(mfrow=c(2,2))
plot(fit5)
Plotting Pairs
pairs(mtcars)