Executive Summary

Exploring the relationship between various factors influencing the mileage of cars and quantifying the difference between automatic and manual transmission. The analysis supports the general assumption that manual transmission is better for Mileage. Linear Model Regression hints that, on average, a manaul transmission gives 1.80 MPG higher than automatic transmission with an uncerainty of ± 1.40 MPG.

Loading Data

library(datasets)
library(ggplot2)
library(car)
library(stats)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs transmission gear
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0       Manual    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0       Manual    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1       Manual    4
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1    Automatic    3
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0    Automatic    3
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1    Automatic    3
##                   carb
## Mazda RX4            4
## Mazda RX4 Wag        4
## Datsun 710           1
## Hornet 4 Drive       1
## Hornet Sportabout    2
## Valiant              1

Exploratory Data Analysis

par(mfrow=c(1,2))
colnames(mtcars)[9]<-"transmission"
mtcars$transmission[mtcars$transmission=="0"]<-"Automatic"
mtcars$transmission[mtcars$transmission=="1"]<-"Manual"
mtcars$transmission <- as.factor(mtcars$transmission)
boxplot(mtcars$mpg~mtcars$transmission,
        ylab="Miles pre Gallon",col=c("salmon","gold"))
title("MPG vs Transmission")
plot(mtcars$wt,mtcars$mpg,col=mtcars$cyl,pch=17,
     ylab="Miles per Gallon",xlab="Weight")
title("MPG vs Weight vs Cyl")
legend("topright", title="No. of Cyl",
    c("4","6","8"),pch=17, col=c("black","red","green"), horiz=FALSE)

Model Selection

# Building possible models
lm1 <- lm(mpg~transmission,data=mtcars)
lm2 <- lm(mpg~transmission+cyl,data=mtcars)
lm3 <- lm(mpg~transmission+cyl+hp,data=mtcars)
lm4 <- lm(mpg~transmission+cyl+hp+wt,data=mtcars)
lm5 <- lm(mpg~transmission+cyl+hp+wt+carb,data=mtcars)
anova(lm1,lm2,lm3,lm4,lm5)
## Analysis of Variance Table
## 
## Model 1: mpg ~ transmission
## Model 2: mpg ~ transmission + cyl
## Model 3: mpg ~ transmission + cyl + hp
## Model 4: mpg ~ transmission + cyl + hp + wt
## Model 5: mpg ~ transmission + cyl + hp + wt + carb
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     28 264.50  2    456.40 37.7751 2.783e-08 ***
## 3     27 197.20  1     67.30 11.1399  0.002647 ** 
## 4     26 151.03  1     46.17  7.6433  0.010547 *  
## 5     25 151.03  1      0.00  0.0000  0.999841    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Summary of the selected model
summary(lm4)
## 
## Call:
## lm(formula = mpg ~ transmission + cyl + hp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        33.70832    2.60489  12.940 7.73e-13 ***
## transmissionManual  1.80921    1.39630   1.296  0.20646    
## cyl6               -3.03134    1.40728  -2.154  0.04068 *  
## cyl8               -2.16368    2.28425  -0.947  0.35225    
## hp                 -0.03211    0.01369  -2.345  0.02693 *  
## wt                 -2.49683    0.88559  -2.819  0.00908 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10

Results

Diagnostics

maxvalues <- c(Maximum.Beta=max(dfbetas(lm4)),
               Maximum.Hatvalue=max(hatvalues(lm4)))
round(maxvalues,3)
##     Maximum.Beta Maximum.Hatvalue 
##            0.939            0.471

Residual Plot

par(mfrow=c(2,2))
plot(lm4)

Appendix

Source of Data:

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411.

Variables:

The data frame(mtcars) with 32 observations on 11 variables.