Executive Summary

This project investigated the effect of transmission types on fuel consumption individually and also along with other variables.This was achieved by a linear regression model and also a multiple regression model. According to the results of the data investigation and regression modeling, cars with manual transmission have higher fuel consumption cmpared to cars with automatic transmission. This conclusion was supported by the results of the single variable regression model and the multiple variable regression mode. Followings describe the details of analysis:

Data Description and Exploratory Analysis

library(ggplot2)
library(datasets)
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
mtcars$cyl  <- factor(mtcars$cyl)
mtcars$vs   <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am   <- factor(mtcars$am,labels=c("Automatic","Manual"))
boxplot(mpg ~ am, data=mtcars, xlab="Transmission (0 = Automatic, 1 = Manual)", ylab="Miles per Gallon",
        main="Boxplot of MPG vs. Transmission", col=c('powderblue', 'mistyrose'))

result <- t.test(mpg ~ factor(am), data=mtcars)
result$p.value
## [1] 0.001373638
result$estimate
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Assuming a null hypothesis as the mpg of the automatic and manual transmissions has no difference. T test has been performed. Since the p-value is 0.00137 which is less than 0.05 so the null hypothesis is rejected. Alternative hypothesis is true where there is a significant difference in mpg between the two groups. The box plot plot would prove this as the mean of mpg for cars with automatic transmission is higher than manual ones.

Regression Analysis

Single Variable Regression Analysis

SingleVariableRegression <- lm(mpg ~ factor(am), data=mtcars)
summary(SingleVariableRegression)
## 
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        17.147      1.125  15.247 1.13e-15 ***
## factor(am)Manual    7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The null hypothesis is rejected as the p-value = 0.000285 is less than 0.05. The resulted R squre value only cover a small portion of variance, so the multivariable regression analysis is required.

Multi Variables Regression Analysis

First, the best varibale for the multivaribales regression model has to be determined

BestModel = step(lm( data=mtcars, mpg ~ .),trace=0,steps=10000)
summary(BestModel)
## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## amManual     1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10

According to the outcome of the BestModel, vehicle weight and acceleration correlate well with mpg in addition to the types of transmission. So, multivaribales regression will be based on: mpg ~ wt + qsec + am

MultiVarReg <- lm(mpg~am + wt + qsec, data=mtcars)
summary(MultiVarReg)
## 
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## amManual      2.9358     1.4109   2.081 0.046716 *  
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The p-value is small , so the null is rejected in favor of the alternative hypothesis that there is a significant difference in mpg between the two groups (manual vs automatic)

par(mfrow = c(2, 2))
plot(MultiVarReg, col=c('powderblue', 'mistyrose', 'turquoise','purple'))