Introduction

In this project we are going to analyze the relationship between the MPG i.e., miles per gallon with other factors of a car, for which we are going to use the mtcars dataset which consists of many characteristics of a car in columns for different cars in rows. Analysis is focussed on two questions:

Here is the glimpse of the dataset.

data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Transforming data for our convenience

mtcars$vs <- factor(mtcars$vs)
mtcars$am.label <-factor(mtcars$am,
                         labels=c("Automatic","Manual"))
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)

Here we have converted all the desired feilds as factors.

Plot to show relation between MPG and Transmission type

library(ggplot2)
ggplot(data=mtcars,aes(x=am.label,y=mpg,color=am.label))+geom_point()

From the above plot it’s clear that the cars with manual transmission type obtains a better MPG than the cars with automatic transmission type

Regression Analysis

test1<-lm(mpg~am.label,data=mtcars)
summary(test1)
## 
## Call:
## lm(formula = mpg ~ am.label, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      17.147      1.125  15.247 1.13e-15 ***
## am.labelManual    7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The above model cleared that manual transmission provides more than 7 MPG on an average than automatic transmission and this hypothesis is significant as p-value is calculated as less than 0.0003 which is significant but the R-squared value for this model suggests that only about 36% variance in MPG is explained by the transmission alone.

So we find significant explanation of variance in MPG given by other variables.

anova(lm(mpg~.,data=mtcars))
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq  F value    Pr(>F)    
## cyl        1 817.71  817.71 102.5913 2.298e-08 ***
## disp       1  37.59   37.59   4.7166  0.045252 *  
## hp         1   9.37    9.37   1.1757  0.294304    
## drat       1  16.47   16.47   2.0660  0.169883    
## wt         1  77.48   77.48   9.7202  0.006629 ** 
## qsec       1   3.95    3.95   0.4955  0.491609    
## vs         1   0.13    0.13   0.0163  0.900058    
## am         1  14.47   14.47   1.8160  0.196569    
## gear       2   2.32    1.16   0.1454  0.865782    
## carb       5  19.03    3.81   0.4774  0.787894    
## Residuals 16 127.53    7.97                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The above model’s analysis of variation suggests that cyl, disp and wt very significantly explain the variance in the MPG as p-value≤0.05 that’s why these three should also be included in the final model along with the transmission variable.

mdl<-lm(mpg~am.label+cyl+disp+wt,data=mtcars)
summary(mdl)
## 
## Call:
## lm(formula = mpg ~ am.label + cyl + disp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.318 -1.362 -0.479  1.354  6.059 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    40.898313   3.601540  11.356 8.68e-12 ***
## am.labelManual  0.129066   1.321512   0.098  0.92292    
## cyl            -1.784173   0.618192  -2.886  0.00758 ** 
## disp            0.007404   0.012081   0.613  0.54509    
## wt             -3.583425   1.186504  -3.020  0.00547 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.642 on 27 degrees of freedom
## Multiple R-squared:  0.8327, Adjusted R-squared:  0.8079 
## F-statistic: 33.59 on 4 and 27 DF,  p-value: 4.038e-10

Summary of the above multivariate model clearly shows that the value of R-squared is over 0.83 which suggests that the included variables explain over 83% of variance in MPG. Another information about the model is that the variable cyl (number of engine cylinder) and wt (weight of the car) have p-value less than 0.05 which act as the confounding variables in the relationship of transmission type and MPG of the car.

Diagnostic Plots

par(mfrow = c(2, 2)) #accommodate all plots
plot(mdl)

The Residuals vs Fitted plot above clearly shows there are few outliers but the residuals are not heteroscedastic but homoscedastic that is the variance of residuals have same scatter over the plot that is constant and normally distributed.

Summary of analysis

From this whole analysis we found that:

  • The car having automatic transmission will obtain about 17 MPG on an average while the car with manual transmission will yield about 24 MPG on an average. Which concludes that manual transmission yields a better MPG in a car than the automatic transmission.
  • Best model for the analysis was having cyl, disp, wt variables along with the transmission type as regressors.
  • In the multivariate regression model, variable cyl (number of engine cylinder) and wt (weight of the car) are the confounding variables.