Overview

This report contains analysis done on a collection cars data. We are trying to explore here the relationship of miles per gallon(mpg) with other variables.

The data set we has used in this report is ‘mtcars’ from dataset package.

Summary

In this report we are doing analysis on mtcars dataset to explore the relation of mpg i.e. miles per gallon with other variables.

Basically we are trying to answer the following two questions

We will be doing throughout this report basic exploratory analysis, inferences, fitting and comparing several models to achieve to our final answer.

Data

Source : ?mtcars

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Structure of Data

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

The data set we have contains 32 observations and 11 regressors.

Variables Description

mpg Miles/(US) gallon
cyl Number of cylinders
disp Displacement (cu.in.)
hp Gross horsepower
drat Rear axle ratio
wt Weight (lb/1000)
qsec 1/4 mile time
vs V/S
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors

Exploratory Analysis and Basic Inferences

Here first we will try to explore the variation in mpg over two typesam i.e. transmission using a simple plot.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
ggplot(mtcars,aes(x=factor(am),y=mpg))+geom_boxplot()+
xlab("Transmission Type")+ylab("Miles Per Gallon (mpg)")+
ggtitle("Box Plot Showing MPG vs AM")

Here from the box plot it gives an indication that for Automatic transmission results in less Mpg. But here we have considered only a single variable am and rest variables are ignored.

We will be doing further analysis by trying different models below.

Regression Models

Correlation Analysis

We will begin our analysis with finding the correlation among the variables in the mtcars dataset.

The correlation plot has been kept simple to find major (positive/negative) variables correlated with mpg first and then we will view the scatterplot of major variables correlated.

library(corrplot)
M<-cor(mtcars)
corrplot.mixed(M,lower="circle",upper="number")

Here we can see in the plot which variables are negatively or positively correlated with mpg.

We can also calculate them

#Setting Correlations of Variables with themselves as 0
diag(M)<-0

#Finding Variables that have corelation of more than 0.75 with mpg
which(abs(M[1,])>0.7)
##  cyl disp   hp   wt 
##    2    3    4    6

Linear Model Fits

1. The Generic Model

We will first try to fit our mpg variable with all the variables present in the dataset and do a coefficient analysis.

fit0<-lm(mpg ~ .,mtcars)
summary(lm(fit0))$coeff
##                Estimate  Std. Error    t value   Pr(>|t|)
## (Intercept) 12.30337416 18.71788443  0.6573058 0.51812440
## cyl         -0.11144048  1.04502336 -0.1066392 0.91608738
## disp         0.01333524  0.01785750  0.7467585 0.46348865
## hp          -0.02148212  0.02176858 -0.9868407 0.33495531
## drat         0.78711097  1.63537307  0.4813036 0.63527790
## wt          -3.71530393  1.89441430 -1.9611887 0.06325215
## qsec         0.82104075  0.73084480  1.1234133 0.27394127
## vs           0.31776281  2.10450861  0.1509915 0.88142347
## am           2.52022689  2.05665055  1.2254035 0.23398971
## gear         0.65541302  1.49325996  0.4389142 0.66520643
## carb        -0.19941925  0.82875250 -0.2406258 0.81217871
coefam1<-as.numeric(fit0$coeff["am1"])

Now here in the summary we can see the relative effect of all the variables on the outcome mpg given other variables are held constant.

Our interest variable here is am for which we can get the coefficient value as NA which is interpreted as

The relative effect of am=1 i.e. manual transmission on mpg is 2.52 greater than am=0 which is automatic transmission (given other variables are held constant)

Now this relative difference will hold true in default case but we will explore further models below to actually get an idea of effect of am on mpg in combination with other variables.

Below we will try to fit several models with mpg as outcome and highly correlated variables found above and on those models we will evaluate the effect of

Comparision Of Models

Variance explained by lower three models are very high which may be a sign of a good model.

The highest amongst these is explained by weight model but

summary(fit_wt_intr)$r.squared
## [1] 0.8330375
summary(fit_hp)$r.squared
## [1] 0.7820346
summary(fit_disp_intr)$r.squared
## [1] 0.7898895

Residual Plot

Plotting residual plot for model explained by ‘wt’ and ‘am’ in interaction

par(mfrow=c(2,2))
plot(fit_wt_intr)

Results Summary

“Is an automatic or manual transmission better for MPG”

  • When all the variables are held constant we can surely observe the difference between the ‘mpg’ in manual and automatic transmission and it was found out to be 2.52 more for manual.

  • But the basic model containing all the variables was not a good model to explain our system so we had to find out the best model and choose parameters that had high correlation with mpg.

  • At the end this question is not answerable because there are many other parameters that are present and may change the effect altogether.

“Quantify the MPG difference between automatic and manual transmissions”

  • For the best model that was found which was able to explain 83% of the variance it was find out that difference between manual and automatic transmission was 14.87 per unit change on mpg which was very less when other variables were adjusted.

  • But again this depends on the model that we choose and will vary as per the model.