Executive Summary

Looking at a data set of a collection of cars, we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). We are particularly interested in the following two questions:

1.“Is an automatic or manual transmission better for MPG” 2.“Quantifying the MPG difference between automatic and manual transmissions”

Exploratory Data Analysis

Let us load the data and find out it’s different aspects.

data(mtcars)
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
g<-ggplot(data=mtcars,aes(x=am,y=mpg))
g<-g+geom_point(aes(fill=vs,col=vs))+labs(title="Miles Per Gallon vs Transmission Type")
##Figure In Appendix

Diagnostics

Let’s see which variables correlate with Miles Pe Gallon(mpg).

##Pairs Plot In Appendix
cor(mtcars)[,1]
##        mpg        cyl       disp         hp       drat         wt 
##  1.0000000 -0.8521620 -0.8475514 -0.7761684  0.6811719 -0.8676594 
##       qsec         vs         am       gear       carb 
##  0.4186840  0.6640389  0.5998324  0.4802848 -0.5509251

We see the variables am,wt,qsec,hp,disp,drat,vs have high correlation with mpg.

Hypothesis Testing & Inference

Let us do a t test now to see if MPG depends on Automatic Transmission and Manual Transmission unequally.

Automatic<-mtcars$mpg[mtcars$am==0]
Manual<-mtcars$mpg[mtcars$am==1]
t.test(Automatic,Manual)
## 
##  Welch Two Sample t-test
## 
## data:  Automatic and Manual
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

Since the p-value is significant ,we reject the null hypothesis and claim that the true difference in means is not equal to 0.In other words,we infer that the effect caused by Automatic Transmission on MPG is significantly different than that caused by Manual Transmission.

Fitting Regression Models

In the first model,we fit all variables against the outcome mpg.None of the variables are significant and therfore we won’t go with this model even though adjusted r-squared is high.In 2nd Model,Adjusted R-squared has improved but a few variables have insignificant p-values.We keep trying different models. Now to 3rd Model,here we see a further improvement in Adjusted R-squared.Lastly we try another model including the interaction effect between MPG and Weight(wt).

fit1<-lm(mpg~.,mtcars)
fit2<-lm(mpg~factor(am)+wt+qsec+hp+disp+drat+vs,mtcars)
fit3<-lm(mpg~factor(am)+wt+qsec,data=mtcars)
fit4<-lm(mpg~factor(am)*wt+qsec,data=mtcars)
summary(fit4)
## 
## Call:
## lm(formula = mpg ~ factor(am) * wt + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5076 -1.3801 -0.5588  1.0630  4.3684 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       9.723      5.899   1.648 0.110893    
## factor(am)1      14.079      3.435   4.099 0.000341 ***
## wt               -2.937      0.666  -4.409 0.000149 ***
## qsec              1.017      0.252   4.035 0.000403 ***
## factor(am)1:wt   -4.141      1.197  -3.460 0.001809 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.084 on 27 degrees of freedom
## Multiple R-squared:  0.8959, Adjusted R-squared:  0.8804 
## F-statistic: 58.06 on 4 and 27 DF,  p-value: 7.168e-13

Here the Adjusted R-squared is 0.8804 telling us that the model explains 88% of the variability in the data.Also the Residual standard error is 2.084 on 27 degrees of freesom which is the lowest among all models.We can conclude that this is the model with the best fit.

##Residual Plot In Appendix

Here we run some more diagnostics to look out for overfitting. Seems like there is none ,also the model shows no signs of heteroskedasticity. The result shows that when wt(weight-lb/1000) and qsec(1/4 mile time) remain constant,cars with manual transmission add 14.079+(-4.41)*wt more MPG(Miles Per Gallon) on average than cars with Automatic Transmission.That is,having same weight (2000 lbs) and same 1/4 mile time,a mcar with manual transmission will have 5.797 more MPG than a car with automatic transmission.

Appendix

Exploring Data

Pairs Plot

Diagnostics of Best Fit Model