Over view
‘Motor Trend’, a magazine about the auto mobile industry is interested in exploring the relationship between a set of variables and miles per gallon (mpg).
They are particularly interested in the following:
> Is an automatic or manual transmission better for mpg
> Quantify the MPG difference between automatic and manual transmissions
Executive Summary
After performing this analysis, we will conclude that:
When measuring MPG, manual transmissions perform better than automatic transmissions by 7.25MPG, however this single factor only accounts for 36% of the explanation.
Data analysis
data(mtcars)
dim(mtcars)
## [1] 32 11
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
summary(mtcars$mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 15.43 19.20 20.09 22.80 33.90
Exploratory data analysis
In variable ‘am’, 0-represents Automatic transmission and 1-represents manual transmission.
Transmission<-factor(mtcars$am, labels=c("Automatic", "Manual"))
## Plot for the transmission types
boxplot(mpg~Transmission, mtcars, col=c("blue", "green"))

Exploratory data analysis conclusion:
The boxplot is showing that Manual transmission provides better mpg than automatic transmission.
Regression Models
Linear Regression model:
Here test the hypothesis with a simple linear regression test
Linear<- lm(mpg~am, mtcars)
summary(Linear)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Conclusion of linear regression model
The P-value is 0.000285, so we will not reject the hypothesis.
Linear regression model test gives the R-squared value : 0.3598. Since the value of 35.98% is very less, using the single am variable is not sufficient to measure the mpg performance.
Hence, Linear regression model is not suitable model to answer the Motor Trend’s interest.
Multivariable Regression model:
## checking with the multi variables cylinder(cyl), horsepower(hp) and weight(wt)
Multiple<-lm(mpg~am+cyl+hp+wt, mtcars)
summary(Multiple)
##
## Call:
## lm(formula = mpg ~ am + cyl + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4765 -1.8471 -0.5544 1.2758 5.6608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.14654 3.10478 11.642 4.94e-12 ***
## am 1.47805 1.44115 1.026 0.3142
## cyl -0.74516 0.58279 -1.279 0.2119
## hp -0.02495 0.01365 -1.828 0.0786 .
## wt -2.60648 0.91984 -2.834 0.0086 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.509 on 27 degrees of freedom
## Multiple R-squared: 0.849, Adjusted R-squared: 0.8267
## F-statistic: 37.96 on 4 and 27 DF, p-value: 1.025e-10
Conclusion for Multivariable regression
R-squared values shows that 85% of the mpg performance with multivariables. P-values for cyl, hp, wt are <0.5 shows that these are confounding variables between mpg and am
Analysis of variance model
checking whether there are any statistically significant differences between the means of the independent variables and p-values
model<-aov(mpg~., mtcars)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 1 817.7 817.7 116.425 5.03e-10 ***
## disp 1 37.6 37.6 5.353 0.03091 *
## hp 1 9.4 9.4 1.334 0.26103
## drat 1 16.5 16.5 2.345 0.14064
## wt 1 77.5 77.5 11.031 0.00324 **
## qsec 1 3.9 3.9 0.562 0.46166
## vs 1 0.1 0.1 0.018 0.89317
## am 1 14.5 14.5 2.061 0.16586
## gear 1 1.0 1.0 0.138 0.71365
## carb 1 0.4 0.4 0.058 0.81218
## Residuals 21 147.5 7.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## The p-values less than 0.5, can consider in addition to transmission type
Appendix
Correlation of the variables
## The matrix of scatter plots between mpg, am, wt and hp visualizes the relation ship between each pair of variables
requiredVar<-mtcars[, c(1,9,6,4)]
par(mar=c(1,1,1,1))
pairs(requiredVar, panel=panel.smooth, col=9+mtcars$wt)

Residual plots and Diagnostics
## Scatter plots of the multiple variable regression model residuals
par(mfrow=c(2,2))
plot(Multiple, col="green")

The ’Residual Vs Fitted plot shows that the residuals are homoscedastic.
Also showing that they are normally distributed except few outliers