The purpose of this analysis report on the mtcars dataset is to explore relationships between the variables and miles per gallon(mpg).
The relationship between miles per gallon(mpg) and automatic/manual transmission is to be rigorously analysed and the difference quantified.
Simple linear regression and hypothesis testing was used to quantify the difference in mpg due to transmission. It was found that there was a 7.245 MPG difference on an average between cars with manual transmission and those with automatic transmission.
However, to adjust for confounding variables like no. of cylinders, weight, and displacement and horsepower, mutltiavariate regression was used. Different models were tested using ANOVA and it was revealed that on an average, manual transmission cars get 1.80 MPGs more than automatic transmission cars.
VS(V Or Straight Engine)[0 or 1 value], AM(Automatic or Manual Transmission)[0 or 1], Cyl(No. Of Cylinders)[4,6 or 8] should be made as factor values.
data(mtcars); mtcars$am <- as.factor(mtcars$am); mtcars$cyl <- as.factor(mtcars$cyl)
levels(mtcars$am) <- c("Automatic","Manual")
From the pairs plot in the Appendix, it can be seen that mpg clearly has a relationship with Automatic/Manual Transmission, mpg being higher for cars with Manual Transmission.
However, It is also seen that mpg has a relationship with No. Of Cylinders, Displacement and Weight. An analysis of variance model can provide a detailed insight into the factors influencing mpg.
summary(aov(lm(mpg~.,data=mtcars)))
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 2 825 412 61.86 2.7e-09 ***
## disp 1 58 58 8.65 0.0081 **
## hp 1 19 19 2.78 0.1113
## drat 1 12 12 1.79 0.1963
## wt 1 56 56 8.37 0.0090 **
## qsec 1 2 2 0.23 0.6377
## vs 1 0 0 0.05 0.8336
## am 1 17 17 2.49 0.1306
## gear 1 4 4 0.56 0.4618
## carb 1 2 2 0.29 0.5948
## Residuals 20 133 7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above table, it can be clearly seen that cyl,disp,hp and wt are the MOST IMPORTANT PARAMETRS influencing mpg apart from am.
fit <- lm(mpg~am,data=mtcars); summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.392 -3.092 -0.297 3.244 9.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.15 1.12 15.25 1.1e-15 ***
## amManual 7.24 1.76 4.11 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.338
## F-statistic: 16.9 on 1 and 30 DF, p-value: 0.000285
From the summary of the fit, it can be seen that transmission (automatic/manual) has a high impact on mpg. (p value < 0.0003)
However, it can be seen that transmission accounts to only 36% of the variance of mpg (R squared value = 0.36) and difference between MPGs of manual and automatic = 7.24
t.test(mpg~am,data=mtcars,paired=FALSE)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.15 24.39
From the t-test results, it can be clearly seen that mpg of cars having manual transmission is greater than cars having automatic transmission .
The evidence to the above hypothesis is in the 95% confidence interval of the differences in mpg of automatic/manual transmission(in order). The interval lies completely below 0, indicating that difference between MPGs of automatic and manual transmission is negative. That is, cars with manual transmission have greater mpg than cars with automatic transmission. (Difference = 7.245 MPGs)
Therefore it can be concluded that manual transmission is better for mpg
However, the above difference in mpg cannot be solely attributed to transmission.
From the covariance matrix, it was concluded that cyl,disp,wt,hp were the most important parameters apart from am (which is by default included in the model).
fit <- lm(mpg~am,data=mtcars)
fit1 <- update(fit,lm(mpg~am+cyl,data=mtcars))
fit2 <- update(fit1,lm(mpg~am+cyl+disp,data=mtcars))
fit3 <- update(fit2,lm(mpg~am+cyl+disp+wt,data=mtcars))
fit4 <- update(fit3,lm(mpg~am+cyl+disp+wt+hp,data=mtcars))
anova(fit,fit1,fit2,fit3,fit4)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl
## Model 3: mpg ~ am + cyl + disp
## Model 4: mpg ~ am + cyl + disp + wt
## Model 5: mpg ~ am + cyl + disp + wt + hp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 721
## 2 28 264 2 456 37.93 2.7e-08 ***
## 3 27 230 1 34 5.66 0.0253 *
## 4 26 183 1 48 7.91 0.0094 **
## 5 25 150 1 32 5.40 0.0286 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the analysis of variance table which has been used for nested model testing, it is clear that Model 5: mpg ~ am + cyl + disp + wt + hp is the best model as the p value are signifcant for each addition of confounding variable.
Hence, it can be concluded that am,cyl,disp,wt,hp are the variables that significantly impact mpg.
finalfit <- lm(mpg~am+cyl+wt+disp+wt+hp,data=mtcars); summary(finalfit)
##
## Call:
## lm(formula = mpg ~ am + cyl + wt + disp + wt + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.94 -1.33 -0.39 1.19 5.08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.86428 2.69542 12.56 2.7e-12 ***
## amManual 1.80610 1.42108 1.27 0.215
## cyl6 -3.13607 1.46909 -2.13 0.043 *
## cyl8 -2.71778 2.89815 -0.94 0.357
## wt -2.73869 1.17598 -2.33 0.028 *
## disp 0.00409 0.01277 0.32 0.751
## hp -0.03248 0.01398 -2.32 0.029 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.45 on 25 degrees of freedom
## Multiple R-squared: 0.866, Adjusted R-squared: 0.834
## F-statistic: 27 on 6 and 25 DF, p-value: 8.86e-10
It can be seen that am,cyl,wt,disp,wt,hp account for 83.4% of the variance in mpg. From this model, now we say that, on an average manual transmission cars have 1.806 MPGs more than automatic transmission cars.
The diagnostic plot in Appendix:1 shows no specific pattern in the residual plot and approximate normality of errors in the normal Q-Q plot.