Executive Summary

In this report data gathered by Moto Trends is explored and analyzed in order to determine the relationship between a set of variables and miles per gallon of automobiles. In particlular, the following two issues will be addressed:

Summary of data

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). The qualitative variables such as number of cylinders and gears were converted to factors. A description of the variables is available in the appendix.

Exploratory analysis

A boxplot was produced to show the difference between automatic and manual in terms of MPG. In figure 1, it is clear that manual transmission produces more MPG. Next, a pairwise graph (figure 2) was created in order to get a greater intuition of what other variables may be of interest. There is a linear relationship between MPG and each of cyl, disp, hp, drat, wt, qsec, vs, am. The covariance was also computed (figure 3) between every variable and the positive values were noted (qsec = 0.419, vs = 0.664, am = 0.600, gear = 0.480). Then a linear model was fit on all the variables to determine which variables should be used in the final models. In figure 4 the summary from this model is shown. The lowest p values were taken (i.e. wt = 0.063, am = 0.234, qsec = 0.274) due to their high significance in predicting MPG.

Model

From the initial model, covariance test and visually inspecting the pairwise graph the following variables stood out in particular: qsec, vs, am, wt and gear. Next a stepwise model process was used in order to obtain the most significant predictors to be used. This is done by using the step function which creates multple regression models with different variables and produces list of the best predictors. As shown in figure 5, the most significant predictors in determining the MPG are cyl, hp, wt and am. The summary for this model is show in figure 6, in particular the forumla is given as: lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars). This selection model yielded an R squared value of 84% (figure 6) meaning that very high percentage of variation is explained by the regression model. Next, the new model was compared with a basic model that only uses transmission type as its predictor. A p-value of 1.688e-08 was obtained (figure 7). This value is miniscule which means that the added predictors are significant to improving the model’s accuracy.

Diagnostics

The residuals from the final model are plotted below.

Figure 8

Figure 8

Statistical Inference

A Two Sample t-test was conducted between the different transmission types. The null hypothesis that transmission types don’t have an effect on the MPG is discarded for a p-value greater than 0.05. The results are shown in figure 8. The p-value of 0.001374 and difference in means show that manual transmission has significantly more MPG than automatic.

Conclusions

The transmission type of a car has a significant effect on its fuel efficiency. According to the model, manual transmission, on average, has 1.81 MPG more than automatics. According to the boxplot, manual transmission has ~ 6 MPG more than automatics.

Appendix

Description of variables
- mpg Miles/(US) gallon
- cyl Number of cylinders
- disp Displacement (cu.in.)
- hp Gross horsepower
- drat Rear axle ratio
- wt Weight (lb/1000)
- qsec Time to drive ¼ mile
- vs V or ordinary engine
- am Transmission (0 = automatic, 1 = manual)
- gear Number of forward gears
- carb Number of carburetors

Figure 1

Figure 1

Figure 2

Figure 2

Figure 3

head(cov2cor(cov(sapply(mtcars, as.numeric))), 1)
##     mpg     cyl    disp      hp   drat      wt   qsec    vs     am   gear
## mpg   1 -0.8522 -0.8476 -0.7762 0.6812 -0.8677 0.4187 0.664 0.5998 0.4803
##        carb
## mpg -0.6067

Figure 4

everything_model = lm(mpg ~ ., data = mtcars)
everything_model$coeff
## (Intercept)        cyl6        cyl8        disp          hp        drat 
##    23.87913    -2.64870    -0.33616     0.03555    -0.07051     1.18283 
##          wt        qsec         vs1    amManual       gear4       gear5 
##    -4.52978     0.36784     1.93085     1.21212     1.11435     2.52840 
##       carb2       carb3       carb4       carb6       carb8 
##    -0.97935     2.99964     1.09142     4.47757     7.25041

Figure 5

new_model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
summary(new_model)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept) 33.70832    2.60489 12.9404 7.733e-13
## cyl6        -3.03134    1.40728 -2.1540 4.068e-02
## cyl8        -2.16368    2.28425 -0.9472 3.523e-01
## hp          -0.03211    0.01369 -2.3450 2.693e-02
## wt          -2.49683    0.88559 -2.8194 9.081e-03
## amManual     1.80921    1.39630  1.2957 2.065e-01

Figure 6

new_model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
new_model$coeff
## (Intercept)        cyl6        cyl8          hp          wt    amManual 
##    33.70832    -3.03134    -2.16368    -0.03211    -2.49683     1.80921

Figure 7

basic_model <- lm(mpg ~ am, data = mtcars)
compare <- anova(basic_model, new_model)
compare$Pr
## [1]        NA 1.688e-08

Figure 8

t_test <- t.test(mpg ~ am, data = mtcars)
t_test
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.28  -3.21
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                   17.15                   24.39