In this report data gathered by Moto Trends is explored and analyzed in order to determine the relationship between a set of variables and miles per gallon of automobiles. In particlular, the following two issues will be addressed:
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). The qualitative variables such as number of cylinders and gears were converted to factors. A description of the variables is available in the appendix.
A boxplot was produced to show the difference between automatic and manual in terms of MPG. In figure 1, it is clear that manual transmission produces more MPG. Next, a pairwise graph (figure 2) was created in order to get a greater intuition of what other variables may be of interest. There is a linear relationship between MPG and each of cyl, disp, hp, drat, wt, qsec, vs, am. The covariance was also computed (figure 3) between every variable and the positive values were noted (qsec = 0.419, vs = 0.664, am = 0.600, gear = 0.480). Then a linear model was fit on all the variables to determine which variables should be used in the final models. In figure 4 the summary from this model is shown. The lowest p values were taken (i.e. wt = 0.063, am = 0.234, qsec = 0.274) due to their high significance in predicting MPG.
From the initial model, covariance test and visually inspecting the pairwise graph the following variables stood out in particular: qsec, vs, am, wt and gear. Next a stepwise model process was used in order to obtain the most significant predictors to be used. This is done by using the step function which creates multple regression models with different variables and produces list of the best predictors. As shown in figure 5, the most significant predictors in determining the MPG are cyl, hp, wt and am. The summary for this model is show in figure 6, in particular the forumla is given as: lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars). This selection model yielded an R squared value of 84% (figure 6) meaning that very high percentage of variation is explained by the regression model. Next, the new model was compared with a basic model that only uses transmission type as its predictor. A p-value of 1.688e-08 was obtained (figure 7). This value is miniscule which means that the added predictors are significant to improving the model’s accuracy.
The residuals from the final model are plotted below.
Figure 8
A Two Sample t-test was conducted between the different transmission types. The null hypothesis that transmission types don’t have an effect on the MPG is discarded for a p-value greater than 0.05. The results are shown in figure 8. The p-value of 0.001374 and difference in means show that manual transmission has significantly more MPG than automatic.
The transmission type of a car has a significant effect on its fuel efficiency. According to the model, manual transmission, on average, has 1.81 MPG more than automatics. According to the boxplot, manual transmission has ~ 6 MPG more than automatics.
Description of variables
- mpg Miles/(US) gallon
- cyl Number of cylinders
- disp Displacement (cu.in.)
- hp Gross horsepower
- drat Rear axle ratio
- wt Weight (lb/1000)
- qsec Time to drive ¼ mile
- vs V or ordinary engine
- am Transmission (0 = automatic, 1 = manual)
- gear Number of forward gears
- carb Number of carburetors
Figure 1
Figure 2
Figure 3
head(cov2cor(cov(sapply(mtcars, as.numeric))), 1)
## mpg cyl disp hp drat wt qsec vs am gear
## mpg 1 -0.8522 -0.8476 -0.7762 0.6812 -0.8677 0.4187 0.664 0.5998 0.4803
## carb
## mpg -0.6067
Figure 4
everything_model = lm(mpg ~ ., data = mtcars)
everything_model$coeff
## (Intercept) cyl6 cyl8 disp hp drat
## 23.87913 -2.64870 -0.33616 0.03555 -0.07051 1.18283
## wt qsec vs1 amManual gear4 gear5
## -4.52978 0.36784 1.93085 1.21212 1.11435 2.52840
## carb2 carb3 carb4 carb6 carb8
## -0.97935 2.99964 1.09142 4.47757 7.25041
Figure 5
new_model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
summary(new_model)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.9404 7.733e-13
## cyl6 -3.03134 1.40728 -2.1540 4.068e-02
## cyl8 -2.16368 2.28425 -0.9472 3.523e-01
## hp -0.03211 0.01369 -2.3450 2.693e-02
## wt -2.49683 0.88559 -2.8194 9.081e-03
## amManual 1.80921 1.39630 1.2957 2.065e-01
Figure 6
new_model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
new_model$coeff
## (Intercept) cyl6 cyl8 hp wt amManual
## 33.70832 -3.03134 -2.16368 -0.03211 -2.49683 1.80921
Figure 7
basic_model <- lm(mpg ~ am, data = mtcars)
compare <- anova(basic_model, new_model)
compare$Pr
## [1] NA 1.688e-08
Figure 8
t_test <- t.test(mpg ~ am, data = mtcars)
t_test
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.15 24.39