Executive Summary

This report is written for Motor Trend, a magazine about the automobile industry. The goal of the report is to analyze the data set of 32 cars produced in 1973-1974 using linear regression models and explore the relationship between various cars’ features and their fuel consumption efficiency represented as MPG - miles per gallon. These two indicators are in inverse relationship since fuel efficiency is usually represented as gallons per 100 miles (g/100m) or kilometer per litre (km/l) in Europe.

We are mostly interested in answering following two questions:

As it can be seen from the report the results show that there is a statistically significant difference between the mean MPG for automatic and manual transmission cars. Manual transmissions perform better than automatic transmission by 7.25 miles per gallon on average. However, when quantifying the impact that transmission type has on MPG using the simple regression model we calculated that only 36% of variation in MPG can be explained only by transmission type of a car. Searching for a better multivariate linear regression model we found out that combination of three car features, namely Weight, Quater mile time and Transmission type, provides us with the prediction model that explains almost 85% of variation in miles per gallon.

All the predictors used in this model were proved to be statistically significant, but it is worth noting that transmission type was the least significant while weight of the car proved to be most significant (explaining almost 75% of variation in MPG by itself in a simple model). Furthermore, we noticed that cars with automatic transmission are in general much heavier than cars with manual transmission which makes two variables highly correlated and goes in favor of weight having the greatest impact on MPG and transmission type having only a minor influence on fuel efficiency of a car.


Data Description

The data were extracted from the 1974 Motor Trend US magazine, and comprise of fuel consumption efficiency (as miles per gallon indicator) together with 10 other aspects of automobile design and performance for 32 automobiles (1973-1974 models) which we are going to use as the features for our linear regression models.

These are all the variables in the dataset with their descriptions:

POS NAME DESCRIPTION VARIABLE TYPE
1. mpg Miles per US gallon Numerical
2. cyl Number of cylinders Factor (levels: 4, 6, 8)
3. disp Displacement (in inches) Numerical
4. hp Gross horsepower Numerical
5. drat Rear axle ratio Numerical
6. wt Weight (in 1000 pounds) Numerical
7. qsec 1/4 Mile time Numerical
8. vs V/S, V-engine or Straight engine Factor (levels: V, S)
9. am Transmission type Factor (levels: automatic, manual)
10. gear Number of forwards gears Factor (levels: 3, 4, 5)
11. carb Number of carburetors Factor (levels: 1, 2, 3, 4, 5, 6, 7, 8)

The focus of our analysis is on determing how different features of a car influence its performance considering fuel efficency (MPG) and special consideration will be given to the transmission type as an essential characteristic of a car.

Exploratory data analysis

Looking at the two scatterplots of our data in Figure 1. we observe that cars with manual transmission seem to obtain more MPG comparing to cars with automatic transmission when looking at same number of horsepowers. We also notice from the second plot that cars with automatic transmission are in general a lot heavier than cars with manual transmission.

Figure 1: Miles per gallon vs Gross horsepower by Transmission type scatterplot.

To check if there exists any difference between manual and automatic transmission when loooking at cars’ miles per gallon performance we will plot them next to each other:

Figure 2: Transmission type vs Miles per gallon boxplot and density plot for Miles per gallon.

The MPG comparison boxplot for cars with automatic and manual transmission types in Figure 2. shows that cars with manual transmission in average make more miles per gallon. From the density plot on the right we also observe that cars with manual transmission tend to have higher MPG with larger variance in values.

Both plots suggest difference in MPG means between two transmission types, so we can form a hypothesis that cars with manual transmission and cars with automatic transmission have different means for MPG, meaning different fuel consumption efficiency.

Hypothesis test for a observed difference in means

To prove observed difference in means is statistically significant and not just happening due to a chance in car selection, we will perform a T-test with 95% confidence interval. From the results of the T-test we can observe that p-value is really low so we conclude the following: probability that observed difference in means is due to a chance is very low and the difference between means is statistically significant.

T-statistic df P-value Lower CL Upper CL Mean: Automatic Mean: Manual
-3.767 18.332 0.001 -11.28 -3.21 17.147 24.392

Furthermore, we can say with 95% confidence that true difference in means between cars with automatic and manual transmission is between 3.21 and 11.28 miles per gallon.

Next, we’ll make a pairwise plot to get an overview of how different variables in the dataset correlate with each other. In Figure 3. we can see the pairwise plot which can give us the idea of what variables we should include in our model and also point out any possible multicollinearity problems.

Figure 3: Pairwise plots for mtcars dataset variables

Simple linear regression model

The focus of the analysis is on determing how different features of a car influence its performance solely looking at the miles per galon aspect of a car as outcome. Another measure of performance in this dataset is Horsepower variable which is considered to be a confounding variable with MPG. Since Horsepower might interfere with our results we will remove it from the analysis. First, we’ll make a simple linear regression model with Transmission type (am) as a predictor and MPG as a outcome.

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## amManual     7.244939   1.764422  4.106127 2.850207e-04

This model shows that there is an 7.25 increase in MPG going from automatic transmission to manual transmission. The R^2 value of this model is 0.36. The estimate coefficient of a 7.25 represents the change in MPG when switching from automatic to manual transmission car. The low p-value of 2.910^{-4} shows that difference between automatic and manual transmission is significant, so we conclude that according to this model manual transmission cars have higher MPG (lower fuel consumption) than cars with automatic transmission.

Optimizing regression model using more variables

Using only one feature of the car to predict MPG isn’t really effiecient, and since we have multiple variables in the dataset that we can use as predictors for our model, we will try to come up with a better multivariable linear regression model that takes into account other features of a car.

To obtain a better model we will use stepwise selection algorithm which will provide better combination of predictors by adding and removing different features (bidirectional elimination) and using AIC (Akaike information criterion) to measure the performance of each model.

##                   Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)       9.617781  6.9595930  1.381946 1.779152e-01
## wt               -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec              1.225886  0.2886696  4.246676 2.161737e-04
## factor(am)Manual  2.935837  1.4109045  2.080819 4.671551e-02

The model achieved this way uses four different variables: Weight, Quater mile time and Transmission type. The R2 value for this model is 0.85 and adjusted R2 is 0.834 which represents the percentage of MPG variation explained by our model. Both values are considerably high which tells us that our model is robust and highly predictive. Also, from the p-values for the coefficients we observe that all three predictors are statistically significant for our model. To confirm that our model is optimal we’ll use nested likelihood ratio test for four different models to determine if adding any new predictors would achive better model.

## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + qsec + am
## Model 2: mpg ~ wt + qsec + am + cyl
## Model 3: mpg ~ wt + qsec + am + cyl + disp
## Model 4: mpg ~ wt + qsec + am + cyl + disp + drat
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     28 169.29                           
## 2     26 159.42  2    9.8616 0.7527 0.4819
## 3     25 157.73  1    1.6905 0.2581 0.6161
## 4     24 157.22  1    0.5168 0.0789 0.7812

Looking at the results of the ANOVA tests we observe from residual sum of squares (RSS) values and corresponding p-values that addition of new predictors adds unnecessary variance to our model while not improving its predictive value, so we conclude that our model is already parsimonious and thus optimal.

Our final model shows that manual transissions adds 2.94 MPG over automatic transmission on average. Also, weight is considered to impact MPG a lot, with every 1000 pounds reducing the MPG for about 3.92 MPG. And lastly, Quater mile time shows us that for each additional second that car needs to cover the path of quater mile length MPG rises about 1.23, which tells us that faster cars are expectedly less fuel efficient.

Residual plots and diagnostics

Lastly, we want to check for any problems with the residuals, in particular any sort of pattern that might be emerging in the plots. Also, we need to check the normality of the residuals by plotting the theoretical quantiles of the standard normal distribution against the standardized residuals.

Figure 4: Residual plots for final model

From Figure 4. we observe that residuals appear to be random and the standardized residuals do not diverge considerably from normality so we report no problems with residuals and conclude our analysis.

Appendix

Original mtcars dataset used for analysis:

##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

This RMarkdown document was produced with RStudio v0.0.99.486 on R v3.2.2.