Overview of the Assignment

You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:

None of the code is shown here per instructions and space conciderations of the assignment. Supporting figures can be found from the Appendix.

Executive Summary

After the analysis it is clear that manual transmission is better for milage per gallon. In a linear model with only transmission type as a factor against milage, a Multiple R-squared value of 0.36 was obtained. This means that 36% of the increase in miles per gallon is explained by the transmission type based on this data.

Best model obtained could explain 87% of increase in miles per gallon. Variables in the model were: amount of cyliders, horsepower, weight and type of transmission. In this model, the estimate is that manual transmission is more fuel efficient by 1.8 MPG with a standard error of 1.4.

Exploratory Analysis

To get an overview of the correllations, a pairs plot is created that clearly shows important correlations with many of the other variables and MPG. Then we check the difference between MPG by transmission type via boxplot and summary table.

Summary of MPG Difference by Transmission Type

## Source: local data frame [2 x 5]
## 
##   transmission count meanMpg medianMpg sdMpg
##         (fctr) (int)   (dbl)     (dbl) (dbl)
## 1    automatic    19    17.1      17.3   3.8
## 2       manual    13    24.4      22.8   6.2

Regression with Linear Models

We start with all the other variables against miles per gallon and using R’s step function, we can remove unnecessary variables to find the best fit. After that a model with just transmission type is created. Then we check if the best model is significantly better than the transmission type model. It is so with F-value of 1.688e-08 which is significant at the 0.999 level. Even though some of the variance can be explained by transmission type, there are other important variables to take into account.

## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mt2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## am1          1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ cyl + hp + wt + am
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     26 151.03  4    569.87 24.527 1.688e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual Analysis

Looking at the residuals, it is clear that best model performs rather consistently with the predicted values. At the highest milages the estimates are not as good as the others, but since the amount of observations is small, those could be thought of as outliers. With the transmission type model, there seems to be some systematic variance. So we can conclude that we should use the model with more variables than just transmission type.

Appendix

Exploratory Analysis

Residual Analysis