This document is part of the Regression Models Course Project (given by John Hopkins University at coursera.com).
This project focuses on creating regression models and their diagnostics relating a data-set of automobiles.

Abstract

This analysis is performed for the Motor Trend, a magazine about the automobile industry. By looking at a data set of a collection of cars, we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) as outcome. We are particularly interested to explore:

In order to get answer to these questions, both univariate and multivariate regression models were created and evaluated after a comprehensive exploratory data analysis. The most promising model was selected according to step-wise selection of the most impacting variables which contribute to the difference of MPG between automatic and manual transmission cars. These variables found to be the weight of the cars and their quarter mile time.

According to the selected model, manual transmission cars cover in average ~2.94 more miles per gallon in relation to automatic tranmission cars.

Exploratory Data Analysis

Data was obtained in R CRAN and its documentation can be found here.
This dataset consists of 32 observations of 11 variables:
1. mpg - Miles/(US) gallon
2. cyl - Number of cylinders
3. disp - Displacement (cu.in.)
4. hp - Gross horsepower
5. drat - Rear axle ratio
6. wt - Weight (1000 lbs)
7. qsec - 1/4 mile time
8. vs - Engine (0 = V-engine, 1 = straight engine)
9. am - Transmission (0 = automatic, 1 = manual)
10. gear - Number of forward gears
11. carb - Number of carburetors

The table below shows descriptive statistics of the dataset. Naturally we will focus on the mpg and am variables (MPG & Transmission).

 

 

 

 

Table continues below
mpg cyl disp hp drat
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930
wt qsec vs am gear carb
Min. :1.513 Min. :14.50 0:18 0:19 Min. :3.000 Min. :1.000
1st Qu.:2.581 1st Qu.:16.89 1:14 1:13 1st Qu.:3.000 1st Qu.:2.000
Median :3.325 Median :17.71 NA NA Median :4.000 Median :2.000
Mean :3.217 Mean :17.85 NA NA Mean :3.688 Mean :2.812
3rd Qu.:3.610 3rd Qu.:18.90 NA NA 3rd Qu.:4.000 3rd Qu.:4.000
Max. :5.424 Max. :22.90 NA NA Max. :5.000 Max. :8.000

From preliminary evaluation of the graphs we can see several points:

Basic T-test (displayed below) show significant difference between Manual and Automatic gears regarding fuel consumption. the p-value of the test is 0.00137 which supports the observation as seen in the above boxplot. However, further investigation is needed in order to get a more comprehensive picture. Next section will be dedicated for this objective.

t.statistic df p.value lower.CL upper.CL automatic.mean manual.mean
-3.767 18.332 0.001 -11.28 -3.21 17.147 24.392

Modeling

Linear regression is a basic modeling tool which will be used here in attempt to find a connection between MPG and transmission type.
First, we will do a univariate linear regression to see the direct effect of transmission over MPG (assuming all other variables does not influence the outcome):

## 
## Call:
## lm(formula = mpg ~ am, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am1            7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Although this model shows a significant connection between the two variables (MPG by transmission), it has a small value of R2 () which indicates that only a small percentage of the variance is explained by this model. Hence, a better model is required in order to quantify more accurately the MPG difference between automatic and manual transmissions.
In order to evaluate what variable may contribute to variance explanation, a step-wise selection was applied:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.617781 6.9595930 1.381946 0.1779152
wt -3.916504 0.7112016 -5.506882 0.0000070
qsec 1.225886 0.2886696 4.246676 0.0002162
am1 2.935837 1.4109045 2.080819 0.0467155

According to this step-wise selection there are two variables that contribute the most (in addition to transmission) to variance explanation: The weight of the car (wt) and quarter mile time (qsec). The variance explained by such model that include those 3 variables is approximately 85% (according to R2 value of 0.85).
Therefore, the best model to show the connection between MPG and transmission should include also these 2 variables. Addition of more variables will increase the true standard error which in turn will increase the variation in the model and affect the significance of it.
The model:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.617781 6.9595930 1.381946 0.1779152
wt -3.916504 0.7112016 -5.506882 0.0000070
qsec 1.225886 0.2886696 4.246676 0.0002162
am1 2.935837 1.4109045 2.080819 0.0467155

As mentioned above, this model explains 85% of the variance in MPG. According to the am coefficient (transmission) we can conclude that on average, manual transmission cars cover 2.94 more miles per gallon in relation to automatic tranmission cars.

Model Evaluation

In order to truely accept the selected model, it is crucial to assess its p-value using the residuals:

From the plots created above, several observations can be seen:

Conclusion

This report was made in order to answer questions regarding the connection between MPG and tranmission type using only linear regression modeling. We found that on average, manual transmission cars cover more miles per gallon than automatic tranmission cars.
However, it is important to mention that other types of modeling and/or exploration of the data may reveal additional insights.