Summary:

Using Regression Models and Exploratory Data Analysis, we look at a dataset of a collection of cars to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). Particularly, answers to the following two questions will be publish on Motor Trend, a magazine about the automobile industry:

Procedure:

In order to answer the questions above, it is important to verify other potential influencers other than transmission type, namely: the weight of a car, the number of cylinders, etc…

The dataset is from R base: mtcars: The coefficients need to be turned into factors (transmission type: Automatic or Manual, number of cylinders: 4, 6, or 8), so that patterns can be identified through exploratory analysis. Models will be tested and ruled out in the process.

mtcars$am <- factor(mtcars$am,labels=c('Automatic', 'Manual'))
mtcars$cyl <- as.factor(mtcars$cyl)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 Automatic 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 Automatic 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 Automatic 3 1

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)

Statistical Inference:

The t-test results rejecting the null hypothesis; having the p-value of 0.00137. The performance difference between cars with automatic and manual transmission is about 7 MPG more for cars with manual transmission than those with automatic transmission.

mpgtransresult <- t.test(mtcars$mpg ~ mtcars$am)
mpgtranspvalue <- mpgtransresult$p.value
mpgtransrestimate <- mpgtransresult$estimate

(see Appendix for more details of the above results) ###Regression Model:

Using various models to fit the data to find coefficients that would have significant at 0.05 significant level:

fullModel <- lm(mpg ~ ., data=mtcars)
bestModel <- step(fullModel, direction = "backward", trace=0)

Full Model Result:

Residual standard error: 2.833 on 15 degrees of freedom Multiple R-squared: 0.8931, Adjusted R-squared: 0.779 F-statistic: 7.83 on 16 and 15 DF, p-value: 0.00012

Best Model Result:

Residual standard error: 2.41 on 26 degrees of freedom Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401 F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10

(see Appendix for more details of the above result)

The boxplot from the Appendix Figure 1 shows that manual transmissions have better mileage per gallon; however, the other factors such as weight needs to be taken into account. The scatterplot Figure 2 shows that automatic cars being heavier than manual cars and has great effect on the cars’ MPG. Calculations show a significal effect, which is 88% at 0.05. see Appendix WeightSignificance Results

Referring to Appendix Regression Model Plots:

  1. The Residuals vs. Fitted plot shows no consistent pattern, supporting the accuracy of the independence assumption.
  2. The Normal Q-Q plot shows that the residuals are normally distributed because the points lie closely to the line.
  3. The Scale-Location plot shows that constant variance assumption holds as the points are randomly distributed.
  4. The Residuals vs. Leverage shows no outliers are present; all values fall well within the 0.5 bands.

Conclusion:

Although transmission type affects the mileage of cars, further analysis of the mtcars data shows that the weight of the car (which in which automatic cars seems to be heavier) has a greater effect on the mileage of the car. If we hold the weight constant, cars with manual transmission add just a not so significant amount on MPG on average than cars with automatic transmission. On the other hand, a weight increase changes the MPG significantly.

APPENDIX:

FIGURE 1: Boxplot to Visualize the Difference in MPG for Automatic and Manual Transmissions

##Detailed Results 1: Statistical Inference

## 
##  Welch Two Sample t-test
## 
## data:  mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231
## [1] 0.001373638
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Detailed Results 2: Regression Model

Best Model Result

## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## amManual     1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10

WeightSignificance Regression Model Plots