Using Regression Models and Exploratory Data Analysis, we look at a dataset of a collection of cars to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). Particularly, answers to the following two questions will be publish on Motor Trend, a magazine about the automobile industry:
In order to answer the questions above, it is important to verify other potential influencers other than transmission type, namely: the weight of a car, the number of cylinders, etc…
The dataset is from R base: mtcars: The coefficients need to be turned into factors (transmission type: Automatic or Manual, number of cylinders: 4, 6, or 8), so that patterns can be identified through exploratory analysis. Models will be tested and ruled out in the process.
mtcars$am <- factor(mtcars$am,labels=c('Automatic', 'Manual'))
mtcars$cyl <- as.factor(mtcars$cyl)
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | Manual | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | Manual | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | Manual | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | Automatic | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | Automatic | 3 | 2 |
| Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | Automatic | 3 | 1 |
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)
The t-test results rejecting the null hypothesis; having the p-value of 0.00137. The performance difference between cars with automatic and manual transmission is about 7 MPG more for cars with manual transmission than those with automatic transmission.
mpgtransresult <- t.test(mtcars$mpg ~ mtcars$am)
mpgtranspvalue <- mpgtransresult$p.value
mpgtransrestimate <- mpgtransresult$estimate
(see Appendix for more details of the above results) ###Regression Model:
Using various models to fit the data to find coefficients that would have significant at 0.05 significant level:
fullModel <- lm(mpg ~ ., data=mtcars)
bestModel <- step(fullModel, direction = "backward", trace=0)
Residual standard error: 2.833 on 15 degrees of freedom Multiple R-squared: 0.8931, Adjusted R-squared: 0.779 F-statistic: 7.83 on 16 and 15 DF, p-value: 0.00012
Residual standard error: 2.41 on 26 degrees of freedom Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401 F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
(see Appendix for more details of the above result)
The boxplot from the Appendix Figure 1 shows that manual transmissions have better mileage per gallon; however, the other factors such as weight needs to be taken into account. The scatterplot Figure 2 shows that automatic cars being heavier than manual cars and has great effect on the cars’ MPG. Calculations show a significal effect, which is 88% at 0.05. see Appendix WeightSignificance Results
Although transmission type affects the mileage of cars, further analysis of the mtcars data shows that the weight of the car (which in which automatic cars seems to be heavier) has a greater effect on the mileage of the car. If we hold the weight constant, cars with manual transmission add just a not so significant amount on MPG on average than cars with automatic transmission. On the other hand, a weight increase changes the MPG significantly.
##Detailed Results 1: Statistical Inference
##
## Welch Two Sample t-test
##
## data: mtcars$mpg by mtcars$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
## [1] 0.001373638
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## amManual 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10