This report provides an analysis and exploration on the relationship between transmission type and its affect on miles per gallon (MPG) using the data collected from the 1974 Motor Trend US magazine for 32 automobiles. Methods of analysis included regression modeling, statistical inferencing and diagnostic test. All supporting plots for the analysis are found in the appendices.
This report analyzes the single predictor affect of transmission type to MPG compared to various multi predictor variable affects to MPG which included using three models, all variables, best fit variables, and most significat factored variables.
Results of the data analysis showed that transmission type alone does not have a significant impact on the miles per gallon (MPG) of an automobile. When analyzing affects on all the variables, four variables, weight, horse power, cylinders and transmission type, had the most influential impact on MPG. When comparing MPG to transmission type alone, the result showed manual transmissions had 7.24 MPG improvement over automatics, but when compared to the the other influential variables, it only had only an approximately 2 MPG improvement. Further analysis will be required to determine the how each affects the MPG
Transform any data points for readability or useability
mtcars$am <- as.factor(mtcars$am)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
First, taking a quick look at the relationship between MGP and transmission type to verify there is a difference before exploring all the variables.
## Estimate Std. Error t value Pr(>|t|)
## am0 17.14737 1.124603 15.24749 1.133983e-15
## am1 24.39231 1.359578 17.94109 1.376283e-17
The MPG empirical mean for automatics is 17.147 and for manuals is 24.392.
Clearly the data shows transmission type does effect the MPG. Next, a t-test will be performed to see if the underlying distribution of the MPG data is normal. Our null hypothesis is that manual transmission cars get an average of 7.24 miles per gallon more than automatics and the gas mileage data is from identical populations.
t.test(mpg ~ am, data = mtcars)$p.value
## [1] 0.001373638
The p-value rejected (p<.005) the null hypothesis that manual transmissions will get an average of 7.24 MPG improvement over automatic transmissions and transmission type alone influenced the gas efficiency. What affects do the other confounding variables have on the MGP.
In the next set of test, a univariate model (mpg ~ transmission), a multivariate model to control the influences of other variables, and a step model to find the most influencial variables will be conducted. R Squared will be used to measure how close the data is to the fitted regression line. The higher the number the closer the fit.
fit_mpg_am <- lm(mpg ~ am, data = mtcars)
R Squared shows 35.98% of the model can be explained, which is low.
fit_mpg_all <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb, data = mtcars)
R Squared shows 86.9% of the model can be explained.
bestFit <- step(fit_mpg_all, direction = "both", trace = 0, k = log(nrow(mtcars)))
R Squared shows 84.97% of the model can be explained.
Using Analysis of Variance (ANOVA) to find the variables with the significant affects on the model
anova(fit_mpg_all)
From the anova table, see Figure 2, transmission type (am), weight (wt), cylinder (cyl) and horse power (hp) have the most affect on the model due to low p-values. We will use these for a second best fit model called Significant Factors.
Figure 3 shows a scatter plot for each significate variable that has an influence on the outcome of MPG.
Based on the identified significant variables, create a linear model and compare them to the other models.
fit_mpg_sigfac <- lm(mpg ~ am + cyl + hp + wt, data = mtcars)
Based on the multivariate, best fit, and significant factors models displayed in the table, manual transmission cars only get between 1.5 and 2.9 more miles per gallon, which is not significant to show a statistical improvement.
| Â | R Sqr | Adj R Sqr | Est. Mileage Increase | p-value | Model |
|---|---|---|---|---|---|
| univariate | 0.3598 | 0.3385 | 7.24 | 2.850207410^{-4} | mpg ~ am |
| multivariate | 0.869 | 0.8066 | 2.52 | 3.793152110^{-7} | mpg ~ all variables |
| Best Fit | 0.8497 | 0.8336 | 2.94 | 1.210446410^{-11} | mpg ~ am + wt + qsec |
| Significant Factors | 0.849 | 0.8267 | 1.48 | 1.024600710^{-10} | mpg ~ am + cyl + hp + wt |
Compare the univariant, multivariant and best fit models using Analysis of Variance (ANOVA)
anova(fit_mpg_am, bestFit, fit_mpg_all)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt + qsec
## Model 3: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 39.2687 8.025e-08 ***
## 3 21 147.49 7 21.79 0.4432 0.8636
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the plots shown in Figure 4 the following assumptions can be made:
The Signifcant Factors model will be used for diagnostics.
See Figure 5 for dffits scatter plots on the signficant variables and Figure 6 for cooks distance on the signficant variables.
Check for influential observation using cooks distance and dffits
which(dffits(fit_mpg_sigfac) > 0.5)
## Chrysler Imperial Fiat 128 Toyota Corolla Lotus Europa
## 17 18 20 28
## Maserati Bora
## 31
which(cooks.distance(fit_mpg_sigfac) > 0.1)
## Chrysler Imperial Fiat 128 Toyota Corolla Toyota Corona
## 17 18 20 21
From these two checks, we can see the same observations that may influence the results.
Finally, leverage (hat values) is checked for outliers that have may have an influence on the model. Figure 7 shows all the y values to be small and appear to have no direct impact without further investigation.
Since the dataset for mtcars is small, we will compare abs(dfbeta) test > 1.
sum((abs(dfbetas(fit_mpg_sigfac)))>1)
## [1] 0
Transmission type alone does not determine the overall gas savings manual transmissions will have on an automobile. The analysis shows other variables have an impact with the respect to the miles per gallon of the car. The analysis suggested which variables had the greatest impact, but further research will be needed to determine the exact influence.
Figure 1 shows a scatter plot of the Motor Trend Data Set**
pairs(mtcars, panel=panel.smooth, main="Scatter Plots of Motor Trend Car Road Variables")
Figure 2 shows the Analysis of Variance for All Variables
## Analysis of Variance Table
##
## Response: mpg
## Df Sum Sq Mean Sq F value Pr(>F)
## am 1 405.15 405.15 57.6846 1.875e-07 ***
## cyl 1 449.53 449.53 64.0039 8.231e-08 ***
## disp 1 19.28 19.28 2.7452 0.11241
## hp 1 35.71 35.71 5.0849 0.03493 *
## drat 1 1.87 1.87 0.2663 0.61121
## wt 1 52.06 52.06 7.4127 0.01275 *
## qsec 1 13.34 13.34 1.8999 0.18260
## vs 1 0.22 0.22 0.0309 0.86214
## gear 1 0.97 0.97 0.1384 0.71365
## carb 1 0.41 0.41 0.0579 0.81218
## Residuals 21 147.49 7.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Figure 3 shows a scatter plot of the significant factor variables
Figure 4 shows the Residual Plots
Figure 5 shows dffits plots for each of the models
Figure 6 shows cooks distance plots for each of the models
Figure 7 shows leverage plots for each of the multivariate models