Executive Summary

Motor Trend is magazine about the automobile industry. They have provided a data set, having 11 different specifications of a collection of 32 different cars. We are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). We are particularly interested in the following two questions:

After examining and analysizing the Motor Trend mtcars data set, a comparison was made between mpg and the remaing specifications (number of cylinders, displacement,horse power,rear axel ratio, weight, quarter mile time, V-engine or straight, transmision type, number of gears forward, number of cylinders). The analysis revealed that:

Using Exploratory Data Analysis

Firstly, the str function, below, delineates the variables contained in the table along with their class type and the first 10 data samples. Next, mpg data is compared to the other entries in the mtcars table using the linear models (lm) function.

Figure 4 in the Appendix uses a binomial distribution to show the probability for manual transmission increases with increasing mpg.

Figure 5 in the Appendix is a boxplot that shows the average mpg for automatic and manual transmissions. The numerical values for the average mpg are provided with the graph.

str(MTcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Analysis of the Data

An inspection of the beta zero(intercept) gives 12.3 mpg. The beta one values correspond to the slopes if we treat each data item as a linear model variable. Looking at the two extreme values, am(2.52) the largest positive value indicating a direct relationship with mpg and wt(-3.72) the largest negative value indicating an indirect relationship with mpg. ( See Figure 1 in the Appendix for comparisons)

Model Selection

Next we use the step analysis function to select a formula based model using Akaike’s ’An Information Criterion (AIC). This yields three values for candidacy, wt, as and qsec. Examining the Adjusted R-Squared value of .8336 we conclude that 83% of the variance in mpg is predictable by the independent varaiable of as, wt and qsec. (see Figure 2 in the Appendix)

mt<-lm(formula = mpg ~ ., data = MTcars)
bt<-step(mt,direction="both",trace=0)
summary(mt)$coeff
##                Estimate  Std. Error    t value   Pr(>|t|)
## (Intercept) 12.30337416 18.71788443  0.6573058 0.51812440
## cyl         -0.11144048  1.04502336 -0.1066392 0.91608738
## disp         0.01333524  0.01785750  0.7467585 0.46348865
## hp          -0.02148212  0.02176858 -0.9868407 0.33495531
## drat         0.78711097  1.63537307  0.4813036 0.63527790
## wt          -3.71530393  1.89441430 -1.9611887 0.06325215
## qsec         0.82104075  0.73084480  1.1234133 0.27394127
## vs           0.31776281  2.10450861  0.1509915 0.88142347
## am           2.52022689  2.05665055  1.2254035 0.23398971
## gear         0.65541302  1.49325996  0.4389142 0.66520643
## carb        -0.19941925  0.82875250 -0.2406258 0.81217871
summary(bt)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = MTcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Residuals and Diagnostics

Figure 3 provides the plot(fit) to produce a series of 4 diagnostic plots

  • Residuals vs Fitted = plots ordinary residuals vs fitted values is used to detect patterns for missing variables, heteroskedasticity. Scattering but no outliers.

  • Scale-Location = plots standardized residuals vs fitted values is similar residual plot, used to detect patterns in residuals. Indicates constant variance with a few outliers.

  • Normal Q-Q = plots theoretical quantiles for standard normal vs actual quantiles of standardized residuals is used to evaluate normality of the errors. Linear indicates conformity

  • Residuals vs Leverage = plots cooks distances comparison of fit at that point vs potential for influence of that point is used to detect any points that have substantial influence on the regression model. The four highest leverage values, outliers, are Cadillac Fleetwood, Chrysler Imperial, Lincoln Continental and Merc 230.

ANOVA can be performed on a single logistic regression, in which it will analyze the change in variances with addition of parameters in the model, or multiple nested logistic regression (similar to linear models) ANOVA stands for Compute analysis of variance (or deviance) tables for one or more fitted model objects. The other two variables, wt and qsec, do affect the outcome.

# perform analysis of variance

bl <- lm(mpg ~ am, data = mtcars)


anova(bl,bt)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ wt + qsec + am
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1     30 720.90                                 
## 2     28 169.29  2    551.61 45.618 1.55e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference and Uncertainty

Cleary ,the null hypothesis should be rejected since the P-value(.001374) is smaller than alpha which is .05 . This verifies the averages being different in a 95% confidence interval.

t.test(mpg ~ am, data = mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Conclusion

Analysis of the best model, we can conclude that:

Cars with Manual transmission get better gas mileage when compared to cars with Automatic transmission. (1.4 adjusted by qsec, and wt).

mpg will decrease by 2.5 for every 1000 lb increase in wt. mpg decreases negligibly with increase of hp.

If number of cylinders, cyl increases from 4 to 6 and 8, mpg will decrease by a factor of 3 and 2.2 respectively (adjusted by hp, wt, and am).

APPENDIX - Reference Graphs

##  Cadillac Fleetwood   Chrysler Imperial Lincoln Continental 
##           0.2270069           0.2296338           0.2642151 
##            Merc 230 
##           0.2970422