knitr::opts_chunk$set(echo = TRUE)

Week 4 Peer-Reviewed assignment

Abstract

This is the final assignment in the Coursera Regression Models course. The intention here is to analyze the mtcars data in order to draw inferences from it. The data is drawn from the 1974 Motor Trend Magazine, and contains data on fuel consumption data together with 10 parameters of interest, for 32 different car models.

The data is interpreted in order to draw a conclusion regarding how automatic (am = 1) compare with manual (am = 0) cars. In summary, the data shows that cars with manual transmission, that are in the lighter category, are more fuel efficient. The data also shows that cars in the heavier category, that have automatic transmission, are more fuel efficient.

library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
# library(dplyr)
# library(explore)

dim(mtcars)
## [1] 32 11
# mtcars %>% explore_tbl()
# mtcars %>% describe()
# mtcars %>% explore_all()  Taking these three lines out - it brings nothing to the analysis

data(mtcars)
knitr::kable(
  mtcars[1:32,],
  caption = "Figure 1: Complete list of all records in the dataset"
)
Figure 1: Complete list of all records in the dataset
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

Dimensions of the data

This shows that there are 32 records, with 11 parameters per record

Initial Data Analysis

Initial observations show that manual transmission cars generally feature better MPG.

Inference

The null hypothesis is calculated on the basis of automatic and manual cars

mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$carb <- as.factor(mtcars$carb)
attach(mtcars)
## The following object is masked from package:ggplot2:
## 
##     mpg
result <- t.test(mpg ~ am)

Now, identify the p-value

result$p.value
## [1] 0.001373638

Show the values for ‘Group 0’ (Automatic) and ’Group 1 (Manual)

result$estimate
## mean in group 0 mean in group 1 
##        17.14737        24.39231

The p-value is 0.00137, so we can comfortably reject the Null Hypothesus; the MPG for manual cars differs from the MPG for automatic cars. The means for the two groups differ by approximately 7.3 miles per gallon.

Regression Analysis

Full Model

The data shows that the residual Standard Error is 2.833 with 15 degrees of freedom. The Adjusted R-Squared value is 77.9%, meaning that model is 77.9% compliant with the variance of the MPG parameter. Also, none of the coefficients are significant at the 5% significance level.

fullModel <- lm(mpg ~ ., data=mtcars)
summary(fullModel)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5087 -1.3584 -0.0948  0.7745  4.6251 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 23.87913   20.06582   1.190   0.2525  
## cyl6        -2.64870    3.04089  -0.871   0.3975  
## cyl8        -0.33616    7.15954  -0.047   0.9632  
## disp         0.03555    0.03190   1.114   0.2827  
## hp          -0.07051    0.03943  -1.788   0.0939 .
## drat         1.18283    2.48348   0.476   0.6407  
## wt          -4.52978    2.53875  -1.784   0.0946 .
## qsec         0.36784    0.93540   0.393   0.6997  
## vs1          1.93085    2.87126   0.672   0.5115  
## am1          1.21212    3.21355   0.377   0.7113  
## gear4        1.11435    3.79952   0.293   0.7733  
## gear5        2.52840    3.73636   0.677   0.5089  
## carb2       -0.97935    2.31797  -0.423   0.6787  
## carb3        2.99964    4.29355   0.699   0.4955  
## carb4        1.09142    4.44962   0.245   0.8096  
## carb6        4.47757    6.38406   0.701   0.4938  
## carb8        7.25041    8.36057   0.867   0.3995  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.833 on 15 degrees of freedom
## Multiple R-squared:  0.8931, Adjusted R-squared:  0.779 
## F-statistic:  7.83 on 16 and 15 DF,  p-value: 0.000124

This model has a Residual Standard Error of 2.833 with 15 degrees of freedom. The Adjusted R-Squared value is 77.9%, which means that 77.9% of the variance of the MPG parameter can be explained.

stepModel <- step(fullModel, k=log(nrow(mtcars)))
## Start:  AIC=101.32
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS     AIC
## - carb  5   13.5989 134.00  87.417
## - gear  2    3.9729 124.38  95.428
## - cyl   2   10.9314 131.33  97.170
## - am    1    1.1420 121.55  98.157
## - qsec  1    1.2413 121.64  98.183
## - drat  1    1.8208 122.22  98.335
## - vs    1    3.6299 124.03  98.806
## - disp  1    9.9672 130.37 100.400
## <none>              120.40 101.321
## - wt    1   25.5541 145.96 104.014
## - hp    1   25.6715 146.07 104.040
## 
## Step:  AIC=87.42
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear
## 
##        Df Sum of Sq    RSS    AIC
## - gear  2    5.0215 139.02 81.662
## - cyl   2   12.5642 146.57 83.353
## - disp  1    0.9934 135.00 84.187
## - drat  1    1.1854 135.19 84.233
## - vs    1    3.6763 137.68 84.817
## - qsec  1    5.2634 139.26 85.184
## - am    1   11.9255 145.93 86.679
## <none>              134.00 87.417
## - wt    1   19.7963 153.80 88.360
## - hp    1   22.7935 156.79 88.978
## 
## Step:  AIC=81.66
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   2   10.4247 149.45 77.045
## - drat  1    0.9672 139.99 78.418
## - disp  1    1.5483 140.57 78.551
## - vs    1    2.1829 141.21 78.695
## - qsec  1    3.6324 142.66 79.022
## <none>              139.02 81.662
## - am    1   16.5665 155.59 81.799
## - hp    1   18.1768 157.20 82.129
## - wt    1   31.1896 170.21 84.674
## 
## Step:  AIC=77.04
## mpg ~ disp + hp + drat + wt + qsec + vs + am
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1     0.645 150.09 73.717
## - drat  1     2.869 152.32 74.187
## - disp  1     9.111 158.56 75.473
## - qsec  1    12.573 162.02 76.164
## - hp    1    13.929 163.38 76.431
## <none>              149.45 77.045
## - am    1    20.457 169.91 77.684
## - wt    1    60.936 210.38 84.523
## 
## Step:  AIC=73.72
## mpg ~ disp + hp + drat + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1     3.345 153.44 70.956
## - disp  1     8.545 158.64 72.023
## - hp    1    13.285 163.38 72.965
## <none>              150.09 73.717
## - am    1    20.036 170.13 74.261
## - qsec  1    25.574 175.67 75.286
## - wt    1    67.572 217.66 82.146
## 
## Step:  AIC=70.96
## mpg ~ disp + hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     6.629 160.07 68.844
## - hp    1    12.572 166.01 70.011
## <none>              153.44 70.956
## - qsec  1    26.470 179.91 72.583
## - am    1    32.198 185.63 73.586
## - wt    1    69.043 222.48 79.380
## 
## Step:  AIC=68.84
## mpg ~ hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - hp    1     9.219 169.29 67.170
## <none>              160.07 68.844
## - qsec  1    20.225 180.29 69.186
## - am    1    25.993 186.06 70.193
## - wt    1    78.494 238.56 78.147
## 
## Step:  AIC=67.17
## mpg ~ wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## <none>              169.29 67.170
## - am    1    26.178 195.46 68.306
## - qsec  1   109.034 278.32 79.614
## - wt    1   183.347 352.63 87.187
summary(stepModel)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am1           2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

The model is “mpg ~ wt + qsec + am”, meaning the the parameters taken into account here are the weight in pounds, the quarter-mile time and whether it is manual or automatic. The Residual Standard Error is 2.459 with 28 degrees of freedom. The Adjusted R-Squared value is 0.8336. The p-value is 1.21e-11. The data shows that all of the coefficients are significant at the 5% level.

amIntWtModel <- lm(mpg ~ wt + qsec + am + wt:am, data=mtcars)
summary(amIntWtModel)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am + wt:am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5076 -1.3801 -0.5588  1.0630  4.3684 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.723      5.899   1.648 0.110893    
## wt            -2.937      0.666  -4.409 0.000149 ***
## qsec           1.017      0.252   4.035 0.000403 ***
## am1           14.079      3.435   4.099 0.000341 ***
## wt:am1        -4.141      1.197  -3.460 0.001809 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.084 on 27 degrees of freedom
## Multiple R-squared:  0.8959, Adjusted R-squared:  0.8804 
## F-statistic: 58.06 on 4 and 27 DF,  p-value: 7.168e-13

This particular model has a Residual Standard Error of 2.084 on 27 degrees of freedom. The Adjusted R-squared value is 0.8804, meaning that 88% of the variance of the MPG variable can be explained.

All coefficients are significant at the 5% level.

amModel <- lm(mpg ~ am, data = mtcars)
summary(amModel)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am1            7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

The average fuel consumption of a car with automatic transmission is 17.147 MPG. Cars with manual transmission have, an average fuel mileage of 24.392, which represents an improvement of 7.245 MPG.

This model has a Residual Standard Error of 4.902 with 30 degrees of freedom. The Adjusted R-Squared value is 0.3385, which means that 34% of the variance of the MPG variable can be explained. The low Adjusted R-Squared value suggests that other variables need to be included in this model.

anova(amModel, stepModel, fullModel, amIntWtModel)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ wt + qsec + am
## Model 3: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Model 4: mpg ~ wt + qsec + am + wt:am
##   Res.Df    RSS  Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                    
## 2     28 169.29   2    551.61 34.3604 2.509e-06 ***
## 3     15 120.40  13     48.88  0.4685    0.9114    
## 4     27 117.28 -12      3.13                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(amIntWtModel)
##                  2.5 %    97.5 %
## (Intercept) -2.3807791 21.826884
## wt          -4.3031019 -1.569960
## qsec         0.4998811  1.534066
## am1          7.0308746 21.127981
## wt:am1      -6.5970316 -1.685721

The model with the highest Adjusted R-Squared value is selected is: “mgp ~ wt + qset + am + wt:am”

summary(amIntWtModel$coef)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -4.141  -2.937   1.017   3.548   9.723  14.079

The result shows that the weight and acceleration parameters, “wt” and “qsec”, are set to constant, cars with manual transmission can expect an additional 14.079 miles per gallon over automatic transmission.

sum((abs(dfbetas(amIntWtModel)))>1)
## [1] 0

Appendix: Graphs of fuel economy under various test conditions

Boxplot of MPG versus Transmission (Manual / Automatic)

boxplot(mpg ~ am, xlab="Transmission (0 = Automatic, 1 = Manual)", ylab = "MPG", main = "Boxplot of MPG versus Transmission")

Fuel Efficiency graphs

pairs(mtcars, main = "Pair graph of Motor Trend fuel efficiency road tests", gap = 1/4)

Scatter plot of MPG version Weight by Transmission

ggplot(mtcars, aes(x=wt, y=mpg, group=am, color=am, height=3, width=3)) + geom_point() + scale_color_discrete(labels=c("Automatic Transmission", "Manual Transmission")) + xlab("weight") + ggtitle("Scatter Plot of MPG versus Weight and Transmission")

Plot of number of cylinders versus fuel economy

This next plot shows the relative fuel economy of cars with 4, 6 and 8 cylinders respectively

coplot(mpg ~ disp | as.factor(cyl), data = mtcars, panel = panel.smooth, rows = 1)

par(mfrow = c(1,1))
plot(amIntWtModel)