knitr::opts_chunk$set(echo = TRUE)
This is the final assignment in the Coursera Regression Models course. The intention here is to analyze the mtcars data in order to draw inferences from it. The data is drawn from the 1974 Motor Trend Magazine, and contains data on fuel consumption data together with 10 parameters of interest, for 32 different car models.
The data is interpreted in order to draw a conclusion regarding how automatic (am = 1) compare with manual (am = 0) cars. In summary, the data shows that cars with manual transmission, that are in the lighter category, are more fuel efficient. The data also shows that cars in the heavier category, that have automatic transmission, are more fuel efficient.
library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
# library(dplyr)
# library(explore)
dim(mtcars)
## [1] 32 11
# mtcars %>% explore_tbl()
# mtcars %>% describe()
# mtcars %>% explore_all() Taking these three lines out - it brings nothing to the analysis
data(mtcars)
knitr::kable(
mtcars[1:32,],
caption = "Figure 1: Complete list of all records in the dataset"
)
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
This shows that there are 32 records, with 11 parameters per record
Initial observations show that manual transmission cars generally feature better MPG.
The null hypothesis is calculated on the basis of automatic and manual cars
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$carb <- as.factor(mtcars$carb)
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
result <- t.test(mpg ~ am)
result$p.value
## [1] 0.001373638
result$estimate
## mean in group 0 mean in group 1
## 17.14737 24.39231
The p-value is 0.00137, so we can comfortably reject the Null Hypothesus; the MPG for manual cars differs from the MPG for automatic cars. The means for the two groups differ by approximately 7.3 miles per gallon.
The data shows that the residual Standard Error is 2.833 with 15 degrees of freedom. The Adjusted R-Squared value is 77.9%, meaning that model is 77.9% compliant with the variance of the MPG parameter. Also, none of the coefficients are significant at the 5% significance level.
fullModel <- lm(mpg ~ ., data=mtcars)
summary(fullModel)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5087 -1.3584 -0.0948 0.7745 4.6251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.87913 20.06582 1.190 0.2525
## cyl6 -2.64870 3.04089 -0.871 0.3975
## cyl8 -0.33616 7.15954 -0.047 0.9632
## disp 0.03555 0.03190 1.114 0.2827
## hp -0.07051 0.03943 -1.788 0.0939 .
## drat 1.18283 2.48348 0.476 0.6407
## wt -4.52978 2.53875 -1.784 0.0946 .
## qsec 0.36784 0.93540 0.393 0.6997
## vs1 1.93085 2.87126 0.672 0.5115
## am1 1.21212 3.21355 0.377 0.7113
## gear4 1.11435 3.79952 0.293 0.7733
## gear5 2.52840 3.73636 0.677 0.5089
## carb2 -0.97935 2.31797 -0.423 0.6787
## carb3 2.99964 4.29355 0.699 0.4955
## carb4 1.09142 4.44962 0.245 0.8096
## carb6 4.47757 6.38406 0.701 0.4938
## carb8 7.25041 8.36057 0.867 0.3995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.833 on 15 degrees of freedom
## Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
## F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124
This model has a Residual Standard Error of 2.833 with 15 degrees of freedom. The Adjusted R-Squared value is 77.9%, which means that 77.9% of the variance of the MPG parameter can be explained.
stepModel <- step(fullModel, k=log(nrow(mtcars)))
## Start: AIC=101.32
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - carb 5 13.5989 134.00 87.417
## - gear 2 3.9729 124.38 95.428
## - cyl 2 10.9314 131.33 97.170
## - am 1 1.1420 121.55 98.157
## - qsec 1 1.2413 121.64 98.183
## - drat 1 1.8208 122.22 98.335
## - vs 1 3.6299 124.03 98.806
## - disp 1 9.9672 130.37 100.400
## <none> 120.40 101.321
## - wt 1 25.5541 145.96 104.014
## - hp 1 25.6715 146.07 104.040
##
## Step: AIC=87.42
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear
##
## Df Sum of Sq RSS AIC
## - gear 2 5.0215 139.02 81.662
## - cyl 2 12.5642 146.57 83.353
## - disp 1 0.9934 135.00 84.187
## - drat 1 1.1854 135.19 84.233
## - vs 1 3.6763 137.68 84.817
## - qsec 1 5.2634 139.26 85.184
## - am 1 11.9255 145.93 86.679
## <none> 134.00 87.417
## - wt 1 19.7963 153.80 88.360
## - hp 1 22.7935 156.79 88.978
##
## Step: AIC=81.66
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
##
## Df Sum of Sq RSS AIC
## - cyl 2 10.4247 149.45 77.045
## - drat 1 0.9672 139.99 78.418
## - disp 1 1.5483 140.57 78.551
## - vs 1 2.1829 141.21 78.695
## - qsec 1 3.6324 142.66 79.022
## <none> 139.02 81.662
## - am 1 16.5665 155.59 81.799
## - hp 1 18.1768 157.20 82.129
## - wt 1 31.1896 170.21 84.674
##
## Step: AIC=77.04
## mpg ~ disp + hp + drat + wt + qsec + vs + am
##
## Df Sum of Sq RSS AIC
## - vs 1 0.645 150.09 73.717
## - drat 1 2.869 152.32 74.187
## - disp 1 9.111 158.56 75.473
## - qsec 1 12.573 162.02 76.164
## - hp 1 13.929 163.38 76.431
## <none> 149.45 77.045
## - am 1 20.457 169.91 77.684
## - wt 1 60.936 210.38 84.523
##
## Step: AIC=73.72
## mpg ~ disp + hp + drat + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - drat 1 3.345 153.44 70.956
## - disp 1 8.545 158.64 72.023
## - hp 1 13.285 163.38 72.965
## <none> 150.09 73.717
## - am 1 20.036 170.13 74.261
## - qsec 1 25.574 175.67 75.286
## - wt 1 67.572 217.66 82.146
##
## Step: AIC=70.96
## mpg ~ disp + hp + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - disp 1 6.629 160.07 68.844
## - hp 1 12.572 166.01 70.011
## <none> 153.44 70.956
## - qsec 1 26.470 179.91 72.583
## - am 1 32.198 185.63 73.586
## - wt 1 69.043 222.48 79.380
##
## Step: AIC=68.84
## mpg ~ hp + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - hp 1 9.219 169.29 67.170
## <none> 160.07 68.844
## - qsec 1 20.225 180.29 69.186
## - am 1 25.993 186.06 70.193
## - wt 1 78.494 238.56 78.147
##
## Step: AIC=67.17
## mpg ~ wt + qsec + am
##
## Df Sum of Sq RSS AIC
## <none> 169.29 67.170
## - am 1 26.178 195.46 68.306
## - qsec 1 109.034 278.32 79.614
## - wt 1 183.347 352.63 87.187
summary(stepModel)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am1 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
The model is “mpg ~ wt + qsec + am”, meaning the the parameters taken into account here are the weight in pounds, the quarter-mile time and whether it is manual or automatic. The Residual Standard Error is 2.459 with 28 degrees of freedom. The Adjusted R-Squared value is 0.8336. The p-value is 1.21e-11. The data shows that all of the coefficients are significant at the 5% level.
amIntWtModel <- lm(mpg ~ wt + qsec + am + wt:am, data=mtcars)
summary(amIntWtModel)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am + wt:am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5076 -1.3801 -0.5588 1.0630 4.3684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.723 5.899 1.648 0.110893
## wt -2.937 0.666 -4.409 0.000149 ***
## qsec 1.017 0.252 4.035 0.000403 ***
## am1 14.079 3.435 4.099 0.000341 ***
## wt:am1 -4.141 1.197 -3.460 0.001809 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.084 on 27 degrees of freedom
## Multiple R-squared: 0.8959, Adjusted R-squared: 0.8804
## F-statistic: 58.06 on 4 and 27 DF, p-value: 7.168e-13
This particular model has a Residual Standard Error of 2.084 on 27 degrees of freedom. The Adjusted R-squared value is 0.8804, meaning that 88% of the variance of the MPG variable can be explained.
All coefficients are significant at the 5% level.
amModel <- lm(mpg ~ am, data = mtcars)
summary(amModel)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
The average fuel consumption of a car with automatic transmission is 17.147 MPG. Cars with manual transmission have, an average fuel mileage of 24.392, which represents an improvement of 7.245 MPG.
This model has a Residual Standard Error of 4.902 with 30 degrees of freedom. The Adjusted R-Squared value is 0.3385, which means that 34% of the variance of the MPG variable can be explained. The low Adjusted R-Squared value suggests that other variables need to be included in this model.
anova(amModel, stepModel, fullModel, amIntWtModel)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ wt + qsec + am
## Model 3: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Model 4: mpg ~ wt + qsec + am + wt:am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 34.3604 2.509e-06 ***
## 3 15 120.40 13 48.88 0.4685 0.9114
## 4 27 117.28 -12 3.13
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(amIntWtModel)
## 2.5 % 97.5 %
## (Intercept) -2.3807791 21.826884
## wt -4.3031019 -1.569960
## qsec 0.4998811 1.534066
## am1 7.0308746 21.127981
## wt:am1 -6.5970316 -1.685721
The model with the highest Adjusted R-Squared value is selected is: “mgp ~ wt + qset + am + wt:am”
summary(amIntWtModel$coef)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.141 -2.937 1.017 3.548 9.723 14.079
The result shows that the weight and acceleration parameters, “wt” and “qsec”, are set to constant, cars with manual transmission can expect an additional 14.079 miles per gallon over automatic transmission.
sum((abs(dfbetas(amIntWtModel)))>1)
## [1] 0
boxplot(mpg ~ am, xlab="Transmission (0 = Automatic, 1 = Manual)", ylab = "MPG", main = "Boxplot of MPG versus Transmission")
pairs(mtcars, main = "Pair graph of Motor Trend fuel efficiency road tests", gap = 1/4)
Scatter plot of MPG version Weight by Transmission
ggplot(mtcars, aes(x=wt, y=mpg, group=am, color=am, height=3, width=3)) + geom_point() + scale_color_discrete(labels=c("Automatic Transmission", "Manual Transmission")) + xlab("weight") + ggtitle("Scatter Plot of MPG versus Weight and Transmission")
This next plot shows the relative fuel economy of cars with 4, 6 and 8 cylinders respectively
coplot(mpg ~ disp | as.factor(cyl), data = mtcars, panel = panel.smooth, rows = 1)
par(mfrow = c(1,1))
plot(amIntWtModel)