Motor Trend, a magazine about the automobile industry is looking at a data set of a collection of cars and are interested in exploring the relationship between a set of variables and miles per gallon (MPG). They are particularly interested in the following two questions:
Is an automatic or manual transmission better for MPG
Quantify the MPG difference between automatic and manual transmissions
An EDA identifies the relationship between transmission type and mpg. As can be seen in the box plot, Plot1, the type of transmission has a significant impact on mpg and the initial results show that manual transmission has a positive effect on mpg.
In the linear model we use transmission type (am) as the predictor and miles per gallon (mpg) as the response.
fit <- lm(mpg ~ am, data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
The linear model reveals that cars with a manual transmission (am = 1) get 7.24 miles more per gallon than cars with an automatic transmission (am = 0). However given an R-squared of .36 signifies that only 36% of variation in mpg is determined by the relationship between mpg and am. Therefore we need to take into account other variables that may play a role in the model.
A multivariate regression model takes into account more variables that have a significant impact on mpg. We identify significant variables that have an impact on mpg by running a pairs plot, Plot 2. From the pairs plot, Plot 2, we can tell that the variables cyl, disp, hp and wt have the strongest correlation with mpg. We build a new model fit_new with these variables and do a model test comparing it with the simple linear model fit.
fit_new <- lm(mpg ~ am + cyl + disp + hp + wt, data = mtcars)
anova(fit, fit_new)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + cyl + disp + hp + wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 26 163.12 4 557.78 22.226 4.507e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The resulting p-value of 4.507e-08 for fit_new is much better than that of the original simple linear regression model fit.
summary(fit_new)
##
## Call:
## lm(formula = mpg ~ am + cyl + disp + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5952 -1.5864 -0.7157 1.2821 5.5725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.20280 3.66910 10.412 9.08e-11 ***
## am 1.55649 1.44054 1.080 0.28984
## cyl -1.10638 0.67636 -1.636 0.11393
## disp 0.01226 0.01171 1.047 0.30472
## hp -0.02796 0.01392 -2.008 0.05510 .
## wt -3.30262 1.13364 -2.913 0.00726 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared: 0.8551, Adjusted R-squared: 0.8273
## F-statistic: 30.7 on 5 and 26 DF, p-value: 4.029e-10
The fit_new model explains 85.5% of the variance and therefore cyl, disp, hp and wt affect the correlation between mpg and am. The fit_new model also indicates that manual transmission will increase miles per gallan by 1.556 miles as compared with automatic transmission.
Plot 3 tells us the following:
The residual vs Fitted graph shows that the residuals are homoscedastic
The Normal Q-Q plot shows that the points fall mostly on the line and this indicates that the residuals are normally distributed
The Scale Location plot also shows that the points are randomly distributed and this confirms the constant variance assumption
The Residuals vs Leverage plot shows that all values fall well within the .5 bands and this indicates that no outliers are present
This study shows that miles per gallon is better with manual transmission as compared to automatic transmission. However the difference in mpg is dependent on other variables. In this study we learned that number of cylinders (cyl), displacement (disp), horsepower (hp) and weight(wt) all influence by how much or how little mpg improves for manual transmission. We can say with confidence that mpg is 1.556 miles better for manual transmission when cyl, disp, hp and wt are included in the regression model.
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
[, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs V/S [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors
mtcars$am <- as.factor(mtcars$am)
boxplot(mpg ~ am, data = mtcars, xlab = "transmission type", ylab = "miles per gallon")
pairs(mtcars, panel = panel.smooth)
par(mfrow = c(2,2))
plot(fit_new)