GitHub link: https://github.com/arubhardwaj/RegressionInR
Motor Trend magazine is interested in knowing the relationship between a set of variables and the miles per gallon, on basis of which it wants to explore the answers of two following questions:
After performing the whole analysis we find that the intercept, from the multiple regression, is 9.617 and the coefficients of wt, qsec and am are -3.9165, 1.22 and 2.93 respectively. On the basis of which, we can say that a unit change in wt leads to -3.9165 units change in our model. And, other coefficients represents the results of their variables.
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
Here, we will load the necessary required libraries for doing the whole analysis and load the data.
library(datasets)
library(ggplot2)
data(mtcars)
Now, we will run regression draw our analysis. At first we will run Simple Regression Annalysis
model1 <- lm(mpg ~ am, mtcars)
summary(model1)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Note that the p value in the analysis is 0.000285 and R-squared is 0.3385. We should, now, fit all the parameters of mtcars in model2
model2 <- lm(mpg ~ ., mtcars)
summary(model2)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
For model3 we will use STEP function to include all variables
model3 <- step(model2, direction = "both", trace = FALSE)
summary(model3)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Value of R-squared here is almost same (0.8497), as in the model2, and look at the residual standard error; it is 2.41 on 26 degree of freedom. model4 is the final model which gives us the answers of whole analysis.
model4 <- lm(mpg ~ wt + qsec + am, data =mtcars)
summary(model4)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.617781 6.9595930 1.381946 1.779152e-01
## wt -3.916504 0.7112016 -5.506882 6.952711e-06
## qsec 1.225886 0.2886696 4.246676 2.161737e-04
## am 2.935837 1.4109045 2.080819 4.671551e-02
The final model shows that mpg is a dependent variable on wt, qsec and am. The result from its coefficients show that the intercept is 9.617, which means if we have nothing (means equal to 0) as the dependent variable then mpg will equal to 9.617.
The figure 1 (see Appendix) shows that the Manual Transmission is good for the MPG, this gives the answer of our first question. From our analysis we get the results that manual transmission perform better than automatic transmission by 7.35mpg but measuring the manual transmission, taking other three variable in account, suggest 1,48mpg more on the automatic transmission. This concludes our analysis.
This completes our discussion on regression analysis for Motor Trend.
Some supportive figures are drawn below:
boxplot(mpg ~ am, data = mtcars, main = "Miles Per Gallon Vs. Transmission Type",
xlab = "Transmission Type", ylab = "Miles Per Gallon", col = 'grey')
par(mfrow = c(2,2))
plot(model2)
par(mfrow = c(2,2))
plot(model3)
par(mfrow = c(2, 2))
plot(model4)