Executive Summary

GitHub link: https://github.com/arubhardwaj/RegressionInR

Motor Trend magazine is interested in knowing the relationship between a set of variables and the miles per gallon, on basis of which it wants to explore the answers of two following questions:

After performing the whole analysis we find that the intercept, from the multiple regression, is 9.617 and the coefficients of wt, qsec and am are -3.9165, 1.22 and 2.93 respectively. On the basis of which, we can say that a unit change in wt leads to -3.9165 units change in our model. And, other coefficients represents the results of their variables.

Variables Used

Data Source

Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

Libraries and Data

Here, we will load the necessary required libraries for doing the whole analysis and load the data.

library(datasets)
library(ggplot2)
data(mtcars)

Analysis

Now, we will run regression draw our analysis. At first we will run Simple Regression Annalysis

model1 <- lm(mpg ~ am, mtcars)
summary(model1)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Note that the p value in the analysis is 0.000285 and R-squared is 0.3385. We should, now, fit all the parameters of mtcars in model2

model2 <- lm(mpg ~ ., mtcars)
summary(model2)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

For model3 we will use STEP function to include all variables

model3 <- step(model2, direction = "both", trace = FALSE)
summary(model3)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Value of R-squared here is almost same (0.8497), as in the model2, and look at the residual standard error; it is 2.41 on 26 degree of freedom. model4 is the final model which gives us the answers of whole analysis.

model4 <- lm(mpg ~ wt + qsec  + am, data =mtcars)
summary(model4)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  9.617781  6.9595930  1.381946 1.779152e-01
## wt          -3.916504  0.7112016 -5.506882 6.952711e-06
## qsec         1.225886  0.2886696  4.246676 2.161737e-04
## am           2.935837  1.4109045  2.080819 4.671551e-02

Conclusion

The final model shows that mpg is a dependent variable on wt, qsec and am. The result from its coefficients show that the intercept is 9.617, which means if we have nothing (means equal to 0) as the dependent variable then mpg will equal to 9.617.

The figure 1 (see Appendix) shows that the Manual Transmission is good for the MPG, this gives the answer of our first question. From our analysis we get the results that manual transmission perform better than automatic transmission by 7.35mpg but measuring the manual transmission, taking other three variable in account, suggest 1,48mpg more on the automatic transmission. This concludes our analysis.

This completes our discussion on regression analysis for Motor Trend.

Appendix

Some supportive figures are drawn below:

Fig. 1

boxplot(mpg ~ am, data = mtcars, main = "Miles Per Gallon Vs. Transmission Type", 
xlab = "Transmission Type", ylab = "Miles Per Gallon", col = 'grey')

Fig. 2

par(mfrow = c(2,2))
plot(model2)

Fig 3

par(mfrow = c(2,2))
plot(model3)

Fig 4

par(mfrow = c(2, 2))
plot(model4)