This article is a part of "Regression Models" course final project on Coursera.  

Aim to analyse “mtcars” dataset and answer 2 questions,
1. Is an automatic or manual transmission better for MPG ?
2. Quantify the MPG difference between automatic and manual transmissions.
Executive summary
According to “mtcars” dataset analysis. Found that manual transmission group has significant better mpg than automatic group.With the average difference in mpg is about 7.24 miles/gallon.

Setting Environment for data analysis

data("mtcars") ## Dataset  used in this analysis.
head(mtcars,n=3)
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
  According to ?mtcars, "am" column this is the binary data where 0 mean automatic transmission,  
  and 1 mean manual transmission.  
  

Step 1 exploratory analysis

coef(lm(mpg ~ ., data = mtcars))
## (Intercept)         cyl        disp          hp        drat          wt 
## 12.30337416 -0.11144048  0.01333524 -0.02148212  0.78711097 -3.71530393 
##        qsec          vs          am        gear        carb 
##  0.82104075  0.31776281  2.52022689  0.65541302 -0.19941925

Coefficient interpretation The result above show that there are 6 predictors, disp, drat, qsec, vs, am and gear, have positive correlation with mpg,and there are 4 predictors, cyl, hp, wt and carb, have negative correlation with mpg.
This summary overview also tell us about all predictors coefficient related to mpg. The most related predictor is “am” with beta1 equal to 2.52, follow by qsec, drat, gear, vs and disp respectively. The beta1 of “am” = 2.52 this seem to be that manual transmission have better mpg than automatic transmission in the first gaze. Let explore more by compare average mpg in this 2 groups manual vs automatic.
Exploratory plot

Interestingly. At first disp has positive coefficient, but when plotting MPG with disp there is a negative slope. What is the real coefficient of disp ?

real <-coef(lm(mpg ~disp ,data = mtcars))[2] ## real coefficient

The real coefficient of disp is -0.0412151. It because there will be at least one variable that its coefficient reverse the sign of disp coefficient from negative to positive.
What is MPG mean by am group?

library(dplyr)
mean_mpg_am<-mtcars %>%
      select(mpg,am)%>%
        group_by(am)%>%
          summarise( mpg_mean = mean(mpg))
mean_mpg_am
## # A tibble: 2 x 2
##      am mpg_mean
##   <dbl>    <dbl>
## 1     0     17.1
## 2     1     24.4

Manual transmission has average 24.4 miles/gallon, while automatic hasaverage 17.1 miles/gallon.
The average difference in mpg along two group is about 7.24 miles/gallon.

Are these means significant difference? Let do student t-test comparison.

automatic_mpg<-mtcars%>%
                  filter(am == "0")%>%
                    select(mpg)
                        
manual_mpg <- mtcars%>%
                  filter(am == "1")%>%
                    select(mpg)
t.test(automatic_mpg,manual_mpg)$p.value
## [1] 0.001373638

The student t-test show p-value less than 0.05, conclude that there is a significant difference in average mpg between automatic and manual transmission.
Conclusion Manual transmission group has significant better mpg than automatic group. with the average difference in mpg is about 7.24 miles/gallon.

Step 2 Create model

mpgmodel <- lm(mpg ~. ,data = mtcars)##This is multiple variable model.
summary(mpgmodel)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

Residual plot and diagnostic of model

x<-mtcars$mpg
e<-resid(mpgmodel)
plot(e~mpg , data = mtcars)
abline(h=0, col="black", lwd = 3)
  for (i in 1 : nrow(mtcars)) 
  lines(c(x[i], x[i]), c(e[i], 0), col = "red" , lwd = 2)

The residual plot look balance. Finding max value of residual.

e[which.max(e)]
## Fiat 128 
## 4.627094

Discussion This model is a multivariables model aim to predict mpg by all 10 predictors. The weakness of this model is P-value of all predictors are more than 0.05,
make us failed to reject that all predictors have no significant impact on output.