Executive Summary

Data for mtcars was extracted from Motor Trend US magazine at 1974 which included 32 automobiles and 10 different designs. In this study we are going to compare effect of automatic and manual transmission on MPG and determination of difference of MPG between automatic and manual automobiles . The results show manual transmission is better than automatic

Introduction

mtcars have 32 observations on 11 variables:

1 mpg: Miles per gallon

2 cyl: Number of cylinders

3 disp: Displacement

4 hp: Gross horsepower

5 drat: Rear axle ratio

6 wt: Weight (1000 lbs)

7 qsec: 1/4 mile time

8 vs: V/S

9 am: Transmission (0 = automatic, 1 = manual)

10 gear: Number of forward gears

11 carb: Number of carburetors

Setting Directory

  setwd("C:/Users/FARZAD/Desktop/Data Science/Course 7/Project")
  getwd()
  [1] "C:/Users/FARZAD/Desktop/Data Science/Course 7/Project"
  
  

Exploratory Analysis

Getting Data & summary

  data(mtcars)
  summary(mtcars)
  
  
     mpg             cyl             disp             hp             drat             wt       
Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0   Min.   :2.760   Min.   :1.513  
1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080   1st Qu.:2.581  
Median :19.20   Median :6.000   Median :196.3   Median :123.0   Median :3.695   Median :3.325  
Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7   Mean   :3.597   Mean   :3.217  
3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920   3rd Qu.:3.610  


Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0   Max.   :4.930   Max.   :5.424  
  
  
      qsec             vs               am              gear            carb      
Min.   :14.50   Min.   :0.0000   Min.   :0.0000   Min.   :3.000   Min.   :1.000  
1st Qu.:16.89   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
Median :17.71   Median :0.0000   Median :0.0000   Median :4.000   Median :2.000  
Mean   :17.85   Mean   :0.4375   Mean   :0.4062   Mean   :3.688   Mean   :2.812  
3rd Qu.:18.90   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000 

Max.   :22.90   Max.   :1.0000   Max.   :1.0000   Max.   :5.000   Max.   :8.000


head(mtcars)


                  mpg   cyl disp  hp   drat    wt   qsec   vs  am  gear carb
Mazda RX4         21.0   6  160   110  3.90  2.620  16.46  0    1    4    4
Mazda RX4 Wag     21.0   6  160   110  3.90  2.875  17.02  0    1    4    4
Datsun 710        22.8   4  108   93   3.85  2.320  18.61  1    1    4    1
Hornet 4 Drive    21.4   6  258   110  3.08  3.215  19.44  1    0    3    1
Hornet Sportabout 18.7   8  360   175  3.15  3.440  17.02  0    0    3    2
Valiant           18.1   6  225   105  2.76  3.460  20.22  1    0    3    1

Evaluation of MPG according to Transmission

boxplot(mpg ~ am, data = mtcars,col  = c("green", "pink"),xlab = "Transmission Type",ylab = "Miles / Gallon",         main = "MPG by Transmission Type",names= c("Automatic","Manual"),horizontal= F)

Therefore Manual looks better than Automatic based on Miles per Gallon based on above boxplot but for evidence based practice it requires hypothesis testing .

Hypothesis Testing

H0: Mean MPG for Automatic = Mean MPG for Manual

H1: Mean MPG for Automatic different than Mean MPG for Manual

 auto=subset(mtcars,select=mpg,am==0)
 manual=subset(mtcars,select=mpg,am==1)
 
 t.test(auto,manual)
 
 
 Welch Two Sample t-test

  data:  auto and manual
  t = -3.7671, df = 18.332, p-value = 0.001374
  alternative hypothesis: true difference in means is not equal to 0
  95 percent confidence interval: -11.280194  -3.209684
  
  sample estimates:
  
  mean of x(Automatic)         mean of y(Manual) 
    17.14737                       24.39231 
    
    

Conclusion:

Manual transmission shows higher mean of MPG than Automatic therefore amount of distance (Miles) per gallon in

manual vehicles is higher than automatic so manual cars can drive longer by certain amount of fuel then Null

hypothesis will be rejected.

Regression Analysis

For regression analysis “MPG” defines as Dependent variable and “am” defines as Independient variable

 reg_Mod<- lm(mpg~am,mtcars) 
 summary(reg_Mod)
 
 Call:
lm(formula = mpg ~ am, data = mtcars)

Residuals:
    Min      1Q        Median      3Q     Max 
  -9.3923   -3.0923   -0.2974    3.2439  9.5077 

Coefficients:
        Estimate        Std.Error    t value     Pr(>|t|)    
(Intercept)   17.147      1.125      15.247     1.13e-15 ***
  am(Manual)  7.245       1.764      4.106      0.000285 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared:  0.3598,    Adjusted R-squared:  0.3385 
F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

This regression determines Manual is better with average 7.245 miles and R squared id 0.36 with variance of 36% .

Multivariate Regression

To evaluate effect of other variables on MPG

reg_total <- lm(mpg~.,mtcars)
summary(reg_total)

Call:
lm(formula = mpg ~ ., data = mtcars)

Residuals:
  Min      1Q       Median       3Q      Max 
-3.4506   -1.6044   -0.1196    1.2193    4.6271 

Coefficients:
           Estimate       Std.Error     t value    Pr(>|t|)  
(Intercept) 12.30337      18.71788      0.657      0.5181  
cyl         -0.11144      1.04502      -0.107      0.9161  
disp         0.01334      0.01786       0.747      0.4635  
hp          -0.02148      0.02177      -0.987      0.3350  
drat         0.78711      1.63537       0.481      0.6353  
wt          -3.71530      1.89441      -1.961      0.0633 .
qsec         0.82104      0.73084       1.123      0.2739  
vs           0.31776      2.10451       0.151      0.8814  
am           2.52023      2.05665       1.225      0.2340  
gear         0.65541      1.49326       0.439      0.6652  
carb        -0.19942      0.82875      -0.241      0.8122  

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  Residual standard error: 2.65 on 21 degrees of freedom
  Multiple R-squared:  0.869,   Adjusted R-squared:  0.8066 
  F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07
  

Evaluation of other variables show although Manual is better but its average reduced to 2.52 miles and R squared

shows variance of 86.9% therefore all coefficients are not significant.

Then for selection of best variables needs stepwise regression method.

reg_stepwise=step(reg_total,trace=0)
summary(reg_stepwise)

Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:
  Min      1Q       Median      3Q       Max 
-3.4811   -1.5555   -0.7257    1.4110    4.6610 

Coefficients:
             Estimate      Std. Error    t value     Pr(>|t|)    
(Intercept)   9.6178       6.9596       1.382       0.177915    
wt           -3.9165       0.7112      -5.507       6.95e-06 ***
qsec          1.2259       0.2887       4.247       0.000216 ***
am            2.9358       1.4109       2.081       0.046716 *  

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,    Adjusted R-squared:  0.8336 
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Stepwise regression method determines variables such as “wt”,“qsec” and “am” can affect on MPG value more than

others , so with variance of 84.9% and coefficients significative of 5% ,the effect of “am” has more significant

than “wt” and “qsec” on MPG value.

Analysis of Variance (ANOVA)

anova(reg_Mod,reg_stepwise,reg_total)

Analysis of Variance Table

  Model 1: mpg ~ am
  Model 2: mpg ~ wt + qsec + am
  Model 3: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
    Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
  1     30 720.90                                   
  2     28 169.29  2    551.61 39.2687 8.025e-08 ***
  3     21 147.49  7     21.79  0.4432    0.8636    

  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  

Conclusion:

ANOVA shows Model 2 with consideration of three variables (“wt”,“qsec”,“am”) is the best choice to evaluate MPG Value.

Evaluation of Residuals

To evaluate the residuals best model with consideration of three variables (“wt”,“qsec”,“am”) will be plotted

plot(reg_stepwise, which=c(1:1))

Correlation

cor(mtcars)[1,]

       mpg        cyl       disp         hp        drat           wt        qsec          vs           am                                                                                 
    1.0000000  -0.8521620 -0.8475514  -0.7761684  0.6811719  -0.8676594   0.4186840   0.6640389   0.5998324 
    
    
      gear           carb
    
    0.4802848     -0.5509251 
    
    
    res_all <- lm(mpg ~ wt+hp+disp+cyl+am, data = mtcars)
    par(mfrow = c(1, 1))
     plot(res_all)
     
pairs(mtcars)