Manual or Automatic Transmissions are Better for MPG?

Loading data

Take the mtcars data set and write up an analysis to answer their question using regression models and exploratory data analyses. Your report must be: Written as a PDF printout of a compiled (using knitr) R markdown document. Brief. Roughly the equivalent of 2 pages or less for the main text. Supporting figures in an appendix can be included up to 5 total pages including the 2 for the main report. The appendix can only include figures. Include a first paragraph executive summary.

A data frame with 32 observations on 11 variables.

[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio ratio between the driveshaft revolutions and the rear axle’s revolutions per minute.
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs V/S V engine 0 or straight or inline engine 1 binary
[, 9] am Transmission (0 = automatic, 1 = manual) binary
[,10] gear Number of forward gears integer
[,11] carb Number of carburetors integer

autom <- subset(mtcars, mtcars$am==0)
manual <- subset(mtcars, mtcars$am==1)
fita <- lm(mpg ~ ., data = autom)
fitm <- lm(mpg ~ ., data = manual)
step(fita, k = log(nrow(autom)))

## Start:  AIC=37.67
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
## 
## Step:  AIC=37.67
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1    0.1256 29.424 34.810
## - vs    1    0.2817 29.580 34.910
## - cyl   1    0.7301 30.028 35.196
## - wt    1    2.4968 31.795 36.282
## - disp  1    4.3918 33.690 37.382
## - qsec  1    4.4261 33.724 37.402
## <none>              29.298 37.673
## - hp    1    5.9415 35.240 38.237
## - gear  1   16.0652 45.363 43.035
## - carb  1   20.8593 50.157 44.944
## 
## Step:  AIC=34.81
## mpg ~ cyl + disp + hp + wt + qsec + vs + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1    0.2652 29.689 32.036
## - cyl   1    1.7497 31.173 32.963
## - wt    1    2.4244 31.848 33.370
## - disp  1    4.8457 34.269 34.762
## <none>              29.424 34.810
## - qsec  1    5.1987 34.622 34.957
## - hp    1    8.0369 37.460 36.454
## - carb  1   20.9294 50.353 42.073
## - gear  1   22.8653 52.289 42.790
## 
## Step:  AIC=32.04
## mpg ~ cyl + disp + hp + wt + qsec + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - wt    1    2.2129 31.902 30.457
## - cyl   1    3.6986 33.387 31.322
## - disp  1    4.6443 34.333 31.853
## - qsec  1    4.9751 34.664 32.035
## <none>              29.689 32.036
## - hp    1    8.2840 37.973 33.767
## - carb  1   21.7825 51.471 39.546
## - gear  1   23.1149 52.804 40.032
## 
## Step:  AIC=30.46
## mpg ~ cyl + disp + hp + qsec + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - qsec  1     2.772 34.674 29.096
## - cyl   1     2.881 34.782 29.155
## - disp  1     3.055 34.957 29.251
## <none>              31.902 30.457
## - hp    1     6.168 38.069 30.871
## - gear  1    23.036 54.938 37.840
## - carb  1    34.766 66.668 41.517
## 
## Step:  AIC=29.1
## mpg ~ cyl + disp + hp + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   1     0.782 35.456 26.575
## - disp  1     5.689 40.363 29.038
## <none>              34.674 29.096
## - hp    1     8.928 43.602 30.505
## - gear  1    23.097 57.771 35.851
## - carb  1    35.065 69.739 39.428
## 
## Step:  AIC=26.58
## mpg ~ disp + hp + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## <none>              35.456 26.575
## - disp  1     8.053 43.509 27.520
## - hp    1     8.365 43.821 27.656
## - gear  1    27.866 63.322 34.650
## - carb  1    37.681 73.137 37.388

## 
## Call:
## lm(formula = mpg ~ disp + hp + gear + carb, data = autom)
## 
## Coefficients:
## (Intercept)         disp           hp         gear         carb  
##    -1.81912     -0.01180      0.04693      7.64765     -3.53781

step(fitm, k = log(nrow(manual)))

## Start:  AIC=25.61
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
## 
## Step:  AIC=25.61
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1    0.1484 13.106 23.190
## - cyl   1    0.3444 13.302 23.383
## <none>              12.958 25.607
## - drat  1    3.5389 16.497 26.181
## - disp  1    4.5333 17.491 26.942
## - hp    1    5.4058 18.364 27.575
## - carb  1    5.7473 18.705 27.815
## - gear  1   14.7990 27.757 32.945
## - wt    1   19.2325 32.190 34.872
## - qsec  1   23.7588 36.717 36.582
## 
## Step:  AIC=23.19
## mpg ~ cyl + disp + hp + drat + wt + qsec + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   1    0.2996 13.406 20.919
## <none>              13.106 23.190
## - drat  1    3.4388 16.545 23.654
## - disp  1    4.7132 17.819 24.619
## - carb  1    6.4223 19.528 25.810
## - hp    1    7.9243 21.030 26.773
## - wt    1   20.9326 34.039 33.033
## - qsec  1   24.9928 38.099 34.498
## - gear  1   25.9273 39.033 34.813
## 
## Step:  AIC=20.92
## mpg ~ disp + hp + drat + wt + qsec + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## <none>              13.406 20.919
## - drat  1     6.450 19.856 23.461
## - hp    1    18.886 32.292 29.783
## - disp  1    22.215 35.620 31.058
## - carb  1    28.064 41.469 33.035
## - gear  1    28.131 41.537 33.056
## - qsec  1    47.030 60.436 37.931
## - wt    1    52.341 65.747 39.026

## 
## Call:
## lm(formula = mpg ~ disp + hp + drat + wt + qsec + gear + carb, 
##     data = manual)
## 
## Coefficients:
## (Intercept)         disp           hp         drat           wt  
##   -125.3722       0.1269      -0.1191      -3.4887      -9.6853  
##        qsec         gear         carb  
##      7.2555      10.9650       3.4573

summary(autom)

##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   :120.1   Min.   : 62.0  
##  1st Qu.:14.95   1st Qu.:6.000   1st Qu.:196.3   1st Qu.:116.5  
##  Median :17.30   Median :8.000   Median :275.8   Median :175.0  
##  Mean   :17.15   Mean   :6.947   Mean   :290.4   Mean   :160.3  
##  3rd Qu.:19.20   3rd Qu.:8.000   3rd Qu.:360.0   3rd Qu.:192.5  
##  Max.   :24.40   Max.   :8.000   Max.   :472.0   Max.   :245.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :2.465   Min.   :15.41   Min.   :0.0000  
##  1st Qu.:3.070   1st Qu.:3.438   1st Qu.:17.18   1st Qu.:0.0000  
##  Median :3.150   Median :3.520   Median :17.82   Median :0.0000  
##  Mean   :3.286   Mean   :3.769   Mean   :18.18   Mean   :0.3684  
##  3rd Qu.:3.695   3rd Qu.:3.842   3rd Qu.:19.17   3rd Qu.:1.0000  
##  Max.   :3.920   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am         gear            carb      
##  Min.   :0   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0   Median :3.000   Median :3.000  
##  Mean   :0   Mean   :3.211   Mean   :2.737  
##  3rd Qu.:0   3rd Qu.:3.000   3rd Qu.:4.000  
##  Max.   :0   Max.   :4.000   Max.   :4.000

summary(manual)

##       mpg             cyl             disp             hp       
##  Min.   :15.00   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:21.00   1st Qu.:4.000   1st Qu.: 79.0   1st Qu.: 66.0  
##  Median :22.80   Median :4.000   Median :120.3   Median :109.0  
##  Mean   :24.39   Mean   :5.077   Mean   :143.5   Mean   :126.8  
##  3rd Qu.:30.40   3rd Qu.:6.000   3rd Qu.:160.0   3rd Qu.:113.0  
##  Max.   :33.90   Max.   :8.000   Max.   :351.0   Max.   :335.0  
##       drat            wt             qsec             vs        
##  Min.   :3.54   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.85   1st Qu.:1.935   1st Qu.:16.46   1st Qu.:0.0000  
##  Median :4.08   Median :2.320   Median :17.02   Median :1.0000  
##  Mean   :4.05   Mean   :2.411   Mean   :17.36   Mean   :0.5385  
##  3rd Qu.:4.22   3rd Qu.:2.780   3rd Qu.:18.61   3rd Qu.:1.0000  
##  Max.   :4.93   Max.   :3.570   Max.   :19.90   Max.   :1.0000  
##        am         gear            carb      
##  Min.   :1   Min.   :4.000   Min.   :1.000  
##  1st Qu.:1   1st Qu.:4.000   1st Qu.:1.000  
##  Median :1   Median :4.000   Median :2.000  
##  Mean   :1   Mean   :4.385   Mean   :2.923  
##  3rd Qu.:1   3rd Qu.:5.000   3rd Qu.:4.000  
##  Max.   :1   Max.   :5.000   Max.   :8.000

# dp <- ggplot(mtcars, aes(x=dose, y=len, fill=dose)) + 
#   geom_violin(trim=FALSE)+
#   geom_boxplot(width=0.1, fill="white")+
#   labs(title="Plot of length  by dose",x="Dose (mg)", y = "Length")

library(psych)
# pairs.panels(mtcars, 
#              method = "spearman", # correlation method
#              hist.col = "#00AFBB",
#              density = TRUE,  # show density plots
#              ellipses = FALSE # show correlation ellipses
#              )
pairs.panels(mtcars[, c(1,3,4,5,6,7)],lm=TRUE,ellipses=FALSE,hist.col = "#00AFBB")

pairs.panels(mtcars[, c(1,2,8,9,10,11)],lm=TRUE,ellipses=FALSE,hist.col = "blue")

describe(mtcars)

##      vars  n   mean     sd median trimmed    mad   min    max  range  skew
## mpg     1 32  20.09   6.03  19.20   19.70   5.41 10.40  33.90  23.50  0.61
## cyl     2 32   6.19   1.79   6.00    6.23   2.97  4.00   8.00   4.00 -0.17
## disp    3 32 230.72 123.94 196.30  222.52 140.48 71.10 472.00 400.90  0.38
## hp      4 32 146.69  68.56 123.00  141.19  77.10 52.00 335.00 283.00  0.73
## drat    5 32   3.60   0.53   3.70    3.58   0.70  2.76   4.93   2.17  0.27
## wt      6 32   3.22   0.98   3.33    3.15   0.77  1.51   5.42   3.91  0.42
## qsec    7 32  17.85   1.79  17.71   17.83   1.42 14.50  22.90   8.40  0.37
## vs      8 32   0.44   0.50   0.00    0.42   0.00  0.00   1.00   1.00  0.24
## am      9 32   0.41   0.50   0.00    0.38   0.00  0.00   1.00   1.00  0.36
## gear   10 32   3.69   0.74   4.00    3.62   1.48  3.00   5.00   2.00  0.53
## carb   11 32   2.81   1.62   2.00    2.65   1.48  1.00   8.00   7.00  1.05
##      kurtosis    se
## mpg     -0.37  1.07
## cyl     -1.76  0.32
## disp    -1.21 21.91
## hp      -0.14 12.12
## drat    -0.71  0.09
## wt      -0.02  0.17
## qsec     0.34  0.32
## vs      -2.00  0.09
## am      -1.92  0.09
## gear    -1.07  0.13
## carb     1.26  0.29

require(datasets); data(mtcars); require(GGally)

## Loading required package: GGally

ggpairs(mtcars, lower = list(continuous = "smooth"), parameters = c(method = "loess"))

## Warning in warn_if_args_exist(list(...)): Extra arguments: 'parameters' are
## being ignored. If these are meant to be aesthetics, submit them using the
## 'mapping' variable within ggpairs with ggplot2::aes or ggplot2::aes_string.

summary(lm(mpg ~ . , data = mtcars))$coefficients

##                Estimate  Std. Error    t value   Pr(>|t|)
## (Intercept) 12.30337416 18.71788443  0.6573058 0.51812440
## cyl         -0.11144048  1.04502336 -0.1066392 0.91608738
## disp         0.01333524  0.01785750  0.7467585 0.46348865
## hp          -0.02148212  0.02176858 -0.9868407 0.33495531
## drat         0.78711097  1.63537307  0.4813036 0.63527790
## wt          -3.71530393  1.89441430 -1.9611887 0.06325215
## qsec         0.82104075  0.73084480  1.1234133 0.27394127
## vs           0.31776281  2.10450861  0.1509915 0.88142347
## am           2.52022689  2.05665055  1.2254035 0.23398971
## gear         0.65541302  1.49325996  0.4389142 0.66520643
## carb        -0.19941925  0.82875250 -0.2406258 0.81217871

summary(lm(mpg ~ am, data = mtcars))$coefficients

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## am           7.244939   1.764422  4.106127 2.850207e-04

custom_car <- ggpairs(mtcars[, c("mpg", "wt", "cyl", "am")], upper = "blank", title = "Custom Example")
plot <- ggplot2::ggplot(mtcars, ggplot2::aes(x=am, y=mpg, label=rownames(mtcars)))
  plot <- plot +
    ggplot2::geom_text(ggplot2::aes(colour=factor(am)), size = 3) +
    ggplot2::scale_colour_discrete(l=40)
# custom_car[1, 4] <- plot
# plot
  # car

library(GGally)
library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

library(scales)

## 
## Attaching package: 'scales'

## The following objects are masked from 'package:psych':
## 
##     alpha, rescale

data("mtcars")
# Function to return points and geom_smooth
# allow for the method to be changed
my_fn <- function(data, mapping, method="lm", ...){
    p <- ggplot(data = data, mapping = mapping) + 
        geom_point()  + 
        geom_smooth(method=method, ...)
    p
}

# Default loess curve    
ggpairs(mtcars[, c(1,3,4,5,6,7)], lower = list(continuous = my_fn))

ggpairs(mtcars[, c(9,1)], lower = list(continuous = my_fn))

library(datasets); data(mtcars); require(stats); require(graphics)
pairs(mtcars, panel = panel.smooth, main = "Modern Trend")

library("GGally")
library(ggplot2)
plot1 <- ggpairs(mtcars, lower=list(continuous="smooth"), 
                 diag=list(continuous=c("barDiag","densityDiag")), axisLabels="show", 
                 columns = c("mpg", "wt", "qsec", "hp", "disp", "drat"))

## Warning in if (!str_detect(val, "Diag$")) {: the condition has length > 1
## and only the first element will be used

## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used

## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used

## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used

## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used

## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used

## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used

plot1

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot2 <- ggpairs(mtcars, lower=list(continuous="smooth"), 
                 diag=list(continuous="barDiag"), axisLabels="show", 
                 columns = c("mpg", "am", "cyl", "vs", "carb", "gear"))
plot2

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

summary(mtcars)

##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Manual or Automatic Transmissions are Better for MPG?

Arman Kirakosyan

August 25, 2016

Loading data

Instructions

Criteria