Developing Data Products project - Shiny Application and Reproducible Pitch

Erich F Gruhn

May 12, 2018

Introduction

Application

Dataset

Dataset used by the application is the Motor Trend Car Road Tests (from now on ‘mtcars’). The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Next, the dataset structure:

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Prediction

A Random Forest prediction model is generated and trained using the ‘mtcars’ dataset. The goal of this model is to predict the fuel consumption (mpg variable) based on the rest of the variables:

customTrainControl <- trainControl(method = "cv", number = 10)
carsRFModel <- function() {
  return(
    train(
      mpg ~ ., 
      data = mtcars,
      method = "rf",
      trControl = customTrainControl
    )
  )
}

carsRFModel()
## Random Forest 
## 
## 32 samples
## 10 predictors
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 29, 28, 29, 30, 28, 28, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE     
##    2    2.388950  0.9100982  2.075167
##    6    2.338089  0.9427356  2.032172
##   10    2.317244  0.9509582  2.018181
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 10.