Predictive Model:

Predicting city mpg (variable “UCity” in column BG).

EDA led the choice of variables and models. Ref

Dataset

Feature Selection

  • Excluding
  • city08: city MPG for fuelType1. Very strongly correlated (0.9975) with dependent variable UCity
  • comb08: combined MPG for fuelType1. Very strongly correlated (0.9859) with dependent variable UCity
  • highway08: highway MPG for fuelType1. cor(UCity, highway08) 0.9334
  • highway08U: highway unrounded MPG for fuelType1. Highly correlated for available values. Not available values are represented by zero which is going to mislead the distributions.
  • eng_dscr: 559 unique engine descriptions: adds no information
  • c240bDscr: 6 descriptions of EV enginer chargers.
  • c240Dscr: 5 descriptions of EV enginer chargers.
  • mpgData: boolean to say whether vehicle is having mpgData variables or not
  • tCharger: 34653 NA’s
  • sCharger
  • charge120 drop zero variance
  • remove id column as it adds no information to the model

Data Preparation

  • remove zero UCity vehicles (25)

  • atvType missing values treatment

  • displ column type cast to factor

  • Year column type cast to factor

  • cylinders type cast to factor

  • feScore type cast to factor

  • ghgScore type cast to factor

  • evMotor binarized

Modeling Results

## [1] 39910
  • Input Variables
##       input_vars
## 1      barrels08
## 2          displ
## 3     fuelCost08
## 4   youSaveSpend
## 5        atvType
## 6  cylinders_cat
## 7          drive
## 8        evMotor
## 9    feScore_cat
## 10          make
## 11        VClass
## 12      year_cat
  • Complete Dataset size
## [1] 39910
  • Modeling method
  • Linear Regression
  • Random Forest
  • Evalutation method
  • 10 fold Cross Validation
  • Results
## 
## Call:
## summary.resamples(object = results)
## 
## Models: LM, RF 
## Number of resamples: 10 
## 
## MAE 
##        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## LM 1.115548 1.138126 1.154752 1.151025 1.168733 1.179196    0
## RF 2.811906 2.909293 2.941965 2.947410 2.990543 3.073557    0
## 
## RMSE 
##        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## LM 1.724450 1.773954 1.811878 1.816579 1.850945 1.928517    0
## RF 4.191009 4.336133 4.534070 4.470805 4.566885 4.671541    0
## 
## Rsquared 
##         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## LM 0.9127999 0.9269126 0.9287763 0.9276492 0.9310013 0.9329749    0
## RF 0.8316304 0.8365962 0.8428304 0.8435713 0.8486622 0.8593638    0