1 Introduction

This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive model of PH.

Please use the historical data set I am providing. Build and report the factors in BOTH a technical and non-technical report. I like to use Word and Excel. Please provide your non-technical report in a business friendly readable document and your predictions in an Excel readable format. The technical report should show clearly the models you tested and how you selected your final approach.

Please submit both Rpubs links and .rmd files or other readable formats for technical and non-technical reports. Also submit the excel file showing the prediction of your models for pH.

3 Load Data

Two data sets are downloaded from Github

  • Training Data: StudentData.xlsx
  • Evaluation Data: StudentEvaluation.xlsx

4 Exploratory data analysis

According to the data summary below,

  • The responsible variable [PH] is continuous, therefore regression model is expected to be built.
  • There are total 31 numerical predictors and 1 categorical predictor in the data set.
  • According to the missing value view, only 1% of the data are missing, the predictor that contains most missing value is [MFR], this missing ratio is 212/2571 = 8.25%. Therefore no predictor is suggested to be removed, imputation is to be included in the later data preprocess.
  • There are 4 rows in the training set which [PH] is missing, as imputing responsible variable is not meaningful in training set, therefore these 4 rows are suggested to be removed.
  • The majority of the continuous numerical predictors in both training set and evaluation set demonstrated skewed distribution, also some of the predictors contain negative values, therefore Yeo-Johnson transformation is used to remove the skewness.
  • A dummy variable will be created for categorical predictor [Brand.Code].
  • The pairwise correlation of predictors [Balling],[Hyd.Pressure3], [Density], [Balling.Lvl] and [Filler.Level], after missing value imputation, are greater than 0.9, therefore, they are suggested to be removed to avoid multicollinearity.

4.1 Training Data Summary

Data summary
Name df
Number of rows 2571
Number of columns 33
_______________________
Column type frequency:
character 1
numeric 32
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Brand.Code 120 0.95 1 1 0 4 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Carb.Volume 10 1.00 5.37 0.11 5.04 5.29 5.35 5.45 5.70 ▁▆▇▅▁
Fill.Ounces 38 0.99 23.97 0.09 23.63 23.92 23.97 24.03 24.32 ▁▂▇▂▁
PC.Volume 39 0.98 0.28 0.06 0.08 0.24 0.27 0.31 0.48 ▁▃▇▂▁
Carb.Pressure 27 0.99 68.19 3.54 57.00 65.60 68.20 70.60 79.40 ▁▅▇▃▁
Carb.Temp 26 0.99 141.09 4.04 128.60 138.40 140.80 143.80 154.00 ▁▅▇▃▁
PSC 33 0.99 0.08 0.05 0.00 0.05 0.08 0.11 0.27 ▆▇▃▁▁
PSC.Fill 23 0.99 0.20 0.12 0.00 0.10 0.18 0.26 0.62 ▆▇▃▁▁
PSC.CO2 39 0.98 0.06 0.04 0.00 0.02 0.04 0.08 0.24 ▇▅▂▁▁
Mnf.Flow 2 1.00 24.57 119.48 -100.20 -100.00 65.20 140.80 229.40 ▇▁▁▇▂
Carb.Pressure1 32 0.99 122.59 4.74 105.60 119.00 123.20 125.40 140.20 ▁▃▇▂▁
Fill.Pressure 22 0.99 47.92 3.18 34.60 46.00 46.40 50.00 60.40 ▁▁▇▂▁
Hyd.Pressure1 11 1.00 12.44 12.43 -0.80 0.00 11.40 20.20 58.00 ▇▅▂▁▁
Hyd.Pressure2 15 0.99 20.96 16.39 0.00 0.00 28.60 34.60 59.40 ▇▂▇▅▁
Hyd.Pressure3 15 0.99 20.46 15.98 -1.20 0.00 27.60 33.40 50.00 ▇▁▃▇▁
Hyd.Pressure4 30 0.99 96.29 13.12 52.00 86.00 96.00 102.00 142.00 ▁▃▇▂▁
Filler.Level 20 0.99 109.25 15.70 55.80 98.30 118.40 120.00 161.20 ▁▃▅▇▁
Filler.Speed 57 0.98 3687.20 770.82 998.00 3888.00 3982.00 3998.00 4030.00 ▁▁▁▁▇
Temperature 14 0.99 65.97 1.38 63.60 65.20 65.60 66.40 76.20 ▇▃▁▁▁
Usage.cont 5 1.00 20.99 2.98 12.08 18.36 21.79 23.75 25.90 ▁▃▅▃▇
Carb.Flow 2 1.00 2468.35 1073.70 26.00 1144.00 3028.00 3186.00 5104.00 ▂▅▆▇▁
Density 1 1.00 1.17 0.38 0.24 0.90 0.98 1.62 1.92 ▁▅▇▂▆
MFR 212 0.92 704.05 73.90 31.40 706.30 724.00 731.00 868.60 ▁▁▁▂▇
Balling 1 1.00 2.20 0.93 -0.17 1.50 1.65 3.29 4.01 ▁▇▇▁▇
Pressure.Vacuum 0 1.00 -5.22 0.57 -6.60 -5.60 -5.40 -5.00 -3.60 ▂▇▆▂▁
PH 4 1.00 8.55 0.17 7.88 8.44 8.54 8.68 9.36 ▁▅▇▂▁
Oxygen.Filler 12 1.00 0.05 0.05 0.00 0.02 0.03 0.06 0.40 ▇▁▁▁▁
Bowl.Setpoint 2 1.00 109.33 15.30 70.00 100.00 120.00 120.00 140.00 ▁▂▃▇▁
Pressure.Setpoint 12 1.00 47.62 2.04 44.00 46.00 46.00 50.00 52.00 ▁▇▁▆▁
Air.Pressurer 0 1.00 142.83 1.21 140.80 142.20 142.60 143.00 148.20 ▅▇▁▁▁
Alch.Rel 9 1.00 6.90 0.51 5.28 6.54 6.56 7.24 8.62 ▁▇▂▃▁
Carb.Rel 10 1.00 5.44 0.13 4.96 5.34 5.40 5.54 6.06 ▁▇▇▂▁
Balling.Lvl 1 1.00 2.05 0.87 0.00 1.38 1.48 3.14 3.66 ▁▇▂▁▆

4.2 Evaluation Data Summary

Data summary
Name df_eval
Number of rows 267
Number of columns 33
_______________________
Column type frequency:
character 1
logical 1
numeric 31
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Brand.Code 8 0.97 1 1 0 4 0

Variable type: logical

skim_variable n_missing complete_rate mean count
PH 267 0 NaN :

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Carb.Volume 1 1.00 5.37 0.11 5.15 5.29 5.34 5.47 5.67 ▂▇▃▅▁
Fill.Ounces 6 0.98 23.97 0.08 23.75 23.92 23.97 24.01 24.20 ▁▅▇▃▁
PC.Volume 4 0.99 0.28 0.06 0.10 0.23 0.28 0.32 0.46 ▁▆▇▅▁
Carb.Pressure 0 1.00 68.25 3.86 60.20 65.30 68.00 70.60 77.60 ▃▆▇▃▂
Carb.Temp 1 1.00 141.23 4.30 130.00 138.40 140.80 143.80 154.00 ▁▆▇▃▁
PSC 5 0.98 0.09 0.05 0.00 0.04 0.08 0.11 0.25 ▆▇▃▂▁
PSC.Fill 3 0.99 0.19 0.11 0.02 0.10 0.18 0.26 0.62 ▇▇▃▁▁
PSC.CO2 5 0.98 0.05 0.04 0.00 0.02 0.04 0.06 0.24 ▇▃▂▁▁
Mnf.Flow 0 1.00 21.03 117.76 -100.20 -100.00 0.20 141.30 220.40 ▇▁▁▆▂
Carb.Pressure1 4 0.99 123.04 4.42 113.00 120.20 123.40 125.50 136.00 ▃▃▇▂▁
Fill.Pressure 2 0.99 48.14 3.44 37.80 46.00 47.80 50.20 60.20 ▁▇▇▂▁
Hyd.Pressure1 0 1.00 12.01 13.53 -50.00 0.00 10.40 20.40 50.00 ▁▁▇▆▂
Hyd.Pressure2 1 1.00 20.11 17.21 -50.00 0.00 26.80 34.80 61.40 ▁▁▆▇▁
Hyd.Pressure3 1 1.00 19.61 16.56 -50.00 0.00 27.70 33.00 49.20 ▁▁▆▃▇
Hyd.Pressure4 4 0.99 97.84 13.92 68.00 90.00 98.00 104.00 140.00 ▅▆▇▂▁
Filler.Level 2 0.99 110.29 15.50 69.20 100.60 118.60 120.20 153.20 ▂▃▇▇▁
Filler.Speed 10 0.96 3581.39 911.19 1006.00 3812.00 3978.00 3996.00 4020.00 ▁▁▁▁▇
Temperature 2 0.99 66.23 1.69 63.80 65.40 65.80 66.60 75.40 ▇▅▁▁▁
Usage.cont 2 0.99 20.90 3.00 12.90 18.12 21.44 23.74 24.60 ▁▃▃▃▇
Carb.Flow 0 1.00 2408.64 1161.36 0.00 1083.00 3038.00 3215.00 3858.00 ▂▃▁▆▇
Density 1 1.00 1.18 0.38 0.06 0.92 0.98 1.60 1.84 ▁▁▇▁▅
MFR 31 0.88 697.80 96.40 15.60 707.00 724.60 731.45 784.80 ▁▁▁▁▇
Balling 1 1.00 2.20 0.92 0.90 1.50 1.65 3.24 3.79 ▅▇▁▂▅
Pressure.Vacuum 1 1.00 -5.17 0.58 -6.40 -5.60 -5.20 -4.80 -3.60 ▁▇▆▃▁
Oxygen.Filler 3 0.99 0.05 0.05 0.00 0.02 0.03 0.05 0.40 ▇▁▁▁▁
Bowl.Setpoint 1 1.00 109.62 15.02 70.00 100.00 120.00 120.00 130.00 ▁▂▁▃▇
Pressure.Setpoint 2 0.99 47.73 2.06 44.00 46.00 46.00 50.00 52.00 ▁▇▁▆▁
Air.Pressurer 1 1.00 142.83 1.23 141.20 142.20 142.60 142.80 147.20 ▅▇▁▁▁
Alch.Rel 3 0.99 6.91 0.50 6.40 6.54 6.58 7.18 7.82 ▇▁▂▁▃
Carb.Rel 2 0.99 5.44 0.13 5.18 5.34 5.40 5.56 5.74 ▂▇▂▃▂
Balling.Lvl 0 1.00 2.05 0.88 0.00 1.38 1.48 3.08 3.42 ▁▃▇▁▇

4.3 Missing Value View

A plot of missing value distribution in the data set.

4.4 Numerical Predictor Correlation after Missing Data Imputation

  • Using KNN to impute missing values of the training data set
  • compute pair-wise correlations and locate the predictors with pair-wire correlation greate than 0.9
## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion

## Warning in gowerD(don_dist_var, imp_dist_var, weights = weightsx,
## numericalX, : NAs introduced by coercion
## Compare row 23  and column  21 with corr  0.955 
##   Means:  0.248 vs 0.154 so flagging column 23 
## Compare row 14  and column  13 with corr  0.925 
##   Means:  0.246 vs 0.147 so flagging column 14 
## Compare row 21  and column  31 with corr  0.948 
##   Means:  0.21 vs 0.141 so flagging column 21 
## Compare row 31  and column  29 with corr  0.921 
##   Means:  0.18 vs 0.136 so flagging column 31 
## Compare row 16  and column  26 with corr  0.946 
##   Means:  0.189 vs 0.133 so flagging column 16 
## All correlations <= 0.9
## [1] "Balling"       "Hyd.Pressure3" "Density"       "Balling.Lvl"  
## [5] "Filler.Level"

5 Data Preprocess

For training set:

  • Remove rows where PH is empty/NA.
  • perform train-test-split, ratio 4/5.

For both training and evaluation set:

  • Impute missing values using bag trees
  • create dummy variable for categorical variables
  • center and scale numerical variables
  • remove skewness of numerical variables
  • remove predictors with near zero variance
  • remove predictors with correlation greater than 0.9

Note: Data preprocess can be performed during model training, however, as there are multiple models to be built in the later section, preprocessing data in advanced is more efficient than doing it during each model run.

Data summary
Name df_mod
Number of rows 2567
Number of columns 29
_______________________
Column type frequency:
numeric 29
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Carb.Volume 0 1 0.00 1.00 -3.11 -0.72 -0.22 0.81 3.10 ▁▆▇▅▁
Fill.Ounces 0 1 0.00 1.00 -3.93 -0.63 -0.02 0.60 3.97 ▁▂▇▂▁
PC.Volume 0 1 0.00 1.00 -3.28 -0.63 -0.10 0.58 3.31 ▁▃▇▂▁
Carb.Pressure 0 1 0.00 1.00 -3.16 -0.74 0.00 0.67 3.15 ▁▅▇▃▁
Carb.Temp 0 1 0.00 1.00 -3.08 -0.67 -0.08 0.66 3.17 ▁▅▇▃▁
PSC 0 1 0.00 1.00 -1.69 -0.71 -0.14 0.56 3.78 ▆▇▃▁▁
PSC.Fill 0 1 0.00 1.00 -1.67 -0.81 -0.13 0.55 3.62 ▆▇▃▁▁
PSC.CO2 0 1 0.00 1.00 -1.32 -0.85 -0.39 0.55 4.29 ▇▅▂▁▁
Mnf.Flow 0 1 0.00 1.00 -1.04 -1.04 0.38 0.97 1.71 ▇▁▁▇▂
Carb.Pressure1 0 1 0.00 1.00 -3.60 -0.76 0.14 0.60 3.75 ▁▃▇▂▁
Fill.Pressure 0 1 0.00 1.00 -4.19 -0.60 -0.48 0.66 3.93 ▁▁▇▂▁
Hyd.Pressure1 0 1 0.00 1.00 -1.06 -1.00 -0.08 0.63 3.67 ▇▅▂▁▁
Hyd.Pressure2 0 1 0.00 1.00 -1.27 -1.27 0.47 0.84 2.35 ▇▂▇▅▁
Hyd.Pressure4 0 1 0.00 1.00 -2.62 -0.80 -0.04 0.42 3.46 ▂▆▇▂▁
Temperature 0 1 0.00 1.00 -1.71 -0.56 -0.27 0.30 7.34 ▇▃▁▁▁
Usage.cont 0 1 0.00 1.00 -3.00 -0.88 0.26 0.92 1.65 ▁▃▅▃▇
Carb.Flow 0 1 0.00 1.00 -2.29 -1.22 0.52 0.67 2.46 ▂▅▆▇▁
MFR 0 1 0.00 1.00 -5.13 0.17 0.38 0.45 1.55 ▁▁▁▂▇
Pressure.Vacuum 0 1 0.00 1.00 -2.43 -0.67 -0.32 0.38 2.83 ▂▇▅▃▁
Oxygen.Filler 0 1 0.00 1.00 -0.98 -0.55 -0.29 0.30 7.83 ▇▁▁▁▁
Bowl.Setpoint 0 1 0.00 1.00 -2.57 -0.61 0.70 0.70 2.00 ▁▂▃▇▁
Pressure.Setpoint 0 1 0.00 1.00 -1.77 -0.79 -0.79 1.17 2.16 ▁▇▁▆▁
Air.Pressurer 0 1 0.00 1.00 -1.68 -0.52 -0.19 0.14 4.42 ▅▇▁▁▁
Alch.Rel 0 1 0.00 1.00 -3.20 -0.71 -0.67 0.66 3.41 ▁▇▂▃▁
Carb.Rel 0 1 0.00 1.00 -3.70 -0.75 -0.28 0.80 4.85 ▁▇▆▂▁
PH 0 1 8.55 0.17 7.88 8.44 8.54 8.68 9.36 ▁▅▇▂▁
Brand.Code_B 0 1 0.00 1.00 -1.00 -1.00 1.00 1.00 1.00 ▇▁▁▁▇
Brand.Code_C 0 1 0.00 1.00 -0.41 -0.41 -0.41 -0.41 2.43 ▇▁▁▁▂
Brand.Code_D 0 1 0.00 1.00 -0.56 -0.56 -0.56 -0.56 1.78 ▇▁▁▁▂

6 Model building

Three categories of regression models are to be built in this section, including Linear Regression Models, Non-linear Regression Models and Tree-based Models. The model with best performance in the test data set will be selected as the final model.

The models to be built are as below:

  • Linear Regression Models: PLS, Ridge, LASSO and Elastic Net
  • Non-linear Regression Models: KNN, SVM-Linear, SVM-Radial, MARS and Neural Network
  • Tree-based Regression Models: Random Forest, Gradient Boosting Machine and Cubist

6.1 Linear Regression Models

6.1.1 PLS Regression

  1. 7th latent variables are optimal;

  2. The corresponding resampled estimate of RMSE and R2 are 0.1362656 and 0.3739715 respectively.

## Partial Least Squares 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##    1     0.1497005  0.2470540  0.1176600
##    2     0.1430215  0.3139339  0.1116965
##    3     0.1413576  0.3297154  0.1108805
##    4     0.1396517  0.3458175  0.1093216
##    5     0.1390031  0.3516492  0.1085059
##    6     0.1384918  0.3566973  0.1080004
##    7     0.1384305  0.3573092  0.1081537
##    8     0.1384597  0.3570316  0.1080082
##    9     0.1385041  0.3566531  0.1080056
##   10     0.1385358  0.3563692  0.1080224
##   11     0.1385680  0.3560643  0.1080587
##   12     0.1385836  0.3559539  0.1080834
##   13     0.1385914  0.3558839  0.1080780
##   14     0.1385636  0.3561045  0.1080470
##   15     0.1385706  0.3560451  0.1080609
##   16     0.1385782  0.3559804  0.1080730
##   17     0.1385796  0.3559731  0.1080733
##   18     0.1385952  0.3558288  0.1080827
##   19     0.1386021  0.3557690  0.1080826
##   20     0.1386004  0.3557876  0.1080814
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 7.
##      RMSE  Rsquared       MAE 
## 0.1362656 0.3739715 0.1064367

6.1.2 Ridge Regression

  1. lambda = 0.03157895 is optimal;

  2. The corresponding resampled estimate of RMSE and R2 are 0.1299868 and 0.4415918 respectively.

## Ridge Regression 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   lambda      RMSE       Rsquared   MAE      
##   0.00000000  0.1386059  0.3557400  0.1080834
##   0.01052632  0.1385244  0.3564372  0.1080449
##   0.02105263  0.1384937  0.3566985  0.1080301
##   0.03157895  0.1384906  0.3567237  0.1080267
##   0.04210526  0.1385055  0.3565978  0.1080296
##   0.05263158  0.1385331  0.3563667  0.1080395
##   0.06315789  0.1385701  0.3560587  0.1080534
##   0.07368421  0.1386146  0.3556927  0.1080730
##   0.08421053  0.1386650  0.3552820  0.1081018
##   0.09473684  0.1387202  0.3548367  0.1081358
##   0.10526316  0.1387795  0.3543642  0.1081742
##   0.11578947  0.1388424  0.3538704  0.1082172
##   0.12631579  0.1389082  0.3533599  0.1082615
##   0.13684211  0.1389767  0.3528364  0.1083094
##   0.14736842  0.1390475  0.3523031  0.1083598
##   0.15789474  0.1391204  0.3517622  0.1084124
##   0.16842105  0.1391953  0.3512160  0.1084677
##   0.17894737  0.1392719  0.3506661  0.1085256
##   0.18947368  0.1393501  0.3501139  0.1085829
##   0.20000000  0.1394298  0.3495607  0.1086415
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was lambda = 0.03157895.
##      RMSE  Rsquared       MAE 
## 0.1299868 0.4415918 0.1021300

6.1.3 LASSO

  1. The Optimal fraction is 0.1,

  2. The corresponding resampled estimate of RMSE and R2 are 0.1561285 and 0.2961838 respectively.

## The lasso 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   fraction    RMSE       Rsquared   MAE      
##   0.01000000  0.1702219  0.1939752  0.1358928
##   0.01473684  0.1693460  0.1939752  0.1350627
##   0.01947368  0.1684926  0.1939752  0.1342976
##   0.02421053  0.1676620  0.1939752  0.1335504
##   0.02894737  0.1668545  0.1939752  0.1328056
##   0.03368421  0.1660705  0.1939752  0.1321218
##   0.03842105  0.1653103  0.1939752  0.1314585
##   0.04315789  0.1645743  0.1939752  0.1308063
##   0.04789474  0.1638627  0.1939752  0.1301621
##   0.05263158  0.1631759  0.1939752  0.1295284
##   0.05736842  0.1625142  0.1939752  0.1289066
##   0.06210526  0.1619037  0.1954704  0.1283338
##   0.06684211  0.1613301  0.1989758  0.1277900
##   0.07157895  0.1607555  0.2047889  0.1272578
##   0.07631579  0.1601689  0.2114635  0.1267473
##   0.08105263  0.1595945  0.2174196  0.1262680
##   0.08578947  0.1590325  0.2227269  0.1257959
##   0.09052632  0.1584829  0.2274517  0.1253301
##   0.09526316  0.1579511  0.2316237  0.1248748
##   0.10000000  0.1574345  0.2354022  0.1244261
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was fraction = 0.1.
##      RMSE  Rsquared       MAE 
## 0.1561285 0.2961838 0.1274395

6.1.4 Elastic Net

  1. The optimal fraction = 0.1 and lambda = 0.2,

  2. The corresponding resampled estimate of RMSE and R2 are 0.1589297 and 0.2697740 respectively.

## Elasticnet 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   lambda      fraction    RMSE       Rsquared   MAE      
##   0.00000000  0.01000000  0.1702219  0.1939752  0.1358928
##   0.01052632  0.01473684  0.1694478  0.1939752  0.1351553
##   0.02105263  0.01947368  0.1687105  0.1939752  0.1344917
##   0.03157895  0.02421053  0.1680056  0.1939752  0.1338616
##   0.04210526  0.02894737  0.1673297  0.1939752  0.1332466
##   0.05263158  0.03368421  0.1666792  0.1939752  0.1326442
##   0.06315789  0.03842105  0.1660529  0.1939752  0.1321087
##   0.07368421  0.04315789  0.1654493  0.1939752  0.1315835
##   0.08421053  0.04789474  0.1648661  0.1939752  0.1310710
##   0.09473684  0.05263158  0.1643026  0.1939752  0.1305686
##   0.10526316  0.05736842  0.1637586  0.1939752  0.1300747
##   0.11578947  0.06210526  0.1632339  0.1939752  0.1295900
##   0.12631579  0.06684211  0.1627265  0.1939752  0.1291161
##   0.13684211  0.07157895  0.1622458  0.1942561  0.1286614
##   0.14736842  0.07631579  0.1618001  0.1953714  0.1282402
##   0.15789474  0.08105263  0.1613795  0.1982185  0.1278414
##   0.16842105  0.08578947  0.1609593  0.2025491  0.1274507
##   0.17894737  0.09052632  0.1605323  0.2074739  0.1270651
##   0.18947368  0.09526316  0.1601179  0.2122214  0.1266996
##   0.20000000  0.10000000  0.1597176  0.2167320  0.1263679
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were fraction = 0.1 and lambda = 0.2.
##      RMSE  Rsquared       MAE 
## 0.1589297 0.2697740 0.1299668

6.2 Non-Linear Regression Models

6.2.1 KNN

  1. The optimal k is 7;

  2. The corresponding resampled estimate of RMSE and R2 are 0.10585060 and 0.62857413 respectively.

## k-Nearest Neighbors 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE       
##    5  0.1257029  0.4775757  0.09351756
##    7  0.1237475  0.4906292  0.09276375
##    9  0.1242006  0.4868828  0.09366748
##   11  0.1258387  0.4745378  0.09549822
##   13  0.1263061  0.4712000  0.09587242
##   15  0.1274855  0.4620434  0.09716663
##   17  0.1284044  0.4544409  0.09826715
##   19  0.1287749  0.4513034  0.09857713
##   21  0.1292793  0.4471276  0.09919209
##   23  0.1298352  0.4422596  0.09962648
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 7.
##       RMSE   Rsquared        MAE 
## 0.10585060 0.62857413 0.07894874

6.2.2 SVM-Linear

  1. The optimal epsilon = 0.1 and cost C = 1;

  2. The corresponding resampled estimate of RMSE and R2 are 0.1381481 and 0.3615830 respectively.

## Support Vector Machines with Linear Kernel 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.1405161  0.3452223  0.1072494
## 
## Tuning parameter 'C' was held constant at a value of 1
## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 1 
## 
## Linear (vanilla) kernel function. 
## 
## Number of Support Vectors : 1831 
## 
## Objective Function Value : -1053.426 
## Training error : 0.643132
##      RMSE  Rsquared       MAE 
## 0.1381481 0.3615830 0.1045695

6.2.3 SVM-Radial

  1. The optimal sigma = 0.0242724 and C = 4;

  2. The corresponding resampled estimate of RMSE and R2 are 0.08011998 and 0.79263724 respectively.

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE       Rsquared   MAE       
##      0.25  0.1286431  0.4526820  0.09577483
##      0.50  0.1256923  0.4758057  0.09278004
##      1.00  0.1231104  0.4952829  0.09035109
##      2.00  0.1210941  0.5106732  0.08867772
##      4.00  0.1204826  0.5158644  0.08851988
##      8.00  0.1212283  0.5141755  0.08924725
##     16.00  0.1224728  0.5116971  0.09033769
##     32.00  0.1258334  0.4986777  0.09296903
##     64.00  0.1326503  0.4687005  0.09806454
##    128.00  0.1389973  0.4449388  0.10296902
##    256.00  0.1452495  0.4218464  0.10818173
##    512.00  0.1510565  0.4016687  0.11316640
##   1024.00  0.1519305  0.3984248  0.11383537
##   2048.00  0.1519305  0.3984248  0.11383537
##   4096.00  0.1519305  0.3984248  0.11383537
## 
## Tuning parameter 'sigma' was held constant at a value of 0.0242724
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.0242724 and C = 4.
## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 4 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0242723997688406 
## 
## Number of Support Vectors : 1748 
## 
## Objective Function Value : -2289.491 
## Training error : 0.216318
##       RMSE   Rsquared        MAE 
## 0.08011998 0.79263724 0.05028598

6.2.4 MARS

  1. The optimal nprune = 23 and degree = 2.

  2. The corresponding resampled estimate of RMSE and R2 are 0.12396741 and 0.49036903 respectively.

## Loading required package: earth
## Loading required package: Formula
## Loading required package: plotmo
## Loading required package: plotrix
## Loading required package: TeachingDemos
## Multivariate Adaptive Regression Spline 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE       
##   1        2      0.1527874  0.2164850  0.11922540
##   1        3      0.1457986  0.2863438  0.11355089
##   1        4      0.1452585  0.2918423  0.11308342
##   1        5      0.1441796  0.3026028  0.11197668
##   1        6      0.1415169  0.3270726  0.10987591
##   1        7      0.1404233  0.3375773  0.10888650
##   1        8      0.1393220  0.3479701  0.10820117
##   1        9      0.1376909  0.3630413  0.10675904
##   1       10      0.1361410  0.3783478  0.10524108
##   1       11      0.1359734  0.3793368  0.10491628
##   1       12      0.1356803  0.3821798  0.10448616
##   1       13      0.1371386  0.3722615  0.10517051
##   1       14      0.1371648  0.3721012  0.10514100
##   1       15      0.1376772  0.3688911  0.10526938
##   1       16      0.1377746  0.3686955  0.10511907
##   1       17      0.1377336  0.3690273  0.10493933
##   1       18      0.1376191  0.3702063  0.10482253
##   1       19      0.1376361  0.3699936  0.10482417
##   1       20      0.1377596  0.3690091  0.10470162
##   1       21      0.1376697  0.3698763  0.10467845
##   1       22      0.1372426  0.3737903  0.10428601
##   1       23      0.1373113  0.3733069  0.10431459
##   1       24      0.1371606  0.3744426  0.10429175
##   1       25      0.1369533  0.3761997  0.10418090
##   1       26      0.1366941  0.3786725  0.10387195
##   1       27      0.1368112  0.3779475  0.10391476
##   1       28      0.1367762  0.3786085  0.10390836
##   1       29      0.1366418  0.3798313  0.10380135
##   1       30      0.1364046  0.3818173  0.10374862
##   1       31      0.1366823  0.3794642  0.10376088
##   1       32      0.1367684  0.3788260  0.10381211
##   1       33      0.1371391  0.3760662  0.10401979
##   1       34      0.1371362  0.3761394  0.10397847
##   1       35      0.1373108  0.3745209  0.10399745
##   1       36      0.1374006  0.3739643  0.10405588
##   1       37      0.1374042  0.3738738  0.10411443
##   1       38      0.1374042  0.3738738  0.10411443
##   2        2      0.1527874  0.2164850  0.11922540
##   2        3      0.1461656  0.2830313  0.11377071
##   2        4      0.1448162  0.2964531  0.11218248
##   2        5      0.1432118  0.3120895  0.11125386
##   2        6      0.1413509  0.3291617  0.10987207
##   2        7      0.1399609  0.3424904  0.10835457
##   2        8      0.1392349  0.3520761  0.10718962
##   2        9      0.1356488  0.3813322  0.10373772
##   2       10      0.1362731  0.3780690  0.10382089
##   2       11      0.1365127  0.3769165  0.10333758
##   2       12      0.1364264  0.3778924  0.10319761
##   2       13      0.1351820  0.3883888  0.10240425
##   2       14      0.1355442  0.3855695  0.10251147
##   2       15      0.1346713  0.3928119  0.10169420
##   2       16      0.1337491  0.4010124  0.10125582
##   2       17      0.1337139  0.4018183  0.10135034
##   2       18      0.1330777  0.4078082  0.10074254
##   2       19      0.1330450  0.4084463  0.10053691
##   2       20      0.1327534  0.4110547  0.10032196
##   2       21      0.1325528  0.4127307  0.10017536
##   2       22      0.1321892  0.4157352  0.09981311
##   2       23      0.1318251  0.4188860  0.09946667
##   2       24      0.1318599  0.4185769  0.09949694
##   2       25      0.1320799  0.4167887  0.09960169
##   2       26      0.1321709  0.4160313  0.09959809
##   2       27      0.1320612  0.4169283  0.09951555
##   2       28      0.1320617  0.4168964  0.09952595
##   2       29      0.1320048  0.4173713  0.09945129
##   2       30      0.1320310  0.4171650  0.09951915
##   2       31      0.1320310  0.4171650  0.09951915
##   2       32      0.1320310  0.4171650  0.09951915
##   2       33      0.1320310  0.4171650  0.09951915
##   2       34      0.1320310  0.4171650  0.09951915
##   2       35      0.1320310  0.4171650  0.09951915
##   2       36      0.1320310  0.4171650  0.09951915
##   2       37      0.1320310  0.4171650  0.09951915
##   2       38      0.1320310  0.4171650  0.09951915
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 23 and degree = 2.
## Selected 23 of 29 terms, and 11 of 28 predictors (nprune=23)
## Termination condition: RSq changed by less than 0.001 at 29 terms
## Importance: Mnf.Flow, Brand.Code_C, Alch.Rel, Bowl.Setpoint, ...
## Number of terms at each degree of interaction: 1 5 17
## GCV 0.01611237    RSS 31.31483    GRSq 0.4573021    RSq 0.4859905
##       RMSE   Rsquared        MAE 
## 0.12396741 0.49036903 0.09564496

6.2.5 Neural Network

The final neural network model is size = 5, decay = 0.01, with RMSE and R2 0.11423783 and R2 0.56938536 respectively.

## Model Averaged Neural Network 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE       
##   0.01   1     0.1390464  0.3530330  0.10718967
##   0.01   2     0.1434126  0.3389881  0.10782088
##   0.01   3     0.1526538  0.3942120  0.10075708
##   0.01   4     0.1257428  0.4693378  0.09528213
##   0.01   5     0.1233552  0.4889622  0.09328839
##   0.03   1     0.1386663  0.3554992  0.10775591
##   0.03   2     0.1388569  0.3613455  0.10756416
##   0.03   3     0.1315032  0.4201018  0.10026793
##   0.03   4     0.1258351  0.4704857  0.09536096
##   0.03   5     0.1247329  0.4793170  0.09451236
##   0.05   1     0.1384113  0.3582842  0.10760727
##   0.05   2     0.1418169  0.3406693  0.10996645
##   0.05   3     0.1307451  0.4301629  0.09983160
##   0.05   4     0.1269153  0.4592491  0.09681098
##   0.05   5     0.1243520  0.4819411  0.09394431
##   0.07   1     0.1387791  0.3543691  0.10793027
##   0.07   2     0.1433552  0.3242280  0.11137415
##   0.07   3     0.1307574  0.4302750  0.10031547
##   0.07   4     0.1275863  0.4536516  0.09767131
##   0.07   5     0.1249580  0.4762182  0.09519004
##   0.09   1     0.1384934  0.3572619  0.10786324
##   0.09   2     0.1388061  0.3677063  0.10781687
##   0.09   3     0.1297552  0.4408124  0.09960522
##   0.09   4     0.1263112  0.4645131  0.09599997
##   0.09   5     0.1251908  0.4737351  0.09525916
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.01 and bag
##  = FALSE.
##       RMSE   Rsquared        MAE 
## 0.11423783 0.56938536 0.08687277

6.3 Tree-Based Regression Models

6.3.1 Random Forest

  1. The optimal mtry = 15.

  2. The corresplonding resampled estimate of RMSE and R2 are 0.09784328 and 0.69226170 respectively.

## Random Forest 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE       Rsquared   MAE       
##    2    0.1165576  0.5859532  0.08864558
##   15    0.1046622  0.6441878  0.07596282
##   28    0.1054225  0.6312982  0.07518499
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 15.
##       RMSE   Rsquared        MAE 
## 0.09784328 0.69226170 0.07327428

6.3.2 Gradient Boosting Machine

  1. The optimal n.trees = 900, interaction.depth = 5, shrinkage = 0.1 and n.minobsinnode = 10.

  2. The corresplonding resampled estimate of RMSE and R2 are 0.1104675 and 0.5972602 respectively.

## Stochastic Gradient Boosting 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   shrinkage  interaction.depth  n.minobsinnode  n.trees  RMSE     
##   0.01       1                   5               100     0.1535056
##   0.01       1                   5               150     0.1490350
##   0.01       1                   5               200     0.1458946
##   0.01       1                   5               250     0.1436576
##   0.01       1                   5               300     0.1419323
##   0.01       1                   5               350     0.1406554
##   0.01       1                   5               400     0.1396556
##   0.01       1                   5               450     0.1389351
##   0.01       1                   5               500     0.1382906
##   0.01       1                   5               550     0.1378580
##   0.01       1                   5               600     0.1374626
##   0.01       1                   5               650     0.1371200
##   0.01       1                   5               700     0.1367336
##   0.01       1                   5               750     0.1364098
##   0.01       1                   5               800     0.1361145
##   0.01       1                   5               850     0.1358792
##   0.01       1                   5               900     0.1356063
##   0.01       1                   5               950     0.1353368
##   0.01       1                   5              1000     0.1351189
##   0.01       1                  10               100     0.1536465
##   0.01       1                  10               150     0.1491078
##   0.01       1                  10               200     0.1460110
##   0.01       1                  10               250     0.1437385
##   0.01       1                  10               300     0.1420114
##   0.01       1                  10               350     0.1406584
##   0.01       1                  10               400     0.1397182
##   0.01       1                  10               450     0.1389405
##   0.01       1                  10               500     0.1383112
##   0.01       1                  10               550     0.1377765
##   0.01       1                  10               600     0.1373695
##   0.01       1                  10               650     0.1370119
##   0.01       1                  10               700     0.1366433
##   0.01       1                  10               750     0.1362744
##   0.01       1                  10               800     0.1360030
##   0.01       1                  10               850     0.1357687
##   0.01       1                  10               900     0.1355482
##   0.01       1                  10               950     0.1352849
##   0.01       1                  10              1000     0.1350739
##   0.01       3                   5               100     0.1443981
##   0.01       3                   5               150     0.1387395
##   0.01       3                   5               200     0.1352581
##   0.01       3                   5               250     0.1330947
##   0.01       3                   5               300     0.1316020
##   0.01       3                   5               350     0.1304906
##   0.01       3                   5               400     0.1295551
##   0.01       3                   5               450     0.1287383
##   0.01       3                   5               500     0.1280514
##   0.01       3                   5               550     0.1274304
##   0.01       3                   5               600     0.1267289
##   0.01       3                   5               650     0.1262282
##   0.01       3                   5               700     0.1258221
##   0.01       3                   5               750     0.1254674
##   0.01       3                   5               800     0.1251154
##   0.01       3                   5               850     0.1247970
##   0.01       3                   5               900     0.1245305
##   0.01       3                   5               950     0.1243256
##   0.01       3                   5              1000     0.1240812
##   0.01       3                  10               100     0.1445042
##   0.01       3                  10               150     0.1386595
##   0.01       3                  10               200     0.1351232
##   0.01       3                  10               250     0.1329691
##   0.01       3                  10               300     0.1313775
##   0.01       3                  10               350     0.1301674
##   0.01       3                  10               400     0.1292257
##   0.01       3                  10               450     0.1284388
##   0.01       3                  10               500     0.1277175
##   0.01       3                  10               550     0.1271278
##   0.01       3                  10               600     0.1265867
##   0.01       3                  10               650     0.1260797
##   0.01       3                  10               700     0.1256559
##   0.01       3                  10               750     0.1252243
##   0.01       3                  10               800     0.1248902
##   0.01       3                  10               850     0.1245717
##   0.01       3                  10               900     0.1242475
##   0.01       3                  10               950     0.1239413
##   0.01       3                  10              1000     0.1237621
##   0.01       5                   5               100     0.1407584
##   0.01       5                   5               150     0.1347082
##   0.01       5                   5               200     0.1310276
##   0.01       5                   5               250     0.1286710
##   0.01       5                   5               300     0.1269540
##   0.01       5                   5               350     0.1257140
##   0.01       5                   5               400     0.1246789
##   0.01       5                   5               450     0.1239186
##   0.01       5                   5               500     0.1233149
##   0.01       5                   5               550     0.1226674
##   0.01       5                   5               600     0.1220399
##   0.01       5                   5               650     0.1216075
##   0.01       5                   5               700     0.1212381
##   0.01       5                   5               750     0.1208889
##   0.01       5                   5               800     0.1205311
##   0.01       5                   5               850     0.1201081
##   0.01       5                   5               900     0.1198700
##   0.01       5                   5               950     0.1196036
##   0.01       5                   5              1000     0.1193918
##   0.01       5                  10               100     0.1405693
##   0.01       5                  10               150     0.1344117
##   0.01       5                  10               200     0.1307994
##   0.01       5                  10               250     0.1282400
##   0.01       5                  10               300     0.1265838
##   0.01       5                  10               350     0.1253331
##   0.01       5                  10               400     0.1242838
##   0.01       5                  10               450     0.1234494
##   0.01       5                  10               500     0.1226329
##   0.01       5                  10               550     0.1219982
##   0.01       5                  10               600     0.1214775
##   0.01       5                  10               650     0.1209410
##   0.01       5                  10               700     0.1206051
##   0.01       5                  10               750     0.1201725
##   0.01       5                  10               800     0.1198850
##   0.01       5                  10               850     0.1195760
##   0.01       5                  10               900     0.1192327
##   0.01       5                  10               950     0.1190077
##   0.01       5                  10              1000     0.1188001
##   0.01       7                   5               100     0.1384275
##   0.01       7                   5               150     0.1320876
##   0.01       7                   5               200     0.1282591
##   0.01       7                   5               250     0.1257032
##   0.01       7                   5               300     0.1239902
##   0.01       7                   5               350     0.1225967
##   0.01       7                   5               400     0.1216320
##   0.01       7                   5               450     0.1206244
##   0.01       7                   5               500     0.1199510
##   0.01       7                   5               550     0.1193445
##   0.01       7                   5               600     0.1187967
##   0.01       7                   5               650     0.1184142
##   0.01       7                   5               700     0.1180329
##   0.01       7                   5               750     0.1176940
##   0.01       7                   5               800     0.1173624
##   0.01       7                   5               850     0.1170728
##   0.01       7                   5               900     0.1167934
##   0.01       7                   5               950     0.1165117
##   0.01       7                   5              1000     0.1162684
##   0.01       7                  10               100     0.1381935
##   0.01       7                  10               150     0.1316462
##   0.01       7                  10               200     0.1276732
##   0.01       7                  10               250     0.1251146
##   0.01       7                  10               300     0.1232905
##   0.01       7                  10               350     0.1220781
##   0.01       7                  10               400     0.1210549
##   0.01       7                  10               450     0.1202724
##   0.01       7                  10               500     0.1195638
##   0.01       7                  10               550     0.1189483
##   0.01       7                  10               600     0.1183244
##   0.01       7                  10               650     0.1179829
##   0.01       7                  10               700     0.1175881
##   0.01       7                  10               750     0.1172371
##   0.01       7                  10               800     0.1169624
##   0.01       7                  10               850     0.1166893
##   0.01       7                  10               900     0.1163924
##   0.01       7                  10               950     0.1161454
##   0.01       7                  10              1000     0.1159055
##   0.10       1                   5               100     0.1353870
##   0.10       1                   5               150     0.1340044
##   0.10       1                   5               200     0.1331607
##   0.10       1                   5               250     0.1325162
##   0.10       1                   5               300     0.1322091
##   0.10       1                   5               350     0.1322188
##   0.10       1                   5               400     0.1319665
##   0.10       1                   5               450     0.1318188
##   0.10       1                   5               500     0.1319638
##   0.10       1                   5               550     0.1320879
##   0.10       1                   5               600     0.1321047
##   0.10       1                   5               650     0.1319325
##   0.10       1                   5               700     0.1323059
##   0.10       1                   5               750     0.1323037
##   0.10       1                   5               800     0.1322896
##   0.10       1                   5               850     0.1326825
##   0.10       1                   5               900     0.1326673
##   0.10       1                   5               950     0.1326552
##   0.10       1                   5              1000     0.1326691
##   0.10       1                  10               100     0.1353808
##   0.10       1                  10               150     0.1335157
##   0.10       1                  10               200     0.1326322
##   0.10       1                  10               250     0.1321384
##   0.10       1                  10               300     0.1319427
##   0.10       1                  10               350     0.1319482
##   0.10       1                  10               400     0.1320047
##   0.10       1                  10               450     0.1317027
##   0.10       1                  10               500     0.1320438
##   0.10       1                  10               550     0.1319677
##   0.10       1                  10               600     0.1317611
##   0.10       1                  10               650     0.1320805
##   0.10       1                  10               700     0.1319790
##   0.10       1                  10               750     0.1319191
##   0.10       1                  10               800     0.1318749
##   0.10       1                  10               850     0.1322117
##   0.10       1                  10               900     0.1322141
##   0.10       1                  10               950     0.1323673
##   0.10       1                  10              1000     0.1324594
##   0.10       3                   5               100     0.1247993
##   0.10       3                   5               150     0.1239652
##   0.10       3                   5               200     0.1232386
##   0.10       3                   5               250     0.1225885
##   0.10       3                   5               300     0.1222911
##   0.10       3                   5               350     0.1220773
##   0.10       3                   5               400     0.1220112
##   0.10       3                   5               450     0.1215983
##   0.10       3                   5               500     0.1211350
##   0.10       3                   5               550     0.1213181
##   0.10       3                   5               600     0.1209272
##   0.10       3                   5               650     0.1206428
##   0.10       3                   5               700     0.1205694
##   0.10       3                   5               750     0.1204253
##   0.10       3                   5               800     0.1203508
##   0.10       3                   5               850     0.1201017
##   0.10       3                   5               900     0.1202024
##   0.10       3                   5               950     0.1203198
##   0.10       3                   5              1000     0.1201014
##   0.10       3                  10               100     0.1246780
##   0.10       3                  10               150     0.1234524
##   0.10       3                  10               200     0.1226146
##   0.10       3                  10               250     0.1216866
##   0.10       3                  10               300     0.1209524
##   0.10       3                  10               350     0.1208231
##   0.10       3                  10               400     0.1208755
##   0.10       3                  10               450     0.1206491
##   0.10       3                  10               500     0.1206228
##   0.10       3                  10               550     0.1204732
##   0.10       3                  10               600     0.1202193
##   0.10       3                  10               650     0.1201130
##   0.10       3                  10               700     0.1202734
##   0.10       3                  10               750     0.1203011
##   0.10       3                  10               800     0.1200271
##   0.10       3                  10               850     0.1201333
##   0.10       3                  10               900     0.1202030
##   0.10       3                  10               950     0.1201546
##   0.10       3                  10              1000     0.1202208
##   0.10       5                   5               100     0.1212583
##   0.10       5                   5               150     0.1197410
##   0.10       5                   5               200     0.1192567
##   0.10       5                   5               250     0.1183344
##   0.10       5                   5               300     0.1177544
##   0.10       5                   5               350     0.1172734
##   0.10       5                   5               400     0.1170780
##   0.10       5                   5               450     0.1170512
##   0.10       5                   5               500     0.1170092
##   0.10       5                   5               550     0.1169448
##   0.10       5                   5               600     0.1168782
##   0.10       5                   5               650     0.1170498
##   0.10       5                   5               700     0.1169686
##   0.10       5                   5               750     0.1169860
##   0.10       5                   5               800     0.1168449
##   0.10       5                   5               850     0.1166501
##   0.10       5                   5               900     0.1166432
##   0.10       5                   5               950     0.1166941
##   0.10       5                   5              1000     0.1166490
##   0.10       5                  10               100     0.1203285
##   0.10       5                  10               150     0.1186437
##   0.10       5                  10               200     0.1178004
##   0.10       5                  10               250     0.1173107
##   0.10       5                  10               300     0.1169347
##   0.10       5                  10               350     0.1164742
##   0.10       5                  10               400     0.1160615
##   0.10       5                  10               450     0.1158243
##   0.10       5                  10               500     0.1154760
##   0.10       5                  10               550     0.1153967
##   0.10       5                  10               600     0.1152440
##   0.10       5                  10               650     0.1151973
##   0.10       5                  10               700     0.1150863
##   0.10       5                  10               750     0.1151455
##   0.10       5                  10               800     0.1148742
##   0.10       5                  10               850     0.1149837
##   0.10       5                  10               900     0.1146944
##   0.10       5                  10               950     0.1148337
##   0.10       5                  10              1000     0.1148119
##   0.10       7                   5               100     0.1184026
##   0.10       7                   5               150     0.1179315
##   0.10       7                   5               200     0.1180765
##   0.10       7                   5               250     0.1179658
##   0.10       7                   5               300     0.1175782
##   0.10       7                   5               350     0.1170519
##   0.10       7                   5               400     0.1168684
##   0.10       7                   5               450     0.1165944
##   0.10       7                   5               500     0.1164862
##   0.10       7                   5               550     0.1165281
##   0.10       7                   5               600     0.1164659
##   0.10       7                   5               650     0.1164430
##   0.10       7                   5               700     0.1162906
##   0.10       7                   5               750     0.1163773
##   0.10       7                   5               800     0.1164120
##   0.10       7                   5               850     0.1164677
##   0.10       7                   5               900     0.1165455
##   0.10       7                   5               950     0.1165966
##   0.10       7                   5              1000     0.1165314
##   0.10       7                  10               100     0.1175349
##   0.10       7                  10               150     0.1170537
##   0.10       7                  10               200     0.1166296
##   0.10       7                  10               250     0.1164940
##   0.10       7                  10               300     0.1162592
##   0.10       7                  10               350     0.1162896
##   0.10       7                  10               400     0.1160903
##   0.10       7                  10               450     0.1158844
##   0.10       7                  10               500     0.1161168
##   0.10       7                  10               550     0.1159057
##   0.10       7                  10               600     0.1157356
##   0.10       7                  10               650     0.1155581
##   0.10       7                  10               700     0.1154481
##   0.10       7                  10               750     0.1153436
##   0.10       7                  10               800     0.1156930
##   0.10       7                  10               850     0.1156195
##   0.10       7                  10               900     0.1155034
##   0.10       7                  10               950     0.1156759
##   0.10       7                  10              1000     0.1157331
##   Rsquared   MAE       
##   0.2923043  0.12104582
##   0.3175356  0.11713260
##   0.3345305  0.11446047
##   0.3457517  0.11262829
##   0.3539033  0.11130744
##   0.3601665  0.11020835
##   0.3651888  0.10940607
##   0.3693003  0.10887048
##   0.3730780  0.10838540
##   0.3758276  0.10801547
##   0.3780504  0.10770114
##   0.3807034  0.10741898
##   0.3834791  0.10710048
##   0.3859153  0.10683060
##   0.3881325  0.10657080
##   0.3898195  0.10634838
##   0.3920246  0.10610349
##   0.3939641  0.10585506
##   0.3953937  0.10564788
##   0.2901715  0.12113708
##   0.3173627  0.11721307
##   0.3324042  0.11449854
##   0.3442850  0.11266653
##   0.3530893  0.11133265
##   0.3596855  0.11022095
##   0.3647388  0.10947421
##   0.3687133  0.10884475
##   0.3724134  0.10833251
##   0.3762077  0.10791443
##   0.3788602  0.10762313
##   0.3814104  0.10728577
##   0.3841774  0.10699975
##   0.3866392  0.10666650
##   0.3886066  0.10641055
##   0.3902346  0.10622005
##   0.3918679  0.10600871
##   0.3938946  0.10576644
##   0.3955024  0.10553620
##   0.3895942  0.11361467
##   0.4085464  0.10883339
##   0.4221029  0.10593687
##   0.4322963  0.10407672
##   0.4401262  0.10275747
##   0.4464604  0.10176121
##   0.4524626  0.10089176
##   0.4575277  0.10012256
##   0.4619026  0.09946874
##   0.4662053  0.09887681
##   0.4714428  0.09827623
##   0.4750793  0.09783409
##   0.4776138  0.09744715
##   0.4799069  0.09709940
##   0.4823535  0.09674901
##   0.4844402  0.09642783
##   0.4861647  0.09615482
##   0.4873754  0.09593639
##   0.4890149  0.09567516
##   0.3877509  0.11364208
##   0.4095044  0.10877600
##   0.4228489  0.10593533
##   0.4328778  0.10405052
##   0.4421676  0.10273885
##   0.4491202  0.10165845
##   0.4548781  0.10077391
##   0.4600134  0.10005415
##   0.4650236  0.09932900
##   0.4689388  0.09871916
##   0.4725248  0.09816326
##   0.4759486  0.09764840
##   0.4788238  0.09718342
##   0.4817735  0.09675055
##   0.4839559  0.09636641
##   0.4862928  0.09609419
##   0.4885961  0.09572951
##   0.4907546  0.09543445
##   0.4919536  0.09523381
##   0.4296328  0.11056944
##   0.4465845  0.10542151
##   0.4595223  0.10223897
##   0.4698218  0.10009360
##   0.4794847  0.09860618
##   0.4862967  0.09744553
##   0.4927006  0.09649515
##   0.4973343  0.09570717
##   0.5007846  0.09510394
##   0.5049475  0.09443334
##   0.5090950  0.09384399
##   0.5117210  0.09335976
##   0.5139853  0.09298451
##   0.5162901  0.09264970
##   0.5187712  0.09233416
##   0.5216779  0.09195232
##   0.5232556  0.09171237
##   0.5249086  0.09144184
##   0.5262414  0.09120052
##   0.4305419  0.11040402
##   0.4496975  0.10519122
##   0.4622780  0.10210678
##   0.4745939  0.09982927
##   0.4829701  0.09830628
##   0.4898305  0.09716037
##   0.4960863  0.09617146
##   0.5011013  0.09532461
##   0.5062142  0.09453153
##   0.5101728  0.09390248
##   0.5133029  0.09335901
##   0.5171049  0.09284134
##   0.5192795  0.09246480
##   0.5219163  0.09200919
##   0.5236911  0.09174965
##   0.5257575  0.09142983
##   0.5281709  0.09110828
##   0.5296959  0.09084697
##   0.5309683  0.09059011
##   0.4553393  0.10873165
##   0.4715015  0.10322464
##   0.4840593  0.09986379
##   0.4955013  0.09761230
##   0.5037097  0.09601242
##   0.5112731  0.09465708
##   0.5169049  0.09363113
##   0.5232551  0.09268454
##   0.5271739  0.09195952
##   0.5309892  0.09133709
##   0.5346564  0.09079435
##   0.5369050  0.09041760
##   0.5390687  0.08998010
##   0.5410202  0.08963275
##   0.5431144  0.08930669
##   0.5450597  0.08901503
##   0.5467731  0.08875337
##   0.5487468  0.08850885
##   0.5503899  0.08830302
##   0.4579059  0.10834217
##   0.4765811  0.10278031
##   0.4897284  0.09926566
##   0.5007878  0.09692264
##   0.5099010  0.09521690
##   0.5154119  0.09405683
##   0.5212298  0.09301995
##   0.5255340  0.09224600
##   0.5296746  0.09152930
##   0.5334226  0.09091125
##   0.5373467  0.09030151
##   0.5391987  0.08993093
##   0.5415483  0.08946552
##   0.5437602  0.08909500
##   0.5455212  0.08878540
##   0.5473915  0.08851441
##   0.5493444  0.08819497
##   0.5510494  0.08794372
##   0.5526001  0.08768313
##   0.3925706  0.10578607
##   0.4009384  0.10441821
##   0.4073512  0.10356795
##   0.4121818  0.10286105
##   0.4147433  0.10252442
##   0.4148174  0.10231741
##   0.4170615  0.10200739
##   0.4186291  0.10180281
##   0.4173253  0.10176632
##   0.4168509  0.10178692
##   0.4171331  0.10169856
##   0.4191245  0.10159296
##   0.4156994  0.10191771
##   0.4163146  0.10185401
##   0.4170539  0.10192938
##   0.4138965  0.10206631
##   0.4142706  0.10204338
##   0.4145678  0.10190521
##   0.4147332  0.10185310
##   0.3921931  0.10586555
##   0.4072644  0.10407100
##   0.4136823  0.10302749
##   0.4161934  0.10222333
##   0.4176470  0.10183855
##   0.4179291  0.10164997
##   0.4172452  0.10159452
##   0.4196887  0.10133641
##   0.4169207  0.10150627
##   0.4182346  0.10139667
##   0.4198290  0.10107966
##   0.4171574  0.10112351
##   0.4181224  0.10117512
##   0.4190221  0.10095109
##   0.4197240  0.10083218
##   0.4169570  0.10094134
##   0.4164783  0.10101379
##   0.4157212  0.10097037
##   0.4151814  0.10098202
##   0.4816502  0.09612469
##   0.4868661  0.09486796
##   0.4913976  0.09386725
##   0.4968902  0.09321608
##   0.4999181  0.09285739
##   0.5016669  0.09260927
##   0.5026058  0.09225028
##   0.5067216  0.09199024
##   0.5100292  0.09154079
##   0.5092293  0.09162251
##   0.5125149  0.09122717
##   0.5148562  0.09089508
##   0.5157139  0.09088356
##   0.5173058  0.09063209
##   0.5182242  0.09069230
##   0.5202329  0.09060502
##   0.5195152  0.09066987
##   0.5188649  0.09053892
##   0.5206071  0.09044505
##   0.4819876  0.09626836
##   0.4904744  0.09454040
##   0.4967831  0.09377631
##   0.5045420  0.09303392
##   0.5104180  0.09212121
##   0.5116055  0.09181519
##   0.5111501  0.09173337
##   0.5131878  0.09165600
##   0.5138878  0.09147914
##   0.5156730  0.09137932
##   0.5175690  0.09106740
##   0.5185697  0.09105451
##   0.5177694  0.09111288
##   0.5178604  0.09125541
##   0.5197956  0.09083979
##   0.5190885  0.09096936
##   0.5189439  0.09100255
##   0.5192169  0.09083851
##   0.5190606  0.09082384
##   0.5082410  0.09219655
##   0.5199920  0.09033539
##   0.5226496  0.08951928
##   0.5302999  0.08894231
##   0.5350123  0.08839156
##   0.5388060  0.08805741
##   0.5407253  0.08783215
##   0.5409549  0.08772513
##   0.5418992  0.08763445
##   0.5425453  0.08759442
##   0.5428591  0.08757097
##   0.5418499  0.08771346
##   0.5427772  0.08765661
##   0.5428787  0.08758450
##   0.5443327  0.08749978
##   0.5455025  0.08728405
##   0.5460617  0.08722536
##   0.5459019  0.08738157
##   0.5462553  0.08736333
##   0.5157180  0.09190604
##   0.5283063  0.09030123
##   0.5342711  0.08958764
##   0.5382390  0.08888796
##   0.5414564  0.08830597
##   0.5452703  0.08796681
##   0.5488859  0.08731427
##   0.5513184  0.08694026
##   0.5541820  0.08661303
##   0.5551141  0.08667164
##   0.5565919  0.08631202
##   0.5569887  0.08610650
##   0.5580502  0.08590139
##   0.5579847  0.08578173
##   0.5601827  0.08565956
##   0.5593102  0.08578147
##   0.5617171  0.08551281
##   0.5610253  0.08569260
##   0.5614420  0.08571081
##   0.5312223  0.08936245
##   0.5350188  0.08859739
##   0.5341844  0.08839159
##   0.5354758  0.08786617
##   0.5386357  0.08724049
##   0.5429063  0.08694732
##   0.5444806  0.08656257
##   0.5469085  0.08651211
##   0.5481268  0.08621271
##   0.5481705  0.08636112
##   0.5488321  0.08636786
##   0.5491554  0.08641019
##   0.5504592  0.08633135
##   0.5500163  0.08634585
##   0.5499226  0.08638559
##   0.5497237  0.08636256
##   0.5493203  0.08638006
##   0.5490384  0.08638911
##   0.5496591  0.08631969
##   0.5379357  0.08898425
##   0.5416412  0.08834651
##   0.5448512  0.08767391
##   0.5458605  0.08698501
##   0.5480120  0.08693774
##   0.5481531  0.08704799
##   0.5502196  0.08686922
##   0.5522713  0.08686386
##   0.5508473  0.08707606
##   0.5526465  0.08682283
##   0.5542019  0.08661854
##   0.5556558  0.08641946
##   0.5565818  0.08623262
##   0.5573549  0.08605023
##   0.5550600  0.08633598
##   0.5557930  0.08638667
##   0.5567567  0.08629577
##   0.5556182  0.08635852
##   0.5552863  0.08638720
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 900,
##  interaction.depth = 5, shrinkage = 0.1 and n.minobsinnode = 10.
##      RMSE  Rsquared       MAE 
## 0.1104675 0.5972602 0.0845282

6.3.3 Cubist

  1. The optimal committees = 20 and neighbors = 5.

  2. The corresplonding resampled estimate of RMSE and R2 are 0.09987318 and 0.67114775 respectively.

## Cubist 
## 
## 2054 samples
##   28 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1849, 1849, 1849, 1849, 1849, 1849, ... 
## Resampling results across tuning parameters:
## 
##   committees  neighbors  RMSE       Rsquared   MAE       
##    1          0          0.1286602  0.4755080  0.08985362
##    1          5          0.1239531  0.5287986  0.08520446
##    1          9          0.1236001  0.5245257  0.08516113
##   10          0          0.1117588  0.5832072  0.08091727
##   10          5          0.1054796  0.6275221  0.07473292
##   10          9          0.1054210  0.6270252  0.07520324
##   20          0          0.1107426  0.5919454  0.08024516
##   20          5          0.1042786  0.6350231  0.07382106
##   20          9          0.1043447  0.6342982  0.07434306
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
##       RMSE   Rsquared        MAE 
## 0.09987318 0.67114775 0.07325504

7 Model Selection

The SVM-Radial model has both lowest RMSE and highest R2, therefore it is selected to be the best model.