7.2, 7.5

7.2 Friedman (1991) introduced several benchmark data sets created by simulation … Tune several models on these data … Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

1. K-Nearest Neighbors

## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.
##      RMSE  Rsquared       MAE 
## 3.2040595 0.6819919 2.5683461

2. Neural Network

Decay = .06, size = 5.

## a 10-5-1 network with 61 weights
## options were - linear output units  decay=0.06
##      RMSE  Rsquared       MAE 
## 1.5052969 0.9103803 1.1799201

3. MARS

## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556    RSS 397.9654    GRSq 0.8968524    RSq 0.9183982
##      RMSE  Rsquared       MAE 
## 1.8136467 0.8677298 1.3911836

4. SVM (linear and radial)

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C     RMSE      Rsquared   MAE     
##   0.25  2.507298  0.7985476  1.988577
##   0.50  2.219032  0.8221279  1.723539
##   1.00  2.026322  0.8446958  1.569136
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06472009
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06472009 and C = 1.
##      RMSE  Rsquared       MAE 
## 2.2549651 0.8002404 1.7237658
## [1] ""
## [1] "___________________________________________"
## [1] ""
## Support Vector Machines with Linear Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   2.443627  0.7597914  1.957003
## 
## Tuning parameter 'C' was held constant at a value of 1
##      RMSE  Rsquared       MAE 
## 2.7633860 0.6973384 2.0970616

The models create RMSEs of between 1.5 and 3.2. Given that the mean y value is 14 and the standard deviation is 5, an RMSE of 1.5 is relatively low. This model (Neural Net) also had an R-Squared of .91, which also suggests a relatively reasonable fit. MARS uses the fist six indicators in this case.

KNN is not generally considered a highly powerful model so it is not surprising that its predictive power is low. NNet was very sensitive to decay. At a decay of .06, the model outperforms MARS but at ,01 it does not. The SVM models performed in between - the radial was more accurate than the Linear.

##   Model      RMSE RSquared
## A KNN        3.2  0.68    
## B NeuralNet  1.5  0.91    
## C MARS       1.8  0.87    
## D SVM-Linear 2.8  0.7     
## E SVM_Radial 2.1  0.83
## [1] 14.38613
## [1] 4.964588

7.5. Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models. (a) Which nonlinear regression model gives the optimal resampling and test set performance?

1. KNN

## k-Nearest Neighbors 
## 
## 143 samples
##  57 predictor
## 
## Pre-processing: centered (57) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 143, 143, 143, 143, 143, 143, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.6231081  0.5890175  0.4452523
##    7  0.6253624  0.5924293  0.4487492
##    9  0.6320160  0.5928172  0.4549737
##   11  0.6376085  0.5958979  0.4637498
##   13  0.6430328  0.5978215  0.4684199
##   15  0.6492103  0.6022657  0.4700675
##   17  0.6554245  0.6026970  0.4730035
##   19  0.6620176  0.6023006  0.4773928
##   21  0.6688298  0.6023563  0.4820142
##   23  0.6732539  0.6048079  0.4854839
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
##      RMSE  Rsquared       MAE 
## 0.6453094 0.6883410 0.4320985

2. Neural Network

Decay = .03, size = 4.

##      RMSE  Rsquared       MAE 
## 0.5587323 0.7956238 0.3826069

3. MARS

## Selected 13 of 21 terms, and 9 of 57 predictors
## Termination condition: RSq changed by less than 0.001 at 21 terms
## Importance: BiologicalMaterial06, BiologicalMaterial10, ...
## Number of terms at each degree of interaction: 1 12 (additive model)
## GCV 0.1155637    RSS 11.25251    GRSq 0.8768118    RSq 0.9149339
##      RMSE  Rsquared       MAE 
## 0.5833055 0.7457736 0.3888165

4. SVM

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 143 samples
##  57 predictor
## 
## Pre-processing: centered (57) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 127, 128, 129, 129, 128, ... 
## Resampling results across tuning parameters:
## 
##   C     RMSE       Rsquared   MAE      
##   0.25  0.6002364  0.6847794  0.4089570
##   0.50  0.5507585  0.7084697  0.3705803
##   1.00  0.5273432  0.7107396  0.3509539
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01291832
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01291832 and C = 1.
##      RMSE  Rsquared       MAE 
## 0.6000082 0.7494038 0.3813698
## [1] ""
## [1] "___________________________________________"
## [1] ""
## Support Vector Machines with Linear Kernel 
## 
## 143 samples
##  57 predictor
## 
## Pre-processing: centered (57), scaled (57) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 128, 128, 131, 130, 128, 128, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE      
##   1.100034  0.7018634  0.4937001
## 
## Tuning parameter 'C' was held constant at a value of 1
##      RMSE  Rsquared       MAE 
## 0.9539542 0.4053442 0.4620395

The models create RMSEs of between .56 and .65 (and an outlier .95 for linear SVM). Given that the mean y value is about 0 and the standard deviation is 1.15, an RMSE of 5.6 is reasonable. This model (again Neural Net) also had an R-Squared of .80, which also suggests a relatively reasonable fit.

With the exception of KNN and linear SVM, all of the models are within a small range. The Lasso model perfomed for homework 7 is also a strong contender, with very comparable scores to the neural net.

##   Model      RMSE RSquared
## A KNN        0.65 0.69    
## B NeuralNet  0.56 0.8     
## C MARS       0.58 0.75    
## D SVM-Linear 0.95 0.4     
## E SVM_Radial 0.6  0.75    
## F Lasso      0.57 0.8
## [1] -0.0278816
## [1] 1.154999

(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

The neural net model has more processing factors at the forefront than the Lasso model. While the results are similar, there is little overlap between the sets of indicators chosen by the two models.

##      Overall
## X10 4.091391
## X1  3.693592
## X17 3.367948
## X55 3.229145
## X32 3.110172
## X51 2.877768
## X11 2.842688
## X42 2.716389
## X13 2.695383
## X39 2.689269

(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

Most of the biological indicators are highly correlated with yield, however the manufacturing indicators the r is low and the p is high. This suggests that the model finds hidden relationships more important than individual correlations Because of the high degree of multicollinearity, the model is looking for the unique set of indicators which explain variance, which may be certain indicators only in combination with other indicators.