Question 7.2

Friedman (1991) introduced several benchmark data sets created by simulation. On of these simulations used the following nonlinear equations to create data:

\(y = 10 sin(\pi x_1x_2) + 20(x_3- 0.5)^2 + 10x_4 + 5x_5 + N(0, \sigma^2)\)

where the \(x\) values are random variables uniformly distributed between \([0,1]\) (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

Tune several models on these data. For example:

KNN Model

k-Nearest Neighbors 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   5  3.466085  0.5121775  2.816838
   7  3.349428  0.5452823  2.727410
   9  3.264276  0.5785990  2.660026
  11  3.214216  0.6024244  2.603767
  13  3.196510  0.6176570  2.591935
  15  3.184173  0.6305506  2.577482
  17  3.183130  0.6425367  2.567787
  19  3.198752  0.6483184  2.592683
  21  3.188993  0.6611428  2.588787
  23  3.200458  0.6638353  2.604529

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 17.
     RMSE  Rsquared       MAE 
3.2040595 0.6819919 2.5683461 

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1-X5)?

MARS Model

Multivariate Adaptive Regression Spline 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE      Rsquared   MAE     
  1        2      4.383438  0.2405683  3.597961
  1        3      3.645469  0.4745962  2.930453
  1        4      2.727602  0.7035031  2.184240
  1        5      2.449243  0.7611230  1.939231
  1        6      2.331605  0.7835496  1.833420
  1        7      1.976830  0.8421599  1.562591
  1        8      1.870959  0.8585503  1.464551
  1        9      1.804342  0.8683110  1.410395
  1       10      1.787676  0.8711960  1.386944
  1       11      1.790700  0.8707740  1.393076
  1       12      1.821005  0.8670619  1.419893
  1       13      1.858688  0.8617344  1.445459
  1       14      1.862343  0.8623072  1.446050
  1       15      1.871033  0.8607099  1.457618
  2        2      4.383438  0.2405683  3.597961
  2        3      3.644919  0.4742570  2.929647
  2        4      2.730222  0.7028372  2.183075
  2        5      2.481291  0.7545789  1.965749
  2        6      2.338369  0.7827873  1.825542
  2        7      2.030065  0.8328250  1.602024
  2        8      1.890997  0.8551326  1.477422
  2        9      1.742626  0.8757904  1.371910
  2       10      1.608221  0.8943432  1.255416
  2       11      1.474325  0.9111463  1.157848
  2       12      1.437483  0.9157967  1.120977
  2       13      1.439395  0.9164721  1.128309
  2       14      1.428565  0.9184503  1.118634
  2       15      1.434093  0.9182413  1.121622

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 14 and degree = 2.

The optimal MARS model minimized the RMSE when the nprune = 14 and the degree = 2.

     RMSE  Rsquared       MAE 
1.2779993 0.9338365 1.0147070 

The RMSE of the MARS model is a lot lower than the KNN model. Let’s see what variables are important.

earth variable importance

    Overall
X1   100.00
X4    84.98
X2    68.87
X5    48.55
X3    38.96
X7     0.00
X9     0.00
X6     0.00
X10    0.00
X8     0.00

The MARS model picks up the X1-X5 varaibles.

SVM Model

Support Vector Machines with Radial Basis Function Kernel 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  C       RMSE      Rsquared   MAE     
    0.25  2.534047  0.7923747  2.033670
    0.50  2.288685  0.8061041  1.833890
    1.00  2.140818  0.8219731  1.702844
    2.00  2.065252  0.8337042  1.632423
    4.00  1.977030  0.8470946  1.576586
    8.00  1.919041  0.8548963  1.533203
   16.00  1.915333  0.8556587  1.532406
   32.00  1.915333  0.8556587  1.532406
   64.00  1.915333  0.8556587  1.532406
  128.00  1.915333  0.8556587  1.532406

Tuning parameter 'sigma' was held constant at a value of 0.06670077
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.06670077 and C = 16.

The optimal SVM mdoel has a \(\sigma\) of 0.07 and an C of 16.

     RMSE  Rsquared       MAE 
2.0829707 0.8242096 1.5826017 
loess r-squared variable importance

     Overall
X4  100.0000
X1   95.5047
X2   89.6186
X5   45.2170
X3   29.9330
X9    6.3299
X10   5.5182
X8    3.2527
X6    0.8884
X7    0.0000

The SVM picked up the important varaibles.

Neural Network Model

Model Averaged Neural Network 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  decay  size  RMSE      Rsquared   MAE     
  0.00    1    2.434946  0.7705241  1.890180
  0.00    2    2.464028  0.7636460  1.954264
  0.00    3    1.952687  0.8503745  1.569414
  0.00    4    1.993307  0.8353695  1.564593
  0.00    5    2.163580  0.8268061  1.687171
  0.00    6         NaN        NaN       NaN
  0.00    7         NaN        NaN       NaN
  0.00    8         NaN        NaN       NaN
  0.00    9         NaN        NaN       NaN
  0.00   10         NaN        NaN       NaN
  0.01    1    2.430323  0.7698516  1.889068
  0.01    2    2.443860  0.7685031  1.905649
  0.01    3    2.128012  0.8220171  1.660445
  0.01    4    2.087196  0.8291606  1.598100
  0.01    5    2.033927  0.8350220  1.596082
  0.01    6         NaN        NaN       NaN
  0.01    7         NaN        NaN       NaN
  0.01    8         NaN        NaN       NaN
  0.01    9         NaN        NaN       NaN
  0.01   10         NaN        NaN       NaN
  0.10    1    2.437162  0.7674718  1.890319
  0.10    2    2.397189  0.7716524  1.858708
  0.10    3    2.108429  0.8240924  1.655143
  0.10    4    2.084801  0.8300933  1.582811
  0.10    5    2.078994  0.8321068  1.643078
  0.10    6         NaN        NaN       NaN
  0.10    7         NaN        NaN       NaN
  0.10    8         NaN        NaN       NaN
  0.10    9         NaN        NaN       NaN
  0.10   10         NaN        NaN       NaN

Tuning parameter 'bag' was held constant at a value of FALSE
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 3, decay = 0 and bag = FALSE.

The best neural network has a size = 3 and a decay of 0.

     RMSE  Rsquared       MAE 
1.8234934 0.8683951 1.3971626 
loess r-squared variable importance

     Overall
X4  100.0000
X1   95.5047
X2   89.6186
X5   45.2170
X3   29.9330
X9    6.3299
X10   5.5182
X8    3.2527
X6    0.8884
X7    0.0000

The top 5 variables are the ones we want to see listed.

Question 7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and traing several nonlinear regression models.

I will run the data through the models in the chapter trying to keep each model as close to the original linear model in terms of the parameters.

Partial Least Squares 

144 samples
 56 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 130, 129, 128, 129, 130, 129, ... 
Resampling results across tuning parameters:

  ncomp  RMSE       Rsquared   MAE      
   1     0.8824790  0.3779221  0.6711462
   2     1.1458456  0.4219806  0.7086431
   3     0.7363066  0.5244517  0.5688553
   4     0.8235294  0.5298005  0.5933120
   5     0.9670735  0.4846010  0.6371199
   6     0.9959036  0.4776684  0.6427478
   7     0.9119517  0.4986338  0.6200233
   8     0.9068621  0.5012144  0.6293371
   9     0.8517370  0.5220166  0.6163795
  10     0.8919356  0.5062912  0.6332243
  11     0.9173758  0.4934557  0.6463164
  12     0.9064125  0.4791526  0.6485663
  13     0.9255289  0.4542181  0.6620193
  14     1.0239913  0.4358371  0.6944056
  15     1.0754710  0.4365214  0.7077991
  16     1.1110579  0.4269065  0.7135684
  17     1.1492855  0.4210485  0.7222868
  18     1.1940639  0.4132534  0.7396357
  19     1.2271867  0.4079005  0.7494818
  20     1.2077102  0.4022859  0.7470327
  21     1.2082648  0.4026711  0.7452969
  22     1.2669285  0.3987044  0.7634170
  23     1.3663033  0.3970188  0.7957514
  24     1.4531634  0.3898475  0.8243034
  25     1.5624265  0.3820102  0.8612935

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 3.

Part A

Which nonlinear regression model give the optimal resampling and test set performance?

KNN Model

k-Nearest Neighbors 

144 samples
 56 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 130, 129, 130, 130, 130, 131, ... 
Resampling results across tuning parameters:

  k   RMSE       Rsquared   MAE      
   5  0.7085987  0.4846711  0.5709345
   7  0.7331379  0.4559107  0.5981516
   9  0.7417741  0.4556845  0.6124459
  11  0.7505969  0.4526780  0.6277377
  13  0.7450412  0.4691830  0.6181457
  15  0.7539451  0.4604423  0.6246762
  17  0.7663763  0.4474991  0.6306860
  19  0.7694326  0.4480450  0.6315047
  21  0.7706841  0.4508441  0.6288689
  23  0.7780866  0.4419184  0.6350889
  25  0.7834333  0.4320541  0.6382165
  27  0.7877890  0.4339581  0.6454522
  29  0.7981302  0.4187796  0.6533767
  31  0.8060716  0.3977540  0.6610186
  33  0.8063566  0.4128701  0.6600828
  35  0.8144040  0.3964037  0.6671734
  37  0.8160580  0.4009313  0.6662999
  39  0.8220176  0.3884786  0.6693883
  41  0.8274778  0.3750873  0.6746765
  43  0.8288000  0.3820179  0.6758220
  45  0.8285016  0.3878549  0.6747216
  47  0.8331504  0.3808342  0.6774480
  49  0.8368096  0.3816669  0.6829726
  51  0.8391074  0.3804316  0.6857959
  53  0.8442763  0.3704957  0.6915276

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.

MARS Model

Multivariate Adaptive Regression Spline 

144 samples
 56 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 129, 130, 130, 128, 130, 128, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE       Rsquared   MAE      
  1        2      0.7744594  0.4159942  0.6084363
  1        3      0.7003597  0.5682276  0.5530374
  1        4      0.6469215  0.6086750  0.5154043
  1        5      0.6404520  0.6073071  0.5042864
  1        6      0.6960964  0.5396157  0.5588565
  1        7      0.6997921  0.5521698  0.5669813
  1        8      0.7147341  0.5423472  0.5621022
  1        9      0.7173559  0.5508650  0.5641610
  1       10      0.7487512  0.5325936  0.5869088
  1       11      0.7415722  0.5353676  0.5847493
  1       12      0.7353149  0.5519142  0.5768069
  1       13      0.7450704  0.5357967  0.5887685
  1       14      0.7419274  0.5400866  0.5844385
  1       15      0.7262459  0.5553445  0.5775694
  2        2      0.7814439  0.4052275  0.6150579
  2        3      0.6961165  0.5526862  0.5636101
  2        4      0.6983163  0.5544842  0.5575337
  2        5      0.6730604  0.5897508  0.5262153
  2        6      0.6707267  0.5848137  0.5147009
  2        7      0.6592644  0.5959026  0.5199324
  2        8      0.6170179  0.6400977  0.4808173
  2        9      0.5848818  0.6718214  0.4591934
  2       10      0.5844613  0.6757433  0.4556203
  2       11      0.5868888  0.6684397  0.4622055
  2       12      0.5804160  0.6762717  0.4550110
  2       13      0.5822871  0.6910726  0.4475039
  2       14      0.5914308  0.6855774  0.4499391
  2       15      0.5796279  0.6970556  0.4450954

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 15 and degree = 2.

SVM Model

Support Vector Machines with Radial Basis Function Kernel 

144 samples
 56 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 129, 129, 129, 130, 131, 131, ... 
Resampling results across tuning parameters:

  C           RMSE       Rsquared   MAE      
        0.25  0.7683015  0.4646282  0.6392703
        0.50  0.7174781  0.4848381  0.5965095
        1.00  0.6686679  0.5330949  0.5530530
        2.00  0.6409061  0.5552794  0.5275947
        4.00  0.6356640  0.5619233  0.5166512
        8.00  0.6286151  0.5761807  0.5115837
       16.00  0.6287131  0.5760690  0.5116965
       32.00  0.6287131  0.5760690  0.5116965
       64.00  0.6287131  0.5760690  0.5116965
      128.00  0.6287131  0.5760690  0.5116965
      256.00  0.6287131  0.5760690  0.5116965
      512.00  0.6287131  0.5760690  0.5116965
     1024.00  0.6287131  0.5760690  0.5116965
     2048.00  0.6287131  0.5760690  0.5116965
     4096.00  0.6287131  0.5760690  0.5116965
     8192.00  0.6287131  0.5760690  0.5116965
    16384.00  0.6287131  0.5760690  0.5116965
    32768.00  0.6287131  0.5760690  0.5116965
    65536.00  0.6287131  0.5760690  0.5116965
   131072.00  0.6287131  0.5760690  0.5116965
   262144.00  0.6287131  0.5760690  0.5116965
   524288.00  0.6287131  0.5760690  0.5116965
  1048576.00  0.6287131  0.5760690  0.5116965
  2097152.00  0.6287131  0.5760690  0.5116965
  4194304.00  0.6287131  0.5760690  0.5116965

Tuning parameter 'sigma' was held constant at a value of 0.015148
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.015148 and C = 8.

Neural Network Model

Model Averaged Neural Network 

144 samples
 56 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 131, 129, 130, 128, 130, 129, ... 
Resampling results across tuning parameters:

  decay  size  RMSE       Rsquared   MAE      
  0.00    1    0.7910248  0.4228412  0.6130383
  0.00    2    0.9218862  0.4080614  0.7428019
  0.00    3    0.7740539  0.5185395  0.6172976
  0.00    4    0.7024795  0.5648305  0.5597466
  0.00    5    0.6390399  0.6194867  0.5144565
  0.00    6          NaN        NaN        NaN
  0.00    7          NaN        NaN        NaN
  0.00    8          NaN        NaN        NaN
  0.00    9          NaN        NaN        NaN
  0.00   10          NaN        NaN        NaN
  0.01    1    0.7696626  0.4978424  0.6083146
  0.01    2    0.7584432  0.5120800  0.6254292
  0.01    3    0.7995990  0.5296820  0.6455941
  0.01    4    0.6883856  0.5995827  0.5517506
  0.01    5    0.6843272  0.5975191  0.5467633
  0.01    6          NaN        NaN        NaN
  0.01    7          NaN        NaN        NaN
  0.01    8          NaN        NaN        NaN
  0.01    9          NaN        NaN        NaN
  0.01   10          NaN        NaN        NaN
  0.10    1    0.7121318  0.5409956  0.5882917
  0.10    2    0.6924135  0.5545770  0.5497616
  0.10    3    0.6739546  0.6037860  0.5412176
  0.10    4    0.6559497  0.6294231  0.5376020
  0.10    5    0.6453881  0.6413499  0.5201350
  0.10    6          NaN        NaN        NaN
  0.10    7          NaN        NaN        NaN
  0.10    8          NaN        NaN        NaN
  0.10    9          NaN        NaN        NaN
  0.10   10          NaN        NaN        NaN

Tuning parameter 'bag' was held constant at a value of FALSE
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 5, decay = 0 and bag = FALSE.

Summary

Model RMSE Rsquared MAE
PLS 0.6192577 0.6771122 0.5059984
SVM 0.6815833 0.6251941 0.4926105
KNN 0.7149819 0.5894607 0.5047973
Neural Network 0.7197787 0.5648347 0.5760858
MARS 0.7694337 0.5378346 0.5809592

The SVM Model was the best non-linear model.

Part B

Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

loess r-squared variable importance

  only 20 most important variables shown (out of 56)

                       Overall
ManufacturingProcess32  100.00
ManufacturingProcess13   93.82
ManufacturingProcess09   89.93
ManufacturingProcess17   88.20
BiologicalMaterial06     82.61
BiologicalMaterial03     79.44
ManufacturingProcess36   73.85
BiologicalMaterial12     72.36
ManufacturingProcess06   69.00
ManufacturingProcess11   62.34
ManufacturingProcess31   56.39
BiologicalMaterial02     50.34
BiologicalMaterial11     48.53
BiologicalMaterial09     44.76
ManufacturingProcess30   41.87
BiologicalMaterial08     40.24
ManufacturingProcess29   38.54
ManufacturingProcess33   38.16
BiologicalMaterial04     36.92
ManufacturingProcess25   36.83
pls variable importance

  only 20 most important variables shown (out of 56)

                       Overall
ManufacturingProcess32  100.00
ManufacturingProcess09   88.04
ManufacturingProcess36   82.20
ManufacturingProcess13   82.11
ManufacturingProcess17   80.25
ManufacturingProcess06   59.06
ManufacturingProcess11   55.93
BiologicalMaterial02     55.46
BiologicalMaterial06     54.64
BiologicalMaterial03     54.50
ManufacturingProcess33   53.91
ManufacturingProcess12   52.04
BiologicalMaterial08     49.76
BiologicalMaterial12     47.40
ManufacturingProcess34   45.47
BiologicalMaterial11     45.05
BiologicalMaterial01     44.18
BiologicalMaterial04     42.95
ManufacturingProcess04   39.94
ManufacturingProcess28   36.61

The SVM model was very similar to the PLS model. In either case the manufacuturing process variables dominate the list. The SVM model found BiologicalMaterial12 to be important and didn’t find BiologicalMaterial02 to be important.

Part C

Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

This indicates a positive relationship between the BiologicalMaterial02 and the yeild, although it seems that there is a sweet spot, or a point of diminishing marginal returns on the material as the highest yeilds are not the furthest to the right on the graph.