Exercise 7.2

Friedman (1991) introduced several benchmark data sets created by simulation. On of these simulations used the following nonlinear equations to create data:

y=10sin(πx1x2)+20(x3−0.5)2+10x4+5x5+N(0,σ2)

where the x values are random variables uniformly distributed between [0,1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

Create Simulated Training Set

View the Data with Skim and Plot with Feature Plot

Data summary
Name trainingData
Number of rows 200
Number of columns 11
_______________________
Column type frequency:
numeric 11
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
x.X1 0 1 0.51 0.29 0.00 0.25 0.51 0.75 1.00 ▇▇▇▇▇
x.X2 0 1 0.56 0.29 0.01 0.32 0.56 0.80 0.99 ▅▆▆▇▇
x.X3 0 1 0.50 0.28 0.00 0.26 0.54 0.75 1.00 ▇▇▇▇▇
x.X4 0 1 0.50 0.30 0.00 0.21 0.51 0.75 0.99 ▇▅▇▆▇
x.X5 0 1 0.52 0.29 0.00 0.28 0.52 0.78 0.99 ▆▆▇▆▇
x.X6 0 1 0.51 0.29 0.00 0.25 0.53 0.76 1.00 ▇▆▇▇▇
x.X7 0 1 0.51 0.28 0.00 0.31 0.51 0.76 0.99 ▆▆▇▆▇
x.X8 0 1 0.53 0.30 0.01 0.26 0.57 0.80 1.00 ▆▅▆▆▇
x.X9 0 1 0.50 0.28 0.00 0.26 0.49 0.72 1.00 ▆▆▇▆▆
x.X10 0 1 0.55 0.28 0.01 0.32 0.57 0.78 0.99 ▅▆▇▇▇
y 0 1 14.73 5.06 0.80 11.04 14.90 18.49 26.72 ▁▅▇▇▂

KNN Model

## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.726109  0.4646980  3.014146
##    7  3.609130  0.4983845  2.916210
##    9  3.573850  0.5157660  2.876988
##   11  3.557243  0.5313498  2.865519
##   13  3.538419  0.5485584  2.862030
##   15  3.527742  0.5645340  2.849736
##   17  3.540169  0.5725169  2.865204
##   19  3.530327  0.5863954  2.851141
##   21  3.519411  0.6000860  2.842288
##   23  3.522842  0.6084923  2.844349
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
##      RMSE  Rsquared       MAE 
## 3.3454465 0.6725733 2.7038661

MARS Model

## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.164892  0.3611916  3.409897
##   1        3      3.660936  0.5062227  2.959827
##   1        4      3.244858  0.6187136  2.631157
##   1        5      2.805473  0.7142806  2.242194
##   1        6      2.505767  0.7703123  1.976545
##   1        7      2.467969  0.7775514  1.945278
##   1        8      2.215528  0.8214284  1.761183
##   1        9      2.140487  0.8342432  1.707313
##   1       10      2.158827  0.8315906  1.725630
##   1       11      2.182385  0.8294915  1.732187
##   1       12      2.221737  0.8232456  1.758400
##   1       13      2.245196  0.8199335  1.777172
##   1       14      2.252059  0.8192026  1.790685
##   1       15      2.267975  0.8178146  1.805161
##   2        2      4.208856  0.3476404  3.451024
##   2        3      3.736080  0.4869258  2.995870
##   2        4      3.284380  0.6084662  2.653734
##   2        5      2.852638  0.7034535  2.294502
##   2        6      2.514765  0.7672643  1.972002
##   2        7      2.281683  0.8090696  1.795828
##   2        8      2.093803  0.8393584  1.663283
##   2        9      1.824083  0.8783372  1.467502
##   2       10      1.588086  0.9080874  1.267878
##   2       11      1.461354  0.9219269  1.174227
##   2       12      1.423772  0.9260926  1.132255
##   2       13      1.434791  0.9245093  1.128864
##   2       14      1.431186  0.9249019  1.124739
##   2       15      1.450474  0.9230373  1.140071
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 12 and degree = 2.
##      RMSE  Rsquared       MAE 
## 1.1890158 0.9434200 0.9512591
## earth variable importance
## 
##    Overall
## X4  100.00
## X1   77.46
## X5   59.43
## X2   39.88
## X3    0.00

SVM Model

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE      Rsquared   MAE     
##     0.25  2.869165  0.7532017  2.312402
##     0.50  2.585576  0.7823054  2.055335
##     1.00  2.409369  0.8062289  1.894586
##     2.00  2.321492  0.8173656  1.854516
##     4.00  2.205788  0.8351372  1.774367
##     8.00  2.152113  0.8422696  1.735025
##    16.00  2.151148  0.8391198  1.728622
##    32.00  2.151463  0.8390591  1.728927
##    64.00  2.151463  0.8390591  1.728927
##   128.00  2.151463  0.8390591  1.728927
## 
## Tuning parameter 'sigma' was held constant at a value of 0.05916194
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05916194 and C = 16.
##     RMSE Rsquared      MAE 
## 1.936793 0.849370 1.513800
## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   44.0360
## X5   33.8046
## X2   27.0904
## X3   18.6716
## X10   3.9053
## X7    1.3407
## X9    1.0309
## X8    0.6161
## X6    0.0000

Neural Network Model

## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.765430  0.7214950  2.222278
##   0.00    2    2.454579  0.7755543  1.958788
##   0.00    3    2.345981  0.7944327  1.878345
##   0.00    4    2.384023  0.7976148  1.825766
##   0.00    5    2.571703  0.7713210  2.013408
##   0.00    6         NaN        NaN       NaN
##   0.00    7         NaN        NaN       NaN
##   0.00    8         NaN        NaN       NaN
##   0.00    9         NaN        NaN       NaN
##   0.00   10         NaN        NaN       NaN
##   0.01    1    2.739902  0.7233210  2.182945
##   0.01    2    2.502228  0.7644033  1.992281
##   0.01    3    2.327996  0.7978076  1.839162
##   0.01    4    2.313584  0.7936864  1.902121
##   0.01    5    2.399065  0.7850531  1.919526
##   0.01    6         NaN        NaN       NaN
##   0.01    7         NaN        NaN       NaN
##   0.01    8         NaN        NaN       NaN
##   0.01    9         NaN        NaN       NaN
##   0.01   10         NaN        NaN       NaN
##   0.10    1    2.740531  0.7201455  2.160466
##   0.10    2    2.542600  0.7600431  2.034750
##   0.10    3    2.447342  0.7772923  1.961562
##   0.10    4    2.281908  0.8054697  1.865467
##   0.10    5    2.333053  0.8006064  1.930540
##   0.10    6         NaN        NaN       NaN
##   0.10    7         NaN        NaN       NaN
##   0.10    8         NaN        NaN       NaN
##   0.10    9         NaN        NaN       NaN
##   0.10   10         NaN        NaN       NaN
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 4, decay = 0.1 and bag = FALSE.
##      RMSE  Rsquared       MAE 
## 2.1282054 0.8177093 1.6732538
## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   44.0360
## X5   33.8046
## X2   27.0904
## X3   18.6716
## X10   3.9053
## X7    1.3407
## X9    1.0309
## X8    0.6161
## X6    0.0000

Findings

The MARS model is the clear winner when it come to peformance. It enjoys the best metrics across RMSE, RSquared and MAE compared to the other models. This likely results from the pruning that takes place with MARs, as you can see MARs has in fact selected all of the “informative” predictors.

RMSE Rsquared MAE Model
2.128205 0.8177093 1.6732538 Neural Network
1.936793 0.8493700 1.5138001 SVM
1.189016 0.9434200 0.9512591 MARS
3.345447 0.6725733 2.7038661 KNN

Exercise 7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and traing several nonlinear regression models.

(a) Which nonlinear regression model give the optimal resampling and test set performance?

KNN

## k-Nearest Neighbors 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 130, 129, 128, 129, 130, 129, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.7017857  0.4962847  0.5606515
##    7  0.7236139  0.4746428  0.5969490
##    9  0.7300591  0.4668491  0.6013968
##   11  0.7460436  0.4490707  0.6177046
##   13  0.7485067  0.4442744  0.6213027
##   15  0.7586368  0.4291603  0.6258871
##   17  0.7624259  0.4375912  0.6294124
##   19  0.7627696  0.4449437  0.6254787
##   21  0.7635900  0.4490280  0.6242544
##   23  0.7732659  0.4412447  0.6310587
##   25  0.7730124  0.4461752  0.6353608
##   27  0.7896268  0.4196195  0.6475769
##   29  0.7988773  0.4028687  0.6572028
##   31  0.8024417  0.4113070  0.6624965
##   33  0.8075687  0.4045510  0.6647987
##   35  0.8120014  0.4029270  0.6670706
##   37  0.8154453  0.3950009  0.6669333
##   39  0.8200178  0.3964573  0.6710715
##   41  0.8227606  0.3978416  0.6722223
##   43  0.8253769  0.3947873  0.6756019
##   45  0.8279943  0.3936810  0.6783199
##   47  0.8268730  0.4025667  0.6780751
##   49  0.8340750  0.3925289  0.6843632
##   51  0.8364038  0.3974095  0.6845187
##   53  0.8378127  0.3955672  0.6875498
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

MARS

## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 129, 129, 130, 132, 130, 130, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.7722023  0.3941996  0.5999321
##   1        3      0.6627211  0.5546441  0.5313042
##   1        4      0.6425518  0.5768163  0.5156998
##   1        5      0.6585956  0.5596575  0.5180076
##   1        6      0.6627592  0.5524007  0.5198269
##   1        7      0.6604789  0.5589514  0.5153873
##   1        8      0.6543415  0.5634510  0.5078212
##   1        9      0.6604364  0.5567869  0.5187573
##   1       10      0.6650075  0.5440094  0.5148558
##   1       11      0.6628266  0.5524088  0.5212075
##   1       12      0.6746918  0.5472778  0.5276799
##   1       13      0.6583636  0.5646934  0.5106820
##   1       14      0.6608923  0.5579993  0.5158548
##   1       15      0.6629204  0.5566854  0.5176025
##   2        2      0.7617132  0.4083614  0.5915262
##   2        3      0.6801367  0.5184663  0.5504259
##   2        4      0.6298407  0.5873378  0.5176954
##   2        5      0.6249017  0.5959341  0.5122018
##   2        6      0.6007549  0.6286730  0.4823449
##   2        7      0.5634810  0.6679785  0.4605940
##   2        8      0.5510920  0.6848203  0.4448332
##   2        9      0.5560838  0.6808989  0.4394491
##   2       10      0.5708689  0.6600874  0.4609790
##   2       11      0.5605395  0.6742829  0.4541922
##   2       12      0.5693831  0.6739141  0.4619783
##   2       13      0.5673132  0.6724538  0.4510969
##   2       14      0.5665736  0.6723828  0.4533791
##   2       15      0.5905837  0.6500161  0.4711157
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 8 and degree = 2.

SVM

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 128, 130, 130, 130, 130, 128, ... 
## Resampling results across tuning parameters:
## 
##   C           RMSE       Rsquared   MAE      
##         0.25  0.7637178  0.4612331  0.6304200
##         0.50  0.7138185  0.4953123  0.5846336
##         1.00  0.6639811  0.5514336  0.5374400
##         2.00  0.6315988  0.5856940  0.5107699
##         4.00  0.6294062  0.5859009  0.5046017
##         8.00  0.6087599  0.6065785  0.4889208
##        16.00  0.6084278  0.6069749  0.4888434
##        32.00  0.6084278  0.6069749  0.4888434
##        64.00  0.6084278  0.6069749  0.4888434
##       128.00  0.6084278  0.6069749  0.4888434
##       256.00  0.6084278  0.6069749  0.4888434
##       512.00  0.6084278  0.6069749  0.4888434
##      1024.00  0.6084278  0.6069749  0.4888434
##      2048.00  0.6084278  0.6069749  0.4888434
##      4096.00  0.6084278  0.6069749  0.4888434
##      8192.00  0.6084278  0.6069749  0.4888434
##     16384.00  0.6084278  0.6069749  0.4888434
##     32768.00  0.6084278  0.6069749  0.4888434
##     65536.00  0.6084278  0.6069749  0.4888434
##    131072.00  0.6084278  0.6069749  0.4888434
##    262144.00  0.6084278  0.6069749  0.4888434
##    524288.00  0.6084278  0.6069749  0.4888434
##   1048576.00  0.6084278  0.6069749  0.4888434
##   2097152.00  0.6084278  0.6069749  0.4888434
##   4194304.00  0.6084278  0.6069749  0.4888434
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01436322
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01436322 and C = 16.

Neural Net

## Model Averaged Neural Network 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 131, 129, 129, 129, 129, 131, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.8550454  0.3905974  0.6736735
##   0.00    2    0.8397700  0.4677405  0.6781118
##   0.00    3    1.5264613  0.4818655  1.1051510
##   0.00    4    0.7159878  0.5603617  0.5629924
##   0.00    5    0.7164835  0.5410727  0.5823812
##   0.00    6          NaN        NaN        NaN
##   0.00    7          NaN        NaN        NaN
##   0.00    8          NaN        NaN        NaN
##   0.00    9          NaN        NaN        NaN
##   0.00   10          NaN        NaN        NaN
##   0.01    1    0.7755530  0.5041892  0.6271978
##   0.01    2    0.8231924  0.4582981  0.6547267
##   0.01    3    0.7477910  0.5072772  0.5946967
##   0.01    4    0.6783638  0.6022529  0.5179515
##   0.01    5    0.6213627  0.6623922  0.4979006
##   0.01    6          NaN        NaN        NaN
##   0.01    7          NaN        NaN        NaN
##   0.01    8          NaN        NaN        NaN
##   0.01    9          NaN        NaN        NaN
##   0.01   10          NaN        NaN        NaN
##   0.10    1    0.7522723  0.5255260  0.6046651
##   0.10    2    0.7397727  0.5402556  0.5802931
##   0.10    3    0.6544117  0.6037688  0.5332527
##   0.10    4    0.6473491  0.6268090  0.5231965
##   0.10    5    0.6113875  0.6511967  0.4892838
##   0.10    6          NaN        NaN        NaN
##   0.10    7          NaN        NaN        NaN
##   0.10    8          NaN        NaN        NaN
##   0.10    9          NaN        NaN        NaN
##   0.10   10          NaN        NaN        NaN
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.1 and bag = FALSE.

Findings

It appears that the neural net and SVM model performed better than the other models on an RMSE basis. However, the MARs and SVM models had high Rsqaured metrics. Overall the SVM may be the best model to go with, as it enjoys relatively strong metrics across the various metrics(RMSE, Rsqared and MAE)

Model RMSE Rsquared MAE
Neural Network 0.6190814 0.6740072 0.4705560
SVM 0.6808183 0.6244615 0.4939512
KNN 0.7149819 0.5894607 0.5047973
MARS 0.7199111 0.5920601 0.5651110
MARS 1.1890158 0.9434200 0.9512591
SVM 1.9367928 0.8493700 1.5138001
Neural Network 2.1282054 0.8177093 1.6732538
KNN 3.3454465 0.6725733 2.7038661

(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

The most important predictors are MP32, MP13, MP09, MP17 and BM06. The Manufacturing Processes dominate list both in terms of numbers and prominence -four of the top five predictors are Manufacturing Processes. Compared to the optimal linear model from last week, both the linear and non linear selected M32 as the most important variable. After that, however, there was only one other variable in the top 10 for both linear and non-linear - M09.

## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   93.82
## ManufacturingProcess09   89.93
## ManufacturingProcess17   88.20
## BiologicalMaterial06     82.61
## BiologicalMaterial03     79.44
## ManufacturingProcess36   73.85
## BiologicalMaterial12     72.36
## ManufacturingProcess06   69.00
## ManufacturingProcess11   62.34
## ManufacturingProcess31   56.39
## BiologicalMaterial02     50.34
## BiologicalMaterial11     48.53
## BiologicalMaterial09     44.76
## ManufacturingProcess30   41.87
## BiologicalMaterial08     40.24
## ManufacturingProcess29   38.54
## ManufacturingProcess33   38.16
## BiologicalMaterial04     36.92
## ManufacturingProcess25   36.83

(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

Below, I have plotted the top five non-linear predictors that were not included in the list of top linear predictors. For some of the predictors we can easily see a positive of negative relationship with the response variable. For others (M36 and M12) there is no disernible relationship. I believe the predictors with no disernible reflect the transformation to high orders that take place with the SVM regression.