Exercise 7.2
Friedman (1991) introduced several benchmark data sets created by simulation. On of these simulations used the following nonlinear equations to create data:
y=10sin(πx1x2)+20(x3−0.5)2+10x4+5x5+N(0,σ2)
where the x values are random variables uniformly distributed between [0,1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:
Create Simulated Training Set
View the Data with Skim and Plot with Feature Plot
Data summary
| Name |
trainingData |
| Number of rows |
200 |
| Number of columns |
11 |
| _______________________ |
|
| Column type frequency: |
|
| numeric |
11 |
| ________________________ |
|
| Group variables |
None |
Variable type: numeric
| x.X1 |
0 |
1 |
0.51 |
0.29 |
0.00 |
0.25 |
0.51 |
0.75 |
1.00 |
▇▇▇▇▇ |
| x.X2 |
0 |
1 |
0.56 |
0.29 |
0.01 |
0.32 |
0.56 |
0.80 |
0.99 |
▅▆▆▇▇ |
| x.X3 |
0 |
1 |
0.50 |
0.28 |
0.00 |
0.26 |
0.54 |
0.75 |
1.00 |
▇▇▇▇▇ |
| x.X4 |
0 |
1 |
0.50 |
0.30 |
0.00 |
0.21 |
0.51 |
0.75 |
0.99 |
▇▅▇▆▇ |
| x.X5 |
0 |
1 |
0.52 |
0.29 |
0.00 |
0.28 |
0.52 |
0.78 |
0.99 |
▆▆▇▆▇ |
| x.X6 |
0 |
1 |
0.51 |
0.29 |
0.00 |
0.25 |
0.53 |
0.76 |
1.00 |
▇▆▇▇▇ |
| x.X7 |
0 |
1 |
0.51 |
0.28 |
0.00 |
0.31 |
0.51 |
0.76 |
0.99 |
▆▆▇▆▇ |
| x.X8 |
0 |
1 |
0.53 |
0.30 |
0.01 |
0.26 |
0.57 |
0.80 |
1.00 |
▆▅▆▆▇ |
| x.X9 |
0 |
1 |
0.50 |
0.28 |
0.00 |
0.26 |
0.49 |
0.72 |
1.00 |
▆▆▇▆▆ |
| x.X10 |
0 |
1 |
0.55 |
0.28 |
0.01 |
0.32 |
0.57 |
0.78 |
0.99 |
▅▆▇▇▇ |
| y |
0 |
1 |
14.73 |
5.06 |
0.80 |
11.04 |
14.90 |
18.49 |
26.72 |
▁▅▇▇▂ |

The skim chart above provides a mini-histogram of the response variable Y the plot below


KNN Model
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 3.726109 0.4646980 3.014146
## 7 3.609130 0.4983845 2.916210
## 9 3.573850 0.5157660 2.876988
## 11 3.557243 0.5313498 2.865519
## 13 3.538419 0.5485584 2.862030
## 15 3.527742 0.5645340 2.849736
## 17 3.540169 0.5725169 2.865204
## 19 3.530327 0.5863954 2.851141
## 21 3.519411 0.6000860 2.842288
## 23 3.522842 0.6084923 2.844349
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 21.
## RMSE Rsquared MAE
## 3.3454465 0.6725733 2.7038661
MARS Model
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 4.164892 0.3611916 3.409897
## 1 3 3.660936 0.5062227 2.959827
## 1 4 3.244858 0.6187136 2.631157
## 1 5 2.805473 0.7142806 2.242194
## 1 6 2.505767 0.7703123 1.976545
## 1 7 2.467969 0.7775514 1.945278
## 1 8 2.215528 0.8214284 1.761183
## 1 9 2.140487 0.8342432 1.707313
## 1 10 2.158827 0.8315906 1.725630
## 1 11 2.182385 0.8294915 1.732187
## 1 12 2.221737 0.8232456 1.758400
## 1 13 2.245196 0.8199335 1.777172
## 1 14 2.252059 0.8192026 1.790685
## 1 15 2.267975 0.8178146 1.805161
## 2 2 4.208856 0.3476404 3.451024
## 2 3 3.736080 0.4869258 2.995870
## 2 4 3.284380 0.6084662 2.653734
## 2 5 2.852638 0.7034535 2.294502
## 2 6 2.514765 0.7672643 1.972002
## 2 7 2.281683 0.8090696 1.795828
## 2 8 2.093803 0.8393584 1.663283
## 2 9 1.824083 0.8783372 1.467502
## 2 10 1.588086 0.9080874 1.267878
## 2 11 1.461354 0.9219269 1.174227
## 2 12 1.423772 0.9260926 1.132255
## 2 13 1.434791 0.9245093 1.128864
## 2 14 1.431186 0.9249019 1.124739
## 2 15 1.450474 0.9230373 1.140071
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 12 and degree = 2.
## RMSE Rsquared MAE
## 1.1890158 0.9434200 0.9512591
## earth variable importance
##
## Overall
## X4 100.00
## X1 77.46
## X5 59.43
## X2 39.88
## X3 0.00
SVM Model
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.869165 0.7532017 2.312402
## 0.50 2.585576 0.7823054 2.055335
## 1.00 2.409369 0.8062289 1.894586
## 2.00 2.321492 0.8173656 1.854516
## 4.00 2.205788 0.8351372 1.774367
## 8.00 2.152113 0.8422696 1.735025
## 16.00 2.151148 0.8391198 1.728622
## 32.00 2.151463 0.8390591 1.728927
## 64.00 2.151463 0.8390591 1.728927
## 128.00 2.151463 0.8390591 1.728927
##
## Tuning parameter 'sigma' was held constant at a value of 0.05916194
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05916194 and C = 16.
## RMSE Rsquared MAE
## 1.936793 0.849370 1.513800
## loess r-squared variable importance
##
## Overall
## X4 100.0000
## X1 44.0360
## X5 33.8046
## X2 27.0904
## X3 18.6716
## X10 3.9053
## X7 1.3407
## X9 1.0309
## X8 0.6161
## X6 0.0000
Neural Network Model
nnet_grid <- expand.grid(.decay = c(0, 0.01, .1), .size = c(1:10), .bag = FALSE)
nnet_maxnwts <- 5 * (ncol(trainingData$x) + 1) + 5 + 1
nnet_model <- train(x = trainingData$x,
y = trainingData$y,
method = "avNNet",
preProcess = c("center", "scale"),
tuneGrid = nnet_grid,
trControl = trainControl(method = "cv"),
linout = TRUE,
trace = FALSE,
MaxNWts = nnet_maxnwts,
maxit = 500)
nnet_model
## Model Averaged Neural Network
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 2.765430 0.7214950 2.222278
## 0.00 2 2.454579 0.7755543 1.958788
## 0.00 3 2.345981 0.7944327 1.878345
## 0.00 4 2.384023 0.7976148 1.825766
## 0.00 5 2.571703 0.7713210 2.013408
## 0.00 6 NaN NaN NaN
## 0.00 7 NaN NaN NaN
## 0.00 8 NaN NaN NaN
## 0.00 9 NaN NaN NaN
## 0.00 10 NaN NaN NaN
## 0.01 1 2.739902 0.7233210 2.182945
## 0.01 2 2.502228 0.7644033 1.992281
## 0.01 3 2.327996 0.7978076 1.839162
## 0.01 4 2.313584 0.7936864 1.902121
## 0.01 5 2.399065 0.7850531 1.919526
## 0.01 6 NaN NaN NaN
## 0.01 7 NaN NaN NaN
## 0.01 8 NaN NaN NaN
## 0.01 9 NaN NaN NaN
## 0.01 10 NaN NaN NaN
## 0.10 1 2.740531 0.7201455 2.160466
## 0.10 2 2.542600 0.7600431 2.034750
## 0.10 3 2.447342 0.7772923 1.961562
## 0.10 4 2.281908 0.8054697 1.865467
## 0.10 5 2.333053 0.8006064 1.930540
## 0.10 6 NaN NaN NaN
## 0.10 7 NaN NaN NaN
## 0.10 8 NaN NaN NaN
## 0.10 9 NaN NaN NaN
## 0.10 10 NaN NaN NaN
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 4, decay = 0.1 and bag = FALSE.
## RMSE Rsquared MAE
## 2.1282054 0.8177093 1.6732538
## loess r-squared variable importance
##
## Overall
## X4 100.0000
## X1 44.0360
## X5 33.8046
## X2 27.0904
## X3 18.6716
## X10 3.9053
## X7 1.3407
## X9 1.0309
## X8 0.6161
## X6 0.0000
Exercise 7.5
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and traing several nonlinear regression models.
(a) Which nonlinear regression model give the optimal resampling and test set performance?
KNN
## k-Nearest Neighbors
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 130, 129, 128, 129, 130, 129, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 0.7017857 0.4962847 0.5606515
## 7 0.7236139 0.4746428 0.5969490
## 9 0.7300591 0.4668491 0.6013968
## 11 0.7460436 0.4490707 0.6177046
## 13 0.7485067 0.4442744 0.6213027
## 15 0.7586368 0.4291603 0.6258871
## 17 0.7624259 0.4375912 0.6294124
## 19 0.7627696 0.4449437 0.6254787
## 21 0.7635900 0.4490280 0.6242544
## 23 0.7732659 0.4412447 0.6310587
## 25 0.7730124 0.4461752 0.6353608
## 27 0.7896268 0.4196195 0.6475769
## 29 0.7988773 0.4028687 0.6572028
## 31 0.8024417 0.4113070 0.6624965
## 33 0.8075687 0.4045510 0.6647987
## 35 0.8120014 0.4029270 0.6670706
## 37 0.8154453 0.3950009 0.6669333
## 39 0.8200178 0.3964573 0.6710715
## 41 0.8227606 0.3978416 0.6722223
## 43 0.8253769 0.3947873 0.6756019
## 45 0.8279943 0.3936810 0.6783199
## 47 0.8268730 0.4025667 0.6780751
## 49 0.8340750 0.3925289 0.6843632
## 51 0.8364038 0.3974095 0.6845187
## 53 0.8378127 0.3955672 0.6875498
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
MARS
## Multivariate Adaptive Regression Spline
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 129, 129, 130, 132, 130, 130, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 0.7722023 0.3941996 0.5999321
## 1 3 0.6627211 0.5546441 0.5313042
## 1 4 0.6425518 0.5768163 0.5156998
## 1 5 0.6585956 0.5596575 0.5180076
## 1 6 0.6627592 0.5524007 0.5198269
## 1 7 0.6604789 0.5589514 0.5153873
## 1 8 0.6543415 0.5634510 0.5078212
## 1 9 0.6604364 0.5567869 0.5187573
## 1 10 0.6650075 0.5440094 0.5148558
## 1 11 0.6628266 0.5524088 0.5212075
## 1 12 0.6746918 0.5472778 0.5276799
## 1 13 0.6583636 0.5646934 0.5106820
## 1 14 0.6608923 0.5579993 0.5158548
## 1 15 0.6629204 0.5566854 0.5176025
## 2 2 0.7617132 0.4083614 0.5915262
## 2 3 0.6801367 0.5184663 0.5504259
## 2 4 0.6298407 0.5873378 0.5176954
## 2 5 0.6249017 0.5959341 0.5122018
## 2 6 0.6007549 0.6286730 0.4823449
## 2 7 0.5634810 0.6679785 0.4605940
## 2 8 0.5510920 0.6848203 0.4448332
## 2 9 0.5560838 0.6808989 0.4394491
## 2 10 0.5708689 0.6600874 0.4609790
## 2 11 0.5605395 0.6742829 0.4541922
## 2 12 0.5693831 0.6739141 0.4619783
## 2 13 0.5673132 0.6724538 0.4510969
## 2 14 0.5665736 0.6723828 0.4533791
## 2 15 0.5905837 0.6500161 0.4711157
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 8 and degree = 2.
SVM
## Support Vector Machines with Radial Basis Function Kernel
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 128, 130, 130, 130, 130, 128, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 0.7637178 0.4612331 0.6304200
## 0.50 0.7138185 0.4953123 0.5846336
## 1.00 0.6639811 0.5514336 0.5374400
## 2.00 0.6315988 0.5856940 0.5107699
## 4.00 0.6294062 0.5859009 0.5046017
## 8.00 0.6087599 0.6065785 0.4889208
## 16.00 0.6084278 0.6069749 0.4888434
## 32.00 0.6084278 0.6069749 0.4888434
## 64.00 0.6084278 0.6069749 0.4888434
## 128.00 0.6084278 0.6069749 0.4888434
## 256.00 0.6084278 0.6069749 0.4888434
## 512.00 0.6084278 0.6069749 0.4888434
## 1024.00 0.6084278 0.6069749 0.4888434
## 2048.00 0.6084278 0.6069749 0.4888434
## 4096.00 0.6084278 0.6069749 0.4888434
## 8192.00 0.6084278 0.6069749 0.4888434
## 16384.00 0.6084278 0.6069749 0.4888434
## 32768.00 0.6084278 0.6069749 0.4888434
## 65536.00 0.6084278 0.6069749 0.4888434
## 131072.00 0.6084278 0.6069749 0.4888434
## 262144.00 0.6084278 0.6069749 0.4888434
## 524288.00 0.6084278 0.6069749 0.4888434
## 1048576.00 0.6084278 0.6069749 0.4888434
## 2097152.00 0.6084278 0.6069749 0.4888434
## 4194304.00 0.6084278 0.6069749 0.4888434
##
## Tuning parameter 'sigma' was held constant at a value of 0.01436322
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01436322 and C = 16.
Neural Net
nnet_grid <- expand.grid(.decay = c(0, 0.01, .1), .size = c(1:10), .bag = FALSE)
nnet_maxnwts <- 5 * ncol(train_df) + 5 + 1
nnet_model <- train(
Yield ~ ., data = train_df, method = "avNNet",
center = TRUE,
scale = TRUE,
tuneGrid = nnet_grid,
trControl = trainControl(method = "cv"),
linout = TRUE,
trace = FALSE,
MaxNWts = nnet_maxnwts,
maxit = 500
)
nnet_model
## Model Averaged Neural Network
##
## 144 samples
## 56 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 131, 129, 129, 129, 129, 131, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 0.8550454 0.3905974 0.6736735
## 0.00 2 0.8397700 0.4677405 0.6781118
## 0.00 3 1.5264613 0.4818655 1.1051510
## 0.00 4 0.7159878 0.5603617 0.5629924
## 0.00 5 0.7164835 0.5410727 0.5823812
## 0.00 6 NaN NaN NaN
## 0.00 7 NaN NaN NaN
## 0.00 8 NaN NaN NaN
## 0.00 9 NaN NaN NaN
## 0.00 10 NaN NaN NaN
## 0.01 1 0.7755530 0.5041892 0.6271978
## 0.01 2 0.8231924 0.4582981 0.6547267
## 0.01 3 0.7477910 0.5072772 0.5946967
## 0.01 4 0.6783638 0.6022529 0.5179515
## 0.01 5 0.6213627 0.6623922 0.4979006
## 0.01 6 NaN NaN NaN
## 0.01 7 NaN NaN NaN
## 0.01 8 NaN NaN NaN
## 0.01 9 NaN NaN NaN
## 0.01 10 NaN NaN NaN
## 0.10 1 0.7522723 0.5255260 0.6046651
## 0.10 2 0.7397727 0.5402556 0.5802931
## 0.10 3 0.6544117 0.6037688 0.5332527
## 0.10 4 0.6473491 0.6268090 0.5231965
## 0.10 5 0.6113875 0.6511967 0.4892838
## 0.10 6 NaN NaN NaN
## 0.10 7 NaN NaN NaN
## 0.10 8 NaN NaN NaN
## 0.10 9 NaN NaN NaN
## 0.10 10 NaN NaN NaN
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.1 and bag = FALSE.
Findings
(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
The most important predictors are MP32, MP13, MP09, MP17 and BM06. The Manufacturing Processes dominate list both in terms of numbers and prominence -four of the top five predictors are Manufacturing Processes. Compared to the optimal linear model from last week, both the linear and non linear selected M32 as the most important variable. After that, however, there was only one other variable in the top 10 for both linear and non-linear - M09.
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 56)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess13 93.82
## ManufacturingProcess09 89.93
## ManufacturingProcess17 88.20
## BiologicalMaterial06 82.61
## BiologicalMaterial03 79.44
## ManufacturingProcess36 73.85
## BiologicalMaterial12 72.36
## ManufacturingProcess06 69.00
## ManufacturingProcess11 62.34
## ManufacturingProcess31 56.39
## BiologicalMaterial02 50.34
## BiologicalMaterial11 48.53
## BiologicalMaterial09 44.76
## ManufacturingProcess30 41.87
## BiologicalMaterial08 40.24
## ManufacturingProcess29 38.54
## ManufacturingProcess33 38.16
## BiologicalMaterial04 36.92
## ManufacturingProcess25 36.83
(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?
Below, I have plotted the top five non-linear predictors that were not included in the list of top linear predictors. For some of the predictors we can easily see a positive of negative relationship with the response variable. For others (M36 and M12) there is no disernible relationship. I believe the predictors with no disernible reflect the transformation to high orders that take place with the SVM regression.
M13

M17

B06

B03

B36

M12
