library(tidyverse)
library(AppliedPredictiveModeling)
library(skimr)
library(caret)
library(GGally)
library(mlbench)
set.seed(200)Homework 8
7.2
Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \(y = 10 sin(πx_1x_2) + 20(x_3 − 0.5)^2 + 10x_4 + 5x_5 + N(0, σ2)\) where the x values are random variables uniformly distributed between [0, 1] there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)## or other methods.
## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)Tune several models on these data. For example:
knnModel <- train(
x = trainingData$x,
y = trainingData$y,
method = "knn",
preProc = c("center", "scale"),
tuneLength = 10
)
knnModelk-Nearest Neighbors
200 samples
10 predictor
Pre-processing: centered (10), scaled (10)
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
5 3.466085 0.5121775 2.816838
7 3.349428 0.5452823 2.727410
9 3.264276 0.5785990 2.660026
11 3.214216 0.6024244 2.603767
13 3.196510 0.6176570 2.591935
15 3.184173 0.6305506 2.577482
17 3.183130 0.6425367 2.567787
19 3.198752 0.6483184 2.592683
21 3.188993 0.6611428 2.588787
23 3.200458 0.6638353 2.604529
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 17.
knnPred <- predict(knnModel, newdata = testData$x)In addition to the KNN model in the example above, I tune a neural network, Multivariate Adaptive Regression Splines (MARS), and Support Vector Machine (SVM) model below.
## There are no columns with pair-wise correlations above the threshold .75
findCorrelation(cor(trainingData$x), cutoff = .75)integer(0)
nnetGrid <- expand.grid(
.decay = c(0, 0.01, .1),
.size = c(1:10),
.bag = FALSE
)
# Pre-process the data and tune an model
ctrl <- trainControl(method = "cv", number = 10)
nnetTune <- train(
x = trainingData$x,
y = trainingData$y,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1,
maxit = 500
)
nnetTuneModel Averaged Neural Network
200 samples
10 predictor
Pre-processing: centered (10), scaled (10)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
decay size RMSE Rsquared MAE
0.00 1 2.409642 0.7699766 1.901164
0.00 2 2.498321 0.7552210 1.997312
0.00 3 2.039894 0.8418315 1.608763
0.00 4 1.910037 0.8571933 1.536982
0.00 5 2.079055 0.8302563 1.600243
0.00 6 2.948820 0.7041657 2.086060
0.00 7 3.476673 0.6454372 2.477829
0.00 8 4.337363 0.5616218 2.837736
0.00 9 4.121967 0.5169400 2.745532
0.00 10 3.775717 0.6474748 2.544171
0.01 1 2.437185 0.7689840 1.934964
0.01 2 2.510986 0.7596193 1.988259
0.01 3 2.000010 0.8419513 1.555801
0.01 4 2.003357 0.8445290 1.549721
0.01 5 2.094085 0.8310163 1.666573
0.01 6 2.303160 0.8013569 1.848981
0.01 7 2.350215 0.8048656 1.877390
0.01 8 2.276100 0.8009925 1.823380
0.01 9 2.255870 0.8137568 1.772540
0.01 10 2.409138 0.7766479 1.970988
0.10 1 2.450906 0.7652288 1.942962
0.10 2 2.489401 0.7606440 1.997059
0.10 3 2.200694 0.8155493 1.786601
0.10 4 2.059323 0.8432341 1.651719
0.10 5 2.173964 0.8178289 1.717782
0.10 6 2.230339 0.8096536 1.765530
0.10 7 2.241135 0.8162395 1.823907
0.10 8 2.321584 0.8001632 1.803450
0.10 9 2.280454 0.7932034 1.852915
0.10 10 2.219990 0.8131718 1.771699
Tuning parameter 'bag' was held constant at a value of FALSE
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 4, decay = 0 and bag = FALSE.
nnetPred <- predict(nnetTune, newdata = testData$x)marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)
marsTuned <- train(
x = trainingData$x,
y = trainingData$y,
method = "earth",
tuneGrid = marsGrid,
trControl = trainControl(method = "cv")
)
marsTunedMultivariate Adaptive Regression Spline
200 samples
10 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
degree nprune RMSE Rsquared MAE
1 2 4.334325 0.2599883 3.607719
1 3 3.599334 0.4805557 2.888987
1 4 2.637145 0.7290848 2.087677
1 5 2.283872 0.7939684 1.817343
1 6 2.125875 0.8183677 1.647491
1 7 1.766013 0.8733619 1.410328
1 8 1.671282 0.8842102 1.324258
1 9 1.645406 0.8867947 1.322041
1 10 1.597968 0.8926582 1.297518
1 11 1.540109 0.8996361 1.237949
1 12 1.545349 0.8992979 1.243771
1 13 1.535169 0.9010122 1.233571
1 14 1.529405 0.9018457 1.223874
1 15 1.529405 0.9018457 1.223874
1 16 1.529405 0.9018457 1.223874
1 17 1.529405 0.9018457 1.223874
1 18 1.529405 0.9018457 1.223874
1 19 1.529405 0.9018457 1.223874
1 20 1.529405 0.9018457 1.223874
1 21 1.529405 0.9018457 1.223874
1 22 1.529405 0.9018457 1.223874
1 23 1.529405 0.9018457 1.223874
1 24 1.529405 0.9018457 1.223874
1 25 1.529405 0.9018457 1.223874
1 26 1.529405 0.9018457 1.223874
1 27 1.529405 0.9018457 1.223874
1 28 1.529405 0.9018457 1.223874
1 29 1.529405 0.9018457 1.223874
1 30 1.529405 0.9018457 1.223874
1 31 1.529405 0.9018457 1.223874
1 32 1.529405 0.9018457 1.223874
1 33 1.529405 0.9018457 1.223874
1 34 1.529405 0.9018457 1.223874
1 35 1.529405 0.9018457 1.223874
1 36 1.529405 0.9018457 1.223874
1 37 1.529405 0.9018457 1.223874
1 38 1.529405 0.9018457 1.223874
2 2 4.334325 0.2599883 3.607719
2 3 3.599334 0.4805557 2.888987
2 4 2.637145 0.7290848 2.087677
2 5 2.271844 0.7927888 1.823675
2 6 2.114868 0.8200184 1.659485
2 7 1.780140 0.8733216 1.429346
2 8 1.663164 0.8891928 1.294968
2 9 1.460976 0.9122520 1.180387
2 10 1.399692 0.9175376 1.122526
2 11 1.380002 0.9216251 1.110556
2 12 1.312883 0.9284253 1.063321
2 13 1.285612 0.9343029 1.014216
2 14 1.328520 0.9286650 1.052185
2 15 1.322954 0.9298515 1.045527
2 16 1.341454 0.9283961 1.053190
2 17 1.344590 0.9280972 1.054209
2 18 1.340821 0.9285264 1.050274
2 19 1.340821 0.9285264 1.050274
2 20 1.340821 0.9285264 1.050274
2 21 1.340821 0.9285264 1.050274
2 22 1.340821 0.9285264 1.050274
2 23 1.340821 0.9285264 1.050274
2 24 1.340821 0.9285264 1.050274
2 25 1.340821 0.9285264 1.050274
2 26 1.340821 0.9285264 1.050274
2 27 1.340821 0.9285264 1.050274
2 28 1.340821 0.9285264 1.050274
2 29 1.340821 0.9285264 1.050274
2 30 1.340821 0.9285264 1.050274
2 31 1.340821 0.9285264 1.050274
2 32 1.340821 0.9285264 1.050274
2 33 1.340821 0.9285264 1.050274
2 34 1.340821 0.9285264 1.050274
2 35 1.340821 0.9285264 1.050274
2 36 1.340821 0.9285264 1.050274
2 37 1.340821 0.9285264 1.050274
2 38 1.340821 0.9285264 1.050274
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 13 and degree = 2.
marsPred <- predict(marsTuned, newdata = testData$x)svmRTuned <- train(
x = trainingData$x,
y = trainingData$y,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14,
trControl = trainControl(method = "cv")
)
svmRTunedSupport Vector Machines with Radial Basis Function Kernel
200 samples
10 predictor
Pre-processing: centered (10), scaled (10)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
Resampling results across tuning parameters:
C RMSE Rsquared MAE
0.25 2.504105 0.7940789 1.987142
0.50 2.219946 0.8148914 1.750249
1.00 2.028115 0.8388693 1.590383
2.00 1.899331 0.8561464 1.486326
4.00 1.815659 0.8669649 1.424259
8.00 1.798336 0.8702845 1.427729
16.00 1.797151 0.8702727 1.431233
32.00 1.795246 0.8705185 1.429239
64.00 1.795246 0.8705185 1.429239
128.00 1.795246 0.8705185 1.429239
256.00 1.795246 0.8705185 1.429239
512.00 1.795246 0.8705185 1.429239
1024.00 1.795246 0.8705185 1.429239
2048.00 1.795246 0.8705185 1.429239
Tuning parameter 'sigma' was held constant at a value of 0.06104815
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.06104815 and C = 32.
svmPred <- predict(svmRTuned, newdata = testData$x)Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?
data.frame(
Model = c("KNN", "Neural Network", "MARS", "SVM"),
rbind(
postResample(pred = knnPred, obs = testData$y),
postResample(pred = nnetPred, obs = testData$y),
postResample(pred = marsPred, obs = testData$y),
postResample(pred = svmPred, obs = testData$y)
)
) |>
arrange(RMSE) Model RMSE Rsquared MAE
1 MARS 1.280306 0.9335241 1.016867
2 SVM 2.069332 0.8263570 1.571883
3 Neural Network 2.498334 0.7838112 1.686304
4 KNN 3.204059 0.6819919 2.568346
The MARS model performs best because it has the lowest RMSE, highest Rsquared, and lowest MAE.
varImp(marsTuned)earth variable importance
Overall
X1 100.00
X4 75.33
X2 48.88
X5 15.63
X3 0.00
Yes, MARS selects the informative predictors X1-X5 above.
7.5
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
I use KNN data imputation, split the data, and pre-process by removing near zero variation predictors below, as I did in Exercise 6.3 previously.
data(ChemicalManufacturingProcess)
# impute
impute <- preProcess(
ChemicalManufacturingProcess,
method = c("knnImpute")
)
imputeCreated from 152 samples and 58 variables
Pre-processing:
- centered (58)
- ignored (0)
- 5 nearest neighbor imputation (58)
- scaled (58)
# predict
chemical_impute <- predict(
impute,
ChemicalManufacturingProcess
)
# remove nzv predictors
nzv <- nearZeroVar(chemical_impute)
filtered_chemical <- chemical_impute[, -nzv]
filtered_chemical |> ncol()[1] 57
# Split the data into a training and a test set
trainingRows <- createDataPartition(
filtered_chemical$Yield,
p = .80,
list = FALSE
)
chemical_train <- filtered_chemical[trainingRows, ]
chemical_test <- filtered_chemical[-trainingRows, ]Next I will train MARS, SVM, and neural network nonlinear regression models. In some cases I pre-process by centering and scaling in the same step.
# Train
marsTuned_chem <- train(
chemical_train[, !names(chemical_train) %in% "Yield"],
chemical_train$Yield,
method = "earth",
tuneGrid = marsGrid,
trControl = trainControl(method = "cv")
)
marsTuned_chemMultivariate Adaptive Regression Spline
144 samples
56 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 130, 130, 130, 129, 131, 128, ...
Resampling results across tuning parameters:
degree nprune RMSE Rsquared MAE
1 2 0.8229694 0.3776693 0.6400285
1 3 0.6726292 0.5545466 0.5414789
1 4 0.6249329 0.6256565 0.5019373
1 5 0.6336078 0.6121999 0.5146433
1 6 0.6533184 0.6006798 0.5312103
1 7 0.6370705 0.6141122 0.5169774
1 8 0.6362795 0.6201796 0.5094093
1 9 0.6326781 0.6259690 0.5072746
1 10 0.6235149 0.6439206 0.5017406
1 11 0.6040794 0.6594559 0.4928326
1 12 0.6267566 0.6393797 0.5069740
1 13 0.6339930 0.6296114 0.5057972
1 14 0.6360183 0.6307692 0.5023013
1 15 0.6369935 0.6311722 0.5052953
1 16 0.6369935 0.6311722 0.5052953
1 17 0.6369935 0.6311722 0.5052953
1 18 0.6369935 0.6311722 0.5052953
1 19 0.6369935 0.6311722 0.5052953
1 20 0.6369935 0.6311722 0.5052953
1 21 0.6369935 0.6311722 0.5052953
1 22 0.6369935 0.6311722 0.5052953
1 23 0.6369935 0.6311722 0.5052953
1 24 0.6369935 0.6311722 0.5052953
1 25 0.6369935 0.6311722 0.5052953
1 26 0.6369935 0.6311722 0.5052953
1 27 0.6369935 0.6311722 0.5052953
1 28 0.6369935 0.6311722 0.5052953
1 29 0.6369935 0.6311722 0.5052953
1 30 0.6369935 0.6311722 0.5052953
1 31 0.6369935 0.6311722 0.5052953
1 32 0.6369935 0.6311722 0.5052953
1 33 0.6369935 0.6311722 0.5052953
1 34 0.6369935 0.6311722 0.5052953
1 35 0.6369935 0.6311722 0.5052953
1 36 0.6369935 0.6311722 0.5052953
1 37 0.6369935 0.6311722 0.5052953
1 38 0.6369935 0.6311722 0.5052953
2 2 0.8213749 0.3784062 0.6373551
2 3 0.6838571 0.5403019 0.5527968
2 4 0.6231565 0.6189251 0.5017234
2 5 0.7274223 0.5308450 0.5734149
2 6 0.7756026 0.5193310 0.5929477
2 7 0.7843063 0.5096186 0.5890326
2 8 0.7895617 0.4982613 0.5928996
2 9 0.7660442 0.5195915 0.5680261
2 10 0.7987959 0.4829875 0.5950913
2 11 0.8629878 0.4594946 0.6238048
2 12 0.8791300 0.4652098 0.6217018
2 13 0.8926965 0.4711372 0.6277474
2 14 0.8818186 0.4833008 0.6128424
2 15 0.8740984 0.4819925 0.6048763
2 16 0.8784880 0.4751070 0.6027940
2 17 0.8546525 0.4863916 0.5952616
2 18 0.9162123 0.4638653 0.6217777
2 19 0.9346913 0.4567074 0.6347407
2 20 0.9627305 0.4427165 0.6460796
2 21 0.9627305 0.4427165 0.6460796
2 22 0.9627305 0.4427165 0.6460796
2 23 0.9627305 0.4427165 0.6460796
2 24 0.9627305 0.4427165 0.6460796
2 25 0.9627305 0.4427165 0.6460796
2 26 0.9627305 0.4427165 0.6460796
2 27 0.9627305 0.4427165 0.6460796
2 28 0.9627305 0.4427165 0.6460796
2 29 0.9627305 0.4427165 0.6460796
2 30 0.9627305 0.4427165 0.6460796
2 31 0.9627305 0.4427165 0.6460796
2 32 0.9627305 0.4427165 0.6460796
2 33 0.9627305 0.4427165 0.6460796
2 34 0.9627305 0.4427165 0.6460796
2 35 0.9627305 0.4427165 0.6460796
2 36 0.9627305 0.4427165 0.6460796
2 37 0.9627305 0.4427165 0.6460796
2 38 0.9627305 0.4427165 0.6460796
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 11 and degree = 1.
# Predict
marsPred_chem <- predict(marsTuned_chem, newdata = chemical_test[, !names(chemical_test) %in% "Yield"])# Train
svmRTuned_chem <- train(
chemical_train[, !names(chemical_train) %in% "Yield"],
chemical_train$Yield,
method = "svmRadial",
preProc = c("center", "scale"),
tuneLength = 14,
trControl = trainControl(method = "cv")
)
svmRTuned_chemSupport Vector Machines with Radial Basis Function Kernel
144 samples
56 predictor
Pre-processing: centered (56), scaled (56)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 130, 129, 129, 129, 129, 131, ...
Resampling results across tuning parameters:
C RMSE Rsquared MAE
0.25 0.7448695 0.5233613 0.5938431
0.50 0.6827689 0.5780817 0.5442406
1.00 0.6414690 0.6312418 0.5125969
2.00 0.6218519 0.6568918 0.4953282
4.00 0.6216708 0.6581026 0.4917260
8.00 0.6220222 0.6623731 0.4945375
16.00 0.6206753 0.6652832 0.4944964
32.00 0.6206753 0.6652832 0.4944964
64.00 0.6206753 0.6652832 0.4944964
128.00 0.6206753 0.6652832 0.4944964
256.00 0.6206753 0.6652832 0.4944964
512.00 0.6206753 0.6652832 0.4944964
1024.00 0.6206753 0.6652832 0.4944964
2048.00 0.6206753 0.6652832 0.4944964
Tuning parameter 'sigma' was held constant at a value of 0.01381503
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.01381503 and C = 16.
# Predict
svmRPred_chem <- predict(svmRTuned_chem, newdata = chemical_test[, !names(chemical_test) %in% "Yield"])# There are several columns with pair-wise correlations above the threshold .75
tooHigh <- findCorrelation(cor(chemical_train[, !names(chemical_train) %in% "Yield"]), cutoff = .75)
trainXnnet <- chemical_train[, !names(chemical_train) %in% "Yield"][, -tooHigh]
testXnnet <- chemical_test[, !names(chemical_test) %in% "Yield"][, -tooHigh]
# Train
nnetTune_chem <- train(
chemical_train[, !names(chemical_train) %in% "Yield"],
chemical_train$Yield,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
MaxNWts = 10 * (ncol(trainXnnet) + 1) + 10 + 1,
maxit = 500
)
nnetTune_chemModel Averaged Neural Network
144 samples
56 predictor
Pre-processing: centered (56), scaled (56)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 131, 130, 128, 129, 128, 130, ...
Resampling results across tuning parameters:
decay size RMSE Rsquared MAE
0.00 1 0.8053326 0.4824447 0.6491145
0.00 2 0.7380611 0.5107145 0.5895829
0.00 3 0.7796276 0.5462163 0.6153225
0.00 4 0.7795760 0.4864407 0.6708643
0.00 5 0.7912621 0.4787829 0.6364726
0.00 6 0.7393453 0.5416763 0.5919869
0.00 7 NaN NaN NaN
0.00 8 NaN NaN NaN
0.00 9 NaN NaN NaN
0.00 10 NaN NaN NaN
0.01 1 0.8214416 0.4605835 0.6502975
0.01 2 0.8519980 0.4899934 0.6802222
0.01 3 0.7658550 0.5509672 0.6097577
0.01 4 0.6848897 0.6033155 0.5528957
0.01 5 0.6427444 0.6525235 0.5283225
0.01 6 0.6187687 0.6875172 0.4976836
0.01 7 NaN NaN NaN
0.01 8 NaN NaN NaN
0.01 9 NaN NaN NaN
0.01 10 NaN NaN NaN
0.10 1 0.7924727 0.5240487 0.6425097
0.10 2 0.7482861 0.5492359 0.6141846
0.10 3 0.6252639 0.6736587 0.5089934
0.10 4 0.6694793 0.6411280 0.5401708
0.10 5 0.6091498 0.6966902 0.4871149
0.10 6 0.5980989 0.6927344 0.4833565
0.10 7 NaN NaN NaN
0.10 8 NaN NaN NaN
0.10 9 NaN NaN NaN
0.10 10 NaN NaN NaN
Tuning parameter 'bag' was held constant at a value of FALSE
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 6, decay = 0.1 and bag = FALSE.
# Predict
nnetPred_chem <- predict(nnetTune_chem, newdata = chemical_test[, !names(chemical_test) %in% "Yield"])- Which nonlinear regression model gives the optimal resampling and test set performance?
data.frame(
Model = c("Neural Network", "MARS", "SVM"),
rbind(
postResample(pred = nnetPred_chem, obs = chemical_test$Yield),
postResample(pred = marsPred_chem, obs = chemical_test$Yield),
postResample(pred = svmRPred_chem, obs = chemical_test$Yield)
)
) |>
arrange(RMSE) Model RMSE Rsquared MAE
1 SVM 0.5523503 0.6305348 0.4160317
2 Neural Network 0.6923444 0.4789435 0.5398615
3 MARS 0.7352410 0.3991137 0.6000865
The SVM nonlinear regression model gives the optimal resampling and test set performance, as it has the lowest RMSE, highest Rsquared, and lowest MAE.
- Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
importance <- varImp(svmRTuned_chem, scale = FALSE)$importance |> arrange(-Overall)
importance |> head(10) Overall
ManufacturingProcess13 0.3926158
BiologicalMaterial06 0.3758590
ManufacturingProcess32 0.3480612
ManufacturingProcess17 0.3181553
BiologicalMaterial12 0.3013207
ManufacturingProcess36 0.2979227
BiologicalMaterial03 0.2919228
ManufacturingProcess09 0.2891698
BiologicalMaterial02 0.2706977
ManufacturingProcess31 0.2655716
most_important_vars <- importance |>
rownames() |>
head(10) |>
paste(collapse = ", ")The 10 most important predictors in the optimal nonlinear model are as follows (in order of importance): ManufacturingProcess13, BiologicalMaterial06, ManufacturingProcess32, ManufacturingProcess17, BiologicalMaterial12, ManufacturingProcess36, BiologicalMaterial03, ManufacturingProcess09, BiologicalMaterial02, ManufacturingProcess31. Process variables dominate this list, but not by much! It is close to an even split between biological and process variables. The optimal partial least squares linear model from 6.3 (published on RPubs here) yielded the following predictors in order of importance:
They are all manufacturing processes, most of which were also captured in the list of important nonlinear predictors.
- Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?
linear_predictors <- paste0("ManufacturingProcess", c(32, 36, 13, 17, 09, 06))
top_predictors_data <- ChemicalManufacturingProcess[, c(importance |>
rownames() |>
head(10) |>
setdiff(linear_predictors), "Yield")]
# Create correlation plot
ggpairs(top_predictors_data) +
theme_classic()I used ggpairs to plot the relationships between each of the top predictors unique to the nonlinear model and the response in the bottom row, and the rightmost column provides the correlation between the predictors and the response variable. These plots reveal that there are important biological predictors that have nonlinear relationships with yield, while important manufacturing process predictors may tend to have nonlinear relationships with yield.