Homework 8

Author

Naomi Buell

library(tidyverse)
library(AppliedPredictiveModeling)
library(skimr)
library(caret)
library(GGally)
library(mlbench)
set.seed(200)

7.2

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \(y = 10 sin(πx_1x_2) + 20(x_3 − 0.5)^2 + 10x_4 + 5x_5 + N(0, σ2)\) where the x values are random variables uniformly distributed between [0, 1] there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

trainingData <- mlbench.friedman1(200, sd = 1)

## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)

## Look at the data using
featurePlot(trainingData$x, trainingData$y)

## or other methods.

## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Tune several models on these data. For example:

knnModel <- train(
    x = trainingData$x,
    y = trainingData$y,
    method = "knn",
    preProc = c("center", "scale"),
    tuneLength = 10
)
knnModel

k-Nearest Neighbors 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   5  3.466085  0.5121775  2.816838
   7  3.349428  0.5452823  2.727410
   9  3.264276  0.5785990  2.660026
  11  3.214216  0.6024244  2.603767
  13  3.196510  0.6176570  2.591935
  15  3.184173  0.6305506  2.577482
  17  3.183130  0.6425367  2.567787
  19  3.198752  0.6483184  2.592683
  21  3.188993  0.6611428  2.588787
  23  3.200458  0.6638353  2.604529

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 17.

knnPred <- predict(knnModel, newdata = testData$x)

In addition to the KNN model in the example above, I tune a neural network, Multivariate Adaptive Regression Splines (MARS), and Support Vector Machine (SVM) model below.

## There are no columns with pair-wise correlations above the threshold .75
findCorrelation(cor(trainingData$x), cutoff = .75)

integer(0)

nnetGrid <- expand.grid(
    .decay = c(0, 0.01, .1),
    .size = c(1:10),
    .bag = FALSE
)

# Pre-process the data and tune an  model
ctrl <- trainControl(method = "cv", number = 10)

nnetTune <- train(
    x = trainingData$x,
    y = trainingData$y,
    method = "avNNet",
    tuneGrid = nnetGrid,
    trControl = ctrl,
    preProc = c("center", "scale"),
    linout = TRUE,
    trace = FALSE,
    MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1,
    maxit = 500
)
nnetTune

Model Averaged Neural Network 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  decay  size  RMSE      Rsquared   MAE     
  0.00    1    2.409642  0.7699766  1.901164
  0.00    2    2.498321  0.7552210  1.997312
  0.00    3    2.039894  0.8418315  1.608763
  0.00    4    1.910037  0.8571933  1.536982
  0.00    5    2.079055  0.8302563  1.600243
  0.00    6    2.948820  0.7041657  2.086060
  0.00    7    3.476673  0.6454372  2.477829
  0.00    8    4.337363  0.5616218  2.837736
  0.00    9    4.121967  0.5169400  2.745532
  0.00   10    3.775717  0.6474748  2.544171
  0.01    1    2.437185  0.7689840  1.934964
  0.01    2    2.510986  0.7596193  1.988259
  0.01    3    2.000010  0.8419513  1.555801
  0.01    4    2.003357  0.8445290  1.549721
  0.01    5    2.094085  0.8310163  1.666573
  0.01    6    2.303160  0.8013569  1.848981
  0.01    7    2.350215  0.8048656  1.877390
  0.01    8    2.276100  0.8009925  1.823380
  0.01    9    2.255870  0.8137568  1.772540
  0.01   10    2.409138  0.7766479  1.970988
  0.10    1    2.450906  0.7652288  1.942962
  0.10    2    2.489401  0.7606440  1.997059
  0.10    3    2.200694  0.8155493  1.786601
  0.10    4    2.059323  0.8432341  1.651719
  0.10    5    2.173964  0.8178289  1.717782
  0.10    6    2.230339  0.8096536  1.765530
  0.10    7    2.241135  0.8162395  1.823907
  0.10    8    2.321584  0.8001632  1.803450
  0.10    9    2.280454  0.7932034  1.852915
  0.10   10    2.219990  0.8131718  1.771699

Tuning parameter 'bag' was held constant at a value of FALSE
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 4, decay = 0 and bag = FALSE.

nnetPred <- predict(nnetTune, newdata = testData$x)

marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38)

marsTuned <- train(
    x = trainingData$x,
    y = trainingData$y,
    method = "earth",
    tuneGrid = marsGrid,
    trControl = trainControl(method = "cv")
)
marsTuned

Multivariate Adaptive Regression Spline 

200 samples
 10 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE      Rsquared   MAE     
  1        2      4.334325  0.2599883  3.607719
  1        3      3.599334  0.4805557  2.888987
  1        4      2.637145  0.7290848  2.087677
  1        5      2.283872  0.7939684  1.817343
  1        6      2.125875  0.8183677  1.647491
  1        7      1.766013  0.8733619  1.410328
  1        8      1.671282  0.8842102  1.324258
  1        9      1.645406  0.8867947  1.322041
  1       10      1.597968  0.8926582  1.297518
  1       11      1.540109  0.8996361  1.237949
  1       12      1.545349  0.8992979  1.243771
  1       13      1.535169  0.9010122  1.233571
  1       14      1.529405  0.9018457  1.223874
  1       15      1.529405  0.9018457  1.223874
  1       16      1.529405  0.9018457  1.223874
  1       17      1.529405  0.9018457  1.223874
  1       18      1.529405  0.9018457  1.223874
  1       19      1.529405  0.9018457  1.223874
  1       20      1.529405  0.9018457  1.223874
  1       21      1.529405  0.9018457  1.223874
  1       22      1.529405  0.9018457  1.223874
  1       23      1.529405  0.9018457  1.223874
  1       24      1.529405  0.9018457  1.223874
  1       25      1.529405  0.9018457  1.223874
  1       26      1.529405  0.9018457  1.223874
  1       27      1.529405  0.9018457  1.223874
  1       28      1.529405  0.9018457  1.223874
  1       29      1.529405  0.9018457  1.223874
  1       30      1.529405  0.9018457  1.223874
  1       31      1.529405  0.9018457  1.223874
  1       32      1.529405  0.9018457  1.223874
  1       33      1.529405  0.9018457  1.223874
  1       34      1.529405  0.9018457  1.223874
  1       35      1.529405  0.9018457  1.223874
  1       36      1.529405  0.9018457  1.223874
  1       37      1.529405  0.9018457  1.223874
  1       38      1.529405  0.9018457  1.223874
  2        2      4.334325  0.2599883  3.607719
  2        3      3.599334  0.4805557  2.888987
  2        4      2.637145  0.7290848  2.087677
  2        5      2.271844  0.7927888  1.823675
  2        6      2.114868  0.8200184  1.659485
  2        7      1.780140  0.8733216  1.429346
  2        8      1.663164  0.8891928  1.294968
  2        9      1.460976  0.9122520  1.180387
  2       10      1.399692  0.9175376  1.122526
  2       11      1.380002  0.9216251  1.110556
  2       12      1.312883  0.9284253  1.063321
  2       13      1.285612  0.9343029  1.014216
  2       14      1.328520  0.9286650  1.052185
  2       15      1.322954  0.9298515  1.045527
  2       16      1.341454  0.9283961  1.053190
  2       17      1.344590  0.9280972  1.054209
  2       18      1.340821  0.9285264  1.050274
  2       19      1.340821  0.9285264  1.050274
  2       20      1.340821  0.9285264  1.050274
  2       21      1.340821  0.9285264  1.050274
  2       22      1.340821  0.9285264  1.050274
  2       23      1.340821  0.9285264  1.050274
  2       24      1.340821  0.9285264  1.050274
  2       25      1.340821  0.9285264  1.050274
  2       26      1.340821  0.9285264  1.050274
  2       27      1.340821  0.9285264  1.050274
  2       28      1.340821  0.9285264  1.050274
  2       29      1.340821  0.9285264  1.050274
  2       30      1.340821  0.9285264  1.050274
  2       31      1.340821  0.9285264  1.050274
  2       32      1.340821  0.9285264  1.050274
  2       33      1.340821  0.9285264  1.050274
  2       34      1.340821  0.9285264  1.050274
  2       35      1.340821  0.9285264  1.050274
  2       36      1.340821  0.9285264  1.050274
  2       37      1.340821  0.9285264  1.050274
  2       38      1.340821  0.9285264  1.050274

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 13 and degree = 2.

marsPred <- predict(marsTuned, newdata = testData$x)

svmRTuned <- train(
    x = trainingData$x,
    y = trainingData$y,
    method = "svmRadial",
    preProc = c("center", "scale"),
    tuneLength = 14,
    trControl = trainControl(method = "cv")
)
svmRTuned

Support Vector Machines with Radial Basis Function Kernel 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  C        RMSE      Rsquared   MAE     
     0.25  2.504105  0.7940789  1.987142
     0.50  2.219946  0.8148914  1.750249
     1.00  2.028115  0.8388693  1.590383
     2.00  1.899331  0.8561464  1.486326
     4.00  1.815659  0.8669649  1.424259
     8.00  1.798336  0.8702845  1.427729
    16.00  1.797151  0.8702727  1.431233
    32.00  1.795246  0.8705185  1.429239
    64.00  1.795246  0.8705185  1.429239
   128.00  1.795246  0.8705185  1.429239
   256.00  1.795246  0.8705185  1.429239
   512.00  1.795246  0.8705185  1.429239
  1024.00  1.795246  0.8705185  1.429239
  2048.00  1.795246  0.8705185  1.429239

Tuning parameter 'sigma' was held constant at a value of 0.06104815
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.06104815 and C = 32.

svmPred <- predict(svmRTuned, newdata = testData$x)

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

data.frame(
    Model = c("KNN", "Neural Network", "MARS", "SVM"),
    rbind(
        postResample(pred = knnPred, obs = testData$y),
        postResample(pred = nnetPred, obs = testData$y),
        postResample(pred = marsPred, obs = testData$y),
        postResample(pred = svmPred, obs = testData$y)
    )
) |>
    arrange(RMSE)

           Model     RMSE  Rsquared      MAE
1           MARS 1.280306 0.9335241 1.016867
2            SVM 2.069332 0.8263570 1.571883
3 Neural Network 2.498334 0.7838112 1.686304
4            KNN 3.204059 0.6819919 2.568346

The MARS model performs best because it has the lowest RMSE, highest Rsquared, and lowest MAE.

varImp(marsTuned)

earth variable importance

   Overall
X1  100.00
X4   75.33
X2   48.88
X5   15.63
X3    0.00

Yes, MARS selects the informative predictors X1-X5 above.

7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

I use KNN data imputation, split the data, and pre-process by removing near zero variation predictors below, as I did in Exercise 6.3 previously.

data(ChemicalManufacturingProcess)

# impute
impute <- preProcess(
    ChemicalManufacturingProcess,
    method = c("knnImpute")
)
impute

Created from 152 samples and 58 variables

Pre-processing:
  - centered (58)
  - ignored (0)
  - 5 nearest neighbor imputation (58)
  - scaled (58)

# predict
chemical_impute <- predict(
    impute,
    ChemicalManufacturingProcess
)

# remove nzv predictors
nzv <- nearZeroVar(chemical_impute)
filtered_chemical <- chemical_impute[, -nzv]
filtered_chemical |> ncol()

[1] 57

# Split the data into a training and a test set
trainingRows <- createDataPartition(
    filtered_chemical$Yield,
    p = .80,
    list = FALSE
)
chemical_train <- filtered_chemical[trainingRows, ]
chemical_test <- filtered_chemical[-trainingRows, ]

Next I will train MARS, SVM, and neural network nonlinear regression models. In some cases I pre-process by centering and scaling in the same step.

# Train
marsTuned_chem <- train(
    chemical_train[, !names(chemical_train) %in% "Yield"],
    chemical_train$Yield,
    method = "earth",
    tuneGrid = marsGrid,
    trControl = trainControl(method = "cv")
)
marsTuned_chem

Multivariate Adaptive Regression Spline 

144 samples
 56 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 130, 130, 130, 129, 131, 128, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE       Rsquared   MAE      
  1        2      0.8229694  0.3776693  0.6400285
  1        3      0.6726292  0.5545466  0.5414789
  1        4      0.6249329  0.6256565  0.5019373
  1        5      0.6336078  0.6121999  0.5146433
  1        6      0.6533184  0.6006798  0.5312103
  1        7      0.6370705  0.6141122  0.5169774
  1        8      0.6362795  0.6201796  0.5094093
  1        9      0.6326781  0.6259690  0.5072746
  1       10      0.6235149  0.6439206  0.5017406
  1       11      0.6040794  0.6594559  0.4928326
  1       12      0.6267566  0.6393797  0.5069740
  1       13      0.6339930  0.6296114  0.5057972
  1       14      0.6360183  0.6307692  0.5023013
  1       15      0.6369935  0.6311722  0.5052953
  1       16      0.6369935  0.6311722  0.5052953
  1       17      0.6369935  0.6311722  0.5052953
  1       18      0.6369935  0.6311722  0.5052953
  1       19      0.6369935  0.6311722  0.5052953
  1       20      0.6369935  0.6311722  0.5052953
  1       21      0.6369935  0.6311722  0.5052953
  1       22      0.6369935  0.6311722  0.5052953
  1       23      0.6369935  0.6311722  0.5052953
  1       24      0.6369935  0.6311722  0.5052953
  1       25      0.6369935  0.6311722  0.5052953
  1       26      0.6369935  0.6311722  0.5052953
  1       27      0.6369935  0.6311722  0.5052953
  1       28      0.6369935  0.6311722  0.5052953
  1       29      0.6369935  0.6311722  0.5052953
  1       30      0.6369935  0.6311722  0.5052953
  1       31      0.6369935  0.6311722  0.5052953
  1       32      0.6369935  0.6311722  0.5052953
  1       33      0.6369935  0.6311722  0.5052953
  1       34      0.6369935  0.6311722  0.5052953
  1       35      0.6369935  0.6311722  0.5052953
  1       36      0.6369935  0.6311722  0.5052953
  1       37      0.6369935  0.6311722  0.5052953
  1       38      0.6369935  0.6311722  0.5052953
  2        2      0.8213749  0.3784062  0.6373551
  2        3      0.6838571  0.5403019  0.5527968
  2        4      0.6231565  0.6189251  0.5017234
  2        5      0.7274223  0.5308450  0.5734149
  2        6      0.7756026  0.5193310  0.5929477
  2        7      0.7843063  0.5096186  0.5890326
  2        8      0.7895617  0.4982613  0.5928996
  2        9      0.7660442  0.5195915  0.5680261
  2       10      0.7987959  0.4829875  0.5950913
  2       11      0.8629878  0.4594946  0.6238048
  2       12      0.8791300  0.4652098  0.6217018
  2       13      0.8926965  0.4711372  0.6277474
  2       14      0.8818186  0.4833008  0.6128424
  2       15      0.8740984  0.4819925  0.6048763
  2       16      0.8784880  0.4751070  0.6027940
  2       17      0.8546525  0.4863916  0.5952616
  2       18      0.9162123  0.4638653  0.6217777
  2       19      0.9346913  0.4567074  0.6347407
  2       20      0.9627305  0.4427165  0.6460796
  2       21      0.9627305  0.4427165  0.6460796
  2       22      0.9627305  0.4427165  0.6460796
  2       23      0.9627305  0.4427165  0.6460796
  2       24      0.9627305  0.4427165  0.6460796
  2       25      0.9627305  0.4427165  0.6460796
  2       26      0.9627305  0.4427165  0.6460796
  2       27      0.9627305  0.4427165  0.6460796
  2       28      0.9627305  0.4427165  0.6460796
  2       29      0.9627305  0.4427165  0.6460796
  2       30      0.9627305  0.4427165  0.6460796
  2       31      0.9627305  0.4427165  0.6460796
  2       32      0.9627305  0.4427165  0.6460796
  2       33      0.9627305  0.4427165  0.6460796
  2       34      0.9627305  0.4427165  0.6460796
  2       35      0.9627305  0.4427165  0.6460796
  2       36      0.9627305  0.4427165  0.6460796
  2       37      0.9627305  0.4427165  0.6460796
  2       38      0.9627305  0.4427165  0.6460796

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 11 and degree = 1.

# Predict
marsPred_chem <- predict(marsTuned_chem, newdata = chemical_test[, !names(chemical_test) %in% "Yield"])

# Train
svmRTuned_chem <- train(
    chemical_train[, !names(chemical_train) %in% "Yield"],
    chemical_train$Yield,
    method = "svmRadial",
    preProc = c("center", "scale"),
    tuneLength = 14,
    trControl = trainControl(method = "cv")
)
svmRTuned_chem

Support Vector Machines with Radial Basis Function Kernel 

144 samples
 56 predictor

Pre-processing: centered (56), scaled (56) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 130, 129, 129, 129, 129, 131, ... 
Resampling results across tuning parameters:

  C        RMSE       Rsquared   MAE      
     0.25  0.7448695  0.5233613  0.5938431
     0.50  0.6827689  0.5780817  0.5442406
     1.00  0.6414690  0.6312418  0.5125969
     2.00  0.6218519  0.6568918  0.4953282
     4.00  0.6216708  0.6581026  0.4917260
     8.00  0.6220222  0.6623731  0.4945375
    16.00  0.6206753  0.6652832  0.4944964
    32.00  0.6206753  0.6652832  0.4944964
    64.00  0.6206753  0.6652832  0.4944964
   128.00  0.6206753  0.6652832  0.4944964
   256.00  0.6206753  0.6652832  0.4944964
   512.00  0.6206753  0.6652832  0.4944964
  1024.00  0.6206753  0.6652832  0.4944964
  2048.00  0.6206753  0.6652832  0.4944964

Tuning parameter 'sigma' was held constant at a value of 0.01381503
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.01381503 and C = 16.

# Predict
svmRPred_chem <- predict(svmRTuned_chem, newdata = chemical_test[, !names(chemical_test) %in% "Yield"])

# There are several columns with pair-wise correlations above the threshold .75
tooHigh <- findCorrelation(cor(chemical_train[, !names(chemical_train) %in% "Yield"]), cutoff = .75)
trainXnnet <- chemical_train[, !names(chemical_train) %in% "Yield"][, -tooHigh]
testXnnet <- chemical_test[, !names(chemical_test) %in% "Yield"][, -tooHigh]

# Train
nnetTune_chem <- train(
    chemical_train[, !names(chemical_train) %in% "Yield"],
    chemical_train$Yield,
    method = "avNNet",
    tuneGrid = nnetGrid,
    trControl = ctrl,
    preProc = c("center", "scale"),
    linout = TRUE,
    trace = FALSE,
    MaxNWts = 10 * (ncol(trainXnnet) + 1) + 10 + 1,
    maxit = 500
)
nnetTune_chem

Model Averaged Neural Network 

144 samples
 56 predictor

Pre-processing: centered (56), scaled (56) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 131, 130, 128, 129, 128, 130, ... 
Resampling results across tuning parameters:

  decay  size  RMSE       Rsquared   MAE      
  0.00    1    0.8053326  0.4824447  0.6491145
  0.00    2    0.7380611  0.5107145  0.5895829
  0.00    3    0.7796276  0.5462163  0.6153225
  0.00    4    0.7795760  0.4864407  0.6708643
  0.00    5    0.7912621  0.4787829  0.6364726
  0.00    6    0.7393453  0.5416763  0.5919869
  0.00    7          NaN        NaN        NaN
  0.00    8          NaN        NaN        NaN
  0.00    9          NaN        NaN        NaN
  0.00   10          NaN        NaN        NaN
  0.01    1    0.8214416  0.4605835  0.6502975
  0.01    2    0.8519980  0.4899934  0.6802222
  0.01    3    0.7658550  0.5509672  0.6097577
  0.01    4    0.6848897  0.6033155  0.5528957
  0.01    5    0.6427444  0.6525235  0.5283225
  0.01    6    0.6187687  0.6875172  0.4976836
  0.01    7          NaN        NaN        NaN
  0.01    8          NaN        NaN        NaN
  0.01    9          NaN        NaN        NaN
  0.01   10          NaN        NaN        NaN
  0.10    1    0.7924727  0.5240487  0.6425097
  0.10    2    0.7482861  0.5492359  0.6141846
  0.10    3    0.6252639  0.6736587  0.5089934
  0.10    4    0.6694793  0.6411280  0.5401708
  0.10    5    0.6091498  0.6966902  0.4871149
  0.10    6    0.5980989  0.6927344  0.4833565
  0.10    7          NaN        NaN        NaN
  0.10    8          NaN        NaN        NaN
  0.10    9          NaN        NaN        NaN
  0.10   10          NaN        NaN        NaN

Tuning parameter 'bag' was held constant at a value of FALSE
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were size = 6, decay = 0.1 and bag = FALSE.

# Predict
nnetPred_chem <- predict(nnetTune_chem, newdata = chemical_test[, !names(chemical_test) %in% "Yield"])

Which nonlinear regression model gives the optimal resampling and test set performance?

data.frame(
    Model = c("Neural Network", "MARS", "SVM"),
    rbind(
        postResample(pred = nnetPred_chem, obs = chemical_test$Yield),
        postResample(pred = marsPred_chem, obs = chemical_test$Yield),
        postResample(pred = svmRPred_chem, obs = chemical_test$Yield)
    )
) |>
    arrange(RMSE)

           Model      RMSE  Rsquared       MAE
1            SVM 0.5523503 0.6305348 0.4160317
2 Neural Network 0.6923444 0.4789435 0.5398615
3           MARS 0.7352410 0.3991137 0.6000865

The SVM nonlinear regression model gives the optimal resampling and test set performance, as it has the lowest RMSE, highest Rsquared, and lowest MAE.

Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

importance <- varImp(svmRTuned_chem, scale = FALSE)$importance |> arrange(-Overall)
importance |> head(10)

                         Overall
ManufacturingProcess13 0.3926158
BiologicalMaterial06   0.3758590
ManufacturingProcess32 0.3480612
ManufacturingProcess17 0.3181553
BiologicalMaterial12   0.3013207
ManufacturingProcess36 0.2979227
BiologicalMaterial03   0.2919228
ManufacturingProcess09 0.2891698
BiologicalMaterial02   0.2706977
ManufacturingProcess31 0.2655716

most_important_vars <- importance |>
    rownames() |>
    head(10) |>
    paste(collapse = ", ")

The 10 most important predictors in the optimal nonlinear model are as follows (in order of importance): ManufacturingProcess13, BiologicalMaterial06, ManufacturingProcess32, ManufacturingProcess17, BiologicalMaterial12, ManufacturingProcess36, BiologicalMaterial03, ManufacturingProcess09, BiologicalMaterial02, ManufacturingProcess31. Process variables dominate this list, but not by much! It is close to an even split between biological and process variables. The optimal partial least squares linear model from 6.3 (published on RPubs here) yielded the following predictors in order of importance:

Most important predictors in partial least squares (linear) model are manufacturing processes.

They are all manufacturing processes, most of which were also captured in the list of important nonlinear predictors.

Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

linear_predictors <- paste0("ManufacturingProcess", c(32, 36, 13, 17, 09, 06))
top_predictors_data <- ChemicalManufacturingProcess[, c(importance |>
    rownames() |>
    head(10) |>
    setdiff(linear_predictors), "Yield")]

# Create correlation plot
ggpairs(top_predictors_data) +
    theme_classic()

I used ggpairs to plot the relationships between each of the top predictors unique to the nonlinear model and the response in the bottom row, and the rightmost column provides the correlation between the predictors and the response variable. These plots reveal that there are important biological predictors that have nonlinear relationships with yield, while important manufacturing process predictors may tend to have nonlinear relationships with yield.