Packages:

library(caret)
library(tidyverse)
library(RColorBrewer)
library(knitr)
library(cowplot)
library(mlbench)
library(earth)
library(kernlab)
library(rminer)
library(AppliedPredictiveModeling)

Exercise 7.2:

Friedman (1991) introduced several benchmark data sets created by simulation. One of these simulations used the following nonlinear equation to create data:

\(y = 10sin(\pi x_1x_2) + 20(x_3 - 0.5)^2 + 10x_4 + 5x_5 + N(0,\sigma^2)\)

where the \(x\) values are random variables uniformly distributed between \([0, 1]\) (there are also five other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
# We convert the ’x’ data from a matrix to a dataframe
# One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
# Look at the data using
featurePlot(trainingData$x, trainingData$y)

# or other methods.
# This creates a list with a vector ’y’ and a matrix
# of predictors ’x’. Also simulate a large test set to
# estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Tune several models on these data. For example:

knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.

A KNN model with \(k = 17\) is the final model selected in the example provided above.

Which models appear to give the best performance? Does MARS select the informative predictors (those named \(X1\) to \(X5\))?

To answer this question, we will first take a look at what features the final KNN model selected in the example provided above deems most important.

We can estimate feature importance in this KNN model by calculating permutation-based importance. We do so by shuffling the values in a single feature column at a time, making new predictions using the shuffled dataset, and comparing them to previous predictions. We calculate how much the new predictions suffered from that single feature column being randomly reordered, and this measure of deterioration estimates the importance of that single feature. We repeat the process, starting with the original unshuffled dataset each time, until we’ve measured permutation-based importance for all the feature columns in the dataset.

Note that we will be comparing predictive \(R^2\) to measure the deterioration, and since higher values of predictive \(R^2\) indicate superior performance, we need to subtract the predictive \(R^2\) for the shuffled test data from the predictive \(R^2\) for the original test data. Only that way do higher deterioration values correspond with feature importance. (If we were instead using \(RMSE\), for which lower values indicate superior performance, we would take a difference in the opposite direction.)

orig_test_pred <- predict(knnModel, testData$x)
orig_pred_rsq <- as.numeric(R2(orig_test_pred, testData$y, form = "traditional"))
features <- colnames(testData$x)
feature_importance <- rep(0, length(features))
names(feature_importance) <- features
for (f in 1:length(features)){
    test_x_shuffled <- testData$x
    rows <- sample(nrow(test_x_shuffled))
    test_x_shuffled[, f] <- test_x_shuffled[rows, f]
    new_test_pred <- predict(knnModel, test_x_shuffled)
    new_pred_rsq <- as.numeric(R2(new_test_pred, testData$y,
                                  form = "traditional"))
    feature_importance[f] <- orig_pred_rsq - new_pred_rsq
}
feature_importance <- sort(feature_importance, decreasing = TRUE) |>
    as.data.frame() |>
    rownames_to_column()
cols <- c("Feature", "Importance")
colnames(feature_importance) <-  cols
knitr::kable(feature_importance, format = "simple")
Feature Importance
X4 0.3045902
X1 0.2190060
X2 0.1906078
X5 0.1063300
X3 0.0118921
X6 0.0054830
X7 0.0020844
X9 0.0007926
X10 0.0004151
X8 -0.0026107

The five most important predictors in the KNN model example provided above are X1 to X5, which is promising. We will proceed with training our own models below and determining which features each model deems (most) important. Then we’ll compare performance across all models.

First, we train a MARS model.

marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:10)
marsTuned <- train(trainingData$x, trainingData$y,
                   method = "earth",
                   # Explicitly declare the candidate models to  test
                   tuneGrid = marsGrid,
                   trControl = trainControl(method = "cv"))
marsTuned
## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.387823  0.2450088  3.588646
##   1        3      3.655998  0.4632262  2.969890
##   1        4      2.588450  0.7255923  2.063827
##   1        5      2.363975  0.7687900  1.914621
##   1        6      2.297197  0.7869612  1.845519
##   1        7      1.818341  0.8632669  1.475048
##   1        8      1.795877  0.8680170  1.381346
##   1        9      1.710708  0.8813665  1.317972
##   1       10      1.683901  0.8837017  1.305783
##   2        2      4.387823  0.2450088  3.588646
##   2        3      3.743159  0.4397324  3.062032
##   2        4      2.591818  0.7244038  2.065765
##   2        5      2.274929  0.7869242  1.862727
##   2        6      2.327693  0.7779589  1.883137
##   2        7      1.859697  0.8612550  1.492758
##   2        8      1.800609  0.8668918  1.389252
##   2        9      1.600103  0.8887390  1.229885
##   2       10      1.516087  0.8991938  1.178427
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 10 and degree = 2.

A MARS model with \(nprune = 10\) and \(degree = 2\) is the final MARS model selected during training.

mars_feature_importance <- varImp(marsTuned, method = "gcv")
mars_feature_importance <- mars_feature_importance$importance |>
    arrange(desc(Overall)) |>
    rownames_to_column()
colnames(mars_feature_importance) <- cols
knitr::kable(mars_feature_importance, format = "simple")
Feature Importance
X1 100.00000
X4 75.57379
X2 49.29161
X5 15.94173
X3 0.00000

The five most important predictors in the final MARS model are X1 to X5, so it selects the informative predictors as well as the KNN model.

Next, we train two SVM models, one using the svmRadial (Radial Basis) kernel function, and one using the svmLinear (Linear) kernel function.

svmRTuned <- train(trainingData$x, trainingData$y,
                   method = "svmRadial",
                   preProc = c("center", "scale"),
                   tuneLength = 14,
                   trControl = trainControl(method = "cv"))
svmRTuned
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE     
##      0.25  2.520754  0.8007702  2.003864
##      0.50  2.268015  0.8149836  1.800009
##      1.00  2.090849  0.8343915  1.650897
##      2.00  1.978953  0.8445531  1.555449
##      4.00  1.913207  0.8540410  1.509552
##      8.00  1.909135  0.8546790  1.513806
##     16.00  1.912668  0.8542672  1.517069
##     32.00  1.912668  0.8542672  1.517069
##     64.00  1.912668  0.8542672  1.517069
##    128.00  1.912668  0.8542672  1.517069
##    256.00  1.912668  0.8542672  1.517069
##    512.00  1.912668  0.8542672  1.517069
##   1024.00  1.912668  0.8542672  1.517069
##   2048.00  1.912668  0.8542672  1.517069
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06971645
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06971645 and C = 8.

For the final SVM (Radial Basis) model selected during training, \(sigma = 0.06971645\) and \(C = 8\).

We can use the Importance function from the rminer library to determine variable importance for this SVM (Radial Basis) model. This function doesn’t work with model objects created using the train function, so we recreate the final model using the fit function from the rminer library instead, plugging in the \(sigma\) and \(C\) values. Then we produce a variable importance plot using the mgraph function from the rminer library.

y <- trainingData$y
names(y) <- "y"
dat = cbind(trainingData$x, y)
svmRFit <- fit(y~., data = dat, model = "svm",
               kpar = list(sigma = 0.06971645), C = 8)
svmR.imp <- Importance(svmRFit, data = dat)
L = list(runs = 1,sen = t(svmR.imp$imp),
         sresponses = svmR.imp$sresponses)
mgraph(L, graph = "IMP", leg = names(dat) ,col = "gray", Grid = 10,
       PDF = "")

svmLTuned <- train(trainingData$x, trainingData$y,
                   method = "svmLinear",
                   preProc = c("center", "scale"),
                   tuneLength = 14,
                   trControl = trainControl(method = "cv"))
svmLTuned
## Support Vector Machines with Linear Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   2.425194  0.7607949  1.960102
## 
## Tuning parameter 'C' was held constant at a value of 1

For the final SVM (Linear) model selected during training, \(C = 1\).

We again produce a variable importance plot for this SVM (Linear) model using the same method we used for the SVM (Radial Basis) model.

svmLFit <- fit(y~., data = dat, model = "svm", C = 1)
svmL.imp <- Importance(svmLFit, data = dat)
L = list(runs = 1,sen = t(svmL.imp$imp),
         sresponses = svmL.imp$sresponses)
mgraph(L, graph = "IMP", leg = names(dat) ,col = "gray", Grid = 10,
       PDF = "")

The five most important predictors in both the SVM (Radial Basis) model and the SVM (Linear) model are X1 to X5, so these two models select the informative predictors as well as the previous two models.

We compare performance across all models.

test_pred1 <- predict(knnModel, testData$x)
test_pred2 <- predict(marsTuned, testData$x)
test_pred3 <- predict(svmRTuned, testData$x)
test_pred4 <- predict(svmLTuned, testData$x)
test_rsq1 <- as.numeric(R2(test_pred1, testData$y, form = "traditional"))
test_rsq2 <- as.numeric(R2(test_pred2, testData$y, form = "traditional"))
test_rsq3 <- as.numeric(R2(test_pred3, testData$y, form = "traditional"))
test_rsq4 <- as.numeric(R2(test_pred4, testData$y, form = "traditional"))
test_rmse1 <- as.numeric(RMSE(test_pred1, testData$y))
test_rmse2 <- as.numeric(RMSE(test_pred2, testData$y))
test_rmse3 <- as.numeric(RMSE(test_pred3, testData$y))
test_rmse4 <- as.numeric(RMSE(test_pred4, testData$y))
models <- c("KNN", "MARS", "SVMRadial", "SVMLinear")
rsqs <- round(c(test_rsq1, test_rsq2, test_rsq3, test_rsq4), 4)
rmses <- round(c(test_rmse1, test_rmse2, test_rmse3, test_rmse4), 4)
tbl <- as.data.frame(cbind(models, rsqs, rmses))
cols <- c("Model", "Predictive_RSquared", "RMSE")
colnames(tbl) <- cols
tbl <- tbl |>
    arrange(desc(Predictive_RSquared))
knitr::kable(tbl, format = "simple")
Model Predictive_RSquared RMSE
MARS 0.9197 1.4064
SVMRadial 0.8238 2.0837
SVMLinear 0.6901 2.7634
KNN 0.5834 3.2041

The MARS model performs best on the test data, having the highest predictive \(R^2\) and the lowest \(RMSE\).

Exercise 7.5:

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

data(ChemicalManufacturingProcess)
x <- colSums(is.na(ChemicalManufacturingProcess))
missing_val_cols <- names(x[x > 0])
ChemicalManufacturingProcess <- ChemicalManufacturingProcess |>
    VIM::kNN(variable = missing_val_cols, k = 15, numFun = weighted.mean,
             weightDist = TRUE, imp_var = FALSE)
nzv_predictors <- nearZeroVar(ChemicalManufacturingProcess |> select(-Yield),
                              names = TRUE, saveMetrics = FALSE)
ChemicalManufacturingProcess <- ChemicalManufacturingProcess |>
    select(-all_of(nzv_predictors))
rows <- sample(nrow(ChemicalManufacturingProcess))
ChemicalManufacturingProcess <- ChemicalManufacturingProcess[rows, ]
sample <- sample(c(TRUE, FALSE), nrow(ChemicalManufacturingProcess),
                 replace=TRUE, prob=c(0.7,0.3))
train_CMP <- ChemicalManufacturingProcess[sample, ]
train_CMP_x <- train_CMP |>
    select(-Yield)
train_CMP_y <- train_CMP$Yield
train_CMP_y <- as.numeric(train_CMP_y)
test_CMP <- ChemicalManufacturingProcess[!sample, ]
test_CMP_x <- test_CMP |>
    select(-Yield)
test_CMP_y <- test_CMP$Yield
test_CMP_y <- as.numeric(test_CMP_y)
knnModel_CMP <- train(x = train_CMP_x, y = train_CMP_y,
                      method = "knn",
                      preProc = c("center", "scale"),
                      tuneLength = 10)
knnModel_CMP
## k-Nearest Neighbors 
## 
## 122 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 122, 122, 122, 122, 122, 122, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  1.431076  0.4192911  1.129584
##    7  1.425560  0.4244557  1.144716
##    9  1.418809  0.4368523  1.147813
##   11  1.413249  0.4460603  1.155365
##   13  1.401882  0.4607233  1.150914
##   15  1.408957  0.4611765  1.164165
##   17  1.407791  0.4714171  1.164250
##   19  1.426362  0.4619893  1.177743
##   21  1.440414  0.4569905  1.186808
##   23  1.447820  0.4584104  1.193982
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 13.

We first train a KNN model and determine \(k = 13\) is optimal.

marsGrid_CMP <- expand.grid(.degree = 1:2, .nprune = 2:25)
marsTuned_CMP <- train(train_CMP_x, train_CMP_y,
                       method = "earth",
                       # Explicitly declare the candidate models to  test
                       tuneGrid = marsGrid_CMP,
                       trControl = trainControl(method = "cv"))
marsTuned_CMP
## Multivariate Adaptive Regression Spline 
## 
## 122 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 110, 110, 110, 110, 110, 109, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE      
##   1        2      1.345663  0.5304703  1.0743235
##   1        3      1.253191  0.5620742  0.9903910
##   1        4      1.238262  0.5700063  0.9897225
##   1        5      1.278363  0.5450584  1.0100193
##   1        6      1.958623  0.5029833  1.2572020
##   1        7      1.858991  0.5344816  1.2056641
##   1        8      1.832332  0.5501380  1.1922490
##   1        9      1.923641  0.5595617  1.2171294
##   1       10      1.947791  0.5522955  1.2432530
##   1       11      1.952605  0.5602027  1.2090060
##   1       12      1.954832  0.5566508  1.2177418
##   1       13      1.989781  0.5454036  1.2614341
##   1       14      2.014663  0.5263943  1.2815657
##   1       15      1.978698  0.5464731  1.2592851
##   1       16      1.920762  0.5518982  1.2312847
##   1       17      1.934162  0.5531826  1.2363562
##   1       18      1.934162  0.5531826  1.2363562
##   1       19      1.945713  0.5484018  1.2622405
##   1       20      1.940239  0.5549285  1.2597543
##   1       21      1.942180  0.5558931  1.2617937
##   1       22      1.941669  0.5553461  1.2608124
##   1       23      1.937588  0.5569318  1.2548077
##   1       24      1.932762  0.5637326  1.2518945
##   1       25      1.932762  0.5637326  1.2518945
##   2        2      1.345663  0.5304703  1.0743235
##   2        3      1.212485  0.5816016  0.9845655
##   2        4      1.257410  0.5720474  1.0123890
##   2        5      1.285869  0.5404506  1.0193186
##   2        6      1.344152  0.5442357  1.0582372
##   2        7      1.289789  0.5565622  1.0193258
##   2        8      1.228916  0.5977539  0.9342822
##   2        9      1.183248  0.6144957  0.9146521
##   2       10      1.180303  0.6136883  0.9102565
##   2       11      1.385462  0.5696106  1.0190707
##   2       12      1.410491  0.5879520  1.0188693
##   2       13      1.392205  0.5883381  0.9926860
##   2       14      1.428436  0.5755075  1.0249686
##   2       15      1.511300  0.5480531  1.0588393
##   2       16      1.506910  0.5518312  1.0563591
##   2       17      1.538118  0.5395622  1.0839772
##   2       18      1.525670  0.5452110  1.0770787
##   2       19      1.527740  0.5464442  1.0789165
##   2       20      1.534004  0.5498893  1.1012723
##   2       21      1.521465  0.5575213  1.0936325
##   2       22      1.525954  0.5583420  1.0983232
##   2       23      1.516199  0.5616723  1.0910192
##   2       24      1.514744  0.5622591  1.0890956
##   2       25      1.514744  0.5622591  1.0890956
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 10 and degree = 2.

We next train a MARS model and determine \(nprune = 10\) and \(degree = 2\) is optimal.

svmRTuned_CMP <- train(train_CMP_x, train_CMP_y,
                       method = "svmRadial",
                       preProc = c("center", "scale"),
                       tuneLength = 14,
                       trControl = trainControl(method = "cv"))
svmRTuned_CMP
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 122 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 109, 110, 110, 110, 110, 110, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE      
##      0.25  1.387692  0.5034995  1.1343735
##      0.50  1.266481  0.5388283  1.0259833
##      1.00  1.181177  0.5843305  0.9454245
##      2.00  1.149952  0.6080905  0.9028646
##      4.00  1.147200  0.6154050  0.8963250
##      8.00  1.131401  0.6264408  0.8872035
##     16.00  1.131401  0.6264408  0.8872035
##     32.00  1.131401  0.6264408  0.8872035
##     64.00  1.131401  0.6264408  0.8872035
##    128.00  1.131401  0.6264408  0.8872035
##    256.00  1.131401  0.6264408  0.8872035
##    512.00  1.131401  0.6264408  0.8872035
##   1024.00  1.131401  0.6264408  0.8872035
##   2048.00  1.131401  0.6264408  0.8872035
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01445795
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01445795 and C = 8.

We next train an SVM (Radial Basis) model and determine \(sigma = 0.01445795\) and \(C = 8\) are optimal.

svmLTuned_CMP <- train(train_CMP_x, train_CMP_y,
                       method = "svmLinear",
                       preProc = c("center", "scale"),
                       tuneLength = 14,
                       trControl = trainControl(method = "cv"))
svmLTuned_CMP
## Support Vector Machines with Linear Kernel 
## 
## 122 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 110, 109, 110, 110, 110, 110, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   2.773556  0.4807428  1.583001
## 
## Tuning parameter 'C' was held constant at a value of 1

We next train an SVM (Linear) model and determine \(C = 1\) is optimal.

test_pred1 <- predict(knnModel_CMP, test_CMP_x)
test_pred2 <- predict(marsTuned_CMP, test_CMP_x)
test_pred3 <- predict(svmRTuned_CMP, test_CMP_x)
test_pred4 <- predict(svmLTuned_CMP, test_CMP_x)
test_rsq1 <- as.numeric(R2(test_pred1, test_CMP_y, form = "traditional"))
test_rsq2 <- as.numeric(R2(test_pred2, test_CMP_y, form = "traditional"))
test_rsq3 <- as.numeric(R2(test_pred3, test_CMP_y, form = "traditional"))
test_rsq4 <- as.numeric(R2(test_pred4, test_CMP_y, form = "traditional"))
test_rmse1 <- as.numeric(RMSE(test_pred1, test_CMP_y))
test_rmse2 <- as.numeric(RMSE(test_pred2, test_CMP_y))
test_rmse3 <- as.numeric(RMSE(test_pred3, test_CMP_y))
test_rmse4 <- as.numeric(RMSE(test_pred4, test_CMP_y))
models <- c("KNN", "MARS", "SVMRadial", "SVMLinear")
rsqs <- round(c(test_rsq1, test_rsq2, test_rsq3, test_rsq4), 4)
rmses <- round(c(test_rmse1, test_rmse2, test_rmse3, test_rmse4), 4)
tbl <- as.data.frame(cbind(models, rsqs, rmses))
cols <- c("Model", "Predictive_RSquared", "RMSE")
colnames(tbl) <- cols
tbl <- tbl |>
    arrange(desc(Predictive_RSquared))
knitr::kable(tbl, format = "simple")
Model Predictive_RSquared RMSE
SVMRadial 0.5181 1.2982
MARS 0.4319 1.4095
KNN 0.3299 1.5308
SVMLinear -1.7948 3.1263

The SVM (Radial Basis) model performs best on the test data, with the highest predictive \(R^2\) and the lowest \(RMSE\).

Below are the top 10 most important predictors in the optimal SVM (Radial Basis) model:

y <- train_CMP_y
names(y) <- "y"
dat = cbind(train_CMP_x, y)
svmRFit <- fit(y~., data = dat, model = "svm",
               kpar = list(sigma = 0.01445795), C = 8)
svmR.imp <- Importance(svmRFit, data = dat)
L = list(runs = 1,sen = t(svmR.imp$imp),
         sresponses = svmR.imp$sresponses)
sen_vec <- as.numeric(L[["sen"]])
copy <- L
delete <- c()
for (i in 1:length(sen_vec)){
    if (sen_vec[i] >= 0.035){
        next
    }else{
        delete <- append(delete, i)
    }
}
copy[["sen"]] <- t(as.matrix(copy[["sen"]][, -delete]))
copy[["sresponses"]] <- copy[["sresponses"]][-delete]
names <- c()
for (i in 1:length(copy[["sresponses"]])){
    n <- copy[["sresponses"]][[i]][["n"]]
    names <- append(names, n)
}
mgraph(copy, graph = "IMP", leg = names, col = "gray",
       PDF = "")

The manufacturing process variables dominate the list. The top 10 important predictors for the linear regression model were:

top_10_linear_vec <- c("ManufacturingProcess09", "ManufacturingProcess32",  "ManufacturingProcess34", "ManufacturingProcess45", "ManufacturingProcess37", "ManufacturingProcess17", "ManufacturingProcess29", "ManufacturingProcess28", "ManufacturingProcess36", "BiologicalMaterial05")
top_10_linear <- as.data.frame(top_10_linear_vec)
colnames(top_10_linear) <- "Linear Top 10"
knitr::kable(top_10_linear, format = "simple")
Linear Top 10
ManufacturingProcess09
ManufacturingProcess32
ManufacturingProcess34
ManufacturingProcess45
ManufacturingProcess37
ManufacturingProcess17
ManufacturingProcess29
ManufacturingProcess28
ManufacturingProcess36
BiologicalMaterial05

ManufacturingProcess34, ManufacturingProcess45, ManufacturingProcess29, ManufacturingProcess28 were very important to the linear regression model, but do not make the top 10 most important predictors list for the optimal SVM (Radial Basis) model.

The top predictors that are unique to the optimal SVM (Radial Basis) model are:

unique <- names[!names %in% as.character(top_10_linear_vec)]
unique_df <- as.data.frame(unique)
colnames(unique_df) <- "Top Predictors Unique to SVM (Radial Basis)"
knitr::kable(unique_df, format = "simple")
Top Predictors Unique to SVM (Radial Basis)
BiologicalMaterial09
ManufacturingProcess04
ManufacturingProcess06
ManufacturingProcess20
keep <- c(unique, "Yield")
plot_df <- train_CMP |>
    select(all_of(keep)) |>
    pivot_longer(cols = all_of(unique), names_to = "Variable",
                 values_to = "Value")
p <- plot_df |>
    ggplot() +
    geom_point(aes(x = Value, y = Yield)) +
    facet_wrap(vars(Variable), scales = "free_x", ncol = 2)
p

It’s a little strange to look at linear relationships after developing an optimal nonlinear regression model, but for BiologicalMaterial09, it appears there is an ideal minimum value of 12 and an ideal maximum value of 13.5 for higher Yield. For ManufacturingProcess04, a decrease tends to increase Yield, but there’s still a lot of variance. Increasing ManufacturingProcess06 results in increased Yield. And there’s a clustering in ManufacturingProcess20 that we can’t really discern any meaning from. The nonlinear model is doing work for us that we can’t naturally assess visually.